README
Transcribes audio and video files to text using OpenAI Whisper. Runs a local web server at http://localhost:8765 with file upload, concurrent chunk processing, clipboard copy, .txt download, and optional Google Sheets output.
Prerequisites
| Dependency | Install |
|---|---|
| Python 3.9+ | python3 --version |
| ffmpeg | brew install ffmpeg / sudo apt-get install ffmpeg |
| Python packages | pip install -r scripts/requirements.txt |
| OpenAI API key | Set OPENAI_API_KEY in environment |
Scripts
| File | Purpose |
|---|---|
scripts/server.py | HTTP server — exposes the web UI and calls Whisper |
scripts/transcriber.py | Chunked transcription logic using the Whisper API |
scripts/sheets_handler.py | Optional Google Sheets output (requires credentials.json) |
scripts/requirements.txt | Python dependency list |
scripts/index.html | Web UI served by the server |
Quick Start
# 1. Install ffmpeg (macOS)
brew install ffmpeg
# 2. Install Python packages
pip install -r scripts/requirements.txt
# 3. Set your OpenAI API key
export OPENAI_API_KEY=sk-...
# 4. Start the server
python scripts/server.py
# 5. Open the UI
open http://localhost:8765
Google Sheets (optional)
Place a credentials.json file (Google Cloud service account or OAuth client) in the project root before starting the server. The UI exposes a sheet ID input when credentials are detected.
Runtime Compatibility — Intentional Deviation
The skill-creator standard says executable scripts must ship both a Python
and a TypeScript version (scripts/python/ + scripts/ts/). This skill
deviates by design: the scripts are a full Flask web application
(server.py, transcriber.py, sheets_handler.py, index.html), and
porting the server to Node.js would double maintenance with no runtime
benefit — the skill already runs in both Claude Code and Claude.ai
because Python is available in both. The scripts live at scripts/*.py
directly rather than under scripts/python/.
Follow-ups
No Python test suite yet. Candidate units for tests/python/:
transcriber.py— chunk-boundary logic, overlap handling, reassembly ordersheets_handler.py— spreadsheet URL parsing, worksheet resolution
SKILL.md
Runs a local web server with a clean UI for audio/video transcription using OpenAI Whisper.
Features: File upload · Concurrent chunk processing · Copy to clipboard · Download as .txt · Google Sheets output
When to Activate
- User wants to transcribe audio or video files (mp3, wav, m4a, mp4, webm, mov, etc.)
- User asks to convert speech to text
- User wants a podcast, meeting, or interview transcript
- User wants to save transcriptions to Google Sheets
- User needs to process large audio files with concurrent chunking
Skip When
- Real-time microphone transcription (this skill processes existing files)
- Speech synthesis / text-to-speech tasks
- Translation between languages (Whisper detects, does not translate)
Quick Start
1. Install system dependency
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
2. Install Python packages
pip install -r ${CLAUDE_SKILL_DIR}[[/scripts/requirements.txt]]
3. Start the server
python ${CLAUDE_SKILL_DIR}[[/scripts/server.py]]
4. Open the UI
Navigate to http://localhost:8765 in your browser.
Using the Interface
Transcription
- Enter your OpenAI API key (stored only in your browser session, never sent anywhere except OpenAI)
- Select language or leave on Auto-detect
- Drop or browse an audio/video file
- Click Transcribe — the terminal shows progress for large chunked files
- Use Copy or Download .txt to save the result
Google Sheets (optional)
To enable saving transcripts to a Google Sheet:
- Follow the one-time setup in
${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]] - Click Connect Google Account — a browser tab opens for authorization
- Paste the Spreadsheet URL and click Load sheets
- Select a worksheet → Save to Sheet
Each saved row contains: timestamp, source file, duration, cost estimate, full transcript.
Architecture
| File | Purpose |
|---|---|
${CLAUDE_SKILL_DIR}[[/scripts/server.py]] | Flask server on port 8765, all API routes |
${CLAUDE_SKILL_DIR}[[/scripts/transcriber.py]] | Chunking + concurrent Whisper API calls |
${CLAUDE_SKILL_DIR}[[/scripts/sheets_handler.py]] | gspread OAuth + spreadsheet read/write |
${CLAUDE_SKILL_DIR}[[/scripts/index.html]] | Single-page UI (served by Flask) |
${CLAUDE_SKILL_DIR}[[/scripts/requirements.txt]] | Python dependencies |
${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]] | One-time Google Cloud setup guide |
Chunking logic
Files larger than 5 MB are split into overlapping chunks and transcribed concurrently (up to 10 parallel API requests). Chunks are reassembled in order. For most files this provides a significant speedup over sequential processing.
Configuration
Environment variable overrides:
PORT=9000 python ${CLAUDE_SKILL_DIR}[[/scripts/server.py]]
Code-level defaults (in transcriber.py):
| Constant | Default | Description |
|---|---|---|
DEFAULT_MAX_DIRECT_MB | 5 | File size threshold before chunking |
DEFAULT_CHUNK_OVERLAP_MS | 1000 | Overlap between chunks (avoids cutting words) |
DEFAULT_MAX_WORKERS | 10 | Concurrent Whisper API requests |
WHISPER_COST_PER_MINUTE | 0.006 | For cost estimation only |
Supported Formats
| Type | Extensions |
|---|---|
| Audio | mp3, wav, m4a, ogg, flac, aac, wma |
| Video | mp4, webm, mov, avi, mkv, m4v |
Video files have their audio track extracted automatically before transcription.
Troubleshooting
| Issue | Fix |
|---|---|
ffmpeg not found | Install ffmpeg (see Quick Start above) |
Module not found | Run pip install -r requirements.txt |
| File too large / timeout | Reduce DEFAULT_MAX_DIRECT_MB or check network connection |
| Google Sheets auth fails | See ${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]] |
| Port 8765 already in use | Run with PORT=9000 python server.py |