README

Transcribes audio and video files to text using OpenAI Whisper. Runs a local web server at http://localhost:8765 with file upload, concurrent chunk processing, clipboard copy, .txt download, and optional Google Sheets output.

Prerequisites

Dependency	Install
Python 3.9+	`python3 --version`
ffmpeg	`brew install ffmpeg` / `sudo apt-get install ffmpeg`
Python packages	`pip install -r scripts/requirements.txt`
OpenAI API key	Set `OPENAI_API_KEY` in environment

Scripts

File	Purpose
`scripts/server.py`	HTTP server — exposes the web UI and calls Whisper
`scripts/transcriber.py`	Chunked transcription logic using the Whisper API
`scripts/sheets_handler.py`	Optional Google Sheets output (requires `credentials.json`)
`scripts/requirements.txt`	Python dependency list
`scripts/index.html`	Web UI served by the server

Quick Start

# 1. Install ffmpeg (macOS)
brew install ffmpeg

# 2. Install Python packages
pip install -r scripts/requirements.txt

# 3. Set your OpenAI API key
export OPENAI_API_KEY=sk-...

# 4. Start the server
python scripts/server.py

# 5. Open the UI
open http://localhost:8765

Google Sheets (optional)

Place a credentials.json file (Google Cloud service account or OAuth client) in the project root before starting the server. The UI exposes a sheet ID input when credentials are detected.

Runtime Compatibility — Intentional Deviation

The skill-creator standard says executable scripts must ship both a Python and a TypeScript version (scripts/python/ + scripts/ts/). This skill deviates by design: the scripts are a full Flask web application (server.py, transcriber.py, sheets_handler.py, index.html), and porting the server to Node.js would double maintenance with no runtime benefit — the skill already runs in both Claude Code and Claude.ai because Python is available in both. The scripts live at scripts/*.py directly rather than under scripts/python/.

Follow-ups

No Python test suite yet. Candidate units for tests/python/:

transcriber.py — chunk-boundary logic, overlap handling, reassembly order
sheets_handler.py — spreadsheet URL parsing, worksheet resolution

SKILL.md

Runs a local web server with a clean UI for audio/video transcription using OpenAI Whisper.

Features: File upload · Concurrent chunk processing · Copy to clipboard · Download as .txt · Google Sheets output

When to Activate

User wants to transcribe audio or video files (mp3, wav, m4a, mp4, webm, mov, etc.)
User asks to convert speech to text
User wants a podcast, meeting, or interview transcript
User wants to save transcriptions to Google Sheets
User needs to process large audio files with concurrent chunking

Skip When

Real-time microphone transcription (this skill processes existing files)
Speech synthesis / text-to-speech tasks
Translation between languages (Whisper detects, does not translate)

Quick Start

1. Install system dependency

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

2. Install Python packages

pip install -r ${CLAUDE_SKILL_DIR}[[/scripts/requirements.txt]]

3. Start the server

python ${CLAUDE_SKILL_DIR}[[/scripts/server.py]]

4. Open the UI

Navigate to http://localhost:8765 in your browser.

Using the Interface

Transcription

Enter your OpenAI API key (stored only in your browser session, never sent anywhere except OpenAI)
Select language or leave on Auto-detect
Drop or browse an audio/video file
Click Transcribe — the terminal shows progress for large chunked files
Use Copy or Download .txt to save the result

Google Sheets (optional)

To enable saving transcripts to a Google Sheet:

Follow the one-time setup in ${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]]
Click Connect Google Account — a browser tab opens for authorization
Paste the Spreadsheet URL and click Load sheets
Select a worksheet → Save to Sheet

Each saved row contains: timestamp, source file, duration, cost estimate, full transcript.

Architecture

File	Purpose
`${CLAUDE_SKILL_DIR}[[/scripts/server.py]]`	Flask server on port 8765, all API routes
`${CLAUDE_SKILL_DIR}[[/scripts/transcriber.py]]`	Chunking + concurrent Whisper API calls
`${CLAUDE_SKILL_DIR}[[/scripts/sheets_handler.py]]`	gspread OAuth + spreadsheet read/write
`${CLAUDE_SKILL_DIR}[[/scripts/index.html]]`	Single-page UI (served by Flask)
`${CLAUDE_SKILL_DIR}[[/scripts/requirements.txt]]`	Python dependencies
`${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]]`	One-time Google Cloud setup guide

Chunking logic

Files larger than 5 MB are split into overlapping chunks and transcribed concurrently (up to 10 parallel API requests). Chunks are reassembled in order. For most files this provides a significant speedup over sequential processing.

Configuration

Environment variable overrides:

PORT=9000 python ${CLAUDE_SKILL_DIR}[[/scripts/server.py]]

Code-level defaults (in transcriber.py):

Constant	Default	Description
`DEFAULT_MAX_DIRECT_MB`	5	File size threshold before chunking
`DEFAULT_CHUNK_OVERLAP_MS`	1000	Overlap between chunks (avoids cutting words)
`DEFAULT_MAX_WORKERS`	10	Concurrent Whisper API requests
`WHISPER_COST_PER_MINUTE`	0.006	For cost estimation only

Supported Formats

Type	Extensions
Audio	mp3, wav, m4a, ogg, flac, aac, wma
Video	mp4, webm, mov, avi, mkv, m4v

Video files have their audio track extracted automatically before transcription.

Troubleshooting

Issue	Fix
`ffmpeg not found`	Install ffmpeg (see Quick Start above)
`Module not found`	Run `pip install -r requirements.txt`
File too large / timeout	Reduce `DEFAULT_MAX_DIRECT_MB` or check network connection
Google Sheets auth fails	See `${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]]`
Port 8765 already in use	Run with `PORT=9000 python server.py`