README

Transcribes audio and video files to text using OpenAI Whisper. Runs a local web server at http://localhost:8765 with file upload, concurrent chunk processing, clipboard copy, .txt download, and optional Google Sheets output.

Prerequisites

DependencyInstall
Python 3.9+python3 --version
ffmpegbrew install ffmpeg / sudo apt-get install ffmpeg
Python packagespip install -r scripts/requirements.txt
OpenAI API keySet OPENAI_API_KEY in environment

Scripts

FilePurpose
scripts/server.pyHTTP server — exposes the web UI and calls Whisper
scripts/transcriber.pyChunked transcription logic using the Whisper API
scripts/sheets_handler.pyOptional Google Sheets output (requires credentials.json)
scripts/requirements.txtPython dependency list
scripts/index.htmlWeb UI served by the server

Quick Start

# 1. Install ffmpeg (macOS)
brew install ffmpeg

# 2. Install Python packages
pip install -r scripts/requirements.txt

# 3. Set your OpenAI API key
export OPENAI_API_KEY=sk-...

# 4. Start the server
python scripts/server.py

# 5. Open the UI
open http://localhost:8765

Google Sheets (optional)

Place a credentials.json file (Google Cloud service account or OAuth client) in the project root before starting the server. The UI exposes a sheet ID input when credentials are detected.

Runtime Compatibility — Intentional Deviation

The skill-creator standard says executable scripts must ship both a Python and a TypeScript version (scripts/python/ + scripts/ts/). This skill deviates by design: the scripts are a full Flask web application (server.py, transcriber.py, sheets_handler.py, index.html), and porting the server to Node.js would double maintenance with no runtime benefit — the skill already runs in both Claude Code and Claude.ai because Python is available in both. The scripts live at scripts/*.py directly rather than under scripts/python/.

Follow-ups

No Python test suite yet. Candidate units for tests/python/:

  • transcriber.py — chunk-boundary logic, overlap handling, reassembly order
  • sheets_handler.py — spreadsheet URL parsing, worksheet resolution

SKILL.md

Runs a local web server with a clean UI for audio/video transcription using OpenAI Whisper.

Features: File upload · Concurrent chunk processing · Copy to clipboard · Download as .txt · Google Sheets output

When to Activate

  • User wants to transcribe audio or video files (mp3, wav, m4a, mp4, webm, mov, etc.)
  • User asks to convert speech to text
  • User wants a podcast, meeting, or interview transcript
  • User wants to save transcriptions to Google Sheets
  • User needs to process large audio files with concurrent chunking

Skip When

  • Real-time microphone transcription (this skill processes existing files)
  • Speech synthesis / text-to-speech tasks
  • Translation between languages (Whisper detects, does not translate)

Quick Start

1. Install system dependency

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

2. Install Python packages

pip install -r ${CLAUDE_SKILL_DIR}[[/scripts/requirements.txt]]

3. Start the server

python ${CLAUDE_SKILL_DIR}[[/scripts/server.py]]

4. Open the UI

Navigate to http://localhost:8765 in your browser.


Using the Interface

Transcription

  1. Enter your OpenAI API key (stored only in your browser session, never sent anywhere except OpenAI)
  2. Select language or leave on Auto-detect
  3. Drop or browse an audio/video file
  4. Click Transcribe — the terminal shows progress for large chunked files
  5. Use Copy or Download .txt to save the result

Google Sheets (optional)

To enable saving transcripts to a Google Sheet:

  1. Follow the one-time setup in ${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]]
  2. Click Connect Google Account — a browser tab opens for authorization
  3. Paste the Spreadsheet URL and click Load sheets
  4. Select a worksheet → Save to Sheet

Each saved row contains: timestamp, source file, duration, cost estimate, full transcript.


Architecture

FilePurpose
${CLAUDE_SKILL_DIR}[[/scripts/server.py]]Flask server on port 8765, all API routes
${CLAUDE_SKILL_DIR}[[/scripts/transcriber.py]]Chunking + concurrent Whisper API calls
${CLAUDE_SKILL_DIR}[[/scripts/sheets_handler.py]]gspread OAuth + spreadsheet read/write
${CLAUDE_SKILL_DIR}[[/scripts/index.html]]Single-page UI (served by Flask)
${CLAUDE_SKILL_DIR}[[/scripts/requirements.txt]]Python dependencies
${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]]One-time Google Cloud setup guide

Chunking logic

Files larger than 5 MB are split into overlapping chunks and transcribed concurrently (up to 10 parallel API requests). Chunks are reassembled in order. For most files this provides a significant speedup over sequential processing.


Configuration

Environment variable overrides:

PORT=9000 python ${CLAUDE_SKILL_DIR}[[/scripts/server.py]]

Code-level defaults (in transcriber.py):

ConstantDefaultDescription
DEFAULT_MAX_DIRECT_MB5File size threshold before chunking
DEFAULT_CHUNK_OVERLAP_MS1000Overlap between chunks (avoids cutting words)
DEFAULT_MAX_WORKERS10Concurrent Whisper API requests
WHISPER_COST_PER_MINUTE0.006For cost estimation only

Supported Formats

TypeExtensions
Audiomp3, wav, m4a, ogg, flac, aac, wma
Videomp4, webm, mov, avi, mkv, m4v

Video files have their audio track extracted automatically before transcription.


Troubleshooting

IssueFix
ffmpeg not foundInstall ffmpeg (see Quick Start above)
Module not foundRun pip install -r requirements.txt
File too large / timeoutReduce DEFAULT_MAX_DIRECT_MB or check network connection
Google Sheets auth failsSee ${CLAUDE_SKILL_DIR}[[/references/google-sheets-setup.md]]
Port 8765 already in useRun with PORT=9000 python server.py