Appearance
Skill: cantonese-transcribe
Transcribe Cantonese/Chinese phone call audio files using Gemini AI, with speaker diarization, timestamps, and sentiment analysis.
Trigger
User says things like:
- "transcribe [audio file]"
- "轉錄 [音訊檔]"
- "transcribe this call / recording"
- Any
.m4a,.mp3,.wavfile with Cantonese/Chinese content
Output Format (ALWAYS use this exact format)
轉錄:[date/time description] 電話錄音
說話者:[Speaker1](主叫)/ [Speaker2](被叫)
語言:粵語(廣東話)
格式:[時間] [說話者] 「內容」(情緒)
============================================================
[MM:SS] [Speaker1] 「對話內容」(情緒)
[MM:SS] [Speaker2] 「對話內容」(情緒)
============================================================
=== 爭吵摘要 === (or 通話摘要 if not an argument)
(100字內核心重點)Emotion labels: 生氣 / 不滿 / 激動 / 平靜 / 擔憂 / 委屈 / 諷刺 / 無奈 / 其他
Workflow
Step 1: Locate & Copy File
Audio paths often contain spaces that break shell. Use find -exec cp pattern:
bash
find /search/path/ -maxdepth 3 -name "*keyword*" -exec cp {} /tmp/audio_call.m4a \;
ls -lh /tmp/audio_call.m4aStep 2: Upload to Gemini Files API
python
import os
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
uploaded = client.files.upload(
file="/tmp/audio_call.m4a",
config={"mime_type": "audio/mp4", "display_name": "call.m4a"}
)
print(f"URI: {uploaded.uri} state: {uploaded.state}")Step 3: Transcribe with Gemini
python
from google.genai import types
prompt = """你是粵語(廣東話)語音轉錄專家。
這是一段電話錄音,通話雙方:
- {Speaker1}(主叫方)
- {Speaker2}(被叫方)
完整轉錄要求:
1. 說話者標籤:每句標明 [{Speaker1}] 或 [{Speaker2}]
2. 保留原始粵語(繁體中文),不翻譯
3. 時間戳:每段 MM:SS
4. 情緒:每句標明(生氣/不滿/激動/平靜/擔憂/委屈/諷刺/無奈)
格式:
[MM:SS] [說話者] 「內容」(情緒:xxx)
最後:
=== 爭吵摘要 ===
(100字內說明核心爭議)"""
response = client.models.generate_content(
model="gemini-2.5-pro",
contents=[
types.Content(parts=[
types.Part(file_data=types.FileData(
file_uri=uploaded.uri,
mime_type="audio/mp4"
)),
types.Part(text=prompt)
])
]
)
print(response.text)
client.files.delete(name=uploaded.name)Step 4: Save Output
Use Write tool (not Bash) to save — avoids path-with-spaces issues. Save to same folder as original audio, e.g.: /Users/d/Desktop/John /transcript_{date}_{time}.txt
Key Notes
- Model:
gemini-2.5-pro(confirmed working 2026-04-20) - Timeout: Set 300000ms — large files take 2-3 min
- Path spaces: NEVER use
cp "path with spaces"— usefind -exec cp - Broken CWD: If shell CWD is invalid, add
dangerouslyDisableSandbox:trueto Bash calls - File URI: Use the URI returned by upload, not a constructed URL
- GEMINI_API_KEY: Available in env
David's Context
- Calls with John (partner): save to
/Users/d/Desktop/John / - David = 主叫, other = 被叫
- John's name: 范文尊
- These are argument recordings — use 爭吵摘要