Skip to content

Skill: cantonese-transcribe

Transcribe Cantonese/Chinese phone call audio files using Gemini AI, with speaker diarization, timestamps, and sentiment analysis.

Trigger

User says things like:

  • "transcribe [audio file]"
  • "轉錄 [音訊檔]"
  • "transcribe this call / recording"
  • Any .m4a, .mp3, .wav file with Cantonese/Chinese content

Output Format (ALWAYS use this exact format)

轉錄:[date/time description] 電話錄音
說話者:[Speaker1](主叫)/ [Speaker2](被叫)
語言:粵語(廣東話)
格式:[時間] [說話者] 「內容」(情緒)
============================================================

[MM:SS] [Speaker1] 「對話內容」(情緒)
[MM:SS] [Speaker2] 「對話內容」(情緒)

============================================================
=== 爭吵摘要 ===  (or 通話摘要 if not an argument)
(100字內核心重點)

Emotion labels: 生氣 / 不滿 / 激動 / 平靜 / 擔憂 / 委屈 / 諷刺 / 無奈 / 其他

Workflow

Step 1: Locate & Copy File

Audio paths often contain spaces that break shell. Use find -exec cp pattern:

bash
find /search/path/ -maxdepth 3 -name "*keyword*" -exec cp {} /tmp/audio_call.m4a \;
ls -lh /tmp/audio_call.m4a

Step 2: Upload to Gemini Files API

python
import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
uploaded = client.files.upload(
    file="/tmp/audio_call.m4a",
    config={"mime_type": "audio/mp4", "display_name": "call.m4a"}
)
print(f"URI: {uploaded.uri}  state: {uploaded.state}")

Step 3: Transcribe with Gemini

python
from google.genai import types

prompt = """你是粵語(廣東話)語音轉錄專家。

這是一段電話錄音,通話雙方:
- {Speaker1}(主叫方)
- {Speaker2}(被叫方)

完整轉錄要求:
1. 說話者標籤:每句標明 [{Speaker1}] 或 [{Speaker2}]
2. 保留原始粵語(繁體中文),不翻譯
3. 時間戳:每段 MM:SS
4. 情緒:每句標明(生氣/不滿/激動/平靜/擔憂/委屈/諷刺/無奈)

格式:
[MM:SS] [說話者] 「內容」(情緒:xxx)

最後:
=== 爭吵摘要 ===
(100字內說明核心爭議)"""

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=[
        types.Content(parts=[
            types.Part(file_data=types.FileData(
                file_uri=uploaded.uri,
                mime_type="audio/mp4"
            )),
            types.Part(text=prompt)
        ])
    ]
)
print(response.text)
client.files.delete(name=uploaded.name)

Step 4: Save Output

Use Write tool (not Bash) to save — avoids path-with-spaces issues. Save to same folder as original audio, e.g.: /Users/d/Desktop/John /transcript_{date}_{time}.txt

Key Notes

  • Model: gemini-2.5-pro (confirmed working 2026-04-20)
  • Timeout: Set 300000ms — large files take 2-3 min
  • Path spaces: NEVER use cp "path with spaces" — use find -exec cp
  • Broken CWD: If shell CWD is invalid, add dangerouslyDisableSandbox:true to Bash calls
  • File URI: Use the URI returned by upload, not a constructed URL
  • GEMINI_API_KEY: Available in env

David's Context

  • Calls with John (partner): save to /Users/d/Desktop/John /
  • David = 主叫, other = 被叫
  • John's name: 范文尊
  • These are argument recordings — use 爭吵摘要

Read-only documentation bundle of the Med Tracker agent stack. AU compliance baked in (AHPRA + Privacy Act 1988 + Spam Act 2003).