Skill: cantonese-transcribe

Transcribe Cantonese/Chinese phone call audio files using Gemini AI, with speaker diarization, timestamps, and sentiment analysis.

Trigger

User says things like:

"transcribe [audio file]"
"轉錄 [音訊檔]"
"transcribe this call / recording"
Any .m4a, .mp3, .wav file with Cantonese/Chinese content

Output Format (ALWAYS use this exact format)

轉錄：[date/time description] 電話錄音
說話者：[Speaker1]（主叫）/ [Speaker2]（被叫）
語言：粵語（廣東話）
格式：[時間] [說話者] 「內容」（情緒）
============================================================

[MM:SS] [Speaker1] 「對話內容」（情緒）
[MM:SS] [Speaker2] 「對話內容」（情緒）

============================================================
=== 爭吵摘要 ===  (or 通話摘要 if not an argument)
（100字內核心重點）

Emotion labels: 生氣 / 不滿 / 激動 / 平靜 / 擔憂 / 委屈 / 諷刺 / 無奈 / 其他

Workflow

Step 1: Locate & Copy File

Audio paths often contain spaces that break shell. Use find -exec cp pattern:

bash

find /search/path/ -maxdepth 3 -name "*keyword*" -exec cp {} /tmp/audio_call.m4a \;
ls -lh /tmp/audio_call.m4a

Step 2: Upload to Gemini Files API

python

import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
uploaded = client.files.upload(
    file="/tmp/audio_call.m4a",
    config={"mime_type": "audio/mp4", "display_name": "call.m4a"}
)
print(f"URI: {uploaded.uri}  state: {uploaded.state}")

Step 3: Transcribe with Gemini

python

from google.genai import types

prompt = """你是粵語（廣東話）語音轉錄專家。

這是一段電話錄音，通話雙方：
- {Speaker1}（主叫方）
- {Speaker2}（被叫方）

完整轉錄要求：
1. 說話者標籤：每句標明 [{Speaker1}] 或 [{Speaker2}]
2. 保留原始粵語（繁體中文），不翻譯
3. 時間戳：每段 MM:SS
4. 情緒：每句標明（生氣/不滿/激動/平靜/擔憂/委屈/諷刺/無奈）

格式：
[MM:SS] [說話者] 「內容」（情緒：xxx）

最後：
=== 爭吵摘要 ===
（100字內說明核心爭議）"""

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=[
        types.Content(parts=[
            types.Part(file_data=types.FileData(
                file_uri=uploaded.uri,
                mime_type="audio/mp4"
            )),
            types.Part(text=prompt)
        ])
    ]
)
print(response.text)
client.files.delete(name=uploaded.name)

Step 4: Save Output

Use Write tool (not Bash) to save — avoids path-with-spaces issues. Save to same folder as original audio, e.g.: /Users/d/Desktop/John /transcript_{date}_{time}.txt

Key Notes

Model: gemini-2.5-pro (confirmed working 2026-04-20)
Timeout: Set 300000ms — large files take 2-3 min
Path spaces: NEVER use cp "path with spaces" — use find -exec cp
Broken CWD: If shell CWD is invalid, add dangerouslyDisableSandbox:true to Bash calls
File URI: Use the URI returned by upload, not a constructed URL
GEMINI_API_KEY: Available in env

David's Context

Calls with John (partner): save to /Users/d/Desktop/John /
David = 主叫, other = 被叫
John's name: 范文尊
These are argument recordings — use 爭吵摘要

Skill: cantonese-transcribe ​

Trigger ​

Output Format (ALWAYS use this exact format) ​

Workflow ​

Step 1: Locate & Copy File ​

Step 2: Upload to Gemini Files API ​

Step 3: Transcribe with Gemini ​

Step 4: Save Output ​

Key Notes ​

David's Context ​