指標

建立：2026-06-24 · 最後編輯：2026-06-26

MER: Mixed Error Rate

衡量整個中英混語句子的錯誤率，越低越好

\((R^{mix})\)：reference 的 mixed token sequence
\((H^{mix})\)：hypothesis 的 mixed token sequence
\((d(\cdot,\cdot))\)：Levenshtein edit distance

\([ MER = \frac{\sum_u d(R^{mix}_u, H^{mix}_u)}{\sum_u |R^{mix}_u|} ]\)

Reference: 我想喝latte
ref mixed token sequence: [我, 想, 喝, latte]
Hypothesis: 我想喝辣椒
hyp mixed token sequence: [我, 想, 喝, 辣, 椒]

Distance: 2
latte -> 辣
insert 椒

2/4 = 0.5

Chen et al., 2023. Generative Error Correction for Code-Switching Speech Recognition Using Large Language Models. arXiv:2310.13013. https://arxiv.org/abs/2310.13013
Hamed et al., 2022/2023. Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. arXiv:2211.16319. https://arxiv.org/abs/2211.16319

CER: Chinese Character Error Rate

只看中文的部分，英文忽略，越低越好

\((R^{zh})\)：reference 的中文字元序列
\((H^{zh})\)：hypothesis 的中文字元序列

\([ CER = \frac{\sum_u d(R^{zh}_u, H^{zh}_u)}{\sum_u |R^{zh}_u|} ]\)

Reference: 我想喝latte
[我, 想, 喝]
Hypothesis: 我想喝辣椒
[我, 想, 喝, 辣, 椒]

Distance: 2
insert 辣
insert 椒
2/3 = 0.6667

Hamed et al., 2022/2023. Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. arXiv:2211.16319. https://arxiv.org/abs/2211.16319
Kadaoui et al., 2024. PolyWER: A Holistic Evaluation Framework for Code-Switched Speech Recognition. Findings of EMNLP 2024. https://aclanthology.org/2024.findings-emnlp.356/

WER: English Word Error Rate

只看英文，中文忽略，越低越好

\((R^{en})\)：reference 的英文 word sequence
\((H^{en})\)：hypothesis 的英文 word sequence

\([ WER = \frac{\sum_u d(R^{en}_u, H^{en}_u)}{\sum_u |R^{en}_u|} ]\)

Reference: 我想喝latte
[latte]
Hypothesis: 我想喝辣椒
[]

Distance: 1
delete latte

1/1 = 1

Hamed et al., 2022/2023. Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. arXiv:2211.16319. https://arxiv.org/abs/2211.16319
Wan et al., 2023. New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.543/

PIER-En（Point-of-Interest Error Rate on English Tokens）

不要只看整句錯誤率，而是只看 code-switching 中真正重要的位置，越小越好。

\((I_u)\)：reference 中 English token 的 index set
\((A_{I,u})\): alignment 中落在 (I_u) 的錯誤 English POI(目前計算所有 reference 產生的英文都是 POI) 集合

\([ PIER\text{-}En = \frac{\sum_u |A_{I,u}|}{\sum_u |I_u|} ]\)

ref: 我 想 買 iphone case
hyp: 我 想 買 phone case

[iphone, case]
[phone, case]
1/2 = 0.5

hyp: 我 想 買 new iphone case
[_,iphone, case]
[new, iphone, case]
0/2 = 0

Pham et al., 2025. PIER: A Novel Metric for Evaluating What Matters in Code-Switching. arXiv:2501.09512 / ICASSP 2025. https://arxiv.org/abs/2501.09512

Over-correction Rate

原本 ASR 是正確的，但被 LLM 改錯，越低越好。

raw correct tokens：raw ASR 相對 reference 原本正確的 tokens
over-corrections：raw ASR 原本正確，但 correction 後變錯的 tokens

\([ OCR = \frac{\text{over-corrections}}{\text{raw correct tokens}} ]\)

ref: 我想喝 latte
ASR: 我想喝 latte
LLM: 我想喝 coffee

raw_correct_tokens: 4
over_corrections: 1
1/4 = 0.25

Correction Precision (CorP)

LLM 做出的修改，有多少改善，越高越好。

beneficial edits：raw ASR 錯，correction 後變對的 edits
modifications：correction 相對 raw ASR 做出的所有修改

\([ CorP = \frac{\text{beneficial edits}}{\text{modifications}} ]\)

ref: 我想買 iphone case
hyp: 我想買 phone case
llm: 我想買 iphone cover

[我, 想, 買, iphone, case]
[我, 想, 買, phone, case] -> asr 錯 1, 對 4
[我, 想, 買, iphone, cover] -> llm 改 asr 改正 1, 對 asr 共修改 2 次

beneficial edits = 1
modifications = 2
1/2 = 0.5

Wan et al., 2023. New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.543/
Dahlmeier and Ng, 2012. Better Evaluation for Grammatical Error Correction https://aclanthology.org/N12-1067/

Correction Recall (CorR)

原本就錯的 token 修正好的比例。

beneficial edits：raw ASR 錯，correction 後變對的 edits
raw errors：raw ASR 相對 reference 的錯誤

\([ CorR = \frac{\text{beneficial edits}}{\text{raw errors}} ]\)

CorP 高, CorR 低 → LLM 修正保守，漏了許多該修的錯誤。
CorP 低, CorR 高 → LLM 修正積極，但傷害原本正確的地方。

Wan et al., 2023. New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.543/
Dahlmeier and Ng, 2012. Better Evaluation for Grammatical Error Correction. https://aclanthology.org/N12-1067/

F0.5

結合 correction precision 和 correction recall，寧可少修，不要亂修，越高越好。

F0.5 高：修正行為比較可靠，少亂改，而且也有一定修正能力
F0.5 低：可能亂改太多、漏改太多，或兩者都有

\((P = CorP)，(R = CorR)\)

\([ F_{0.5} = \frac{(1 + 0.5^2)PR}{0.5^2P + R} ]\)

Wan et al., 2023. New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.543/

FB（Fallback Rate）

衡量方法有多少比例最後沒有採用 LLM output，而是使用 Raw ASR。