指標

建立：2026-06-24 · 最後編輯：2026-06-25

MER: Mixed Error Rate

衡量整個中英混語句子的錯誤率，越低越好

reference mixed sequence R_m
hypothesis mixed sequence H_m
Levenshtein edit distance D(R_m, H_m)

\([ WER = \frac{\sum_u d(R^{en}_u, H^{en}_u)}{\max(\sum_u |R^{en}_u|, \sum_u |H^{en}_u|)} ]\)

Reference: 我想喝latte
[我, 想, 喝, latte]
Hypothesis: 我想喝辣椒
[我, 想, 喝, 辣, 椒]

Distance: 2
latte -> 辣
insert 椒

2/max(4, 5) = 0.4

Chen et al., 2023. Generative Error Correction for Code-Switching Speech Recognition Using Large Language Models. arXiv:2310.13013. https://arxiv.org/abs/2310.13013
Hamed et al., 2022/2023. Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. arXiv:2211.16319. https://arxiv.org/abs/2211.16319

CER: Chinese Character Error Rate

只看中文的部分，英文忽略，越低越好

reference 中文 R_c
hypothesis 中文 H_c

\([ CER = \frac{D(R_c, H_c)}{\max(|R_c|, |H_c|)} ]\)

Reference: 我想喝latte
[我, 想, 喝]
Hypothesis: 我想喝辣椒
[我, 想, 喝, 辣, 椒]

Distance: 2
insert 辣
insert 椒
2/5 = 0.4

Hamed et al., 2022/2023. Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. arXiv:2211.16319. https://arxiv.org/abs/2211.16319
Kadaoui et al., 2024. PolyWER: A Holistic Evaluation Framework for Code-Switched Speech Recognition. Findings of EMNLP 2024. https://aclanthology.org/2024.findings-emnlp.356/

WER: English Word Error Rate

只看英文，中文忽略，越低越好

reference 英文 R_w
hypothesis 英文 H_w

\([ WER = \frac{\sum_u d(R^{en}_u, H^{en}_u)}{\max(\sum_u |R^{en}_u|, \sum_u |H^{en}_u|)} ]\)

Reference: 我想喝latte
[latte]
Hypothesis: 我想喝辣椒
[]

Distance: 1
insert latte

1/1 = 1

Hamed et al., 2022/2023. Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. arXiv:2211.16319. https://arxiv.org/abs/2211.16319
Wan et al., 2023. New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction. Findings of EMNLP 2023. https://aclanthology.org/2023.findings-emnlp.543/

PIER-En（Point-of-Interest Error Rate on English Tokens）

不要只看整句錯誤率，而是只看 code-switching 中真正重要的位置，越小越好。

\((I_u)\)：reference 中 English token 的 index set
\((A_{I,u})\): alignment 中落在 (I_u) 的錯誤 English POI(所有 reference 產生的英文都是 POI) 集合

\([ PIER\text{-}En = \frac{\sum_u |A_{I,u}|}{\sum_u |I_u|} ]\)

ref: 我 想 買 iphone case
hyp: 我 想 買 phone case

[iphone, case]
[phone, case]
1/2 = 0.5

hyp: 我 想 買 new iphone case
[_,iphone, case]
[new, iphone, case]
0/2 = 0

Pham et al., 2025. PIER: A Novel Metric for Evaluating What Matters in Code-Switching. arXiv:2501.09512 / ICASSP 2025. https://arxiv.org/abs/2501.09512

EnP: English Precision / EnR: English Recall

EnP 輸出的英文有多少是正確的，EnR ref 的英文有多少被抓到，越高越好。

\([ EnP = \frac{\text{正確對齊的英文詞數}}{\text{hypothesis 裡的英文詞數}} ]\)

\([ EnR = \frac{\text{正確對齊的英文詞數}}{\text{reference 裡的英文詞數}} ]\)

ref: 我想喝 latte
hyp: 我想喝 coffee

Reference English words: [latte]
Hypothesis English words: [coffee]

EnP = 0 / 1 = 0
EnR = 0 / 1 = 0

ref: 我想喝 latte
hyp: 我想喝 latte coffee

Reference English words: [latte]
Hypothesis English words: [latte, coffee]

EnP = 1 / 2 = 0.5
EnR = 1 / 1 = 1

EnR 高, EnP 低 → LLM 幻覺額外英文，沒漏但亂加。
EnR 低, EnP 高 → LLM 保守，常漏掉英文。

Over-correction rate

原本 ASR 是正確的，但被 LLM 改錯，越低越好。

raw_correct_tokens 正確的 token。
over_corrections 原本正確被改錯的 token。

\([ \mathrm{OverCorrectionRate} = \frac{\mathrm{over\_corrections}} {\mathrm{raw\_correct\_tokens}} ]\)

ref: 我想喝 latte
ASR: 我想喝 latte
LLM: 我想喝 coffee

raw_correct_tokens: 4
over_corrections: 1
1/4 = 0.25

Correction precision

LLM 做出的修改，有多少改善，越高越好。

\([ CorrectionPrecision = \frac{\text{improvements}} {\text{modifications}} ]\)

Correction recall

原本就錯的 token 修正好的比例。

\(\text{CorrectionRecall} = \frac{\text{improvements}}{\text{raw\_error\_tokens}}\)

Precision 高, Recall 低 → LLM 修正保守，漏了許多該修的錯誤。
Precision 低, Recall 高 → LLM 修正積極，但傷害原本正確的地方。

ETCR

Eraw: 原始 ASR 中的英文
Ecorr: 修正過後的英文
\((d(\cdot,\cdot))\): Levenshtein edit distance
\((|E|)\): 英文 token 數

\([ ETCR = \frac{d(E_{raw}, E_{corr})}{\max(|E_{raw}|, |E_{corr}|)} ]\)

ASR: 我想買 iphone case
修正: 我想買 phone case

raw: [iphone, case]
corrected: [phone, case]
edit distance = 1, length = 2
1/2 = 0.5