有一些研究需要控制一個變數,來觀察其他變數之間的關係。例如控制性別,觀察雞尾酒療法對愛滋病的療效;控制地域,觀察政黨傾向與選舉投票。這樣的交叉表稱為\(2 \times 2\times k\)交叉表,也就是three-way table。分析\(2 \times 2\times k\)交叉表最重要的就是Cochran-Mantel-Haenszel檢定。
Cochran-Mantel-Haenszel檢定以William Cochran、Nathan Mantel與William Haenszel來命名,是一個以卡方分配為基礎的檢定。在一個層級數等於i的\(2 \times 2 \times i\)交叉表中,Cochran-Mantel-Haenszel檢定的統計量如下:
有 | 無 | 小計 | |
個案1 | \(a_{i}\) | \(b_{i}\) | \(N_{1,i}\) |
個案2 | \(c_{i}\) | \(d_{i}\) | \(N_{2,i}\) |
小計 | \(M_{1,i}\) | \(M_{2,i}\) | \(T_{i}\) |
\[CMH=\dfrac{\left( \displaystyle\sum_{i=1}^{k}(a_{i}-\frac{N_{1,i}M_{1,i}}{T_{i}}) \right)^2}{\displaystyle\sum_{i=1}^{k} \frac{N_{1,i}N_{2,i}M_{1,i}M_{2,i}}{T_{i}^2(T_{i}-1)}}\]
大谷翔平 vs. 貝比魯斯:投手對決
投打二刀流大谷翔平(Shohei Ohtani)在2021年8月19日先發對決底特律老虎,主投8局8K僅失1分,同時敲出賽季高居大聯盟的40轟,成為洛杉磯天使隊單季最會轟的左打者。超狂的表現讓越來越多人把大谷翔平拿來與棒球之神貝比魯斯相比較。有趣的是大谷翔平是右投左打,貝比魯斯是左投左打,我們可以把大谷翔平與貝比魯斯對付左右打者的數據做分層,用CMH檢定比較他們的投打數據。從Baseball Reference取得兩人的生涯投球數據如下:
Batter | Pitcher | G | AB | Getting Hit | Batter Out |
---|---|---|---|---|---|
vs. RHB | Shohei Ohtani | 29 | 271 | 70 | 201 |
Babe Ruth | 57 | 1170 | 329 | 841 | |
vs. LHB | Shohei Ohtani | 29 | 249 | 69 | 180 |
Babe Ruth | 56 | 502 | 141 | 361 |
我們用R如法炮製出一樣的交叉表,首先進行前置作業,創造一個包含投手、防禦與左右打者的資料檔:
> pitcher<-c("Shohei Ohtani", "Babe Ruth")
> defence<-c("Getting Hit", "Out")
> batter<-c("vs. RHB", "vs. LHB")
> table<-list(Pitcher=pitcher, Defence=defence, Batter=batter)
> table<-expand.grid(table)
> table
Pitcher Defence Batter
1 Shohei Ohtani Getting Hit vs. RHB
2 Babe Ruth Getting Hit vs. RHB
3 Shohei Ohtani Out vs. RHB
4 Babe Ruth Out vs. RHB
5 Shohei Ohtani Getting Hit vs. LHB
6 Babe Ruth Getting Hit vs. LHB
7 Shohei Ohtani Out vs. LHB
8 Babe Ruth Out vs. LHB
接著把投打資料填入後,就可以用xtabs()呼叫樞紐分析表:
> data<-c(70, 329, 201, 841, 69, 141, 180, 361)
> crosstab<-cbind(table, data)
> xtabs(data~Pitcher+Defence+Batter, crosstab)
, , Batter = vs. RHB
Defence
Pitcher Getting Hit Out
Shohei Ohtani 70 201
Babe Ruth 329 841
, , Batter = vs. LHB
Defence
Pitcher Getting Hit Out
Shohei Ohtani 69 180
Babe Ruth 141 361
Cochran-Mantel-Haenszel檢定:
> Ohtani_Ruth_pitch<-xtabs(data~Pitcher+Defence+Batter, crosstab)
> mantelhaen.test(Ohtani_Ruth_pitch)
Mantel-Haenszel chi-squared test with continuity correction
data: Ohtani_Ruth_pitch
Mantel-Haenszel X-squared = 0.34347, df = 1, p-value = 0.5578
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.7420511 1.1628302
sample estimates:
common odds ratio
0.928913
Cochran-Mantel-Haenszel是自由度為1的卡方檢定,卡方值越大代表分層中的觀察值與期望值差距越大,可以得到拒絕虛無假設的結論。從投球資料分析,Cochran-Mantel-Haenszel檢定未達顯著,顯示大谷翔平與貝比魯斯面對左右打者的投球表現幾乎一樣。雖然大谷翔平的大聯盟生涯才剛起步,累積數據還不完整,但至少現階段而言兩者是同等級的投手。
大谷翔平 vs. 貝比魯斯:打者對決
一樣從Baseball Reference取得兩人的生涯打擊數據如下:
Pitcher | Hitter | G | AB | Hit | Out |
---|---|---|---|---|---|
vs. RHP | Shohei Ohtani | 339 | 881 | 371 | 510 |
Babe Ruth | 1707 | 5019 | 2611 | 2408 | |
vs. LHP | Shohei Ohtani | 224 | 387 | 140 | 247 |
Babe Ruth | 882 | 2078 | 983 | 1095 |
根據資料創造R的交叉表:
> hitter<-c("Shohei Ohtani", "Babe Ruth")
> offense<-c("Hit", "Out")
> pitcher<-c("vs. RHP", "vs. LHP")
> table<-list(Hitter=hitter, Offense=offense, Pitcher=pitcher)
> table<-expand.grid(table)
> data<-c(371, 2611, 510, 2408, 140, 983, 247, 1095)
> crosstab<-cbind(table, data)
> xtabs(data~Hitter+Offense+Pitcher, crosstab)
, , Pitcher = vs. RHP
Offense
Hitter Hit Out
Shohei Ohtani 371 510
Babe Ruth 2611 2408
, , Pitcher = vs. LHP
Offense
Hitter Hit Out
Shohei Ohtani 140 247
Babe Ruth 983 1095
Cochran-Mantel-Haenszel檢定:
> Ohtani_Ruth_hit<-xtabs(data~Pitcher+Defence+Batter, crosstab)
> mantelhaen.test(Ohtani_Ruth_hit)
Mantel-Haenszel chi-squared test with continuity correction
data: Ohtani_Ruth_hit
Mantel-Haenszel X-squared = 45.167, df = 1, p-value = 1.81e-11
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.5834310 0.7441049
sample estimates:
common odds ratio
0.6588884
Cochran-Mantel-Haenszel檢定結果達到統計顯著,顯示大谷翔平與貝比魯斯的打擊成績有差異。至於到底是大谷翔平的打擊表現好?還是貝比魯斯的表現比較好?我們可以把分層表格拆解成\(2 \times 2\)交叉表,比較兩人的打擊率:
> vsRHP<-matrix(c(371,510,2611,2408), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
> vsLHP<-matrix(c(140,247,983,1095), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
> prop.table(vsRHP,1)
Hit Out
Shohei Ohtani 0.4211124 0.5788876
Babe Ruth 0.5202232 0.4797768
> prop.table(vsLHP,1)
Hit Out
Shohei Ohtani 0.3617571 0.6382429
Babe Ruth 0.4730510 0.5269490
從上面的打擊表現來看,左打的貝比魯斯在面對右投手的時候,打擊率較好,出局率也較低;而面對左打的時候,貝比魯斯的打擊表現也是勝過大谷翔平。比較兩人的勝算比:
> library(epitools)
> oddsratio(vsRHP)
$data
Hit Out Total
Shohei Ohtani 371 510 881
Babe Ruth 2611 2408 5019
Total 2982 2918 5900
$measure
NA
odds ratio with 95% C.I. estimate lower upper
Shohei Ohtani 1.0000000 NA NA
Babe Ruth 0.6710105 0.5802926 0.7752787
$p.value
NA
two-sided midp.exact fisher.exact chi.square
Shohei Ohtani NA NA NA
Babe Ruth 5.595547e-08 6.08892e-08 5.736062e-08
$correction
[1] FALSE
attr(,"method")
[1] "median-unbiased estimate & mid-p exact CI"
> oddsratio(vsLHP)
$data
Hit Out Total
Shohei Ohtani 140 247 387
Babe Ruth 983 1095 2078
Total 1123 1342 2465
$measure
NA
odds ratio with 95% C.I. estimate lower upper
Shohei Ohtani 1.0000000 NA NA
Babe Ruth 0.6317902 0.5036948 0.7897257
$p.value
NA
two-sided midp.exact fisher.exact chi.square
Shohei Ohtani NA NA NA
Babe Ruth 4.96021e-05 6.034054e-05 5.428246e-05
$correction
[1] FALSE
attr(,"method")
[1] "median-unbiased estimate & mid-p exact CI"
勝算比顯示,大谷翔平的打擊勝算大約只有貝比魯斯的60%。不過貝比魯斯是已經退休的明星,大谷翔平在MLB的生涯才剛開始,未來是不是會創造出更多的投打二刀流奇蹟,真正超越棒球之神,仍有待時間來證明。