有一些研究需要控制一個變數,來觀察其他變數之間的關係。例如控制性別,觀察雞尾酒療法對愛滋病的療效;控制地域,觀察政黨傾向與選舉投票。這樣的交叉表稱為\(2 \times 2\times k\)交叉表,也就是three-way table。分析\(2 \times 2\times k\)交叉表最重要的就是Cochran-Mantel-Haenszel檢定。
Cochran-Mantel-Haenszel檢定以William Cochran、Nathan Mantel與William Haenszel來命名,是一個以卡方分配為基礎的檢定。在一個層級數等於i的\(2 \times 2 \times i\)交叉表中,Cochran-Mantel-Haenszel檢定的統計量如下:
有 | 無 | 小計 | |
個案1 | \(a_{i}\) | \(b_{i}\) | \(N_{1,i}\) |
個案2 | \(c_{i}\) | \(d_{i}\) | \(N_{2,i}\) |
小計 | \(M_{1,i}\) | \(M_{2,i}\) | \(T_{i}\) |
\[CMH=\dfrac{\left( \displaystyle\sum_{i=1}^{k}(a_{i}-\frac{N_{1,i}M_{1,i}}{T_{i}}) \right)^2}{\displaystyle\sum_{i=1}^{k} \frac{N_{1,i}N_{2,i}M_{1,i}M_{2,i}}{T_{i}^2(T_{i}-1)}}\]
大谷翔平 vs. 貝比魯斯:投手對決
投打二刀流大谷翔平(Shohei Ohtani)在2021年8月19日先發對決底特律老虎,主投8局8K僅失1分,同時敲出賽季高居大聯盟的40轟,成為洛杉磯天使隊單季最會轟的左打者。超狂的表現讓越來越多人把大谷翔平拿來與棒球之神貝比魯斯相比較。有趣的是大谷翔平是右投左打,貝比魯斯是左投左打,我們可以把大谷翔平與貝比魯斯對付左右打者的數據做分層,用CMH檢定比較他們的投打數據。從Baseball Reference取得兩人的生涯投球數據如下:
Batter | Pitcher | G | AB | Getting Hit | Batter Out |
vs. RHB | Shohei Ohtani | 29 | 271 | 70 | 201 |
Babe Ruth | 57 | 1170 | 329 | 841 | |
vs. LHB | Shohei Ohtani | 29 | 249 | 69 | 180 |
Babe Ruth | 56 | 502 | 141 | 361 |
> pitcher<-c("Shohei Ohtani", "Babe Ruth")
> defence<-c("Getting Hit", "Out")
> batter<-c("vs. RHB", "vs. LHB")
> table<-list(Pitcher=pitcher, Defence=defence, Batter=batter)
> table<-expand.grid(table)
> table
Pitcher Defence Batter
1 Shohei Ohtani Getting Hit vs. RHB
2 Babe Ruth Getting Hit vs. RHB
3 Shohei Ohtani Out vs. RHB
4 Babe Ruth Out vs. RHB
5 Shohei Ohtani Getting Hit vs. LHB
6 Babe Ruth Getting Hit vs. LHB
7 Shohei Ohtani Out vs. LHB
8 Babe Ruth Out vs. LHB
> data<-c(70, 329, 201, 841, 69, 141, 180, 361)
> crosstab<-cbind(table, data)
> xtabs(data~Pitcher+Defence+Batter, crosstab)
, , Batter = vs. RHB
Pitcher Getting Hit Out
Shohei Ohtani 70 201
Babe Ruth 329 841
, , Batter = vs. LHB
Pitcher Getting Hit Out
Shohei Ohtani 69 180
Babe Ruth 141 361
> Ohtani_Ruth_pitch<-xtabs(data~Pitcher+Defence+Batter, crosstab)
> mantelhaen.test(Ohtani_Ruth_pitch)
Mantel-Haenszel chi-squared test with continuity correction
data: Ohtani_Ruth_pitch
Mantel-Haenszel X-squared = 0.34347, df = 1, p-value = 0.5578
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.7420511 1.1628302
sample estimates:
common odds ratio
大谷翔平 vs. 貝比魯斯:打者對決
一樣從Baseball Reference取得兩人的生涯打擊數據如下:
Pitcher | Hitter | G | AB | Hit | Out |
vs. RHP | Shohei Ohtani | 339 | 881 | 371 | 510 |
Babe Ruth | 1707 | 5019 | 2611 | 2408 | |
vs. LHP | Shohei Ohtani | 224 | 387 | 140 | 247 |
Babe Ruth | 882 | 2078 | 983 | 1095 |
> hitter<-c("Shohei Ohtani", "Babe Ruth")
> offense<-c("Hit", "Out")
> pitcher<-c("vs. RHP", "vs. LHP")
> table<-list(Hitter=hitter, Offense=offense, Pitcher=pitcher)
> table<-expand.grid(table)
> data<-c(371, 2611, 510, 2408, 140, 983, 247, 1095)
> crosstab<-cbind(table, data)
> xtabs(data~Hitter+Offense+Pitcher, crosstab)
, , Pitcher = vs. RHP
Hitter Hit Out
Shohei Ohtani 371 510
Babe Ruth 2611 2408
, , Pitcher = vs. LHP
Hitter Hit Out
Shohei Ohtani 140 247
Babe Ruth 983 1095
> Ohtani_Ruth_hit<-xtabs(data~Pitcher+Defence+Batter, crosstab)
> mantelhaen.test(Ohtani_Ruth_hit)
Mantel-Haenszel chi-squared test with continuity correction
data: Ohtani_Ruth_hit
Mantel-Haenszel X-squared = 45.167, df = 1, p-value = 1.81e-11
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.5834310 0.7441049
sample estimates:
common odds ratio
Cochran-Mantel-Haenszel檢定結果達到統計顯著,顯示大谷翔平與貝比魯斯的打擊成績有差異。至於到底是大谷翔平的打擊表現好?還是貝比魯斯的表現比較好?我們可以把分層表格拆解成\(2 \times 2\)交叉表,比較兩人的打擊率:
> vsRHP<-matrix(c(371,510,2611,2408), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
> vsLHP<-matrix(c(140,247,983,1095), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
> prop.table(vsRHP,1)
Hit Out
Shohei Ohtani 0.4211124 0.5788876
Babe Ruth 0.5202232 0.4797768
> prop.table(vsLHP,1)
Hit Out
Shohei Ohtani 0.3617571 0.6382429
Babe Ruth 0.4730510 0.5269490
> library(epitools)
> oddsratio(vsRHP)
Hit Out Total
Shohei Ohtani 371 510 881
Babe Ruth 2611 2408 5019
Total 2982 2918 5900
odds ratio with 95% C.I. estimate lower upper
Shohei Ohtani 1.0000000 NA NA
Babe Ruth 0.6710105 0.5802926 0.7752787
two-sided midp.exact fisher.exact chi.square
Shohei Ohtani NA NA NA
Babe Ruth 5.595547e-08 6.08892e-08 5.736062e-08
[1] "median-unbiased estimate & mid-p exact CI"
> oddsratio(vsLHP)
Hit Out Total
Shohei Ohtani 140 247 387
Babe Ruth 983 1095 2078
Total 1123 1342 2465
odds ratio with 95% C.I. estimate lower upper
Shohei Ohtani 1.0000000 NA NA
Babe Ruth 0.6317902 0.5036948 0.7897257
two-sided midp.exact fisher.exact chi.square
Shohei Ohtani NA NA NA
Babe Ruth 4.96021e-05 6.034054e-05 5.428246e-05
[1] "median-unbiased estimate & mid-p exact CI"