有一些研究需要控制一個變數,來觀察其他變數之間的關係。例如控制性別,觀察雞尾酒療法對愛滋病的療效;控制地域,觀察政黨傾向與選舉投票。這樣的交叉表稱為\(2 \times 2\times k\)交叉表,也就是three-way table。分析\(2 \times 2\times k\)交叉表最重要的就是Cochran-Mantel-Haenszel檢定。
Cochran-Mantel-Haenszel檢定以William Cochran、Nathan Mantel與William Haenszel來命名,是一個以卡方分配為基礎的檢定。在一個層級數等於i的\(2 \times 2 \times i\)交叉表中,Cochran-Mantel-Haenszel檢定的統計量如下:
| 有 | 無 | 小計 | |
| 個案1 | \(a_{i}\) | \(b_{i}\) | \(N_{1,i}\) | 
| 個案2 | \(c_{i}\) | \(d_{i}\) | \(N_{2,i}\) | 
| 小計 | \(M_{1,i}\) | \(M_{2,i}\) | \(T_{i}\) | 
\[CMH=\dfrac{\left( \displaystyle\sum_{i=1}^{k}(a_{i}-\frac{N_{1,i}M_{1,i}}{T_{i}}) \right)^2}{\displaystyle\sum_{i=1}^{k} \frac{N_{1,i}N_{2,i}M_{1,i}M_{2,i}}{T_{i}^2(T_{i}-1)}}\]
大谷翔平 vs. 貝比魯斯:投手對決
投打二刀流大谷翔平(Shohei Ohtani)在2021年8月19日先發對決底特律老虎,主投8局8K僅失1分,同時敲出賽季高居大聯盟的40轟,成為洛杉磯天使隊單季最會轟的左打者。超狂的表現讓越來越多人把大谷翔平拿來與棒球之神貝比魯斯相比較。有趣的是大谷翔平是右投左打,貝比魯斯是左投左打,我們可以把大谷翔平與貝比魯斯對付左右打者的數據做分層,用CMH檢定比較他們的投打數據。從Baseball Reference取得兩人的生涯投球數據如下:
| Batter | Pitcher | G | AB | Getting Hit | Batter Out | 
|---|---|---|---|---|---|
| vs. RHB | Shohei Ohtani | 29 | 271 | 70 | 201 | 
| Babe Ruth | 57 | 1170 | 329 | 841 | |
| vs. LHB | Shohei Ohtani | 29 | 249 | 69 | 180 | 
| Babe Ruth | 56 | 502 | 141 | 361 | 
我們用R如法炮製出一樣的交叉表,首先進行前置作業,創造一個包含投手、防禦與左右打者的資料檔:
     > pitcher<-c("Shohei Ohtani", "Babe Ruth")
     > defence<-c("Getting Hit", "Out")
     > batter<-c("vs. RHB", "vs. LHB")
     > table<-list(Pitcher=pitcher, Defence=defence, Batter=batter)
     > table<-expand.grid(table)
     > table
             Pitcher     Defence  Batter
     1 Shohei Ohtani Getting Hit vs. RHB
     2     Babe Ruth Getting Hit vs. RHB
     3 Shohei Ohtani         Out vs. RHB
     4     Babe Ruth         Out vs. RHB
     5 Shohei Ohtani Getting Hit vs. LHB
     6     Babe Ruth Getting Hit vs. LHB
     7 Shohei Ohtani         Out vs. LHB
     8     Babe Ruth         Out vs. LHB
    
   接著把投打資料填入後,就可以用xtabs()呼叫樞紐分析表:
     > data<-c(70, 329, 201, 841, 69, 141, 180, 361)
     > crosstab<-cbind(table, data)
     > xtabs(data~Pitcher+Defence+Batter, crosstab)
     , , Batter = vs. RHB
     
                    Defence
     Pitcher         Getting Hit Out
       Shohei Ohtani          70 201
       Babe Ruth             329 841
     
     , , Batter = vs. LHB
     
                    Defence
     Pitcher         Getting Hit Out
       Shohei Ohtani          69 180
       Babe Ruth             141 361
    
   Cochran-Mantel-Haenszel檢定:
     > Ohtani_Ruth_pitch<-xtabs(data~Pitcher+Defence+Batter, crosstab)
     > mantelhaen.test(Ohtani_Ruth_pitch)
     
             Mantel-Haenszel chi-squared test with continuity correction
     
     data:  Ohtani_Ruth_pitch
     Mantel-Haenszel X-squared = 0.34347, df = 1, p-value = 0.5578
     alternative hypothesis: true common odds ratio is not equal to 1
     95 percent confidence interval:
      0.7420511 1.1628302
     sample estimates:
     common odds ratio
              0.928913
    
   Cochran-Mantel-Haenszel是自由度為1的卡方檢定,卡方值越大代表分層中的觀察值與期望值差距越大,可以得到拒絕虛無假設的結論。從投球資料分析,Cochran-Mantel-Haenszel檢定未達顯著,顯示大谷翔平與貝比魯斯面對左右打者的投球表現幾乎一樣。雖然大谷翔平的大聯盟生涯才剛起步,累積數據還不完整,但至少現階段而言兩者是同等級的投手。
大谷翔平 vs. 貝比魯斯:打者對決
一樣從Baseball Reference取得兩人的生涯打擊數據如下:
| Pitcher | Hitter | G | AB | Hit | Out | 
|---|---|---|---|---|---|
| vs. RHP | Shohei Ohtani | 339 | 881 | 371 | 510 | 
| Babe Ruth | 1707 | 5019 | 2611 | 2408 | |
| vs. LHP | Shohei Ohtani | 224 | 387 | 140 | 247 | 
| Babe Ruth | 882 | 2078 | 983 | 1095 | 
根據資料創造R的交叉表:
     > hitter<-c("Shohei Ohtani", "Babe Ruth")
     > offense<-c("Hit", "Out")
     > pitcher<-c("vs. RHP", "vs. LHP")
     > table<-list(Hitter=hitter, Offense=offense, Pitcher=pitcher)
     > table<-expand.grid(table)
     > data<-c(371, 2611, 510, 2408, 140, 983, 247, 1095)
     > crosstab<-cbind(table, data)
     > xtabs(data~Hitter+Offense+Pitcher, crosstab)
     , , Pitcher = vs. RHP
     
                    Offense
     Hitter           Hit  Out
       Shohei Ohtani  371  510
       Babe Ruth     2611 2408
     
     , , Pitcher = vs. LHP
     
                    Offense
     Hitter           Hit  Out
       Shohei Ohtani  140  247
       Babe Ruth      983 1095
    
  Cochran-Mantel-Haenszel檢定:
    > Ohtani_Ruth_hit<-xtabs(data~Pitcher+Defence+Batter, crosstab)
    > mantelhaen.test(Ohtani_Ruth_hit)
    
            Mantel-Haenszel chi-squared test with continuity correction
    
    data:  Ohtani_Ruth_hit
    Mantel-Haenszel X-squared = 45.167, df = 1, p-value = 1.81e-11
    alternative hypothesis: true common odds ratio is not equal to 1
    95 percent confidence interval:
     0.5834310 0.7441049
    sample estimates:
    common odds ratio
            0.6588884
   
  Cochran-Mantel-Haenszel檢定結果達到統計顯著,顯示大谷翔平與貝比魯斯的打擊成績有差異。至於到底是大谷翔平的打擊表現好?還是貝比魯斯的表現比較好?我們可以把分層表格拆解成\(2 \times 2\)交叉表,比較兩人的打擊率:
    > vsRHP<-matrix(c(371,510,2611,2408), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
    > vsLHP<-matrix(c(140,247,983,1095), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
    > prop.table(vsRHP,1)
                        Hit       Out
    Shohei Ohtani 0.4211124 0.5788876
    Babe Ruth     0.5202232 0.4797768
    > prop.table(vsLHP,1)
                        Hit       Out
    Shohei Ohtani 0.3617571 0.6382429
    Babe Ruth     0.4730510 0.5269490
   
  從上面的打擊表現來看,左打的貝比魯斯在面對右投手的時候,打擊率較好,出局率也較低;而面對左打的時候,貝比魯斯的打擊表現也是勝過大谷翔平。比較兩人的勝算比:
    > library(epitools)
    > oddsratio(vsRHP)
    $data
                   Hit  Out Total
    Shohei Ohtani  371  510   881
    Babe Ruth     2611 2408  5019
    Total         2982 2918  5900
    
    $measure
                            NA
    odds ratio with 95% C.I.  estimate     lower     upper
               Shohei Ohtani 1.0000000        NA        NA
               Babe Ruth     0.6710105 0.5802926 0.7752787
    $p.value
                  NA
    two-sided        midp.exact fisher.exact   chi.square
      Shohei Ohtani          NA           NA           NA
      Babe Ruth     5.595547e-08  6.08892e-08 5.736062e-08
    
    $correction
    [1] FALSE
    
    attr(,"method")
    [1] "median-unbiased estimate & mid-p exact CI"
    > oddsratio(vsLHP)
    $data
                   Hit  Out Total
    Shohei Ohtani  140  247   387
    Babe Ruth      983 1095  2078
    Total         1123 1342  2465
    
    $measure
                            NA
    odds ratio with 95% C.I.  estimate     lower     upper
               Shohei Ohtani 1.0000000        NA        NA
               Babe Ruth     0.6317902 0.5036948 0.7897257
    $p.value
                  NA
    two-sided        midp.exact fisher.exact   chi.square
      Shohei Ohtani          NA           NA           NA
      Babe Ruth     4.96021e-05 6.034054e-05 5.428246e-05
    
    $correction
    [1] FALSE
    
    attr(,"method")
    [1] "median-unbiased estimate & mid-p exact CI"
   
  勝算比顯示,大谷翔平的打擊勝算大約只有貝比魯斯的60%。不過貝比魯斯是已經退休的明星,大谷翔平在MLB的生涯才剛開始,未來是不是會創造出更多的投打二刀流奇蹟,真正超越棒球之神,仍有待時間來證明。
 
  
   
    
    
   