lecture
website icon
2x2xK交叉表
勝算比 2x2交叉表 2x2xK交叉表 泊松回歸 羅吉斯回歸 多項羅吉斯回歸 累積羅吉斯回歸 Bradley-Terry模型 配對模型
×
website icon 資料管理 統計分析 相關資源 巨人肩膀 語法索引 關於作者

簡介

本部分介紹分析\(2 \times 2 \times k\)交叉表的Cochran-Mantel-Haenszel檢定。

Facebook Icon Twitter Icon LinkedIn Icon LINE Icon

有一些研究需要控制一個變數,來觀察其他變數之間的關係。例如控制性別,觀察雞尾酒療法對愛滋病的療效;控制地域,觀察政黨傾向與選舉投票。這樣的交叉表稱為\(2 \times 2\times k\)交叉表,也就是three-way table。分析\(2 \times 2\times k\)交叉表最重要的就是Cochran-Mantel-Haenszel檢定。

Cochran-Mantel-Haenszel檢定以William Cochran、Nathan Mantel與William Haenszel來命名,是一個以卡方分配為基礎的檢定。在一個層級數等於i的\(2 \times 2 \times i\)交叉表中,Cochran-Mantel-Haenszel檢定的統計量如下:

小計
個案1 \(a_{i}\) \(b_{i}\) \(N_{1,i}\)
個案2 \(c_{i}\) \(d_{i}\) \(N_{2,i}\)
小計 \(M_{1,i}\) \(M_{2,i}\) \(T_{i}\)

\[CMH=\dfrac{\left( \displaystyle\sum_{i=1}^{k}(a_{i}-\frac{N_{1,i}M_{1,i}}{T_{i}}) \right)^2}{\displaystyle\sum_{i=1}^{k} \frac{N_{1,i}N_{2,i}M_{1,i}M_{2,i}}{T_{i}^2(T_{i}-1)}}\]

大谷翔平 vs. 貝比魯斯:投手對決

投打二刀流大谷翔平(Shohei Ohtani)在2021年8月19日先發對決底特律老虎,主投8局8K僅失1分,同時敲出賽季高居大聯盟的40轟,成為洛杉磯天使隊單季最會轟的左打者。超狂的表現讓越來越多人把大谷翔平拿來與棒球之神貝比魯斯相比較。有趣的是大谷翔平是右投左打,貝比魯斯是左投左打,我們可以把大谷翔平與貝比魯斯對付左右打者的數據做分層,用CMH檢定比較他們的投打數據。從Baseball Reference取得兩人的生涯投球數據如下:

Batter Pitcher G AB Getting Hit Batter Out
vs. RHB Shohei Ohtani 29 271 70 201
Babe Ruth 57 1170 329 841
vs. LHB Shohei Ohtani 29 249 69 180
Babe Ruth 56 502 141 361

我們用R如法炮製出一樣的交叉表,首先進行前置作業,創造一個包含投手、防禦與左右打者的資料檔:

> pitcher<-c("Shohei Ohtani", "Babe Ruth")
> defence<-c("Getting Hit", "Out")
> batter<-c("vs. RHB", "vs. LHB")
> table<-list(Pitcher=pitcher, Defence=defence, Batter=batter)
> table<-expand.grid(table)
> table
        Pitcher     Defence  Batter
1 Shohei Ohtani Getting Hit vs. RHB
2     Babe Ruth Getting Hit vs. RHB
3 Shohei Ohtani         Out vs. RHB
4     Babe Ruth         Out vs. RHB
5 Shohei Ohtani Getting Hit vs. LHB
6     Babe Ruth Getting Hit vs. LHB
7 Shohei Ohtani         Out vs. LHB
8     Babe Ruth         Out vs. LHB

接著把投打資料填入後,就可以用xtabs()呼叫樞紐分析表:

> data<-c(70, 329, 201, 841, 69, 141, 180, 361)
> crosstab<-cbind(table, data)
> xtabs(data~Pitcher+Defence+Batter, crosstab)
, , Batter = vs. RHB

               Defence
Pitcher         Getting Hit Out
  Shohei Ohtani          70 201
  Babe Ruth             329 841

, , Batter = vs. LHB

               Defence
Pitcher         Getting Hit Out
  Shohei Ohtani          69 180
  Babe Ruth             141 361

Cochran-Mantel-Haenszel檢定:

> Ohtani_Ruth_pitch<-xtabs(data~Pitcher+Defence+Batter, crosstab)
> mantelhaen.test(Ohtani_Ruth_pitch)

        Mantel-Haenszel chi-squared test with continuity correction

data:  Ohtani_Ruth_pitch
Mantel-Haenszel X-squared = 0.34347, df = 1, p-value = 0.5578
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.7420511 1.1628302
sample estimates:
common odds ratio
         0.928913

Cochran-Mantel-Haenszel是自由度為1的卡方檢定,卡方值越大代表分層中的觀察值與期望值差距越大,可以得到拒絕虛無假設的結論。從投球資料分析,Cochran-Mantel-Haenszel檢定未達顯著,顯示大谷翔平與貝比魯斯面對左右打者的投球表現幾乎一樣。雖然大谷翔平的大聯盟生涯才剛起步,累積數據還不完整,但至少現階段而言兩者是同等級的投手。

大谷翔平 vs. 貝比魯斯:打者對決

一樣從Baseball Reference取得兩人的生涯打擊數據如下:

Pitcher Hitter G AB Hit Out
vs. RHP Shohei Ohtani 339 881 371 510
Babe Ruth 1707 5019 2611 2408
vs. LHP Shohei Ohtani 224 387 140 247
Babe Ruth 882 2078 983 1095

根據資料創造R的交叉表:

> hitter<-c("Shohei Ohtani", "Babe Ruth")
> offense<-c("Hit", "Out")
> pitcher<-c("vs. RHP", "vs. LHP")
> table<-list(Hitter=hitter, Offense=offense, Pitcher=pitcher)
> table<-expand.grid(table)
> data<-c(371, 2611, 510, 2408, 140, 983, 247, 1095)
> crosstab<-cbind(table, data)
> xtabs(data~Hitter+Offense+Pitcher, crosstab)
, , Pitcher = vs. RHP

               Offense
Hitter           Hit  Out
  Shohei Ohtani  371  510
  Babe Ruth     2611 2408

, , Pitcher = vs. LHP

               Offense
Hitter           Hit  Out
  Shohei Ohtani  140  247
  Babe Ruth      983 1095

Cochran-Mantel-Haenszel檢定:

> Ohtani_Ruth_hit<-xtabs(data~Pitcher+Defence+Batter, crosstab)
> mantelhaen.test(Ohtani_Ruth_hit)

        Mantel-Haenszel chi-squared test with continuity correction

data:  Ohtani_Ruth_hit
Mantel-Haenszel X-squared = 45.167, df = 1, p-value = 1.81e-11
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.5834310 0.7441049
sample estimates:
common odds ratio
        0.6588884

Cochran-Mantel-Haenszel檢定結果達到統計顯著,顯示大谷翔平與貝比魯斯的打擊成績有差異。至於到底是大谷翔平的打擊表現好?還是貝比魯斯的表現比較好?我們可以把分層表格拆解成\(2 \times 2\)交叉表,比較兩人的打擊率:

> vsRHP<-matrix(c(371,510,2611,2408), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
> vsLHP<-matrix(c(140,247,983,1095), ncol=2, byrow=TRUE, dimnames=list(c("Shohei Ohtani", "Babe Ruth"),c("Hit","Out")))
> prop.table(vsRHP,1)
                    Hit       Out
Shohei Ohtani 0.4211124 0.5788876
Babe Ruth     0.5202232 0.4797768
> prop.table(vsLHP,1)
                    Hit       Out
Shohei Ohtani 0.3617571 0.6382429
Babe Ruth     0.4730510 0.5269490

從上面的打擊表現來看,左打的貝比魯斯在面對右投手的時候,打擊率較好,出局率也較低;而面對左打的時候,貝比魯斯的打擊表現也是勝過大谷翔平。比較兩人的勝算比:

> library(epitools)
> oddsratio(vsRHP)
$data
               Hit  Out Total
Shohei Ohtani  371  510   881
Babe Ruth     2611 2408  5019
Total         2982 2918  5900

$measure
                        NA
odds ratio with 95% C.I.  estimate     lower     upper
           Shohei Ohtani 1.0000000        NA        NA
           Babe Ruth     0.6710105 0.5802926 0.7752787
$p.value
              NA
two-sided        midp.exact fisher.exact   chi.square
  Shohei Ohtani          NA           NA           NA
  Babe Ruth     5.595547e-08  6.08892e-08 5.736062e-08

$correction
[1] FALSE

attr(,"method")
[1] "median-unbiased estimate & mid-p exact CI"
> oddsratio(vsLHP)
$data
               Hit  Out Total
Shohei Ohtani  140  247   387
Babe Ruth      983 1095  2078
Total         1123 1342  2465

$measure
                        NA
odds ratio with 95% C.I.  estimate     lower     upper
           Shohei Ohtani 1.0000000        NA        NA
           Babe Ruth     0.6317902 0.5036948 0.7897257
$p.value
              NA
two-sided        midp.exact fisher.exact   chi.square
  Shohei Ohtani          NA           NA           NA
  Babe Ruth     4.96021e-05 6.034054e-05 5.428246e-05

$correction
[1] FALSE

attr(,"method")
[1] "median-unbiased estimate & mid-p exact CI"

勝算比顯示,大谷翔平的打擊勝算大約只有貝比魯斯的60%。不過貝比魯斯是已經退休的明星,大谷翔平在MLB的生涯才剛開始,未來是不是會創造出更多的投打二刀流奇蹟,真正超越棒球之神,仍有待時間來證明。