lecture
website icon
Bradley-Terry模型
勝算比 2x2交叉表 2x2xK交叉表 泊松回歸 羅吉斯回歸 多項羅吉斯回歸 累積羅吉斯回歸 Bradley-Terry模型 配對模型
×
website icon 資料管理 統計分析 相關資源 巨人肩膀 語法索引 關於作者

簡介

本部分介紹Bradley-Terry模型,使用到的指令包含:

Facebook Icon Twitter Icon LinkedIn Icon LINE Icon

Rank Analysis of Incomplete Block Designs I: The Method of Paired Comparisons這篇論文中,維吉尼亞理工大學的Ralph Bradley與Miton Terry發展出成對樣本的羅吉斯模型。由於是成對樣本,這套模型可以適用於任何一對一的「比較」中,其中運動賽事是最常見的應用。模型的假設是當i與j比賽時,i獲勝的機率是\(\prod ij\),其模型的數學式為:

\[logit(\prod i \text{ beats } j)=\beta_{i}-\beta_{j}\]

經過轉換後可得:

\[\prod ij = \frac{exp(\beta_{i}-\beta_{j})}{1+exp(\beta_{i}-\beta_{j})}\]

誰是網球的GOAT?

自從2003年費德勒(Roger Federer)在溫布敦錦標賽(The Championships, Wimbledon)獲得他的第一座大滿貫冠軍,直到2021年的美國網球公開賽(US Open)為止,費德勒、納達爾(Rafael Nadal)、喬科維奇(Novak Djokovic)、莫瑞(Andy Murray)四人主宰男子網壇。他們四人在長達18年的73場大滿貫賽事中獲得63次冠軍,同時也壟斷ATP大師賽,因此有了四大天王(Big Four)的稱號。

近年隨著莫瑞傷退,四大天王逐漸變成三大天王(Big Three)。儘管費德勒、納達爾、喬科維奇已經接近退休,但三人在網壇的主宰力絲毫未見衰退,他們都分別擁有20座大滿貫頭銜,超過80座ATP冠軍,長期壟斷世界第一的位置,也因此到底誰是網球界的Goat一直是球迷津津樂道的話題。我們可以從四人的對戰組合中,試著找出答案。

Win \ Lose Murray Djokovic Nadal Federer
Murray 0 11 7 11
Djokovic 25 0 30 27
Nadal 17 28 0 24
Federer 14 23 16 0

上表是直接從ATP網站擷取的對戰紀錄,時間至2021年美網結束為止,我們可用R繪製出一模一樣的對戰表:

> winner<-c("Andy Murray", "Novak Djokovic", "Rafael Nadal", "Roger Federer")
> loser<-c("Andy Murray", "Novak Djokovic", "Rafael Nadal", "Roger Federer")
> table<-list(Win=winner, Lose=loser)
> table<-expand.grid(table)
> data<-c(0,25,17,14,11,0,28,23,7,30,0,16,11,27,24,0)
> crosstab<-cbind(table, data)
> tennis<-xtabs(data~Win+Lose, crosstab)
> tennis
                Lose
Win              Andy Murray Novak Djokovic Rafael Nadal Roger Federer
  Andy Murray              0             11            7            11
  Novak Djokovic          25              0           30            27
  Rafael Nadal            17             28            0            24
  Roger Federer           14             23           16             0

接著下載BradleyTerry2套件,並將資料用countsToBinomial()轉換為成對樣本。4位選手中取兩兩對戰,因此共有\(C_{2}^4=6\)種對戰組合。

> library(BradleyTerry2)
> Head2Head<-countsToBinomial(tennis)
> names(Head2Head)[3:4]<-c("Win", "Lose")
> Head2Head</kbd>
         player1        player2 Win Lose
1    Andy Murray Novak Djokovic  11   25
2    Andy Murray   Rafael Nadal   7   17
3    Andy Murray  Roger Federer  11   14
4 Novak Djokovic   Rafael Nadal  30   28
5 Novak Djokovic  Roger Federer  27   23
6   Rafael Nadal  Roger Federer  24   16

大體而言,喬科維奇對戰莫瑞、納達爾、費德勒分別取得25勝11敗、30勝28敗、27勝23敗的優勢,與Nadal的對戰成績接近五五波優勢並不明顯。由於喬科維奇是現任球王,我們以他當作參考點來建立模型:

> model<-BTm(cbind(Win, Lose), player1, player2, formula=~player, id="player", refcat="Andy Murray", data=Head2Head)
> summary(model)

Call:
BTm(outcome = cbind(Win, Lose), player1 = player1, player2 = player2,
    formula = ~player, id = "player", refcat = "Novak Djokovic",
    data = Head2Head)

Deviance Residuals:
      1        2        3        4        5        6
-0.2046  -0.2674   0.4786   0.3421  -0.5346   0.2242

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)
playerAndy Murray   -0.74728    0.25236  -2.961  0.00306 **
playerRafael Nadal   0.02088    0.20831   0.100  0.92017
playerRoger Federer -0.31241    0.21610  -1.446  0.14827
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)
    Null deviance: 12.2482  on 6  degrees of freedom
Residual deviance:  0.7955  on 3  degrees of freedom
AIC: 30.825

Number of Fisher Scoring iterations: 3

以喬科維奇當作參考點,其他三位選手對數勝算(log odds)的係數分為別莫瑞-0.747、納達爾0.021、費德勒-0.312,其中只有莫瑞達到統計顯著。在Bradley-Terry模型中,參數就代表每位球員跟參考球員相比較後的「能力」(ability parameter)。可以看到與喬科維奇相較之下,除了納達爾之外,莫瑞與費德勒在對戰上都落居下風。不過只有莫瑞vs.喬科維奇達到統計顯著,代表就現在的對戰紀錄而言(雖然這是母體),喬科維奇確實贏過莫瑞,但統計上還沒有達到可以完全戰勝納達爾與費德勒的程度。

BTabilities()可以直接輸出球員的「能力」報表,也就是Bradley-Terry模型的參數估計,因為喬科維奇是參考點,所以參數是0:

> BTabilities(model)
                   ability      s.e.
Andy Murray    -0.74728269 0.2523606
Novak Djokovic  0.00000000 0.0000000
Rafael Nadal    0.02087824 0.2083145
Roger Federer  -0.31240554 0.2160995

有了上述對數勝算參數,可以用exp去除對數,重新估計喬科維奇對上其他三人的對戰勝算:

如果要更改模型的參考點,可以用update()配合refact來設定。例如可以將參考點從喬科維奇改為納達爾:

> update(model, refcat="Rafael Nadal")
Bradley Terry model fit by glm.fit

Call:  BTm(outcome = cbind(Win, Lose), player1 = player1, player2 = player2,
    formula = ~player, id = "player", refcat = "Rafael Nadal",
    data = Head2Head)

Coefficients:
   playerAndy Murray  playerNovak Djokovic   playerRoger Federer
            -0.76816              -0.02088              -0.33328

Degrees of Freedom: 6 Total (i.e. Null);  3 Residual
Null Deviance:      12.25
Residual Deviance: 0.7955       AIC: 30.83

硬地 vs. 紅土 vs. 草地

網球場地可分為硬地、紅土、草地,不同選手有不同擅長的場地,例如納達爾被稱為紅土之王、前球王山普拉斯(Pete Sampras)號稱草地至尊。我們依據不同場地,列出四大天王的對戰成績,分析他們在各種場地上的表現:

硬地 Hard
Win \ Lose Murray Djokovic Nadal Federer
Murray 0 8 5 10
Djokovic 20 0 20 20
Nadal 7 7 0 9
Federer 12 18 11 0
紅土 Clay
Win \ Lose Murray Djokovic Nadal Federer
Murray 0 1 2 0
Djokovic 5 0 8 4
Nadal 7 19 0 14
Federer 0 4 2 0
草地 Grass
Win \ Lose Murray Djokovic Nadal Federer
Murray 0 2 0 1
Djokovic 0 0 2 3
Nadal 3 2 0 1
Federer 2 1 3 0

以R繪製出三種不同場地的對戰紀錄:

> data_hard<-c(0,20,7,12,8,0,7,18,5,20,0,11,10,20,9,0)
> data_clay<-c(0,5,7,0,1,0,19,4,2,8,0,2,0,4,14,0)
> data_grass<-c(0,0,3,2,2,0,2,1,0,2,0,3,1,3,1,0)
> crosstab_hard<-cbind(table, data_hard)
> crosstab_clay<-cbind(table, data_clay)
> crosstab_grass<-cbind(table, data_grass)
> hard<-xtabs(data_hard~Win+Lose, crosstab_hard)
> clay<-xtabs(data_clay~Win+Lose, crosstab_clay)
> grass<-xtabs(data_grass~Win+Lose, crosstab_grass)
> hard
                Lose
Win              Andy Murray Novak Djokovic Rafael Nadal Roger Federer
  Andy Murray              0              8            5            10
  Novak Djokovic          20              0           20            20
  Rafael Nadal             7              7            0             9
  Roger Federer           12             18           11             0
> clay
                Lose
Win              Andy Murray Novak Djokovic Rafael Nadal Roger Federer
  Andy Murray              0              1            2             0
  Novak Djokovic           5              0            8             4
  Rafael Nadal             7             19            0            14
  Roger Federer            0              4            2             0
> grass
                Lose
Win              Andy Murray Novak Djokovic Rafael Nadal Roger Federer
  Andy Murray              0              2            0             1
  Novak Djokovic           0              0            2             3
  Rafael Nadal             3              2            0             1
  Roger Federer            2              1            3             0

將轉換為成對樣本:

> Head2Head_hard<-countsToBinomial(hard)
         player1        player2 win1 win2
1    Andy Murray Novak Djokovic    8   20
2    Andy Murray   Rafael Nadal    5    7
3    Andy Murray  Roger Federer   10   12
4 Novak Djokovic   Rafael Nadal   20    7
5 Novak Djokovic  Roger Federer   20   18
6   Rafael Nadal  Roger Federer    9   11
> Head2Head_clay<-countsToBinomial(clay)
         player1        player2 win1 win2
1    Andy Murray Novak Djokovic    1    5
2    Andy Murray   Rafael Nadal    2    7
3 Novak Djokovic   Rafael Nadal    8   19
4 Novak Djokovic  Roger Federer    4    4
5   Rafael Nadal  Roger Federer   14    2
> Head2Head_grass<-countsToBinomial(grass)
         player1        player2 win1 win2
1    Andy Murray Novak Djokovic    2    0
2    Andy Murray   Rafael Nadal    0    3
3    Andy Murray  Roger Federer    1    2
4 Novak Djokovic   Rafael Nadal    2    2
5 Novak Djokovic  Roger Federer    3    1
6   Rafael Nadal  Roger Federer    1    3

計算Bradley-Terry的硬地場地模型:

> model_hard<-BTm(cbind(win1, win2), player1, player2, formula=~player, id="player", refcat="Novak Djokovic", data=Head2Head_hard)
> summary(model_hard)

Call:
BTm(outcome = cbind(win1, win2), player1 = player1, player2 = player2,
    formula = ~player, id = "player", refcat = "Novak Djokovic",
    data = Head2Head_hard)

Deviance Residuals:
      1        2        3        4        5        6
-0.2625  -0.4658   0.6247   0.7177  -0.7688   0.4106

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)
playerAndy Murray    -0.8073     0.2975  -2.713  0.00666 **
playerRafael Nadal   -0.7419     0.3007  -2.467  0.01362 *
playerRoger Federer  -0.3560     0.2613  -1.362  0.17305
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 12.6625  on 6  degrees of freedom
Residual deviance:  1.9509  on 3  degrees of freedom
AIC: 29.136


Number of Fisher Scoring iterations: 3

從上面的硬地模型來看,喬科維奇對戰另外三人都具有優勢,特別是在對戰莫瑞與納達爾達到統計顯著,反映出他在澳網的主宰力,相較之下在硬地對戰費德勒的優勢則不明顯。接下來計算紅土場地模型:

> model_clay<-BTm(cbind(win1, win2), player1, player2, formula=~player, id="player", refcat="Novak Djokovic", data=Head2Head_clay)
> summary(model_clay)

Call:
BTm(outcome = cbind(win1, win2), player1 = player1, player2 = player2,
    formula = ~player, id = "player", refcat = "Novak Djokovic",
    data = Head2Head_clay)

Deviance Residuals:
      1        2        3        4        5
-0.7023   0.6636   0.1272  -0.7475   0.6968

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)
playerAndy Murray    -0.8959     0.6795  -1.318   0.1874
playerRafael Nadal    0.9188     0.3742   2.456   0.0141 *
playerRoger Federer  -0.5316     0.5258  -1.011   0.3119
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 20.592  on 5  degrees of freedom
Residual deviance:  1.994  on 2  degrees of freedom
AIC: 20.849

Number of Fisher Scoring iterations: 4

從模型數據來看,喬科維奇在紅土球場對戰納達爾的係數為0.9188,還原後的勝算為:

\(\frac{exp(0-0.9188)}{1+exp(0-0.9188)}\)=29%

由此可以看出納達爾在紅土的宰制力。接著來看看草地的情況又有所不同:

計算草地模型:

> model_grass<-BTm(cbind(win1, win2), player1, player2, formula=~player, id="player", refcat="Novak Djokovic", data=Head2Head_grass)
> summary(model_grass)

Call:
BTm(outcome = cbind(win1, win2), player1 = player1, player2 = player2,
    formula = ~player, id = "player", refcat = "Novak Djokovic",
    data = Head2Head_grass)

Deviance Residuals:
      1        2        3        4        5        6
 1.9307  -1.6614  -0.1280   0.1063   1.1271  -1.0230

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)
playerAndy Murray    -0.4314     0.8774  -0.492    0.623
playerRafael Nadal    0.1063     0.7489   0.142    0.887
playerRoger Federer   0.1063     0.7489   0.142    0.887

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9.3643  on 6  degrees of freedom
Residual deviance: 8.8323  on 3  degrees of freedom
AIC: 21.868

Number of Fisher Scoring iterations: 4

草地是四大天王對戰紀錄最少的場地,因此全部都未達統計顯著,從係數來看喬科維奇在草地對戰費納的優勢差不多。