BMJ小小統計問題（47）：統合分析：異質性和次群體分析 (Meta-analyses: heterogeneity and subgroup analysis)

前言：

本期插播統合分析森林圖針對異質性和子群分析結果的解讀。最近有2位朋友問了統合分析森林圖上顯示χ²與I²數值如何進行解讀？最近幾天除了當直播主、推動研究案分析進度與編製教材外，還花點時間挑了這篇文章進行整理。針對兩概念講得頗清楚，應該可以釐清許多人這方面的疑惑。想挑戰的朋友可以看完題目後，先自己選看看答案。

另”如何使用R語言進行統合分析”線上課程新增內容已進入後製階段，今天星期一是大忙日，預計6/21(二)早上9:00上架。主要新增結局連續結局變項的觀念說明與程式碼，並補強之前內容，著重在實際操作跟練習。請有購買的朋友明天過後可以再進行觀看跟學習。

本篇內容有點難度，不過絕對值得看完。Hope u enjoy it.😊

Ps.想要特別感謝幾位朋友，很想一一感謝，但我知道有人喜歡潛水。在此謝謝按讚、鼓勵以及一路以來對匯東華的支持，無論是使用匯東華的服務或購買產品。下個月，匯東華設立即將滿一周年，走的尚穩，真的十分感謝。匯東華每年都會變一個模樣，愈來愈好，也會努力讓夥伴們愈來愈好😊

問題：

研究人員進行一項統合分析，評估對因急診入院的老年人進行的整合性老年評估的效果。其中包括比較老年整合評估與常規照護的隨機對照試驗。老年整合評估是一種多維度跨專科的診斷過程，用於確定衰弱老年人的醫療、心理和功能能力，從而制定協作、整合性的治療和長期追蹤計畫。常規照護通常包括在非專科醫生的照護下進入一般病房。採用22篇試驗結果，評估6個國家的10315名參與者[1]。

在預定追蹤期結束時，主要結局為“在家生活”。18個試驗，7062名參與者報告本項結果。中位追蹤時間為12個月(範圍為6周至12個月)。各試驗異質性檢驗結果χ²=28.49, df=17, P=0.04, I²=40%。總效果量估計值表明，在計畫追蹤結束時，接受整合老年評估的患者住在家中的勝算(odds)顯著高於接受常規照護的患者(勝算比(OR)=1.16(95%信賴區間1.05 - 1.28;P = 0.003)。

根據老年整合評估模型的類型進行子群分析。確定兩大類模式：由一個協調的專家小組在指定病房進行評估；在病人辦理住院處由流動小組進行評估。對病房(ward)的異質性檢驗，χ²=17.66, df=13, P=0.17, I²=26%；對團隊(team)的異質性檢驗，χ²=1.86, df=3, P=0.60, I²=0%。

“病房”的小計(subtotal)表明，與常規照互相比，整合的老年評估更有可能導致患者在計畫追蹤結束時待在自己家中(OR=1.22 (1.1 - 1.35;P < 0.001)。然而，當由流動小組進行整合老年評估時，其效果與常規照理相比則沒有定論(OR=0.75 (0.55 - 1.01;P = 0.06)。次群體差異檢驗χ²=9.06, df=1, P=0.003, I²=89%。

下列敘述何者正確?

a)可以推斷，所有試驗的樣本估計之間存在同質性。

b)“病房”和“團隊”兩個子群的樣本估計值存在同質性。

c)各子群間治療對主要結局的影響差異有統計學意義，可以推斷各子群間治療對主要結局的影響有差異。

d)“病房”子群和“團隊”子群在主要結局中存在顯著交互作用。

答案：

b,d正確，a,c錯誤。

詳細說明：

本統合分析目的為合併樣本估計值對整合評估對於常規照護母群體在家生活的OR參數進行估計。統合分析森林圖如下所示。

計算所有試驗的總效果量，無論整合老年評估是在指定病房進行或由流動小組進行。整合分析必須納入異質性的統計檢驗，以評估所有試驗樣本估計值間的變異程度。最常見的異質性檢定是Cochran 's Q和Higgins 's I²。

Cochran’s Q是較為傳統的檢定方式，基於χ²檢定。與傳統的統計假設檢定類似，有虛無假設和對立假設。虛無假設表示在全部試驗中母群體參數的樣本估計間具有同質性，即當從同一母群體抽樣時，它們之間的變異都不會超過期望值，換句話說，它們之間的變異很小，源自抽樣誤差。對例假設認為樣本估計值之間存在異質性。

Cochran的Q檢定可能無法準確地檢測出樣本估計中的異質性。所以，Higgins 's I²統計量也經常被使用。Higgins 's I²表示由於異質性而非抽樣誤差導致的樣本估計值之間的變異百分比。取值範圍為0% ~ 100%，0%表示統計上存在同質性。建議低、中、高異質性的I²值分別為25%、50%和75%。如果I²大於或等於50%，則認為存在顯著異質性。

在上述統合分析中，Cochran’s Q和Higgins’s I²檢定所有樣本估計值的P值都顯示在森林圖的底部，如圖“異質性檢驗：χ²= 28.49, df=17, P=0.04, I²=40%”所示。P值為0.04意味著在5%的顯著性臨界水準下，拒絕虛無假設而選擇對立假設。Higgins’s I²統計表明有低至中度的，因此得出樣本估計值之間存在統計異質性的結論(a錯誤)。

執行子群分析來探討異質性。該分析基於整合老年評估模型，即指定病房和流動團隊。每個子群分析仍然需要檢定異質性，如圖各子群的研究列表下方所示。“病房”組的χ²=17.66, df=13, P=0.17, I²=26%;“團隊”組的χ²=1.86, df=3, P=0.60, I²=0%。因此，兩個子群樣本估計間存在同質性(b正確)。

異質性檢定結果會影響各子群的總估計效果量的取得。存在同質性時，應使用固定效果量方法來處理小計(subtotal)的效果量。在存在異質性的情況下，隨機效果量方法將被使用。與固定效果量的統合分析相比，隨機效果量分析產生的小計效果量的信賴區間更寬，導致小計效果量的準確性較低。

子群分析表明，預定追蹤結束時，在指定病房接受整合老年評估的患者明顯比接受常規照護的患者更有可能返家(OR=1.22 (1.1 - 1.35; P < 0.001)。然而，當流動團隊進行的整合老年評估與常規照護進行比較時，結果無明確定論(OR=0.75 (0.55 - 1.01);P = 0.06)。

老年整合評估與常規照互相比，“病房”子群的治療效果有顯著性差異，而“團隊”子群則無顯著性差異。然而，無法直接根據各子群的顯著性來推斷治療對主要結局的影響在病房子群和團隊子群中是不同的(c錯誤)；正確的做法是直接比較各子群的治療效果。此外，每個子群樣本估計值間存在同質性的推論並不一定表明評估模型(病房或團隊)解釋了上述所有試驗中觀察到的樣本估計值間的異質性。特別是，無論是證明治療效果的顯著性還是異質性，子群分析的試驗數量和參與者數量都可能太小，無法具有足夠的統計學檢力。

各子群的治療效果應通過交互作用(interaction)檢定進行比較，而非通過P值進行顯著性比較。交互作用調查介入措施(與常規照護相比的老年整合評估)對主要結局的影響在各子組間是否存在差異。交互作用有時被稱為效果修正(effect modification)。統合分析中，交互作用的檢定是使用Cochran’s Q和Higgins’s I² test。檢定統計量為子群間的小計估計值。與之前使用Cochran’s Q和Higgins’s I²比較所有試驗中治療效果的樣本估計值不同。

對交互作用檢驗而言，Cochran’s Q和Higgins’s I² test的虛無假設為：同質性存在母群體參數的各子群估計值，當子群來自同一母群體時，彼此間的變異不會超過期望值。亦即，他們之間的變異性是小的，源自於抽樣誤差。Higgins’s I²測量總變異中來源於異質性而非抽樣誤差的佔率。統合分析交互作用檢定結果顯示於圖中森林圖的底部，標題為” “Test for subgroup differences: χ²= 9.06, df=1, P=0.003, I²=89%.”

Cochran’s Q檢定是在顯著水準為0.05下達到顯著，而Higgins’s I² 大於50%。從兩指標結果可知有顯著的交互作用存在不同子群的小計估計值間(d正確)。可以得出結論，這些子群估計了不同的母群體參數。

References:

[1] Ellis G, Whitehead MA, Robinson D, O’Neill D, Langhorne P. Comprehensive geriatric assessment for older adults admitted to hospital: meta-analysis of randomised controlled trials. BMJ 2011;343:d6553

#BMJ

#醫學統計

#meta-analysis

#heterogeneity

#subgroup analysis

原文題目：

Researchers undertook a meta-analysis to evaluate the effectiveness of comprehensive geriatric assessment in hospital for older adults admitted as an emergency. They included randomised controlled trials that compared comprehensive geriatric assessment with usual care. Comprehensive geriatric assessment is a multidimensional interdisciplinary diagnostic process used to determine the medical, psychological, and functional capabilities of a frail elderly person so as to develop a coordinated and integrated plan for treatment and long term follow-up. Usual care usually involved admission to a general medical ward setting under the care of a non-specialist. Twenty two trials were identified, evaluating 10 315 participants in six countries.[1]

The primary outcome was “living at home” at the end of the scheduled follow-up period. This outcome was reported by 18 trials evaluating 7062 participants. The median follow-up was 12 months (range six weeks to 12 months). The test of heterogeneity for these trials gave χ2=28.49, df=17, P=0.04, I2=40%. The total overall estimate indicated that the odds of a patient living at home at the end of scheduled follow-up were significantly higher in those patients who had undergone comprehensive geriatric assessment than in those who received usual care (odds ratio=1.16 (95% confidence interval 1.05 to 1.28; P=0.003)).

Subgroup analysis was undertaken, based on the type of model of comprehensive geriatric assessment performed. Two broad types of model were identified: assessment in designated wards by a coordinated specialist team; and assessment by mobile teams wherever the patient was admitted. The test of heterogeneity for “ward” gave χ2=17.66, df=13, P=0.17, I2=26% while that for “team” gave χ2=1.86, df=3, P=0.60, I2=0%.

The subtotal estimate for “ward” indicated that comprehensive geriatric assessment was significantly more likely to result in patients being in their own homes at the end of scheduled follow-up than was usual care (odds ratio 1.22 (1.1 to 1.35; P<0.001)). However, when comprehensive geriatric assessment was undertaken by mobile teams its effects were inconclusive in comparison with usual care (odds ratio 0.75 (0.55 to 1.01; P=0.06)). The test for subgroup differences gave χ2=9.06, df=1, P=0.003, I2=89%.

Which of the following statements, if any, are true?

a) It can be inferred that homogeneity existed between the sample estimates across all trials.

b) Homogeneity existed between the sample estimates in both subgroups of “ward” and “team.”

c) It can be inferred that the effect of treatment on the primary outcome was different in the subgroups of wards and teams on the basis of the statistical significance in the subgroups.

d) A significant interaction existed between the subgroups of “ward” and “team” in the primary outcome.

Answers

Statements b and d are true, whereas a and c are false.

Cite this as: BMJ 2013;346:f404

https://www.bmj.com/content/346/bmj.f4040

匯東華統計顧問有限公司

2022年6月19日星期日