Kwangmin Kim - Composite Outcomes 의 함정

이 글은 Schulz Ch.18 시리즈의 마지막 글이다. Composite outcome 의 함정 을 다룬다.

1 진입 직관 — Composite 의 매력과 함정

Composite outcome 의 동기는 통계적 효율. 여러 결과 합치면 event rate 증가 → sample size 감소.

Schulz 의 경고: “Composite 는 better precision 을 fuzzy target 으로 산다. 정밀하게 틀린 곳 을 측정.”

이 글의 목적: Composite 를 적절히 사용하는 4 가지 기준 + 흔한 함정 사례.

2 Composite Outcome 의 4 가지 동등성 기준

Schulz 의 핵심 권고: Composite 가 valid 하려면 components 가 다음 4 가지에서 동등 해야.

2.1 기준 1: Seriousness (심각도)

모든 components 가 비슷한 임상적 심각도.

반사실: “사망 또는 입원” composite. 사망 = catastrophic, 입원 = serious. 비슷한 정도? 보통 사망이 훨씬 심각. 두 결과를 동등 가중 으로 합치면 misleading.

2.2 기준 2: Frequency (빈도)

모든 components 가 비슷한 빈도.

반사실 — DREAM: 당뇨 발생 25% + 사망 1.3%. 빈도 20 배 차이. Composite 는 사실상 당뇨 만 측정.

2.3 기준 3: Direction of Effect (효과 방향)

모든 components 에 처치가 같은 방향 효과.

반사실: 한 약이 사망 줄이지만 뇌졸중 늘림. Composite (사망 + 뇌졸중) 는 상쇄 가능. 효과가 한 방향 인지 components 별 검증 필수.

2.4 기준 4: Importance to Participants (환자 중요도)

환자 관점에서 비슷한 중요도.

반사실: “사망 또는 ECG 이상”. 환자에게 사망 = catastrophic, ECG 이상 = trivial. 합치면 환자 가치관 무시.

3 DREAM Trial — 60% 감소의 함정

이전 개관 글에서 본 사례. 깊이 분석.

3.1 시험 개요

Gerstein 외 (2006, Lancet). 5269 명 공복혈당장애 환자. Rosiglitazone vs Placebo.

3.2 Composite Outcome

“Incident diabetes or death”

3.3 결과 (Panel 18.4)

Outcome	Rosiglitazone (%)	Placebo (%)	HR (95% CI)
Composite	11.6	26.0	0.40 (0.35-0.46)
당뇨만	10.6	25.0	0.38 (0.33-0.44)
사망만	1.1	1.3	0.90 (0.55-1.5)

3.4 보고서의 misleading 첫 문장

“This large, prospective, blinded international clinical trial shows that 8 mg of rosiglitazone daily … substantially reduces the risk of diabetes or death by 60% in individuals at high risk for diabetes.”

3.5 함정 분석

함정 1 — Casual reader 오해: “60% reduces … diabetes or death” → “사망 60% 감소” 추측. 실제 사망 효과는 0.90 HR (CI 0.55-1.5) — 거의 0.

함정 2 — 4 기준 위반:

Seriousness: 사망 ≫ 당뇨 발생

Frequency: 당뇨 25% ≫ 사망 1.3%

Direction: 같음 (당뇨 감소 + 사망 무영향)

Importance: 사망 ≫ 당뇨 발생

함정 3 — Composite 의 본질: 흔한 component (당뇨) 가 수치 압도. 드문 component (사망) 의 진짜 효과 가 가려짐.

Schulz 의 비판: “Composite 가 quantitative heterogeneity (frequency) + qualitative heterogeneity (importance) 모두 위반.”

4 추가 함정 — Pseudo-Primary Outcome

4.1 잘못된 보고 패턴

한 시험이 5 개 primary outcome 사전 명시. 분석 결과 4 개는 비유의, 1 개만 유의.

잘못된 변형: 4 개를 재분류 또는 묶음 하여 이전에 없던 composite 만들기. 유의한 결과 강조.

결과: P-hacking 의 형태. 사후 데이터 보고 결정 → false positive 증가.

4.2 Lim 외 (2008) 의 메타분석

2000~2007 년 304 RCT (composite 사용) 분석:

발견	함의
Median 3 components per composite	합리적
작은 시험일수록 더 많은 components	“Boost events” 의도
회귀: components +1 → 환자 -721 명	Sample size 줄이려는 의도 입증
Composite p-value 분포 비대칭 (p<0.05 과다)	Publication bias 또는 p-hacking
사망의 composite 기여 미미	흔한 사건이 압도

Schulz 의 평가: “Statistical efficiency 의 매력에 의해 composite 가 invalidly used.”

5 Magnesium Sulphate Trial — 정당한 Composite

5.1 시험 (Crowther 외 2003, JAMA)

조산 임박 임산부에 정맥 magnesium sulphate vs placebo. 신생아 신경 보호 효과 검증.

5.2 Composite Outcome

“Death or cerebral palsy at age 2 years”

5.3 비판과 정당화

5.3.1 비판

“사망과 뇌성마비는 경제·사회·임상 가치 다름. 합치면 misleading.”

5.3.2 저자의 응답 (정당)

“A child cannot have motor dysfunction at age 2 if he or she is already dead.”

메커니즘: Magnesium 이 사망 늘려서 “살아남은 아이의 뇌성마비 감소” 처럼 보일 수 있음. Composite 가 이 competing risk 차단.

5.4 4 기준 평가

기준	평가
Seriousness	[O] 둘 다 catastrophic
Frequency	비슷 (드문 사건)
Direction	같음 (둘 다 감소 기대)
Importance to participants	비슷 (둘 다 매우 심각)

so what: 정당한 composite 는 competing risk 차단 의 도구. 4 기준 충족 시 사용 가능.

6 Composite Outcome 의 적절한 사용 권고

6.1 CONSORT 권고

“All components 의 개별 결과 보고 필수.”

6.2 Schulz 의 권고

사전 명시 — 프로토콜에 정확한 components 정의
4 기준 충족 — Seriousness, Frequency, Direction, Importance
Components 모두 보고 — Composite + 각 component 별 결과
사후 변경 금지 — Post-hoc composite 위험

7 p-hacking 으로서의 Composite

7.1 시나리오

시험에 5 outcome (A, B, C, D, E). 사전 합의: A 가 primary, 나머지 secondary.

분석 결과: A, B, C 비유의, D, E 유의.

사후 변형: “D + E” composite 생성, primary 처럼 보고. p < 0.05 보임.

7.2 문제

데이터 본 후 결정 — Type I 오류 폭증. 통계학적으로 무의미.

7.3 CONSORT 의 강제

“Primary outcome 사전 명시, 변경 시 protocol amendment 명시.”

반사실: Cordoba 외 (2010, BMJ) 메타분석. 40 RCT (composite outcome) 중 70% 가 components 선택의 정당성 명시 안 함. 60% 만이 components 별 결과 보고. 대부분 reports 가 composite 만 기반으로 inference.

8 코드 예시 — DREAM Trial 함정 시뮬레이션

import numpy as np

np.random.seed(42)

n_per_arm = 2500
true_diabetes_effect = 0.60   # 당뇨 60% 감소
true_death_effect = 0.0        # 사망 효과 없음

# Baseline rates
baseline_diabetes = 0.25
baseline_death = 0.013

# 시뮬레이션
diabetes_T = np.random.binomial(1, baseline_diabetes * (1 - true_diabetes_effect), n_per_arm)
death_T = np.random.binomial(1, baseline_death, n_per_arm)
composite_T = (diabetes_T | death_T)

diabetes_C = np.random.binomial(1, baseline_diabetes, n_per_arm)
death_C = np.random.binomial(1, baseline_death, n_per_arm)
composite_C = (diabetes_C | death_C)

print("[DREAM Trial 시뮬레이션]")
print(f"\nComposite (당뇨 + 사망):")
print(f"  Treatment: {composite_T.mean():.1%}, Placebo: {composite_C.mean():.1%}")
print(f"  HR ≈ {composite_T.mean() / composite_C.mean():.2f}")
print(f"  '60% reduction' 의 출처\n")

print(f"당뇨만:")
print(f"  Treatment: {diabetes_T.mean():.1%}, Placebo: {diabetes_C.mean():.1%}")
print(f"  HR ≈ {diabetes_T.mean() / diabetes_C.mean():.2f}\n")

print(f"사망만:")
print(f"  Treatment: {death_T.mean():.1%}, Placebo: {death_C.mean():.1%}")
print(f"  HR ≈ {death_T.mean() / death_C.mean():.2f}")

print(f"\n→ 'Reduces diabetes or death by 60%' 는 *당뇨* 효과만 반영")
print(f"→ 사망 효과는 사실상 0 — composite 가 가림")

# 4 기준 평가
print("\n[4 기준 평가]")
criteria = {
    "Seriousness": "사망 ≫ 당뇨 발생 — 위반",
    "Frequency": f"당뇨 ({baseline_diabetes:.0%}) ≫ 사망 ({baseline_death:.1%}) — 위반",
    "Direction": "같은 방향 — 충족",
    "Importance": "사망 ≫ 당뇨 — 위반",
}
for criterion, evaluation in criteria.items():
    print(f"  {criterion}: {evaluation}")

9 결론 — Ch.18 시리즈의 종합

Surrogate 와 Composite 는 통계적 효율 의 도구지만 임상적 진실 을 가릴 위험. Outcomes that matter 가 시험의 진짜 목적.

핵심 메시지:

Composite 4 기준 — Seriousness, Frequency, Direction, Importance
DREAM trial 60% 함정 — 흔한 component 가 압도
p-hacking 위험 — 사후 composite 생성
Magnesium sulphate 사례 — 정당한 composite (competing risk)
CONSORT 강제 — Components 모두 보고

Ch.18 시리즈를 종합: Surrogate 와 composite 모두 통계적 효율의 매력 에 임상적 진실의 trade-off. Validated surrogate (Level 2) 와 4 기준 충족 composite 만 사용 가능. 그 외에는 true endpoint 측정.

다음 챕터 (Ch.20) 는 Multiplicity II — Subgroup + Interim 분석.

10 관련 주제

11 참고문헌

Schulz, K. F. & Grimes, D. A. (2019). Essential Concepts in Clinical Research (2nd ed.), Ch.18. Elsevier.
Gerstein, H. C., Yusuf, S., Bosch, J., et al. (2006). DREAM trial. Lancet 368, 1096-1105.
Crowther, C. A., Hiller, J. E., Doyle, L. W., Haslam, R. R. (2003). Effect of magnesium sulfate. JAMA 290, 2669-2676.
Cordoba, G., Schwartz, L., Woloshin, S., Bae, H., Gøtzsche, P. C. (2010). Definition, reporting, and interpretation of composite outcomes. BMJ 341, c3920.
Lim, E., Brown, A., Helmy, A., Mussa, S., Altman, D. G. (2008). Composite outcomes in cardiovascular research. Ann. Intern. Med. 149, 612-617.
Goldberg, R., Gore, J. M., Barton, B., Gurwitz, J. (2014). Individual and composite study endpoints: separating the wheat from the chaff. Am. J. Med. 127, 379-384.
Tomlinson, G. & Detsky, A. S. (2010). Composite end points in randomized trials: there is no free lunch. JAMA 303, 267-268.
Ferreira-Gonzalez, I., et al. (2007). Methodological discussions for using composite endpoints. J. Clin. Epidemiol. 60, 651-657.