Kwangmin Kim - RAR + Play-the-Winner

출처

이 글은 사전지식 기반 (교재 미확인 — agent 사전학습 기반). 핵심 인용 — Wei (1979), Berry (2015), Korn & Freidlin (2011), Hu & Rosenberger (2003), Bartlett et al. (1985).

이 글은 J-ADAPT 시리즈의 두 번째 글. Response-Adaptive Randomization (RAR) — Wei 의 Play-the-Winner + Berry 의 Bayesian RAR — 을 다룬다.

1 진입 직관 — “효과 좋은 arm 에 더 많은 환자”

표준 RCT: 50:50 무작위. 효과 좋은 arm 도 나쁜 arm 도 같은 수 환자.

RAR 의 한 줄 원리: 진행 결과로 효과 좋아 보이는 arm 에 randomization probability ↑. 효과 나쁜 arm 에 ↓.

윤리적 동기: 환자 입장 에서 효과 좋은 약 받을 확률 ↑.

트레이드오프: 통계 효율 ↓ — 균형 sample 이 가장 powerful.

1.1 비유

MAB (Multi-Armed Bandit) — 효과 좋은 arm 에 더 많은 pull. RAR 은 환자 = pull, arm = 처치.

Phase I 의 MAB 시리즈와 직접 연결. RAR 은 사실상 Bandit algorithm 의 임상 trial 응용.

2 Wei (1979) — Play-the-Winner

2.1 의의

RAR 의 시초. Urn model 기반 직관적 메커니즘.

2.2 메커니즘

Urn (단지) 안에 balls — 각 색은 arm 을 나타냄.

환자 도착 → urn 에서 공 하나 뽑기. 색에 해당하는 arm 처치.

결과:

성공 → 같은 색 공 추가

실패 → 반대 색 공 추가

결과: 효과 좋은 arm 의 공이 증가 — randomization probability ↑.

2.3 직관

Pólya urn 같은 self-reinforcing 동학. 효과 발견 빠를수록 수렴 빠름.

2.4 한계

작은 sample 에서 극단적 imbalance 가능. 다음의 Bayesian RAR 로 부드러움.

3 Bayesian RAR (Berry 2015)

3.1 메커니즘

매 환자마다:

각 arm 의 posterior distribution update (예: Beta-Bernoulli)

Posterior probability of best 계산: \(P(\text{arm } k \text{ is best}) = ?\)

Randomization probability = posterior probability 의 함수 (예: \(p_k \propto \sqrt{P(\text{best } k)}\))

3.2 사례 — Beta-Bernoulli

Arm \(k\): \(\text{Beta}(\alpha_k, \beta_k)\), 성공 \(\alpha_k - 1\), 실패 \(\beta_k - 1\).

Posterior: \(\theta_k \sim \text{Beta}(\alpha_k + s_k, \beta_k + f_k)\).

\(P(\theta_k = \max_j \theta_j)\) 를 Monte Carlo sampling 으로 추정.

3.3 자연 연결

Phase I 의 Thompson Sampling 과 동일 메커니즘. RAR = TS 의 임상 trial 응용.

3.4 Smoothing

Power parameter \(c\):

\[ p_k \propto P(\text{arm } k \text{ best})^{1/c} \]

\(c \to \infty\): 균등 (표준 RCT) \(c = 1\): TS \(c \to 0\): 가장 큰 posterior arm 만 선택 (greedy)

적정 \(c\) 로 효율 vs 균형 조정.

4 RAR 의 통계적 도전

4.1 1: Type I Error 보정

Adaptive randomization 은 null hypothesis 하에서 distribution 변경. 표준 검정 biased.

Permutation test 또는 re-randomization test 필요.

4.2 2: Estimator Bias

Adaptive sample 은 post-hoc 효과 큰 arm 의 over-representation. Naive estimator biased.

Inverse Probability Weighting (IPW) 또는 adjusted estimator 필요.

4.3 3: Time Trend

환자 특성이 시간 따라 변화 시 (예: 시즌별, 연도별) RAR 의 균형 깨짐. Stratified RAR 또는 covariate adjustment.

5 Korn & Freidlin (2011) — 비판

5.1 의의

RAR 의 통계 효율성 에 대한 비판 적 분석. Power 비교.

5.2 결과

RAR 의 power 가 균형 RCT 의 power 보다 일반적으로 낮음. 특히 작은 효과 size 에서.

RAR 의 윤리적 이점 (효과 좋은 arm 에 더 많이) 가 통계 비효율 을 상쇄 하는가? 항상 그렇지는 않음.

5.3 권고

작은 효과 size 또는 큰 sample 에서는 균형 RCT 가 우월. RAR 은 큰 효과 또는 심각한 outcome (사망 등) 에서 정당화.

5.4 영향

RAR 의 광범위 사용 자제. Specific scenario (rare disease, large effect) 에서만.

6 ECMO Trial (Bartlett et al. 1985) — 극단적 RAR

6.1 시나리오

신생아 호흡 부전 — ECMO (extracorporeal membrane oxygenation) vs 전통 치료.

6.2 적용

Play-the-winner 같은 강한 RAR.

6.3 결과

첫 11 명 ECMO 환자 모두 생존, 첫 1 명 전통 환자 사망 → urn 이 완전 ECMO 편향.

전체 trial 에서 randomize 된 환자: ECMO 11 명, 전통 1 명.

6.4 비판

통계 검정력 부족. 1 명 전통 환자의 대표성 의문. RAR 의 극단적 사례.

6.5 영향

RAR 의 윤리·통계 갈등 의 표준 사례. 이후 milder RAR 또는 Bayesian smoothing 권장.

7 I-SPY 2 — Bayesian RAR + Platform

7.1 시나리오

유방암 의 neoadjuvant (수술 전) 약 평가. 다수 약 동시 평가.

7.2 메커니즘

Bayesian RAR — 효과 좋은 arm 에 더 많은 환자. Biomarker subgroup 별 효과 분석.

7.3 결과

다수 약의 graduate (Phase 3 진입) 또는 drop (효과 부족). 표준 RCT 대비 시간 ↓ 비용 ↓.

7.4 의의

Adaptive 의 성공 사례. FDA 와의 협력 + breakthrough designation.

8 Hu & Rosenberger (2003) — Asymptotic Analysis

8.1 의의

RAR 의 asymptotic 효율성 정통화.

8.2 결과

RAR 의 성공 은 adaptive ratio 와 true effect 의 결합 — 최적 RAR 가 환자 수 + 통계 power 동시 최적화 가능 (특정 가정 하).

8.3 영향

Bayesian + frequentist RAR 의 이론 토대.

9 시뮬레이션 — Bayesian RAR

import numpy as np

np.random.seed(42)

# 시나리오: 2 arm trial
# Arm 0 (control): 성공률 0.3
# Arm 1 (treatment): 성공률 0.5
true_p = [0.3, 0.5]

n_total = 200
n_arms = 2

# Beta(1, 1) prior
alpha = np.ones(n_arms)
beta = np.ones(n_arms)

# RAR
arm_assignments = []
outcomes = []

n_mc = 1000   # MC for posterior best probability

for n in range(n_total):
    # Posterior best probability via MC
    samples = np.random.beta(alpha, beta, size=(n_mc, n_arms))
    best_arm = np.argmax(samples, axis=1)
    p_best = np.array([np.mean(best_arm == k) for k in range(n_arms)])

    # Smoothing
    p_assign = p_best ** 0.5
    p_assign = p_assign / p_assign.sum()

    # 환자 배정
    arm = np.random.choice(n_arms, p=p_assign)
    outcome = np.random.binomial(1, true_p[arm])

    # Update posterior
    if outcome == 1:
        alpha[arm] += 1
    else:
        beta[arm] += 1

    arm_assignments.append(arm)
    outcomes.append(outcome)

arm_assignments = np.array(arm_assignments)
outcomes = np.array(outcomes)

print(f"[Bayesian RAR 시뮬레이션]\n")
print(f"진짜 success rates: {true_p}\n")
print(f"각 arm 의 환자 수:")
for k in range(n_arms):
    n_k = (arm_assignments == k).sum()
    p_k = outcomes[arm_assignments == k].mean()
    print(f"  Arm {k}: n = {n_k} ({100*n_k/n_total:.1f}%), observed p = {p_k:.3f}")

# 균형 RCT 비교
print(f"\n[비교: 균형 RCT (50:50)]")
n_per_arm_rct = n_total // 2
power_rct = "표준 power"
print(f"  Arm 0: {n_per_arm_rct}, Arm 1: {n_per_arm_rct}")
print(f"  Power (large effect): 표준")

# RAR 의 효과
n_treat = (arm_assignments == 1).sum()
print(f"\n[RAR 의 결과]")
print(f"  Treatment arm (good arm) 에 {100*n_treat/n_total:.0f}% 환자 배정")
print(f"  표준 RCT 라면 50% 만 — RAR 이 효과 좋은 arm 에 더 많이")

# Total successes
total_success = outcomes.sum()
print(f"\n[총 successes]")
print(f"  RAR: {total_success} / {n_total} = {total_success/n_total:.2%}")
print(f"  균형 RCT (예상): {n_total * np.mean(true_p):.0f} / {n_total} = {np.mean(true_p):.2%}")
print(f"  → RAR 이 윤리적 (더 많은 환자가 좋은 약 받음)")

10 결론

RAR 은 효과 좋은 arm 에 더 많은 환자 배정. Wei 의 Play-the-Winner, Berry 의 Bayesian RAR 의 정통. 윤리적 이점 vs 통계 효율 손실 — Korn-Freidlin 의 비판 적용. ECMO, I-SPY 2 의 표준 사례. Phase I 의 MAB 와 직접 연결.

핵심 메시지:

RAR: 효과 좋은 arm 에 randomization probability ↑
Wei 1979 Play-the-Winner: Urn model
Berry Bayesian RAR: posterior best probability
Smoothing: \(c\) parameter trade-off
Korn-Freidlin 2011 비판: 통계 효율 손실
ECMO 1985: 극단 사례
I-SPY 2: 성공 사례
MAB 와 연결: TS = Bayesian RAR

다음 글: Bayesian Adaptive + Platform Trials.

11 관련 주제

선행 지식

J-ADAPT-0 Overview
Phase I MAB 시리즈 — Thompson Sampling, UCB
(Phase C) RCT 무작위 배정

Phase J 후속 글

Bayesian + Platform Trials (placeholder)
FDA Adaptive Designs (placeholder)

12 참고문헌

Wei, L. J. (1979). The generalized Pólya’s urn design for sequential medical trials. Annals of Statistics 7, 291-296.
Berry, D. A. (2015). The Brave New World of clinical cancer research. Mol. Oncology 9, 951-959.
Korn, E. L. & Freidlin, B. (2011). Outcome-adaptive randomization: Is it useful? J. Clin. Oncol. 29, 771-776.
Hu, F. & Rosenberger, W. F. (2003). Optimality, variability, power: Evaluating response-adaptive randomization procedures. J. Amer. Statist. Assoc. 98, 671-678.
Bartlett, R. H., Roloff, D. W., Cornell, R. G., Andrews, A. F., Dillon, P. W., Zwischenberger, J. B. (1985). Extracorporeal circulation in neonatal respiratory failure. Pediatrics 76, 479-487.
Barker, A. D., Sigman, C. C., Kelloff, G. J., Hylton, N. M., Berry, D. A., Esserman, L. J. (2009). I-SPY 2: an adaptive breast cancer trial design. Clin. Pharmacol. Ther. 86, 97-100.