Kwangmin Kim - Klein § 7.5~7.6 — Stratified Tests · Matched Pairs

1 들어가며 — Ch.7 의 세 번째 deep-dive

편	주제
Ch.7 Overview	9 절 조망
§ 7.1~7.2	One-Sample
§ 7.3~7.4	K-Sample + Trend
§ 7.5~7.6 (본 편)	Stratified + Matched + Renyi
§ 7.7~7.8 (예정)	Cramer-von Mises · KM-based · Median
§ 7.9 (예정)	Exercises

§ 7.5~7.6 의 한 줄 요약

“§ 7.5 의 stratified test 는 covariate 보정 — strata 별 \(Z_{js}\) 와 \(\widehat{\sigma}_{jgs}\) 를 합산해 글로벌 검정. 단 strata 간 reversed effect 시 cancel out (Klein Example 7.7 BMT \(Z = 0.57\) vs HOD only \(Z = 2.89\)). Matched pairs 의 식 7.5.7 \(Z = (D_1 - D_2)/\sqrt{D_1 + D_2}\) — censored sign test (Klein Example 7.8 6-MP \(D_p = 18, D_{6MP} = 3 \to Z = 3.27\)). § 7.6 의 Renyi 검정 식 7.6.3 \(Q = \sup |Z(t)|/\sigma(\tau)\) 는 crossing hazards 의 log-rank cancel-out 회피 — Klein Example 7.9 gastrointestinal: log-rank \(p = 0.63\) 비유의지만 Renyi \(Q = 2.20\) (\(p = 0.053\)) marginal 검출.”

2 § 7.5 — Stratified Tests

2.1 동기 — Covariate 보정의 필요성

직관 — 왜 stratification 이 필요한가

K-sample 검정 (§ 7.3) 의 한계:

검정: \(h_1 = h_2 = \cdots = h_K\).
Confounding: 다른 covariate (예: 연령, 성별, 질환 stage) 가 군 간 분포 다름 → 군 차이가 진짜 처치 효과인지 covariate 효과인지 모호.

예 — BMT auto vs allo (Klein Example 7.7):

환자가 HOD (Hodgkin’s disease) 인지 NHL (non-Hodgkin’s lymphoma) 인지가 결과에 영향.
HOD vs NHL 비율이 두 처치 군에 다르면 처치 차이가 질환 차이로 혼란.

해결:

회귀 (Cox PH, Ch.8): covariate 를 회귀 모형에 포함.
Stratification (§ 7.5): 각 strata 안에서 검정 → 합산.

→ § 7.5 가 비모수 stratification — 검정만, 모형 없음.

2.2 식 7.5.2·7.5.3·7.5.4

정의: K-Sample Stratified Test

\(M\) strata + \(K\) 군. 가설:

\[ H_0: h_{1s}(t) = h_{2s}(t) = \cdots = h_{Ks}(t), \quad s = 1, \ldots, M, \quad t < \tau \]

각 strata \(s\) 에서 식 7.3.3 으로 \(Z_{js}(\tau)\) + 공분산 행렬 \(\widehat{\sigma}_{jgs}\) 계산.

Strata 합산 (식 7.5.2):

\[ Z_{j \cdot}(\tau) = \sum_{s=1}^M Z_{js}(\tau), \quad \widehat{\sigma}_{jg \cdot} = \sum_{s=1}^M \widehat{\sigma}_{jgs} \]

K-sample 검정 통계량 (식 7.5.3):

\[ \chi^2 = (Z_{1 \cdot}, \ldots, Z_{(K-1) \cdot}) \widehat{\Sigma}_\cdot^{-1} (Z_{1 \cdot}, \ldots, Z_{(K-1) \cdot})^T \sim \chi^2_{K-1} \]

두 군 단순 형태 (식 7.5.4):

\[ Z = \frac{\sum_{s=1}^M Z_{1s}(\tau)}{\sqrt{\sum_{s=1}^M \widehat{\sigma}_{11s}}} \sim N(0, 1) \]

직관 — 식 7.5.4 의 의미

각 strata \(s\) 의 \(Z_{1s}\) 는 그 strata 안에서의 “관측 - 기대”.

합산 \(\sum_s Z_{1s}\):

각 strata 의 차이를 더함 → 글로벌 차이.
공통 방향이면 합이 커짐 → 강한 검정력.
반대 방향이면 cancel out → 약한 검정력.

분모 \(\sqrt{\sum_s \widehat{\sigma}_{11s}}\):

각 strata 분산의 합 → 글로벌 표준오차.
Independence 가정 하 분산 합산.

→ “각 strata 에서 처치 차이를 평균” + 표준화.

2.3 Klein Example 7.7 — BMT (HOD vs NHL Stratification)

데이터 (Klein § 1.10)

Allogeneic vs Autogeneic BMT for HOD or NHL:

\(H_0\): 두 처치의 leukemia-free survival 동일.
Stratification: HOD vs NHL.

손풀이 — 단계별

Step 1 — HOD strata 만:

\(Z_{1, \text{HOD}}(2144) = 3.106\), \(\widehat{\sigma}_{11, \text{HOD}} = 1.518\).

\(Z_{HOD} = 3.106 / \sqrt{1.518} = 2.52 \to Z^2 = \chi^2_1 \approx 8.36\), \(p = 0.004\).
정확한 값: \(Z = 2.89\) (Klein 본문, 단측 형태) → Allo 가 HOD 에서 우월.

Step 2 — NHL strata 만:

\(Z_{1, \text{NHL}}(2144) = -2.306\), \(\widehat{\sigma}_{11, \text{NHL}} = 3.356\).

\(Z_{NHL} = -2.306 / \sqrt{3.356} = -1.26\), \(p = 0.21\).
NHL 에서는 Auto 가 약간 우월 (반대 방향).

Step 3 — Stratified (식 7.5.4):

\[ Z_{\text{strat}} = \frac{3.106 + (-2.306)}{\sqrt{1.518 + 3.356}} = \frac{0.800}{\sqrt{4.874}} = \frac{0.800}{2.207} = 0.568 \]

\(p = 2 \cdot P(Z > 0.568) = 0.5699\).

→ Stratified 비유의 (\(p = 0.57\)).

함정 — Strata 간 Reversed Effect

Klein Example 7.7 의 결과 해석:

HOD 만: Allo 우월 (\(p = 0.004\)).
NHL 만: Auto 약간 우월 (\(p = 0.21\)).
Stratified: 비유의 (\(p = 0.57\)).

왜 stratified 가 비유의?

\(Z_{HOD} = 3.106 > 0\) vs \(Z_{NHL} = -2.306 < 0\) — 반대 방향.

합 \(0.800\) 이 작아짐 → cancel out.

→ Stratified 가 약점: strata 간 처치 효과 방향이 다르면 검정력 매우 약함.

대처:

Practical Note 3: “stratified tests will have good power against alternatives that are in the same direction in each stratum. When this is not the case, these statistics may have very low power, and separate tests for each stratum are indicated.”
즉 각 strata 별 결과 별도 보고 권장.
HOD: Allo 권장. NHL: Auto 약간 권장. → 임상 의사결정에 stratum-specific 결론.

Cox PH (Ch.8) 와의 연결:

Stratification 외에 interaction term (\(treat \times disease\)) 으로 모형화 가능. § 7.5 의 stratified 가 단순 검정만, Cox PH 가 효과 정량화 + 진단.

2.4 Klein Example 7.4 Continuation — BMT 3 그룹 × MTX Strata

137 명 BMT × 2 strata (MTX 사용 vs 비사용)

No MTX strata (Klein Example 7.4 의 일부):

\[ Z_{\text{NOMTX}} = (-103, -892, 995) \]

\[ \widehat{\Sigma}_{\text{NOMTX}} = \begin{pmatrix} 49367 & -32121 & -17246 \\ -32121 & 69389 & -37268 \\ -17246 & -37268 & 54514 \end{pmatrix} \]

MTX strata:

\[ Z_{\text{MTX}} = (20, -45, 25) \]

\[ \widehat{\Sigma}_{\text{MTX}} = \begin{pmatrix} 5137 & -2686 & -2452 \\ -2686 & 4398 & -1712 \\ -2452 & -1712 & 4164 \end{pmatrix} \]

Pooling (식 7.5.2):

\[ Z_\cdot = (-83, -937, 1020) \]

\[ \widehat{\Sigma}_\cdot = \begin{pmatrix} 54504 & -34806 & -19698 \\ -34806 & 73786 & -38980 \\ -19698 & -38980 & 58678 \end{pmatrix} \]

Stratified χ² (식 7.5.3):

\[ \chi^2_{\text{strat}} = (-83, -937) \widehat{\Sigma}_{2 \times 2}^{-1} (-83, -937)^T = 19.14 \]

\(\chi^2_2\) 분포 → \(p < 0.0001\).

비교:

No MTX strata 만: \(\chi^2 = 19.18\) (\(p = 0.0001\)).
MTX strata 만: \(\chi^2 = 0.48\) (\(p = 0.79\)) — MTX 군 작음 (정보 적음).
Stratified: \(\chi^2 = 19.14\) (\(p < 0.0001\)).
단순 (ignoring MTX): \(\chi^2 = 16.24\) (\(p = 0.0003\)).

→ Stratified 가 단순보다 강한 검정력 — covariate 보정의 효과.

2.5 Matched Pairs — Censored Sign Test

정의: Matched Pair 검정 (식 7.5.5·6·7)

\(M\) 개 matched pair. 각 pair \(i\) 의 두 sample 의:

Event times \((T_{1i}, T_{2i})\).
Event indicators \((\delta_{1i}, \delta_{2i})\).

\(H_0\): \(h_{1i}(t) = h_{2i}(t)\) for all \(i\).

식 7.5.5 — 각 pair 의 \(Z_{1i}\):

\[ Z_{1i}(\tau) = \begin{cases} W(T_{1i})/2 & \text{if } T_{1i} < T_{2i}, \delta_{1i} = 1 \\ -W(T_{2i})/2 & \text{if } T_{2i} < T_{1i}, \delta_{2i} = 1 \\ 0 & \text{otherwise (둘 다 censored 또는 censored 먼저)} \end{cases} \]

식 7.5.6 — Pooling:

\[ Z_{1 \cdot}(\tau) = w \cdot \frac{D_1 - D_2}{2}, \quad \widehat{\sigma}_{11 \cdot} = w^2 \cdot \frac{D_1 + D_2}{4} \]

where:

\(D_1\) = sample 1 이 먼저 사건 발생한 pair 수.
\(D_2\) = sample 2 가 먼저.
\(w\) = weight value at smaller event time (모든 pair 동일 형태이므로 cancel).

식 7.5.7 — 검정 통계량:

\[ Z = \frac{D_1 - D_2}{\sqrt{D_1 + D_2}} \sim N(0, 1) \]

직관 — 식 7.5.7 의 도출

식 7.5.6 을 식 7.5.4 에 대입:

\[ Z = \frac{Z_{1 \cdot}}{\sqrt{\widehat{\sigma}_{11 \cdot}}} = \frac{w (D_1 - D_2)/2}{\sqrt{w^2 (D_1 + D_2)/4}} = \frac{w (D_1 - D_2)/2}{w \sqrt{D_1 + D_2}/2} = \frac{D_1 - D_2}{\sqrt{D_1 + D_2}} \]

→ Weight \(w\) 가 cancel out — 어떤 weight 함수를 써도 동일.

Sign test 와의 관계:

Uncensored matched pair 의 sign test: \(D_1, D_2 \sim \text{Binomial}(D_1+D_2, 0.5)\) 가정.
\(D_1 - D_2\) 의 평균 0, 분산 \(D_1 + D_2\) → 식 7.5.7 의 z-score.

→ 식 7.5.7 = censored data 의 sign test 일반화 (Practical Note 2).

효과적 표본 크기 (Practical Note 1):

작은 시간이 사건인 pair 만 검정 정보 제공.
둘 다 censored 또는 censored 먼저 → 0 기여.
효과적 \(n = D_1 + D_2\) — censoring 많을수록 정보 적음.

2.6 Klein Example 7.8 — 6-MP Matched Pairs

데이터 (Klein § 1.2 Table 1.1)

Freireich (1963) 6-MP vs placebo matched-pair 임상시험. 21 pair.

같은 hospital + remission status (complete or partial) 로 매칭.
Pair 안에서 6-MP 또는 placebo 무작위 배정.
추적: 재발 또는 study 종료.

손풀이

Klein Table 1.1 (또는 § 1.2) 에서 21 pair 의 결과:

\(D_{\text{placebo}} = 18\) (placebo 가 먼저 재발).
\(D_{6\text{-MP}} = 3\) (6-MP 가 먼저 재발).
(둘 다 censored: 0 pair).

식 7.5.7 적용:

\[ Z = \frac{18 - 3}{\sqrt{18 + 3}} = \frac{15}{\sqrt{21}} = \frac{15}{4.583} = 3.27 \]

\(p\)-value: \(2 \cdot P(Z > 3.27) = 0.001\).

결론: 6-MP 가 placebo 보다 명확히 우월.

임상 함의 — Matched-Pair 설계의 강한 검정력

Matched-pair 설계의 장점:

Confounding 제거: 같은 hospital + remission status 로 매칭 → 두 군이 시작점에서 동질적.
검정력 강화: pair 간 비교 → 표본 분산 감소.
작은 표본 가능: 21 pair 만으로 \(p = 0.001\) — 강한 효과 검출.

비교 — 단순 두 군 검정 (matching 무시):

21 명 6-MP + 21 명 placebo 의 log-rank → \(\chi^2 \approx 16.79\), \(p < 0.0001\) (Klein Example 4.1 기반).
Matched-pair: \(Z = 3.27\), \(p = 0.001\) — 약간 더 보수적이지만 강함.

→ Matched-pair 가 표본 크기 작을 때 효율적. 임상시험 설계의 표준 옵션.

3 § 7.6 — Renyi Type Tests

3.1 동기 — Crossing Hazards 의 Log-Rank 약점

함정 — Log-Rank Cancel-Out (재강조)

§ 7.3 의 weighted log-rank 는 시간 적분 — 부호 다른 영역 cancel.

예 (Klein Example 7.5 BMT auto vs allo, § 7.3 deep-dive 의 정전 사례):

초기 (≤ 12 mo): allo hazard > auto (GVH 사망).
후기 (> 12 mo): allo hazard < auto (cure).
적분: 두 영역 cancel → log-rank ≈ 0 → 비유의.

Renyi 의 idea: 적분이 아닌 부분 합의 sup (최대값) 사용 → cancel 회피.

3.2 식 7.6.1·7.6.2·7.6.3

정의: Renyi Statistic (식 7.6.1·7.6.2·7.6.3)

식 7.6.1 — Sequential 부분 합:

\[ Z(t_i) = \sum_{t_k \leq t_i} W(t_k) \left[d_{k1} - Y_{k1} \frac{d_k}{Y_k}\right] \]

(시점 \(t_i\) 까지의 식 7.3.3 누적합 — cumulative log-rank statistic).

식 7.6.2 — 분산 (식 7.3.7 의 분모와 동일):

\[ \sigma^2(\tau) = \sum_{t_k \leq \tau} W^2(t_k) \frac{Y_{k1}}{Y_k} \frac{Y_{k2}}{Y_k} \frac{Y_k - d_k}{Y_k - 1} d_k \]

식 7.6.3 — 검정 통계량 (양측):

\[ Q = \frac{\sup_{t \leq \tau} |Z(t)|}{\sigma(\tau)} \]

→ \(H_0\) 하 \(Q\) 의 점근 분포 = \(\sup |B(x)|\) (\(0 \leq x \leq 1\), \(B\) = 표준 Brownian motion). Critical value 표 Klein Appendix C.5.

직관 — Renyi 의 작동 원리

Log-rank 의 약점: \(Z(\tau)\) = 마지막 시점 부분 합 = 적분 결과 = cancel out.

Renyi 의 강점:

\(Z(t)\) 를 모든 시점에서 평가.
\(|Z(t)|\) 의 최대값 취함.
Crossing 시점 직전이 max 가능 → cancel 전 차이 검출.

비유: 비행 항로 — 출발지와 목적지의 거리만 보면 직선이지만, 실제 항로의 최대 이탈을 보면 우회가 보임.

Brownian motion 분포: \(H_0\) 하 \(Z(t)/\sigma(\tau) \to B[\sigma^2_0(t)/\sigma^2_0(\infty)]\) — 표준 Brownian motion 의 변환.

\(\sup |B(x)|\) for \(x \in [0, 1]\) 의 분포: Billingsly 1968, Klein Appendix C.5.
Critical value: \(Q_{0.05} \approx 2.241\), \(Q_{0.025} \approx 2.498\).

3.3 Kolmogorov-Smirnov 의 Censored 일반화

직관 — KS 검정과의 대응

Uncensored 데이터의 Kolmogorov-Smirnov:

\[ KS = \sup_t |F_1(t) - F_2(t)| \]

→ 두 empirical CDF 의 최대 차이.

Censored 데이터의 Renyi (Gill 1980):

\[ Q = \sup_t \left|\sum_{t_k \leq t} W [d_{k1} - Y_{k1} d_k/Y_k]\right| / \sigma(\tau) \]

→ 두 hazard 의 가중 누적 차이의 최대값.

→ Censoring 처리 가능한 KS 일반화. PH 위반 검정에 더 강력.

3.4 Klein Example 7.9 — Gastrointestinal Cancer

데이터 (GITSG 1982, Stablein-Koutrouvelis 1985)

Gastrointestinal Tumor Study Group 의 90 명 advanced gastric cancer:

Group 1: chemotherapy only (n = 45).
Group 2: chemotherapy + radiotherapy (n = 45).
8 년 추적.
3 censored (chemo only) + 8 censored (chemo+radio).

임상 배경 — Crossing Hazards

처치별 hazard 패턴:

Chemo + Radio: 초기 toxicity (radiation 부작용) → 첫 1 년 사망률 매우 높음.
Chemo only: 초기 안전, 후기 disease progression → 후기 사망 많음.
두 군 KM 곡선이 약 1 년에 cross (Klein Figure 7.5).

→ Crossing hazards 의 정전 사례.

손풀이 — Renyi vs Log-Rank

식 7.6.1 적용 (log-rank weight \(W = 1\)):

각 사건 시점 \(t_i\) 까지의 누적 \(Z(t_i)\) 계산. Klein Figure 7.4 의 \(|Z(t_i)|\) plot:

\(t_i\) 작을 때: chemo+radio 가 먼저 사망 → \(Z(t_i) > 0\) 증가 (chemo only 가 적게 사망).
\(t_i\) 약 315 일: \(|Z| = 9.80\) — 최대값.
그 이후: chemo only 가 사망 시작 → \(Z(t)\) 감소.
\(t = 2363\) (마지막 사건): \(Z(2363) = -2.15\) — 음수 (cancel out).

식 7.6.2 — 분산:

\[ \sigma(2363) = 4.46 \]

식 7.6.3 — Renyi 통계량:

\[ Q = \frac{\sup |Z(t)|}{\sigma(\tau)} = \frac{9.80}{4.46} = 2.20 \]

Klein Appendix C.5 에서 \(Q_{0.05} \approx 2.24\) → \(p \approx 0.053\) (marginal 유의).

비교 — 단순 Log-Rank:

\[ Z(2363) = -2.15, \quad \sigma = 4.46 \to z = -2.15/4.46 = -0.48 \]

\(p = 2 \cdot P(Z > 0.48) = 0.63\) → 비유의.

→ Renyi \(p = 0.053\) vs Log-rank \(p = 0.63\) — 결론이 매우 다름.

임상 결론

Log-rank: “두 처치 차이 없음” 잘못된 결론.

Renyi: “차이 있음” (marginal) 검출 — 그러나 방향 정보 없음 (sup |Z| 는 양측만).

→ Renyi 가 차이 검출했지만 어느 처치가 더 좋은지 모름. 추가 분석 필요:

초기 (≤ 1 년): chemo only 가 우월 (radio toxicity 회피).
후기 (> 1 년): chemo+radio 가 우월 (cure).

임상 의사결정:

짧은 생존 기대 환자 → chemo only.
장기 생존 가능 환자 → chemo + radio.

→ Crossing hazards 의 임상 함의: “어느 처치가 좋은가” 가 단순 답 안 됨. 시간에 따라 다름.

4 Practical Notes (4 개)

Practical Note 1 — 일방 Renyi

\(H_0\): \(S_1 = S_2\) vs \(H_A\): \(S_1 < S_2\) 일방 검정:

\[ Q^* = \sup_{t \leq \tau} \frac{Z(t)}{\sigma(\tau)} \]

(절대값 없이 양수 sup).

분포: \(\sup B(t)\) — Brownian motion 의 sup. \(H_0\) 하:

\[ P(\sup B(t) > Q^*) = 2[1 - \Phi(Q^*)] \]

→ 표준 정규의 단측 확률의 2 배. 매우 단순한 critical value.

예: \(Q^* = 1.96\) → \(p = 2 \cdot (1 - 0.975) = 0.05\).

Practical Note 2 (§ 7.5) — Effective Sample Size

Matched pair 의 effective \(n = D_1 + D_2\) — censoring 많을수록 작아짐.

예 — 50 pair 이지만 30 pair 가 둘 다 censored:

\(D_1 + D_2 = 20\) — 효과적 표본 20.
\(Z = (D_1 - D_2)/\sqrt{20}\) 의 분모 작아 검정력 약함.

→ Matched-pair 설계 시 censoring 최소화 가 검정력에 결정적.

Practical Note 3 (§ 7.5) — Stratified Test 의 한계

Strata 간 처치 효과 방향이 같으면: stratified 가 강력.

다르면 (예: BMT Klein Example 7.7): stratified 검정력 매우 약함.

대처:

각 strata 별 결과 별도 보고 (interaction 의심 시).
Cox PH 의 interaction term 으로 명시적 모형.
Forest plot 으로 시각화.

→ Stratification 은 단순 보정 도구 — 복잡한 interaction 은 회귀가 적합.

5 Theoretical Notes

Theoretical Note 1 (§ 7.6) — Gill 1980 Renyi

Gill (1980) 이 식 7.6.3 의 Renyi-type statistic 도입:

Two-sample weighted log-rank 의 sequential maximum.
“Renyi” 라는 이름은 cumulative process 의 sup 이 Renyi 의 일반 framework.

점근: Counting process martingale + Brownian motion 수렴 (Theoretical Note 3).

확장:

Fleming-Harrington 1991: 일반 weight family 의 Renyi.
Schumacher 1984: Pitman efficiency 비교.
Fleming et al. 1987: 점근 detail.

Theoretical Note 2 (§ 7.6) — Brownian Motion Sup 분포

Billingsly (1968) 의 결과:

\[ P(\sup_{0 \leq t \leq 1} |B(t)| > y) = 1 - \frac{4}{\pi} \sum_{k=0}^\infty \frac{(-1)^k}{2k+1} \exp\left[-\frac{\pi^2 (2k+1)^2}{8 y^2}\right] \]

→ Series expansion. Klein Appendix C.5 의 critical value 가 이 식에서 도출.

일방 (\(\sup B\), 절대값 없이):

\[ P(\sup_{0 \leq t \leq 1} B(t) > y) = 2[1 - \Phi(y)] \]

(reflection principle).

Theoretical Note 5 (§ 7.6) — Pitman Efficiency

Schumacher 1984 + Fleming et al. 1987 의 시뮬레이션:

PH + 약한 censoring: Renyi vs log-rank 거의 동등 (검정력 약간 손실).
PH 위반 (crossing hazards): Renyi 매우 우월.
Heavy censoring: Renyi 의 advantage 줄어듦.

→ 실무 권장: PH 의심 시 Renyi 추가 보고 (log-rank 와 함께).

6 응용 분야

상황	적용 검정
Matched cohort (twin, sibling)	식 7.5.7 censored sign
같은 hospital matched	식 7.5.7
Multi-center trial 보정	Stratified by center
연령대 보정	Stratified by age group
면역치료 crossing hazards	Renyi 또는 FH p<q
화학요법 + 방사선 crossing	Renyi (Klein Example 7.9)
Subgroup 별 reversed effect	별도 strata 보고 (Klein Example 7.7)

7 코드 예시

7.1 Step 1 — Stratified Log-Rank (R `survival`)

library(survival)

# Klein Example 7.7 — BMT Allo vs Auto, HOD vs NHL strata
data(bmtcrr)  # 또는 사용자 데이터

# 전체 (stratification 무시)
fit_pooled <- survdiff(Surv(time, event) ~ trt, data = bmtcrr)
print(fit_pooled)

# Stratified by disease (HOD vs NHL)
fit_strat <- survdiff(Surv(time, event) ~ trt + strata(disease),
                       data = bmtcrr)
print(fit_strat)

# Strata 별 분석
fit_hod <- survdiff(Surv(time, event) ~ trt,
                     data = subset(bmtcrr, disease == "HOD"))
fit_nhl <- survdiff(Surv(time, event) ~ trt,
                     data = subset(bmtcrr, disease == "NHL"))

print(fit_hod)
print(fit_nhl)

7.2 Step 2 — Matched Pairs Sign Test (Python)

import numpy as np
from scipy.stats import norm

def matched_pairs_logrank(times1, events1, times2, events2):
    """식 7.5.7 — Censored sign test for matched pairs"""
    D1 = D2 = 0
    censored_pairs = 0
    for t1, e1, t2, e2 in zip(times1, events1, times2, events2):
        if t1 < t2 and e1 == 1:
            D1 += 1
        elif t2 < t1 and e2 == 1:
            D2 += 1
        else:
            censored_pairs += 1

    if D1 + D2 == 0:
        return None, None, 0
    Z = (D1 - D2) / np.sqrt(D1 + D2)
    p = 2 * (1 - norm.cdf(abs(Z)))
    return Z, p, D1 + D2

# Klein Example 7.8 — 6-MP matched pairs (Klein Table 1.1)
times_p = np.array([1, 22, 3, 12, 8, 17, 2, 11, 8, 12,
                    2, 5, 4, 15, 8, 23, 5, 11, 4, 1, 8])
events_p = np.array([1] * 21)
times_6mp = np.array([10, 7, 32, 23, 22, 6, 16, 34, 32, 25,
                       11, 20, 19, 6, 17, 35, 6, 13, 9, 6, 10])
events_6mp = np.array([1, 1, 0, 1, 1, 1, 1, 0, 0, 0,
                        0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0])

Z, p, n_eff = matched_pairs_logrank(times_p, events_p,
                                      times_6mp, events_6mp)
print(f"Z = {Z:.2f}, p = {p:.3f}, effective n = {n_eff}")
# ≈ Z = 3.27, p = 0.001, n_eff = 21

7.3 Step 3 — Renyi Test (Python 직접)

import numpy as np
from scipy.stats import norm

def renyi_test(times1, events1, times2, events2):
    """식 7.6.1·7.6.2·7.6.3 — Renyi log-rank test"""
    times_all = np.concatenate([times1, times2])
    events_all = np.concatenate([events1, events2])
    groups = np.concatenate([np.ones(len(times1)),
                              np.zeros(len(times2))])

    sorted_idx = np.argsort(times_all)
    t_sorted = times_all[sorted_idx]
    e_sorted = events_all[sorted_idx]
    g_sorted = groups[sorted_idx]
    event_times = np.unique(t_sorted[e_sorted == 1])

    Z_cum = 0
    sigma2 = 0
    Z_history = []

    for t in event_times:
        Y1 = np.sum((t_sorted >= t) & (g_sorted == 1))
        Y2 = np.sum((t_sorted >= t) & (g_sorted == 0))
        Y_total = Y1 + Y2
        d1 = np.sum((t_sorted == t) & (e_sorted == 1) & (g_sorted == 1))
        d_total = np.sum((t_sorted == t) & (e_sorted == 1))

        # 식 7.6.1 (W = 1)
        Z_cum += d1 - Y1 * d_total / Y_total
        Z_history.append((t, Z_cum))

        # 식 7.6.2
        if Y_total > 1:
            tc = (Y_total - d_total) / (Y_total - 1)
            sigma2 += (Y1 / Y_total) * (Y2 / Y_total) * tc * d_total

    # 식 7.6.3 — Renyi Q
    sup_Z = max(abs(z) for _, z in Z_history)
    sigma = np.sqrt(sigma2)
    Q = sup_Z / sigma

    # p-value (Klein Appendix C.5 근사)
    # P(sup |B| > Q) — series expansion
    p = 0
    for k in range(20):
        p += (-1) ** k / (2 * k + 1) * np.exp(-np.pi ** 2 * (2*k+1)**2 / (8 * Q**2))
    p = 1 - 4 / np.pi * p

    return Q, p, sup_Z, sigma, Z_history


# Klein Example 7.9 — gastrointestinal cancer
times_chemo = [1, 63, 105, 129, 182, 216, 250, 262, 301, 301,
                342, 354, 356, 358, 380, 383, 383, 388, 394,
                408, 460, 489, 499, 523, 524, 535, 562, 569,
                675, 676, 748, 778, 786, 797, 955, 968, 1000,
                1245, 1271, 1420, 1551, 1694, 2363, 2754, 2950]
events_chemo = [1] * 42 + [0, 0, 0]  # 마지막 3 censored

times_radio = [17, 42, 44, 48, 60, 72, 74, 95, 103, 108, 122,
                144, 167, 170, 183, 185, 193, 195, 197, 208,
                234, 235, 254, 307, 315, 401, 445, 464, 484,
                528, 542, 547, 577, 580, 795, 855, 1366, 1577,
                2060, 2412, 2486, 2796, 2802, 2934, 2988]
events_radio = [1] * 38 + [0] * 7

# (실제 데이터의 censoring 패턴 확인 필요)
Q, p, sup_Z, sigma, history = renyi_test(times_chemo, events_chemo,
                                           times_radio, events_radio)
print(f"Q = {Q:.2f}, p = {p:.3f}")  # 약 Q = 2.20, p = 0.053

7.4 Step 4 — R `survRM2` (Restricted Mean Survival Difference)

PH 위반 시 RMST 차이가 또 다른 대안:

library(survRM2)

# Klein Example 7.5 — BMT auto vs allo crossing
result <- rmst2(time = bmt$time,
                status = bmt$event,
                arm = (bmt$type == "allo"),
                tau = 60)  # 60 months

print(result)
# Allo vs auto 의 RMST 차이 + 95% CI + p-value
# Crossing hazards 의 또 다른 비모수 검정 옵션

8 핵심 takeaway

§ 7.5~7.6 의 5 가지 교훈

Stratified test (식 7.5.4) — Strata 별 \(Z_{js}\) 와 분산을 합산해 covariate 보정. 단 strata 간 reversed effect 시 cancel out (Klein 7.7 BMT \(Z=0.57\) vs HOD \(Z=2.89\)). 각 strata 별 결과 별도 보고 권장.
Stratified vs 단순 비교 — Klein 7.4 BMT × MTX: stratified \(\chi^2=19.14\) vs 단순 \(\chi^2=16.24\). Covariate 보정으로 검정력 강화.
Matched pair 식 7.5.7 = censored sign test — \(Z = (D_1 - D_2)/\sqrt{D_1+D_2}\), weight 무관. Klein 7.8 6-MP \(D_p=18, D_{6MP}=3 \to Z=3.27, p=0.001\). Effective \(n = D_1 + D_2\) — censoring 많을수록 작아짐.
Renyi (식 7.6.3) 의 sup 최대값 — Crossing hazards 의 log-rank cancel-out 회피. \(Q = \sup|Z(t)|/\sigma(\tau) \to \sup |B(x)|\) Brownian motion 분포. Kolmogorov-Smirnov 의 censored 일반화.
Klein 7.9 GITSG 정전 — log-rank vs Renyi — log-rank \(p=0.63\) (cancel-out 비유의) vs Renyi \(Q=2.20, p=0.053\) (marginal 검출). PH 위반 시 다양 검정 보고 + 임상 해석은 시간대별 분리.

9 관련 주제

선행 지식

후속 주제

§ 7.7 — Other Two-Sample Tests (Cramer-von Mises, KM-based, median)
§ 7.8 — Fixed-time test
§ 7.9 — Exercises
Ch.8 — Cox PH (stratified Cox + interaction term + score test)

관련 개념

Sign test (uncensored matched pair) → 식 7.5.7 censored 일반화
Kolmogorov-Smirnov (uncensored) → § 7.6 Renyi censored 일반화
Brownian motion sup 분포 (Billingsly 1968) — Renyi critical value 의 토대
Pitman efficiency (Schumacher 1984) — 검정 비교의 표준
Cox PH stratification (Ch.9) — § 7.5 의 회귀 일반화