Kwangmin Kim - Klein Ch.1 § 1.9~1.10 심화 — Autologous/Allogeneic BMT

1 들어가며 — Ch.1 시리즈의 마무리

Klein 시리즈 사다리:

편	주제
Ch.1 Overview (01)	19 예제 catalog
§ 1.1~1.2 (01-1)	Introduction + 6-MP Leukemia
§ 1.3~1.4 (01-2)	BMT + Dialysis
§ 1.5~1.6 (01-3)	Breast Cancer + Burn
§ 1.7~1.8 (01-4)	Kidney Transplant + Laryngeal Cancer
§ 1.9~1.10 (본 편)	Auto/Allo BMT + Lymphoma BMT + Ch.1 결산

본 편이 답하는 다섯 가지 질문

Autologous vs Allogeneic BMT — graft-vs-leukemia 와 GVHD 의 trade-off 가 어떻게 통계 분석 설계에 영향을 주는가?
Cox 진단 4 도구 (martingale·score·deviance·influence) 가 각각 무엇을 검출하는가?
Karnofsky score 의 functional form 을 어떻게 결정 (linear vs spline)?
Stratified analysis by disease type — HOD/NHL 을 stratum 으로 두고 Allo/Auto 비교의 통계적 함의?
Ch.1 의 9 핵심 예제가 Klein 13 chapter 의 어느 도구 를 어떻게 시연하는지의 통합 매핑?

2 § 1.9 Autologous vs Allogeneic BMT (IBMTR)

2.1 의학적 배경 — 두 종류의 골수이식

2.1.1 Autologous BMT

환자 자신의 골수 채취
    ↓ 동결 보존
    ↓ 고용량 chemotherapy (myeloablation)
    ↓ 환자에게 재주입
회복

장점: GVHD 없음, donor 필요 없음.
단점: 골수에 잠재 백혈병 세포 포함 가능 → 재발 위험.
비유: “내 백혈병 정원에 내 흙 다시 깔기 — 잡초 씨앗 잔존.”

2.1.2 Allogeneic BMT

HLA-matched sibling donor 의 골수 채취
    ↓
환자 (myeloablated) 에게 주입
회복 + 새 면역계 형성 (engraftment)
    ↓
Graft-vs-Leukemia (GVL): donor 면역세포가 잔존 백혈병 공격

장점: GVL 효과 → 재발 위험 감소.
단점: GVHD (Graft-vs-Host Disease), donor 필요.
비유: “다른 사람의 흙 깔기 — 새 잡초 씨앗 없으나 흙 자체가 환자와 충돌.”

직관 — Auto vs Allo 의 통계적 trade-off

Auto 의 사망 원인 분포:

재발 (relapse): 많음 (백혈병 세포 잔존).
치료 관련 사망 (TRM): 적음 (GVHD 없음).

Allo 의 사망 원인 분포:

재발: 적음 (GVL 효과).
TRM: 많음 (GVHD, infection).

→ “Leukemia-free survival” 같은 composite endpoint 가 trade-off 통합.

Composite = 재발 OR TRM.
두 BMT 의 leukemia-free survival 이 비슷할 수 있음 (각자의 약점 상쇄).
또는 한쪽이 명확히 우월 → 환자 특성 기반 선택.

본 데이터 (101 명) 가 이 trade-off 분석의 표준.

2.2 데이터 — Table 1.4

n: 101 (51 auto + 50 allo).
Disease: advanced acute myelogenous leukemia (AML).
Source: International Bone Marrow Transplant Registry (IBMTR).

2.2.1 Allo 50 명 (개월)

0.030, 0.493, 0.855, 1.184, 1.283, 1.480, 1.776, 2.138, 2.500, 2.763, 2.993, 3.224, 3.421, 4.178, 4.441+, 5.691, 5.855+, 6.941+, 6.941, 7.993+, 8.882, 8.882, 9.145+, 11.480, 11.513, 12.105+, 12.796, 12.993+, 13.849+, 16.612+, 17.138+, 20.066, 20.329+, 22.368+, 26.776+, 28.717+ (×2), 32.928+, 33.783+, 34.211+, 34.770+, 39.539+, 41.118+, 45.033+, 46.053+, 46.941+, 48.289+, 57.401+, 58.322+, 60.625+

2.2.2 Auto 51 명 (개월)

0.658, 0.822, 1.414, 2.500, 3.322, 3.816, 4.737, 4.836+, 4.934, 5.033, 5.757, 5.855, 5.987, 6.151, 6.217, 6.447+, 8.651, 8.717, 9.441+, 10.329, 11.480, 12.007, 12.007+, 12.237, 12.401+, 13.059+, 14.474+, 15.000+, 15.461, 15.757, 16.480, 16.711, 17.204+, 17.237, 17.303+, 17.664+, 18.092, 18.092+, 18.750+, 20.625+, 23.158, 27.730+, 31.184+, 32.434+, 35.921+, 42.237+, 44.638+, 46.480+, 47.467+, 48.322+, 56.086

직관 — 데이터 첫 인상

Allo 50 명:

사건: 약 24/50 (48%).
Censored: 약 26/50 (52%).
초반 (0.03~3) 에 사건 집중 — TRM 의심.

Auto 51 명:

사건: 약 24/51 (47%).
Censored: 약 27/51 (53%).
사건 분포가 시간 따라 분산 — 재발 의심.

관찰:

두 그룹의 leukemia-free survival 비슷 (~50%).
그러나 사건 시점 패턴 다름 → PH 가정 위반 의심.
Allo: 초반 위험 (TRM), 후반 안전 (GVL).
Auto: 일정 위험 (잔존 백혈병의 재발).

2.3 Klein 책 사용 매핑

Chapter	본 데이터 사용
Ch.7	Weighted log-rank·censored median test·censored t-test 비교
Ch.11.3	Martingale residuals (전체 모델 적합)
Ch.11.4	Score residuals (PH 가정 점검)
Ch.11.5	Deviance residuals (outlier 검출)
Ch.11.6	Influence diagnostics (개별 영향)
Ch.12.5	Parametric AFT diagnostic plots

→ 6 chapter 의 도구 시연. Cox 진단 표준 데이터.

2.4 Cox 진단 4 도구 (Ch.11.3~11.6)

2.4.1 1. Martingale Residual (Ch.11.3)

\[ \hat M_i = \delta_i - \hat H(T_i \mid Z_i) \]

$\delta_i$ = 실제 사건 (1 또는 0).
$\hat H(T_i \mid Z_i)$ = 모델 예측 누적 위험.

해석: “실제 사건 - 모델 예측”.

직관 — Martingale Residual 의 의미

양수 ($\hat M_i > 0$): 모델보다 일찍 사건 발생. 모델이 과소예측.
음수: 모델보다 늦게 사건 또는 censored. 과대예측.
0: 모델이 정확.

용도:

Functional form 점검: $\hat M_i$ vs 공변량 $Z_i$ plot.
- 패턴 (linear, U-shape) → 비선형 변환 필요.
- 무작위 → 현재 form 적절.
Overall fit 점검: $\hat M_i$ vs predicted 의 분포.

본 데이터 (BMT): leukemia-free survival 의 functional form (group, age 등) 점검.

2.4.2 2. Score Residual (Ch.11.4)

각 개체 $i$ 의 score function (log-likelihood 의 1차 미분):

\[ \hat L_i(\beta) = \int_0^{T_i} \bigl(Z_i - \bar Z(t)\bigr) dM_i(t) \]

해석: “각 개체가 score 에 기여하는 양”.

직관 — Score Residual 의 PH 검정 활용

PH 가정 위반 시 score residual 이 시간 따라 패턴 보임.

검정 (Schoenfeld residuals 와 유사):

\[ \rho = \text{cor}(\hat L_i, \log T_i) \]

$\rho \approx 0$: PH 적합.
$\rho \neq 0$: PH 위반.

또는 visual: $\hat L_i$ vs $\log T_i$ plot.

2.4.3 3. Deviance Residual (Ch.11.5)

Martingale 의 정규화 변환:

\[ \hat D_i = \text{sign}(\hat M_i) \sqrt{-2 (\hat M_i + \delta_i \log(\delta_i - \hat M_i))} \]

분포가 standard normal 에 근사.
Outlier 검출에 유용.

직관 — Deviance Residual 의 정규화

Martingale 의 한계:

$\delta_i = 0$: $\hat M_i \in [-\infty, 0]$.
$\delta_i = 1$: $\hat M_i \in [0, 1]$ approximately.
비대칭 분포 → outlier 비교 어려움.

Deviance:

대칭 (정규 근사).
$|\hat D_i| > 2$: outlier 후보.
$|\hat D_i| > 3$: 주의 깊게.

본 데이터: outlier 식별 → influence diagnostics 와 함께 분석.

2.4.4 4. Influence Diagnostics (Ch.11.6)

각 개체를 제거 시 추정치 변화:

\[ \Delta\hat\beta_{(i)} = \hat\beta - \hat\beta_{(-i)} \]

$\hat\beta_{(-i)}$ = 개체 $i$ 제외 적합.
“Jackknife residual”.

직관 — Influence 의 가치

Outlier vs Influential:

Outlier: deviance 큼 — 모델 적합 안 됨.
Influential: 제거 시 결과 크게 변화 — 추정에 큰 영향.
두 측면이 다름.

예:

가장자리 (extreme covariate) outlier → influential.
중앙 outlier → 덜 influential.

검출:

$|_{(i)}| > $ threshold → 개체 $i$ 가 영향적.
그 개체의 데이터 검증 (입력 오류 여부 등).

본 데이터: 101 명 중 individual influence 진단으로 robust 추정.

3 § 1.10 Lymphoma BMT (Avalos 1993)

3.1 의학적 배경

3.1.1 Hodgkin’s vs Non-Hodgkin’s Lymphoma

HOD (Hodgkin’s lymphoma): Reed-Sternberg 세포 특징, B-cell 기원, 예후 좋음.
NHL (Non-Hodgkin’s lymphoma): 다양한 subtype, 일반적 lymphoma, 예후 가변.

→ 다른 disease entity — baseline survival 다름.

3.1.2 Karnofsky Performance Status (KPS)

기능 상태 평가:

KPS	의미
100	정상
90	약한 증상
80	정상 활동 노력 필요
70	자기 관리 가능, 일 불가
60	가끔 도움 필요
50	상당한 도움 필요
40	거의 무능
30	심각한 무능
20	매우 아픔
10	임종 임박
0	사망

사전 건강 상태 평가 표준 척도.
이식 적합성 결정의 핵심 변수.

3.1.3 Waiting Time

진단 → 이식 까지의 시간 (개월).

짧음: 빠른 결정 + donor 조건 좋음.
김: disease progression, 반대로 health 회복할 시간.

3.2 데이터 — Table 1.5

n: 43.
4 그룹:
- Allo NHL (8 명)
- Auto NHL (10 명)
- Allo HOD (10 명)
- Auto HOD (15 명)
변수:
- $T_i$: leukemia-free survival 시간 (일).
- $\delta_i$: 사건 indicator (1 = death/relapse, 0 = censored).
- $Z_1$: Karnofsky score (40~100).
- $Z_2$: Waiting time (5~171 개월).

3.3 Klein 책 사용 매핑

Chapter	본 데이터 사용
Ch.7.5	Stratified test by disease (HOD/NHL stratum 보정 후 Allo/Auto 비교)
Ch.11.3	Martingale residual 로 Karnofsky score functional form 결정

3.4 Stratified Test by Disease (Ch.7.5)

3.4.1 동기

HOD 와 NHL 은 다른 disease → baseline survival 다름.
Pooled Allo vs Auto 비교 시 disease type 이 confounder.

3.4.2 Stratified Log-Rank

각 stratum (HOD, NHL) 안에서 Allo vs Auto 비교 후 합산:

\[ Q_{\text{strat}} = \frac{(\sum_s O_s - E_s)^2}{\sum_s V_s} \]

$s$ = stratum (HOD, NHL).
각 stratum 에서 observed - expected 합산.

직관 — Stratified vs Adjusted

Stratified test:

Stratum 안에서 비교 → 합산.
“HOD 안의 Allo/Auto 차이 + NHL 안의 Allo/Auto 차이 = 평균 효과”.
Stratum 별 baseline 자유 (다른 hazard).

Adjusted Cox (covariate 로 disease):

\[ h(t \mid \text{disease, type}) = h_0(t) \exp(\beta_1 \text{disease} + \beta_2 \text{type}) \]

Disease 효과를 multiplicative 가정.
PH 가정 (disease HR 가 시간 무관).

언제 어느 것?

Disease 가 nuisance covariate + PH 의심 → stratified.
Disease 효과 자체가 관심 → adjusted.
본 데이터 (lymphoma): disease 가 confounder, 관심은 Allo/Auto → stratified 권장.

3.5 Functional Form of Karnofsky Score (Ch.11.3)

3.5.1 문제

Karnofsky 가 continuous (10 단위 ordinal). 어떻게 모델링?

Linear: $\beta \cdot$ KPS — “1 단위 → constant hazard ratio”.
Categorical: 100, 90-80, 70-60, ≤50 등 4 그룹.
Spline: 부드러운 비선형.

3.5.2 Martingale Residual 진단

Cox model 적합 (KPS 없이) → martingale residual 계산 → KPS 와 plot.

fit_null <- coxph(Surv(time, event) ~ disease + type, data = lymphoma)
mart <- residuals(fit_null, type = "martingale")
plot(lymphoma$kps, mart, ylab = "Martingale residual",
     xlab = "Karnofsky score")
lines(lowess(lymphoma$kps, mart), col = "red")

해석:

Linear pattern: KPS 의 linear effect 적절.
Curved pattern: 비선형 (spline 또는 binning) 필요.
U-shape: very low + very high KPS 모두 위험 (예외).

직관 — Martingale Residual 진단의 위력

역할:

“현재 모델이 covariate $Z$ 의 functional form 을 정확히 잡고 있는가?”
$Z$ 가 모델에 포함되지 않은 채 fit → residual 이 $Z$ 의 패턴 노출.

예:

단순 Cox: $h(t) = h_0(t) \exp(\beta \cdot \text{KPS})$ 가정.
만약 진짜는 $\exp(\beta_1 \cdot \text{KPS} + \beta_2 \cdot \text{KPS}^2)$ → KPS-only fit 의 residual 이 KPS 의 quadratic 패턴 보임.

해결:

Lowess fit 으로 패턴 시각화.
명확한 패턴 → 비선형 변환 추가.

본 데이터: KPS 가 hazard 와 monotone 한지, 임계값 (예: KPS < 70 = 위험 급증) 이 있는지 점검.

4 R + Python EDA — Auto/Allo BMT

4.1 R — `survival` + Cox 진단

library(survival)
library(survminer)

# Klein Table 1.4 — Auto vs Allo BMT (101명)
allo_times <- c(0.030, 0.493, 0.855, 1.184, 1.283, 1.480, 1.776, 2.138,
                2.500, 2.763, 2.993, 3.224, 3.421, 4.178, 4.441, 5.691,
                5.855, 6.941, 6.941, 7.993, 8.882, 8.882, 9.145, 11.480,
                11.513, 12.105, 12.796, 12.993, 13.849, 16.612, 17.138,
                20.066, 20.329, 22.368, 26.776, 28.717, 28.717, 32.928,
                33.783, 34.211, 34.770, 39.539, 41.118, 45.033, 46.053,
                46.941, 48.289, 57.401, 58.322, 60.625)
allo_status <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1,
                 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
                 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

auto_times <- c(0.658, 0.822, 1.414, 2.500, 3.322, 3.816, 4.737, 4.836,
                4.934, 5.033, 5.757, 5.855, 5.987, 6.151, 6.217, 6.447,
                8.651, 8.717, 9.441, 10.329, 11.480, 12.007, 12.007,
                12.237, 12.401, 13.059, 14.474, 15.000, 15.461, 15.757,
                16.480, 16.711, 17.204, 17.237, 17.303, 17.664, 18.092,
                18.092, 18.750, 20.625, 23.158, 27.730, 31.184, 32.434,
                35.921, 42.237, 44.638, 46.480, 47.467, 48.322, 56.086)
auto_status <- c(1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,
                 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0,
                 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)

bmt <- data.frame(
  group = c(rep("Allo", 50), rep("Auto", 51)),
  time = c(allo_times, auto_times),
  status = c(allo_status, auto_status)
)

# KM
fit <- survfit(Surv(time, status) ~ group, data = bmt)
ggsurvplot(fit, data = bmt, pval = TRUE, conf.int = TRUE,
           palette = c("blue", "red"),
           xlab = "Months", ylab = "Leukemia-free survival",
           legend.labs = c("Allo", "Auto"))

# Standard log-rank
survdiff(Surv(time, status) ~ group, data = bmt)

# Weighted log-rank: Wilcoxon (early difference)
survdiff(Surv(time, status) ~ group, data = bmt, rho = 1)

# Cox model
cox_fit <- coxph(Surv(time, status) ~ group, data = bmt)
summary(cox_fit)

# 1. Martingale residuals (overall fit)
mart <- residuals(cox_fit, type = "martingale")
plot(predict(cox_fit), mart, xlab = "Linear predictor",
     ylab = "Martingale residual", main = "Overall fit")
abline(h = 0, lty = 2)

# 2. Score residuals (PH check)
score <- residuals(cox_fit, type = "score")
plot(bmt$time, score[, 1], xlab = "Time",
     ylab = "Score residual", main = "PH assumption")

# 또는 cox.zph (Schoenfeld 기반 PH 검정)
test_ph <- cox.zph(cox_fit)
print(test_ph)
plot(test_ph)

# 3. Deviance residuals (outlier)
dev <- residuals(cox_fit, type = "deviance")
plot(dev, xlab = "Index", ylab = "Deviance residual",
     main = "Outlier detection")
abline(h = c(-2, 2), col = "red", lty = 2)

# 4. Influence diagnostics (jackknife)
infl <- residuals(cox_fit, type = "dfbeta")
plot(infl, xlab = "Index", ylab = "DFBETA",
     main = "Individual influence")

# Parametric AFT (Ch.12)
aft_wei <- survreg(Surv(time, status) ~ group, data = bmt, dist = "weibull")
aft_logn <- survreg(Surv(time, status) ~ group, data = bmt, dist = "lognormal")

# AIC 비교
AIC(cox_fit, aft_wei, aft_logn)

4.2 Python — `lifelines` 진단

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lifelines import KaplanMeierFitter, CoxPHFitter, WeibullAFTFitter
from lifelines.statistics import logrank_test

# 데이터 (R 와 동일)
bmt = pd.DataFrame({
    "group": ["Allo"] * 50 + ["Auto"] * 51,
    "time": list(allo_times) + list(auto_times),
    "status": list(allo_status) + list(auto_status),
    "auto": [0]*50 + [1]*51,
})

# KM
fig, ax = plt.subplots(figsize=(9, 6))
for grp, color in [("Allo", "blue"), ("Auto", "red")]:
    sub = bmt[bmt["group"] == grp]
    kmf = KaplanMeierFitter()
    kmf.fit(sub["time"], sub["status"], label=grp)
    kmf.plot_survival_function(ax=ax, color=color)
ax.set_xlabel("Months")
ax.set_ylabel("Leukemia-free survival")
plt.tight_layout()

# Cox
cph = CoxPHFitter()
cph.fit(bmt[["time", "status", "auto"]], duration_col="time", event_col="status")
print(cph.summary)

# Martingale residuals
mart = cph.compute_residuals(bmt[["time", "status", "auto"]], kind="martingale")
plt.figure()
plt.scatter(cph.predict_partial_hazard(bmt[["auto"]]), mart["martingale"])
plt.axhline(0, ls="--", color="gray")
plt.xlabel("Predicted hazard")
plt.ylabel("Martingale residual")

# Schoenfeld for PH
ph_check = cph.check_assumptions(bmt[["time", "status", "auto"]],
                                   p_value_threshold=0.05)

# Deviance residuals
dev = cph.compute_residuals(bmt[["time", "status", "auto"]], kind="deviance")
plt.figure()
plt.scatter(range(len(dev)), dev["deviance"])
plt.axhline(2, ls="--", color="red")
plt.axhline(-2, ls="--", color="red")
plt.xlabel("Patient index")
plt.ylabel("Deviance residual")

5 R + Python EDA — Lymphoma BMT

5.1 R 직접 입력 (Klein Table 1.5 일부)

library(survival)

# Avalos 1993 — 43명 lymphoma BMT (구조 시연; 실제 데이터는 책 부록)
# 4 그룹: Allo NHL (8), Auto NHL (10), Allo HOD (10), Auto HOD (15)
set.seed(42)
n_groups <- list("Allo NHL" = 8, "Auto NHL" = 10,
                 "Allo HOD" = 10, "Auto HOD" = 15)

lymphoma <- data.frame(
  disease_type = unlist(lapply(c("NHL", "NHL", "HOD", "HOD"),
                                function(d) rep(d, c(8, 10, 10, 15)[
                                  match(d, c("NHL", "NHL", "HOD", "HOD"))]))),
  bmt_type = c(rep("Allo", 8), rep("Auto", 10), rep("Allo", 10), rep("Auto", 15)),
  time = c(28, 32, 49, 84, 357, 933, 1078, 1183, 1560, 2114, 2144,
           42, 53, 57, 63, 81, 140, 176, 210, 252, 476, 524, 1037,
           2, 4, 72, 77, 79, 30, 36, 41, 52, 62, 108, 132, 180, 307, 406, 446,
           484, 748, 1290, 1345),
  status = c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,
             1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0,
             1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0),
  kps = c(90, 30, 40, 60, 70, 90, 100, 90, 80, 80, 90,
          80, 90, 30, 60, 50, 100, 80, 90, 90, 90, 90, 90,
          20, 50, 80, 60, 70,
          90, 80, 70, 60, 90, 70, 60, 100, 100, 100, 90, 90, 90, 80),
  wait = c(24, 7, 8, 10, 42, 9, 16, 16, 20, 27, 5,
           19, 17, 9, 13, 12, 11, 38, 16, 21, 24, 39, 84,
           34, 28, 59, 102, 71,
           73, 61, 34, 18, 40, 65, 17, 61, 24, 48, 52, 84, 171, 20, 98)
)

# KM by group
fit <- survfit(Surv(time, status) ~ disease_type + bmt_type, data = lymphoma)
ggsurvplot(fit, data = lymphoma, pval = TRUE,
           xlab = "Days", ylab = "Survival")

# Stratified log-rank by disease (Ch.7.5)
survdiff(Surv(time, status) ~ bmt_type + strata(disease_type), data = lymphoma)

# Cox with disease + type
cox_full <- coxph(Surv(time, status) ~ bmt_type + disease_type, data = lymphoma)
summary(cox_full)

# Stratified Cox (disease as stratum)
cox_strat <- coxph(Surv(time, status) ~ bmt_type + strata(disease_type),
                   data = lymphoma)
summary(cox_strat)

# Functional form of KPS via martingale (Ch.11.3)
cox_partial <- coxph(Surv(time, status) ~ bmt_type + strata(disease_type),
                     data = lymphoma)
mart <- residuals(cox_partial, type = "martingale")

plot(lymphoma$kps, mart, xlab = "Karnofsky score",
     ylab = "Martingale residual",
     main = "Functional form of KPS")
lines(lowess(lymphoma$kps, mart, iter = 0), col = "red", lwd = 2)
abline(h = 0, lty = 2)

# Wait time 도 마찬가지
plot(lymphoma$wait, mart, xlab = "Waiting time (months)",
     ylab = "Martingale residual",
     main = "Functional form of waiting time")
lines(lowess(lymphoma$wait, mart, iter = 0), col = "red")

5.2 Python — Stratified Cox

from lifelines import CoxPHFitter

lymphoma = pd.DataFrame({
    "disease_NHL": [1]*18 + [0]*25,
    "auto": [0]*8 + [1]*10 + [0]*10 + [1]*15,
    "kps": [90, 30, 40, 60, 70, 90, 100, 90, 80, 80, 90,
            80, 90, 30, 60, 50, 100, 80, 90, 90, 90, 90, 90,
            20, 50, 80, 60, 70, 90, 80, 70, 60, 90, 70, 60,
            100, 100, 100, 90, 90, 90, 80, 90],
    "wait": [24, 7, 8, 10, 42, 9, 16, 16, 20, 27, 5,
             19, 17, 9, 13, 12, 11, 38, 16, 21, 24, 39, 84,
             34, 28, 59, 102, 71, 73, 61, 34, 18, 40, 65, 17,
             61, 24, 48, 52, 84, 171, 20, 98],
    "time": [28, 32, 49, 84, 357, 933, 1078, 1183, 1560, 2114, 2144,
             42, 53, 57, 63, 81, 140, 176, 210, 252, 476, 524, 1037,
             2, 4, 72, 77, 79, 30, 36, 41, 52, 62, 108, 132, 180,
             307, 406, 446, 484, 748, 1290, 1345],
    "status": [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,
               1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0,
               1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1]
})

# Stratified Cox by disease
cph = CoxPHFitter()
cph.fit(lymphoma, duration_col="time", event_col="status",
        strata=["disease_NHL"])
print(cph.summary)

# Functional form check
mart = cph.compute_residuals(lymphoma, kind="martingale")
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].scatter(lymphoma["kps"], mart["martingale"])
axes[0].axhline(0, ls="--", color="gray")
axes[0].set_xlabel("Karnofsky score")
axes[0].set_ylabel("Martingale residual")
axes[1].scatter(lymphoma["wait"], mart["martingale"])
axes[1].axhline(0, ls="--", color="gray")
axes[1].set_xlabel("Waiting time")
axes[1].set_ylabel("Martingale residual")
plt.tight_layout()

6 Ch.1 시리즈 결산

6.1 9 핵심 예제의 통계 도구 매핑

§	데이터	n	핵심 도구	Klein chapter
1.2	Leukemia 6-MP	42	Matched-pair, KM, log-rank	4·6·7·9
1.3	BMT (AML/ALL)	137	Multistate, competing risks, time-dep	4·6·7·8·9·11
1.4	Kidney Dialysis	119	PH violation, weighted log-rank	7·8·9
1.5	Breast Cancer	45	Small sample, Aalen additive	8·10
1.6	Burn	154	Multi-covariate Cox, time-dep	8·9
1.7	Kidney Transplant	863	Kernel smoothing, continuous covariate	6·8
1.8	Laryngeal Cancer	90	Ordinal trend, AFT, deviance residuals	7·8·10·12
1.9	Auto/Allo BMT	101	Cox 4 진단 도구	7·11·12
1.10	Lymphoma BMT	43	Stratified test, functional form	7·11

6.2 Klein 13 chapter 의 데이터-도구 cross-reference

Ch	도구	시연 데이터
2	Basic quantities ($S, h, mrl$)	일반
3	Censoring/truncation likelihood	모든
4	KM, NA, CIF	1.2, 1.3, 1.7
5	Other sampling (interval, double, right truncation)	외부
6	Kernel hazard, Bayesian density	1.3, 1.7
7	Hypothesis tests (log-rank, weighted, trend, stratified)	1.2, 1.4, 1.8, 1.9, 1.10
8	Cox PH (fixed)	1.5, 1.6, 1.7, 1.8
9	Cox refinements (time-dep, stratified, multistate)	1.3, 1.6
10	Additive hazards	1.5, 1.8
11	Diagnostics (martingale, score, deviance, influence)	1.9, 1.10
12	Parametric AFT	1.8, 1.9
13	Multivariate survival	외부

6.3 데이터-도구 매핑의 페다고지

직관 — 9 데이터로 13 chapter 학습

각 데이터는 한두 개 핵심 도구 시연:

단순한 데이터 (1.2 Leukemia, 1.5 Breast, 1.10 Lymphoma): 작은 n, 단순 비교 → 기본 도구 (KM, log-rank, Cox).
풍부한 데이터 (1.3 BMT, 1.6 Burn): 다변량 + time-dep → 고급 도구 (multistate, time-dep Cox).
큰 데이터 (1.7 Kidney): 비모수 smoothing.
Ordinal 데이터 (1.8 Laryngeal): trend test, AFT.
Diagnostics 데이터 (1.9, 1.10): 진단 + functional form.

→ 13 chapter 의 도구가 9 데이터로 모두 시연.

이는 풍부한 학습 자원 — 실제 의학 연구의 다양성이 통계 도구의 다양성을 동기 부여.

7 핵심 직관 통합

Auto vs Allo trade-off = GVL effect (재발 감소) vs GVHD (TRM 증가).
Composite endpoint (leukemia-free survival) = trade-off 통합.
Cox 진단 4 도구:
- Martingale: overall fit.
- Score: PH 검정.
- Deviance: outlier.
- Influence: 개별 영향.
Stratified analysis = nuisance covariate (disease type) 통제 + 관심 변수 (BMT type) 비교.
Functional form = continuous covariate (KPS) 의 linear/spline 결정.
9 데이터 → 13 chapter = 풍부한 학습 매핑.

8 실전 체크리스트 — § 1.9~1.10

§ 1.9 Auto/Allo BMT

Auto/Allo trade-off 인지 (GVL vs GVHD).
Leukemia-free survival = composite endpoint.
KM + standard log-rank + weighted log-rank 비교.
Cox 진단 4 도구:
- Martingale (overall fit).
- Score (PH).
- Deviance (outlier).
- Influence (DFBETA).
Parametric AFT 비교 (Weibull, lognormal).

§ 1.10 Lymphoma BMT

HOD vs NHL = different disease, baseline 다름.
Stratified Cox by disease — disease 가 confounder.
Karnofsky score 의 functional form 결정 (martingale residual).
Waiting time 의 functional form.
4 그룹 (HOD/NHL × Allo/Auto) KM 비교.

EDA

그룹별 events·censored·n·median.
KM curve + 신뢰구간.
Cox 진단 plots (4 종).
PH 가정 점검 (Schoenfeld).

Ch.1 결산

9 핵심 예제와 13 chapter 도구 매핑 인지.
각 데이터의 핵심 통계 도전 식별.

다음 단계

§ 1.11~1.19 (10 가지 추가 예제, optional).
Ch.2 (Basic Quantities and Models) 으로 진행 — $S(t), h(t)$ 의 정확한 정의·관계·parametric models.

9 관련 주제

Klein 시리즈

관련 개념 (cross-category)

10 참고문헌

Klein, J. P., & Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data (2nd ed.), Ch.1 § 1.9~1.10. Springer.
Avalos, B. R., Klein, J. L., Kapoor, N., et al. (1993). Bone Marrow Transplantation for Hodgkin’s and Non-Hodgkin’s Lymphoma. Bone Marrow Transplantation, 11(3), 225-232.
International Bone Marrow Transplant Registry (IBMTR) data.
Cox, D. R. (1972). Regression Models and Life-Tables. JRSS B, 34(2), 187-220.
Therneau, T. M., Grambsch, P. M., & Fleming, T. R. (1990). Martingale-Based Residuals for Survival Models. Biometrika, 77(1), 147-160.
Grambsch, P. M., & Therneau, T. M. (1994). Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika, 81(3), 515-526.
Schoenfeld, D. (1982). Partial Residuals for the Proportional Hazards Regression Model. Biometrika, 69(1), 239-241.
Lin, D. Y., Wei, L. J., & Ying, Z. (1993). Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals. Biometrika, 80(3), 557-572.
Karnofsky, D. A., & Burchenal, J. H. (1949). The Clinical Evaluation of Chemotherapeutic Agents in Cancer. In Evaluation of Chemotherapeutic Agents, ed. C. M. MacLeod. Columbia University Press.
Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer.
Davidson-Pilon, C. (2019). lifelines: Survival Analysis in Python. JOSS, 4(40), 1317.