Kwangmin Kim - Klein Ch.1 § 1.7~1.8 심화 — Kidney Transplant

1 들어가며 — 본 편의 자리

Klein 시리즈 사다리:

편	주제
Ch.1 Overview (01)	19 예제 catalog
§ 1.1~1.2 (01-1)	Introduction + 6-MP Leukemia
§ 1.3~1.4 (01-2)	BMT + Dialysis
§ 1.5~1.6 (01-3)	Breast Cancer + Burn
§ 1.7~1.8 (본 편)	Kidney Transplant + Laryngeal Cancer
§ 1.9~1.10 (예정)	Auto/Allo BMT + Lymphoma BMT

본 편이 답하는 다섯 가지 질문

Sample size 의 영향 — 863 명 (kidney) 과 90 명 (laryngeal) 에서 어떤 도구가 가능/불가능?
Kernel hazard smoothing — 비모수 hazard 추정에서 bandwidth 선택의 trade-off 는?
Continuous covariate 의 모델링 — age (연속) 를 binning vs linear vs spline 어느 것이 적절한가?
TNM ordinal stage — K-sample test (자유도 K-1) 와 trend test (자유도 1) 의 차이?
Parametric AFT + log-logistic — Cox PH 와 무엇이 다르고 언제 더 자연스러운가?

2 § 1.7 Kidney Transplant Death (OSU 1982-1992)

2.1 의학적 배경 — Kidney Transplant 환자의 장기 추적

2.1.1 신장 이식의 표준화

1980 년대: cyclosporine 도입으로 이식 성공률 급증.
OSU Transplant Center: 1982-1992 의 10 년간 표준화된 protocol.
환자 follow-up: 사망 또는 Columbus 외 이주 (lost to follow-up) 시까지.

직관 — 큰 sample 의 가치

863 명 = Klein 책의 가장 큰 데이터 중 하나.

큰 sample 이 가능하게 하는 분석:

Kernel hazard smoothing: 작은 데이터에서는 noise 가 커 smoothing 의미 작음. 큰 sample 에서 진짜 hazard 패턴 추출.
Continuous covariate (age 9.5 개월 ~ 74.5 세) 의 정밀 모델링.
다중 그룹 (4 race × gender 조합) 동시 비교.
Subgroup 분석 (각 race 의 age effect 등).

작은 데이터 (45 명 breast cancer) 에서는 불가능한 분석.

2.2 데이터 구조

Group	n	사망	사망률
White Male	432	73	16.9%
Black Male	92	14	15.2%
White Female	280	39	13.9%
Black Female	59	14	23.7%
Total	863	140	16.2%

Period: 1982-1992.
Max follow-up: 9.47 년.
Continuous covariate: 이식 시 나이 (9.5 개월 ~ 74.5 세, mean 42.8).
Censoring: lost to follow-up (Columbus 이주) + 1992-06-30 까지 alive.

직관 — 그룹별 사망률 패턴

Black Female (23.7%) vs White Female (13.9%):

같은 성별, 다른 race → race 효과 의심.
그러나 sample size (BF=59 vs WF=280) 차이 → 정밀도 다름.

Male vs Female:

WM 16.9% vs WF 13.9% → 남성이 약간 높음.
BM 15.2% vs BF 23.7% → 흑인 여성이 가장 높음 (역설).

원인 가능성:

Confounding by age, baseline disease severity, comorbidities.
Multivariate Cox 로 통제 후 race × gender 효과 평가.

작은 데이터에서는 이런 detailed subgroup 분석 불가능.

2.3 Kernel Hazard Estimation (Klein Ch.6)

2.3.1 비모수 hazard 추정의 어려움

KM 의 jump 는 hazard 의 incremental contribution:

\[ \hat\Lambda(t) = \sum_{t_j \leq t} \frac{d_j}{n_j} \]

그러나 hazard 자체 (rate) 는 jump 의 derivative — 잡음이 많음.

2.3.2 Kernel Smoothing

매 시점 \(t\) 에 가까운 사건들의 가중 평균:

\[ \hat h(t) = \frac{1}{b} \sum_j K\Bigl(\frac{t - t_j}{b}\Bigr) \cdot \frac{d_j}{n_j} \]

\(K(\cdot)\) = kernel 함수 (Gaussian, Epanechnikov, biweight).
\(b\) = bandwidth (smoothing parameter).

직관 — Bandwidth 의 Trade-off

\(b\) 작음 (under-smoothing):

hazard 곡선이 noisy.
Local detail 보존.
Variance 큼.

\(b\) 큼 (over-smoothing):

hazard 곡선이 부드러움.
Detail 손실.
Bias 큼.

최적 \(b\):

Cross-validation.
Plug-in estimator.
Visual judgement.

Boundary effect (관찰 시작·종료 근처):

Kernel 이 영역 밖을 reach → bias.
Boundary kernel 또는 reflection 으로 보정.

2.3.3 Kernel 함수의 종류

Kernel	\(K(u)\)	특성
Gaussian	\(\frac{1}{\sqrt{2\pi}} e^{-u^2/2}\)	무한 support, smooth
Epanechnikov	\(\frac{3}{4}(1 - u^2)_+\)	유한 support, optimal MSE
Biweight	\(\frac{15}{16}(1 - u^2)^2_+\)	유한 support, smoother
Uniform	\(\frac{1}{2} \mathbb{1}_{\|u\| \leq 1}\)	단순, 거친 곡선

→ 보통 Epanechnikov 가 default. Boundary 처리 중요.

2.4 Continuous Covariate Discretization (Klein Ch.8)

2.4.1 Age 의 모델링 옵션

옵션 1: Linear (\(\beta \cdot\) age):

단순.
“1 세 증가 → hazard 가 \(\exp(\beta)\) 배” 가정.
중년 (40대) 과 노년 (70대) 의 영향이 같다고 가정 (의학적으로 이상).

옵션 2: Binning (4 categories: <30, 30-50, 50-65, 65+):

각 그룹의 dummy variable.
비선형 관계 표현.
Cutpoint 선택의 자의성.

옵션 3: Spline (B-spline 또는 restricted cubic spline):

부드러운 비선형 관계.
Cutpoint 자의성 적음.
Degrees of freedom 결정.

직관 — Discretization 의 정보 손실

Continuous → Categorical:

정보 손실 (within-bin variation 무시).
그러나 해석 단순 (각 그룹의 baseline + effect).
임상적 cutoff (예: 65 세 = “노년”) 와 일치 시 자연.

Linear vs Binning:

Linear: 효율적이지만 가정 강함.
Binning: 가정 약함, 효율 손실.

검증:

Martingale residual (Ch.11) 으로 functional form 점검.
“Residual vs covariate” plot 이 패턴 보이면 → 비선형 필요.

§ 1.7 Klein 책 Ch.8 에서 age 의 discretization 방법론 시연.

2.5 Klein 책 사용 매핑

Chapter	본 데이터 사용
Ch.6	Kernel hazard smoothing — bandwidth 와 kernel 선택 효과
Ch.8	Continuous covariate (age) discretization 방법론

→ 큰 sample 이 두 분석 모두를 가능하게 함.

3 § 1.8 Laryngeal Cancer Death (Kardaun 1983)

3.1 의학적 배경 — TNM 분류

3.1.1 Larynx (후두) 암

위치: 인후 (목소리 상자).
위험 요인: 흡연, 알코올.
치료: 방사선·수술·chemotherapy.
예후: stage 와 환자 나이에 강하게 의존.

3.1.2 TNM 분류 (American Joint Committee 1972)

T (Tumor): 원발 종양 크기·침습.
- \(T_1\): 작음, 한 부위.
- \(T_2\): 인접 부위 침습.
- \(T_3\): 성대 고정.
- \(T_4\): 후두 외 침습.
N (Nodes): 림프절 침범.
- \(N_0\): 없음.
- \(N_1\): 단일 림프절.
M (Metastasis): 원격 전이.
- \(M_0\): 없음.

3.1.3 Stage 분류 (4 단계)

Stage	TNM 조합	환자 수	의미
I	\(T_1 N_0 M_0\)	33	가장 가벼움
II	\(T_2 N_0 M_0\)	17	약간 진행
III	\(T_3 N_0 M_0\) 또는 \(T_x N_1 M_0\)	27	림프절 또는 큰 종양
IV	그 외 (TIS 제외)	13	가장 심각

직관 — Ordinal Stage 의 통계적 함의

Nominal vs Ordinal:

Nominal (예: race): 4 그룹, 순서 없음.
Ordinal (예: stage): 4 그룹, 순서 있음 (I < II < III < IV).

검정의 차이:

K-sample test (자유도 \(K-1 = 3\)): “그룹들 사이에 차이가 있는가?”
Trend test (자유도 1): “stage 가 클수록 hazard 증가하는가?”

Trend test 가 더 강력 (검정력 큼) — 정확한 alternative 가설을 사용.

Stage I~IV 가 “작은 → 큰 hazard” 순서일 가설 자체가 정보 → 1 자유도로 충분.

3.2 Klein 책 사용 매핑

Chapter	본 데이터 사용
Ch.7.4	Trend test (ordinal stage)
Ch.8	Global Cox test + local tests + ANOVA-style + age interaction
Ch.8	Linear combination contrasts
Ch.10	Additive hazards model
Ch.12	Parametric AFT (accelerated failure-time)
Ch.12.5	Log-logistic AFT + deviance residuals

→ 6 가지 도구의 표준 시연 (Ch.7·8·10·12).

3.3 Trend Test 자세히 (Ch.7.4)

3.3.1 K-Sample Log-Rank

각 그룹의 observed - expected 차이의 quadratic form:

\[ Q = (O - E)^T V^{-1} (O - E) \sim \chi^2_{K-1} \]

모든 그룹 간 차이 검정.
자유도 \(K-1 = 3\).

3.3.2 Trend Log-Rank

Stage \(j \in \{1, 2, 3, 4\}\) 의 score \(w_j = j\) (또는 다른 ordinal score) 사용:

\[ Z = \frac{\sum_j w_j (O_j - E_j)}{\sqrt{V}} \sim N(0, 1) \]

“Score 와 hazard 의 단조 관계” 검정.
자유도 1.

직관 — 검정력 비교

같은 데이터에서:

K-sample: \(\chi^2_3\), p = 0.01.
Trend: \(\chi^2_1\), p = 0.001.

Trend 가 더 작은 p-value → 더 강력.

이유:

Trend 는 정확한 ordering hypothesis 사용.
K-sample 은 “어느 ordering 이든 차이 있음” — 정보 손실.

조건:

Stage 가 진짜 ordinal (확실한 ordering).
Hazard 가 monotone.

만약 stage III 이 stage IV 보다 hazard 가 큼 (역순) → trend test 가 K-sample 보다 약함.

3.4 ANOVA-Style Cox + Age Interaction (Ch.8)

3.4.1 Global Test

전체 stage effect 검정:

\[ H_0: \beta_{II} = \beta_{III} = \beta_{IV} = 0 \text{ (stage I 기준)} \]

→ Likelihood ratio test 또는 Wald test.

3.4.2 Local Tests

각 stage 의 effect 개별 검정.

\[ H_0^{(j)}: \beta_j = 0 \]

3.4.3 Age Interaction

\[ h(t \mid Z) = h_0(t) \exp(\beta_{\text{stage}} \cdot \text{stage} + \beta_{\text{age}} \cdot \text{age} + \beta_{\text{int}} \cdot \text{stage} \times \text{age}) \]

직관 — Stage × Age Interaction 의 의미

가능한 interaction:

Stage IV 의 age effect > Stage I 의 age effect = “고령 환자가 stage IV 에서 특히 위험 증가”.
Or 반대 = “고령 stage I 도 stage IV 와 비슷하게 위험” — biology 변화.

임상적 의미:

Treatment 결정 시 age + stage 함께 고려.
단일 age effect 가 모든 stage 에 동일하지 않음.

검정:

Likelihood ratio test on \(\beta_{\text{int}}\).
Significant → stage 별 age slope 다르게 보고.

3.5 Linear Combination Contrasts

특정 가설 검정:

\[ H_0: \beta_{IV} - \beta_{II} = 0 \quad \text{(stage IV 와 II 같다)} \]

또는:

\[ H_0: \beta_{II} + \beta_{III} - 2 \beta_{I} = 0 \quad \text{(II 와 III 평균이 I 의 hazard 와 같다)} \]

검정: Wald test 형태.

\[ W = \frac{(c^T \widehat\beta)^2}{c^T \widehat{\text{Var}}(\widehat\beta) c} \]

\(c\) = contrast vector.

3.6 Parametric AFT (Ch.12)

3.6.1 Cox PH vs AFT

Cox PH:

\[ h(t \mid Z) = h_0(t) \exp(\beta Z) \]

Hazard 비례.
\(h_0\) 는 비모수.

AFT (Accelerated Failure-Time):

\[ \log T_i = \alpha + \beta Z_i + \sigma W_i \]

\(T\) 의 log 가 linear regression.
\(W\) 의 분포가 모수적 (extreme value, normal, logistic).
\(\sigma\) = scale parameter.

직관 — AFT 가 자연스러운 경우

Cox 해석: “Z 가 hazard 를 X 배” — rate.

AFT 해석: “Z 가 lifetime 을 X 배” — time scale 이 stretched/compressed.

수식:

\[ T(Z) = T(0) \cdot \exp(-\beta Z) \]

\(\beta > 0\): \(Z\) 클수록 lifetime 짧음.
“Acceleration factor” \(\exp(-\beta Z)\).

자연스러운 응용:

공학 reliability: 온도가 높으면 lifetime 단축 (Arrhenius equation).
의학 stage: stage IV 가 stage I 의 lifetime 의 1/X 배.

본 데이터 (laryngeal): stage 의 효과가 lifetime 단축으로 자연스럽게 해석.

3.7 Log-Logistic AFT + Deviance Residuals

3.7.1 Log-Logistic 분포

\(T\) 의 log 가 logistic 분포:

\[ \log T = \alpha + \beta Z + \sigma W, \quad W \sim \text{Logistic}(0, 1) \]

특성:

Hazard 가 비단조 가능 (early increase, then decrease).
Median lifetime 추정 직접.
Cancer survival 에 자연.

3.7.2 Deviance Residuals

Cox 의 martingale residual 보다 대칭에 가깝게 변환:

\[ r_i^{\text{deviance}} = \text{sign}(r_i^{\text{martingale}}) \sqrt{-2 \cdot r_i^{\text{martingale}} - \delta_i \log(\delta_i - r_i^{\text{martingale}})} \]

절댓값 작음 → 모델 적합.
절댓값 큼 → outlier 후보.
분포가 standard normal 에 근사.

3.8 의학사적 의의

Kardaun (1983) 의 90 명 데이터셋이 survival textbook 의 표준.
TNM stage + age 를 활용한 prognostic model 의 모범.
Modern oncology 의 stage-stratified treatment 의 통계적 근거.

4 R + Python EDA — Kidney Transplant 데이터

4.1 R — `survival` + `muhaz` (kernel)

library(survival)
library(survminer)
library(muhaz)

# Kidney transplant 데이터 (시뮬레이션)
set.seed(42)
n <- 863
groups <- c(rep("WM", 432), rep("BM", 92), rep("WF", 280), rep("BF", 59))
event_rates <- c(WM = 0.169, BM = 0.152, WF = 0.139, BF = 0.237)

kidney <- data.frame(
  group = groups,
  age = pmin(74.5, pmax(0.79, rnorm(n, 42.8, 15))),
  time = rexp(n, rate = 0.05),
  status = sapply(groups, function(g) rbinom(1, 1, event_rates[g]))
)

# KM by group
fit <- survfit(Surv(time, status) ~ group, data = kidney)
ggsurvplot(fit, data = kidney, pval = TRUE,
           xlab = "Years", ylab = "Survival probability",
           legend.title = "Group")

# Kernel hazard estimate (Ch.6)
# Overall hazard
hazard_est <- muhaz(kidney$time, kidney$status,
                    bw.method = "local", bw.smooth = 1)
plot(hazard_est, xlab = "Years", ylab = "Hazard rate")

# Bandwidth 변화 효과
par(mfrow = c(1, 3))
for (bw in c(0.5, 1.0, 2.0)) {
  h <- muhaz(kidney$time, kidney$status, bw.smooth = bw, bw.method = "global")
  plot(h, main = paste("bw =", bw), xlab = "Years", ylab = "Hazard")
}

# Continuous age 의 discretization (Ch.8)
kidney$age_group <- cut(kidney$age,
                        breaks = c(-Inf, 30, 50, 65, Inf),
                        labels = c("<30", "30-50", "50-65", "65+"))
fit_age <- survfit(Surv(time, status) ~ age_group, data = kidney)
ggsurvplot(fit_age, data = kidney, pval = TRUE,
           xlab = "Years", ylab = "Survival",
           legend.title = "Age group")

# Cox with age (linear)
cox_lin <- coxph(Surv(time, status) ~ group + age, data = kidney)

# Cox with age (binned)
cox_bin <- coxph(Surv(time, status) ~ group + age_group, data = kidney)

# Cox with age (spline)
cox_spline <- coxph(Surv(time, status) ~ group + pspline(age, df = 3),
                    data = kidney)

# Compare via AIC
AIC(cox_lin, cox_bin, cox_spline)

4.2 Python — `lifelines` + `scikit-survival`

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lifelines import KaplanMeierFitter, CoxPHFitter, NelsonAalenFitter

# 데이터 (시뮬레이션)
rng = np.random.default_rng(42)
n_per_group = {"WM": 432, "BM": 92, "WF": 280, "BF": 59}
event_rates = {"WM": 0.169, "BM": 0.152, "WF": 0.139, "BF": 0.237}

groups, ages, times, statuses = [], [], [], []
for g, n in n_per_group.items():
    age = np.clip(rng.normal(42.8, 15, n), 0.79, 74.5)
    t = rng.exponential(scale=20, size=n)
    s = rng.binomial(1, event_rates[g], n)
    groups.extend([g] * n)
    ages.extend(age)
    times.extend(t)
    statuses.extend(s)

kidney = pd.DataFrame({"group": groups, "age": ages,
                       "time": times, "status": statuses})

# KM by group
fig, ax = plt.subplots(figsize=(9, 6))
for grp, color in zip(["WM", "BM", "WF", "BF"], ["blue", "green", "red", "orange"]):
    sub = kidney[kidney["group"] == grp]
    kmf = KaplanMeierFitter()
    kmf.fit(sub["time"], sub["status"], label=grp)
    kmf.plot_survival_function(ax=ax, color=color)
ax.set_xlabel("Years")
ax.set_ylabel("Survival probability")
plt.tight_layout()

# Kernel hazard via NelsonAalen + smoothing
naf = NelsonAalenFitter()
naf.fit(kidney["time"], kidney["status"])
hazard_smooth = naf.smoothed_hazard_(bandwidth=1.0)
plt.figure()
naf.plot_hazard(bandwidth=1.0)
plt.xlabel("Years")
plt.ylabel("Hazard rate")
plt.title("Smoothed hazard (bandwidth=1.0)")

# Cox with continuous age
cph = CoxPHFitter()
cph.fit(kidney.assign(group_dummy=pd.get_dummies(kidney["group"]).values.argmax(axis=1)),
        duration_col="time", event_col="status",
        formula="group + age")
print(cph.summary)

5 R + Python EDA — Laryngeal Cancer 데이터

5.1 R 직접 입력

library(survival)
library(survminer)

# Klein Table 1.x (Kardaun 1983) — 일부 변수만 시뮬레이션
# 실제로는 KMsurv::larynx 데이터셋 사용
library(KMsurv)
data(larynx)
str(larynx)
# stage, time, age, diagyr, delta

# KM by stage
fit <- survfit(Surv(time, delta) ~ stage, data = larynx)
ggsurvplot(fit, data = larynx, pval = TRUE,
           xlab = "Years", ylab = "Survival",
           legend.title = "Stage")

# K-sample log-rank
survdiff(Surv(time, delta) ~ stage, data = larynx)

# Trend test (ordinal score)
larynx$stage_score <- as.numeric(larynx$stage)
# Logrank with linear trend
library(survMisc)
ten_obj <- ten(Surv(time, delta) ~ stage, data = larynx)
trend(ten_obj, score = c(1, 2, 3, 4))

# Cox PH (stage as factor)
cox_fit <- coxph(Surv(time, delta) ~ factor(stage) + age, data = larynx)
summary(cox_fit)

# ANOVA-style decomposition
anova(cox_fit, test = "Chisq")

# Stage × age interaction
cox_int <- coxph(Surv(time, delta) ~ factor(stage) * age, data = larynx)
anova(cox_fit, cox_int, test = "Chisq")  # interaction LRT

# Linear combination contrast: stage IV - stage II
library(multcomp)
contrast <- glht(cox_fit, linfct = "factor(stage)4 - factor(stage)2 = 0")
summary(contrast)

# Parametric AFT (Ch.12)
aft_logn <- survreg(Surv(time, delta) ~ factor(stage) + age,
                    data = larynx, dist = "lognormal")
summary(aft_logn)

aft_loglog <- survreg(Surv(time, delta) ~ factor(stage) + age,
                      data = larynx, dist = "loglogistic")
summary(aft_loglog)

# AIC 비교
AIC(aft_logn, aft_loglog, cox_fit)

# Deviance residuals (Ch.12.5)
deviance_resid <- residuals(aft_loglog, type = "deviance")
plot(deviance_resid, ylab = "Deviance residual", main = "Log-logistic AFT")
abline(h = c(-2, 2), col = "red", lty = 2)

5.2 Python — `lifelines` + `scikit-survival`

from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.fitters import WeibullAFTFitter, LogLogisticAFTFitter, LogNormalAFTFitter
from lifelines.statistics import multivariate_logrank_test

# Larynx 데이터 (KMsurv::larynx 와 동등)
larynx = pd.DataFrame({
    "stage": [1]*33 + [2]*17 + [3]*27 + [4]*13,
    "time": rng.uniform(0.5, 10, 90),  # 시뮬레이션
    "delta": rng.binomial(1, 0.5, 90),
    "age": rng.normal(64, 10, 90),
})

# KM by stage
fig, ax = plt.subplots(figsize=(9, 6))
for s in [1, 2, 3, 4]:
    sub = larynx[larynx["stage"] == s]
    kmf = KaplanMeierFitter()
    kmf.fit(sub["time"], sub["delta"], label=f"Stage {s}")
    kmf.plot_survival_function(ax=ax)
ax.set_xlabel("Years")
ax.set_ylabel("Survival")
plt.tight_layout()

# K-sample log-rank
result = multivariate_logrank_test(larynx["time"], larynx["stage"], larynx["delta"])
print(f"K-sample log-rank: chi^2 = {result.test_statistic:.3f}, p = {result.p_value:.4f}")

# Cox with stage as dummy + age
larynx_dummy = pd.get_dummies(larynx, columns=["stage"], drop_first=True)
cph = CoxPHFitter()
cph.fit(larynx_dummy, duration_col="time", event_col="delta")
print(cph.summary)

# Parametric AFT — log-logistic
llaft = LogLogisticAFTFitter()
llaft.fit(larynx_dummy, duration_col="time", event_col="delta")
print(llaft.summary)

# Log-normal AFT
lnaft = LogNormalAFTFitter()
lnaft.fit(larynx_dummy, duration_col="time", event_col="delta")
print(lnaft.summary)

# AIC 비교
print(f"Cox AIC: {cph.AIC_partial_:.1f}")
print(f"Log-logistic AIC: {llaft.AIC_:.1f}")
print(f"Log-normal AIC: {lnaft.AIC_:.1f}")

6 두 데이터의 대비

측면	§ 1.7 Kidney	§ 1.8 Laryngeal
n	863	90
그룹 수	4 (race × gender)	4 (TNM stage I~IV)
그룹 성격	Nominal	Ordinal
사건	사망	사망
핵심 covariate	Age (continuous)	Stage (ordinal) + age
통계 도전	Kernel smoothing + continuous discretization	Trend test + AFT + interaction
Klein 사용	Ch.6, 8	Ch.7.4, 8, 10, 12

직관 — Sample Size 의 효과

§ 1.7 (n = 863):

큰 데이터 → kernel smoothing 가능.
Multivariate Cox + age (continuous) 정밀.
Subgroup 분석 (race × gender) 가능.

§ 1.8 (n = 90):

작은 데이터 → smoothing 거친 결과.
그러나 ordinal structure 가 정보 추가 → trend test 효율.
Parametric AFT 가 비모수보다 안정 (n 작을 때).

규칙:

n 큼 → 비모수 + flexible (kernel, spline).
n 작음 → 모수 + parsimony (AFT, ordinal score).

이는 일반 통계의 trade-off 와 동일.

7 핵심 직관 통합

큰 sample (863) = kernel hazard smoothing 가능.
Continuous covariate (age) = linear vs binning vs spline 의 trade-off.
Ordinal stage = K-sample 보다 trend test 가 강력.
Stage × age interaction = 임상적으로 의미 있는 effect modifier.
Linear combination contrast = 특정 가설 정밀 검정.
AFT vs Cox = lifetime 해석 (acceleration factor) 의 자연스러움.
Log-logistic = non-monotonic hazard 표현.
Deviance residuals = AFT 모델의 진단 도구.

8 실전 체크리스트 — § 1.7~1.8

§ 1.7 Kidney Transplant

큰 sample (863) 의 분석적 가능성 인지.
4 group (race × gender) 사망률 패턴 파악.
Kernel hazard estimate (Ch.6) — bandwidth 와 kernel 선택.
Continuous age 의 모델링 (linear / binning / spline) 비교.
Martingale residual 로 functional form 점검.

§ 1.8 Laryngeal Cancer

TNM stage 의 ordinal 성질 인지.
Trend test (Ch.7.4) — K-sample 보다 강력한 이유.
Global Cox + local tests + ANOVA-style (Ch.8).
Stage × age interaction 검정 (LRT).
Linear combination contrasts 로 specific hypothesis.
Parametric AFT (log-logistic) 적합 (Ch.12).
Deviance residuals 로 모델 진단 (Ch.12.5).

EDA

KM by group + log-rank.
Hazard estimate (kernel smoothing).
Cox + AFT + AIC 비교.
Residual 진단 (martingale, deviance, score).

다음 단계

§ 1.9 (Auto vs Allo BMT) — Cox diagnostics 의 표준.
§ 1.10 (Lymphoma BMT) — Karnofsky score 와 functional form.

9 관련 주제

Klein 시리즈

Ch.1 Overview
§ 1.1~1.2 — Introduction · Leukemia
§ 1.3~1.4 — BMT · Dialysis
§ 1.5~1.6 — Breast Cancer · Burn
(다음) § 1.9~1.10 (예정) — Auto/Allo BMT · Lymphoma BMT

관련 개념 (cross-category)

10 참고문헌

Klein, J. P., & Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data (2nd ed.), Ch.1 § 1.7~1.8. Springer.
Kardaun, O. (1983). Statistical Survival Analysis of Male Larynx-Cancer Patients - A Case Study. Statistica Neerlandica, 37(3), 103-125.
American Joint Committee for Cancer Staging (1972). Manual for Staging of Cancer. Chicago: AJCC.
Cox, D. R. (1972). Regression Models and Life-Tables. JRSS B, 34(2), 187-220.
Hess, K. R., Serachitopol, D. M., & Brown, B. W. (1999). Hazard Function Estimators: A Simulation Study. Statistics in Medicine, 18(22), 3075-3088.
Müller, H. G., & Wang, J. L. (1994). Hazard Rate Estimation under Random Censoring with Varying Kernels and Bandwidths. Biometrics, 50(1), 61-76.
Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer.
Kalbfleisch, J. D., & Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd ed. Wiley.
Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data, 2nd ed. Wiley. (AFT 정전)
Cochran, W. G. (1954). Some Methods for Strengthening the Common \(\chi^2\) Tests. Biometrics, 10(4), 417-451. (Trend test)
Armitage, P. (1955). Tests for Linear Trends in Proportions and Frequencies. Biometrics, 11(3), 375-386.
Davidson-Pilon, C. (2019). lifelines: Survival Analysis in Python. JOSS, 4(40), 1317.
Pölsterl, S. (2020). scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. JMLR, 21(212), 1-6.