Kwangmin Kim - Klein § 3.1~3.2 — Right Censoring (6 가지 형태)

1 들어가며 — Right Censoring 의 6 얼굴

Ch.1 시리즈에서 본 19 임상 데이터 catalog 의 거의 모든 사례가 — right censoring 을 가진다. 그러나 “right censoring” 은 단일 현상이 아니라 — 표본 추출 design 에 따라 6 가지 형태로 분류된다. 각 형태는 likelihood 가 다르고, 추정·검정의 점근 성질도 다르다.

§ 3.1~3.2 의 한 줄 요약

“Right censoring 의 본질은 — ‘\(X > C_r\)’ 이라는 lower bound 정보. 그러나 \(C_r\) 가 fixed 인지·random 인지·다른 환자에 의존하는지 (Type II) 에 따라 likelihood 가 달라진다.”

형태	종료 결정	\(C_r\) 의 성격	정전 사례
Type I	사전 결정 시점	고정, 모두 동일	NCTR carcinogen 실험
Generalized Type I	사전 결정 종료일 + 개체별 입학	고정, 개체별 다름	§ 1.2 Leukemia, § 1.5 Breast
Progressive Type I	다단계 사전 결정 sacrifice	고정, 다단계	42 + 104 주 쥐 실험
Type II	첫 \(r\) 사건까지	random (\(T_{(r)}\))	전구 신뢰성 시험
Progressive Type II	다단계 random sacrifice	random	동물 실험 변종
Random / Competing	개체별 random	random, 분포 \(G\)	dropout, § 1.3 BMT

본 편은 6 형태를 — 정의·notation·likelihood·임상 매핑 — 차원에서 정밀하게 다룬다.

2 § 3.1 — Introduction

2.1 5 절의 위치

Ch.3 의 7 절은 다음과 같이 분기:

§ 3.2 (본 편 후반) — Right censoring (가장 흔함).
§ 3.3 — Left + Interval censoring.
§ 3.4 — Truncation.
§ 3.5 — 위 모든 것의 통합 likelihood.
§ 3.6 — Counting process framework (Aalen 1975).
§ 3.7 — Exercises.

2.2 Censoring vs Truncation — 한 줄 비교

두 현상의 본질

	Censoring	Truncation
개체	표본에 포함됨	조건 만족자만 표본
정보	사건 시점의 부분 정보 (lower/upper bound)	정보 자체 부재
해석	“추적 중단”	“관측 가능성 자체가 조건부”
Likelihood	\(f\) 또는 \(S\) 또는 구간 확률	조건부 분포 (분모 추가)

Right censoring: “환자 A 는 12 주째에도 살아있음” — 12 주 이상이라는 lower bound 정보.

Right truncation: “1986-06-30 sampling 시 발병자만 관측” — 그 이후 발병자는 표본에 미포함, 정보 자체 없음.

이 구분이 § 3.5 likelihood 식의 두 분기점 — censoring 항 vs truncation 항.

3 § 3.2 — Right Censoring 6 형태

3.1 Notation 통일

표기

진짜 사건 시간: \(X_i\) — i.i.d. with \(f, S\).
Right censoring 시간: \(C_{r,i}\).
관측 시간: \(T_i = \min(X_i, C_{r,i})\).
Censoring indicator: \(\delta_i = I(X_i \leq C_{r,i}) = \begin{cases} 1 & \text{event} \\ 0 & \text{right censored} \end{cases}\)
관측 데이터: \(\{(T_i, \delta_i)\}_{i=1}^n\).

이 notation 은 6 형태 모두 공통. 차이는 — \(C_{r,i}\) 가 어떻게 결정되는가.

3.2 형태 1 — Type I Censoring (사전 종료 시점)

정의

\(C_{r,i} = C_r\) (모든 개체 동일, 고정).
실험은 시점 \(C_r\) 에서 종료, 살아있는 모든 개체 sacrifice.

예시 (Klein Example 3.1) — National Center for Toxicological Research (NCTR) 의 carcinogen 동물 실험. 200 마리 쥐에 발암물질 투여 후, 사전 결정된 sacrifice 시점 까지 추적. 그 시점에 살아있는 쥐는 모두 censoring.

3.2.1 Likelihood 도출

\(\delta = 0\) (\(X > C_r\)):

\[ \Pr[T = C_r, \delta = 0] = \Pr[X > C_r] = S(C_r) \]

\(\delta = 1\) (\(X \leq C_r\), 정확한 시점 관측):

\[ \Pr[T = t, \delta = 1] = \Pr[X = t, X \leq C_r] = f(t) \]

결합 표현:

\[ \Pr[t, \delta] = [f(t)]^\delta [S(t)]^{1-\delta} \]

전체 likelihood (식 3.5.3):

\[ L_I = \prod_{i=1}^n [f(t_i)]^{\delta_i} [S(t_i)]^{1-\delta_i} = \prod_{i=1}^n [h(t_i)]^{\delta_i} \exp[-H(t_i)] \]

Exponential 의 단순 closed form (식 3.5.4)

\(f = \lambda e^{-\lambda x}\), \(S = e^{-\lambda x}\) 대입:

\[ L_I = \lambda^r \exp[-\lambda S_T] \]

여기서

\(r = \sum \delta_i\) — 관측된 사건 수.
\(S_T = \sum t_i\) — 총 관측 시간 (events + censored 모두).

MLE: \(\hat{\lambda} = r / S_T\) — “사건 수 / 총 위험 시간”. 직관적: hazard = (사건 강도) = (사건 수) / (시간 노출).

점근 분산: \(\text{Var}(\hat{\lambda}) \approx \lambda^2 / r\) — 사건 수가 추정 정밀도를 결정 (n 이 아니라 r).

3.3 형태 2 — Generalized Type I (개체별 시작 시점, 종료일 공통)

정의

환자가 서로 다른 시점에 입학, 그러나 종료일은 사전 결정된 공통 시점.

환자 \(i\) 의 입학 시점: \(E_i\) (calendar time).
종료일: \(T^*\) (모두 공통).
따라서 \(C_{r,i} = T^* - E_i\) (개체별 다름, 그러나 사전 결정).

수학적으로: 개체별 \(C_{r,i}\) 가 fixed 이지만 다름. 식 (3.5.3) 그대로 적용.

예시 (Klein Figure 3.3, 3.4, 3.5):

§ 1.2 Leukemia (Freireich 1963): 환자 입학 시점 분산, 1969 년 종료.
§ 1.5 Sedmak 1989 breast cancer: 1955~1980 사이 입학, 1985-12-31 종료.
§ 1.14 NLSY weaning: 1980~1986 출생, 1986-06-01 인터뷰.
§ 1.15 Woolson 1981 psychiatric: 1935~1945 입원.

3.3.1 Lexis Diagram (Keiding 1990)

시각화

x 축: calendar time (실제 달력 날짜).
y 축: time on study (개체의 follow-up 길이).
45° 선: 개체의 시간 진행.
종료일 (수직선) 에 도달하면 censoring (open dot), 그 전에 사건 발생하면 사건 (filled dot).

이 diagram 으로 — 환자별 입학·사건·censoring 의 timing 을 한 눈에 파악.

3.3.2 Likelihood

각 개체의 \(C_{r,i}\) 가 fixed 이므로 — 식 (3.5.3) 의 \(C_r\) 자리에 \(C_{r,i}\) 대입:

\[ L = \prod_{i=1}^n [f(t_i)]^{\delta_i} [S(t_i)]^{1-\delta_i} \]

(개체별 \(C_{r,i}\) 는 likelihood 식에 직접 안 나옴 — \(T_i = \min(X_i, C_{r,i})\) 와 \(\delta_i\) 로만 표현.)

핵심 가정

Generalized Type I 의 likelihood (3.5.3) 가 유효하려면 — 입학 시점 \(E_i\) 가 사건 분포 \(f\) 와 무관.

만약 환자가 자신의 위험 정도에 따라 입학 결정한다면 (selection bias) — \(E_i\) 가 informative → 표준 likelihood 편향.

검증: 무작위 배정 임상시험 (RCT) 은 자동 만족. 관찰연구는 도메인 검증 필요.

3.4 형태 3 — Progressive Type I (다단계 사전 결정 sacrifice)

정의

여러 사전 결정 시점 \(C_r^{(1)} < C_r^{(2)} < \cdots < C_r^{(K)}\) 에서 — 살아있는 일부를 sacrifice.

sacrifice 비율도 사전 결정.
예: \(n = 200\) 쥐, 42 주에 100 마리 sacrifice, 104 주에 나머지 sacrifice.

예시 (Klein Example 3.2 + Figure 3.2): mouse tumor 연구. 4 dose level × 2 sex × 200 마리. 42 주에 일부 sacrifice (조기 정보), 104 주에 나머지 sacrifice. 두 시점 모두 사전 결정.

3.4.1 왜 다단계 sacrifice 가 유용한가

비치명 질병의 자연사 정보

42 주에 sacrifice 된 쥐를 부검하면 — 이 시점에서 종양·병변의 분포 를 파악 가능. 사망까지 기다리지 않고도 progression 정보 획득.

비용 절감 (모두 죽을 때까지 기다리지 않음).
비치명 endpoint 의 분포 추정.
Multi-state model (Klein Ch.13) 의 자연 발생.

3.4.2 Likelihood

각 sacrifice 시점이 fixed → Type I 의 다단계 일반화. 식 (3.5.3) 와 본질적으로 동일.

3.5 형태 4 — Type II (첫 r 사건까지)

정의

사전 결정된 정수 \(r < n\).
첫 \(r\) 개의 사건이 발생하면 실험 종료.
데이터: 첫 \(r\) 개의 ordered statistics \(X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(r)}\).
나머지 \(n - r\) 개는 시점 \(X_{(r)}\) 에 모두 censored.
\(r\) 은 고정, \(X_{(r)}\) 은 random.

예시: 100 개 전구를 동시에 시험. 첫 60 개가 끊어지면 실험 종료. 신뢰성 공학의 표준 design.

3.5.1 Likelihood (식 3.5.7)

ordered statistics 의 joint density:

\[ L_{II,1} = \frac{n!}{(n-r)!} \prod_{i=1}^r f(x_{(i)}) [S(x_{(r)})]^{n-r} \]

식의 직관

\(\prod f(x_{(i)})\): 첫 \(r\) 개의 정확한 사건 시점.
\([S(x_{(r)})]^{n-r}\): 나머지 \(n-r\) 개가 시점 \(x_{(r)}\) 까지 살아있을 확률.
\(n!/(n-r)!\): 어느 환자가 첫 \(r\) 명에 들어가는지 의 ordering 수.

핵심: 식 (3.5.1) 의 일반 형태 와 비례. 상수 \(n!/(n-r)!\) 는 inference 에 무관.

3.5.2 Type I vs Type II 비교

두 design 의 차이

항목	Type I	Type II
종료 trigger	사전 결정 시점	첫 \(r\) 사건 발생
Censoring 시간	fixed (\(C_r\))	random (\(X_{(r)}\))
사건 수	random (\(r\))	fixed (\(r\))
비용 예측	어려움 (사건 수 미정)	어려움 (시간 미정)
수학적 처리	likelihood 직접	order statistics 이론
응용	임상시험 (대다수)	신뢰성 공학

임상시험은 거의 항상 Type I — 환자 수와 추적 기간 사전 결정. 신뢰성 공학은 Type II 가 흔함 — “첫 60 개 고장 시점 데이터” 가 필요한 design.

3.6 형태 5 — Progressive Type II (다단계 random sacrifice)

정의

Type II 의 다단계 일반화:

첫 \(r_1\) 개 사건 발생 → \(n_1 - r_1\) 개의 살아있는 개체 sacrifice.
다음 \(r_2\) 개 사건 발생 → \(n_2 - r_2\) 개 sacrifice.
… (사전 결정된 시퀀스).

\(r_i, n_i\) 가 fixed integer, sacrifice 시점 \(T_{(r_1)}, T_{(n_1 + r_2)}, \ldots\) 가 random.

3.6.1 Likelihood

ordered statistics + truncation 결합 (Klein Theoretical Note 2):

\[ L_{II,2} \propto \prod_{i=1}^{r_1} f(x_{(i)}) [S(x_{(r_1)})]^{n_1} \prod_{i=1}^{r_2} f(x^*_{(i)}) [S(x^*_{(r_2)})]^{n - n_1 - r_1 - r_2} \]

여기서 \(x^*\) 는 첫 단계 후의 truncated 분포 (\(x \geq x_{(r_1)}\) 조건부) 의 실현.

활용

이 형태는 — 동물 실험에서 비용 절감 + intermediate 정보 획득 의 두 목표를 동시 달성. 신뢰성 공학에서도 — 일정 비율 사건 발생 후 잔여 수명 분포 변화를 검출하는 design.

3.7 형태 6 — Random / Competing Risks Censoring (가장 일반적)

정의

각 개체에 random censoring time \(C_{r,i}\) — 분포 \(G\) (PDF \(g\), survival \(G(\cdot)\)).
\(X_i \perp C_{r,i}\) 가정 (강한 가정).
데이터: \(T_i = \min(X_i, C_{r,i})\), \(\delta_i\).

예시 사례 (독립 가정 합리적):

사고사 (자연사 외).
환자 이주.
loss to follow-up by 임의 사유.
행정적 종료 (연구 자금 종료).

예시 사례 (독립 가정 의심):

환자 상태 악화로 인한 dropout (informative).
부작용으로 인한 약물 중단.
다른 cause 로 인한 사망 (competing risks).

3.7.1 Likelihood (Klein Example 3.10)

\(X \perp C_r\) 에서 결합 분포는

\[ L = \prod_{i=1}^n [f(t_i) G(t_i)]^{\delta_i} [g(t_i) S(t_i)]^{1-\delta_i} \]

분리:

\[ L = \underbrace{\left\{\prod_i G(t_i)^{\delta_i} g(t_i)^{1-\delta_i}\right\}}_{\text{censoring 분포}} \times \underbrace{\left\{\prod_i f(t_i)^{\delta_i} S(t_i)^{1-\delta_i}\right\}}_{\text{관심 부분}} \]

“Non-informative censoring” 의 정의

\(G\) 가 사건 분포 \(f\) 의 모수와 무관 이면 — 첫째 항은 상수 → likelihood 가 식 (3.5.6):

\[ L \propto \prod_i [f(t_i)]^{\delta_i} [S(t_i)]^{1-\delta_i} \]

식 (3.5.3) 와 동일한 형태. 즉 — non-informative random censoring 은 Type I censoring 과 같은 추론 도구로 처리.

조건:

\(X \perp C_r\) (독립).
\(G\) 가 \(f\) 의 모수에 의존 안 함.

둘 다 만족 못 하면 (informative censoring) — 표준 KM/NA/Cox 모두 편향.

3.7.2 독립 가정의 검증 불가능성

Tsiatis 1975 와 동일한 구조

\((T, \delta)\) 만으로는 \(X \perp C_r\) 식별 불가능. 어떤 종속 (X, C_r) 에 대해서도 — 동일한 cause-specific hazard 를 갖는 독립 (X’, C_r’) 모델이 항상 존재.

따라서:

독립 가정의 합리성 은 도메인 지식 (왜 dropout 했는가?) 으로만 판단.
무작위 사고·이주·행정 종료 → 합리적.
부작용·임상 악화·다른 cause → 의심.

관찰적 검증 불가능 — Klein § 2.7 의 identifiability dilemma 와 같은 구조.

3.7.3 Random Censoring + Type I 결합

실제 임상시험은 거의 항상 — Type I (사전 종료) + Random (개체별 dropout) 의 결합:

환자 일부는 dropout 으로 random censoring.
나머지 살아있는 환자는 종료일에 Type I censoring.
식 (3.5.6) 가 두 경우 모두 적용 가능 (random 이 non-informative 이고 Type I 도 fixed).

3.8 비독립 Random Censoring (Theoretical Note 3)

\(X\) 와 \(C_r\) 의 joint survival \(S(x, c)\) 가 종속이면 — likelihood 가 다름:

\[ L_{III} \propto \prod_i \left\{[-\partial S(x, t_i)/\partial x]_{x=t_i}\right\}^{\delta_i} \left\{[-\partial S(t_i, c)/\partial c]_{c=t_i}\right\}^{1-\delta_i} \]

이는 식 (3.5.6) 와 — 상당히 다를 수 있다. informative censoring 의 표준 분석은 inverse probability of censoring weighting (IPCW) 또는 sensitivity analysis 가 필요.

4 6 형태 → Ch.1 19 예제 매핑

정전 매핑

Klein 분류	Ch.1 정전 사례	특징
Type I (단순)	NCTR carcinogen 실험 (예제 외)	모든 개체 동일 종료
Generalized Type I	§ 1.2 Leukemia, § 1.5 Breast, § 1.14 Weaning, § 1.15 Psychiatric	개체별 입학, 공통 종료일
Progressive Type I	42·104 주 mouse 실험	다단계 sacrifice
Type II	신뢰성 공학 전구 시험	첫 r 사건
Progressive Type II	동물 실험 변종	다단계 random
Random / Competing	§ 1.3 BMT, § 1.7 Kidney, § 1.13 Pneumonia	dropout, competing causes

대부분의 임상시험 = Generalized Type I + Random censoring 결합.

4.1 Klein 19 예제 의 censoring 형태

19 예제 censoring 분류

§ 1.2 Leukemia (Freireich 1963) — Generalized Type I.
§ 1.3 BMT (Copelan 1991) — Generalized Type I + competing risks.
§ 1.4 Dialysis (Nahman 1992) — Generalized Type I.
§ 1.5 Breast cancer (Sedmak 1989) — Generalized Type I.
§ 1.6 Burn (Ichida 1993) — Generalized Type I.
§ 1.7 Kidney transplant (OSU) — Random + Type I.
§ 1.8 Laryngeal (Kardaun 1983) — Generalized Type I.
§ 1.9 Auto/Allo BMT (IBMTR) — Generalized Type I + competing risks.
§ 1.10 Lymphoma BMT (Avalos 1993) — Generalized Type I.
§ 1.11 Tongue (Sickle-Santanello 1988) — Generalized Type I.
§ 1.12 STD (877명) — Generalized Type I.
§ 1.13 Pneumonia (NLSY) — Generalized Type I + dropout (random).
§ 1.14 Weaning (NLSY) — Generalized Type I.
§ 1.15 Psychiatric (Woolson 1981) — Generalized Type I.
§ 1.16 Channing (Hyde 1980) — Generalized Type I + left truncation (Ch.3 § 3.4).
§ 1.17 Marijuana (Turnbull-Weiss 1978) — Doubly censored (Ch.3 § 3.3).
§ 1.18 Breast cosmetic (Beadle 1984) — Interval censored (Ch.3 § 3.3).
§ 1.19 AIDS (Lagakos 1988) — Right truncated (Ch.3 § 3.4).

Right censoring 6 형태로 분류 안 되는 4 사례 (§ 1.16~1.19) 가 — Ch.3 § 3.3~3.4 의 출생 동기.

5 R + Python — 6 형태 시뮬레이션 비교

5.1 시뮬레이션 setup

진짜 분포: Exponential(\(\lambda = 0.05\)/week), \(n = 100\).

Type I: \(C_r = 30\) weeks (모두 동일).
Generalized Type I: 입학 시점 \(E_i \sim U(0, 12)\), 종료일 \(T^* = 30\), \(C_{r,i} = 30 - E_i\).
Type II: 첫 \(r = 70\) 사건까지.
Random: \(C_{r,i} \sim\) Exp(\(\mu = 0.02\), 평균 50 weeks).

5.2 R — 6 형태 시뮬레이션

library(survival)
set.seed(42)

n <- 100
lambda_true <- 0.05
X <- rexp(n, rate = lambda_true)

# (1) Type I: C_r = 30
T1 <- pmin(X, 30)
delta1 <- as.integer(X <= 30)

# (2) Generalized Type I
E <- runif(n, 0, 12)
T_star <- 30
Cr_gen <- T_star - E  # 개체별 censoring time
T2 <- pmin(X, Cr_gen)
delta2 <- as.integer(X <= Cr_gen)

# (3) Progressive Type I: 15주에 30명, 30주에 나머지
sacrifice_idx <- sample(which(X > 15), 30)
Cr_prog <- ifelse(seq_len(n) %in% sacrifice_idx, 15, 30)
T3 <- pmin(X, Cr_prog)
delta3 <- as.integer(X <= Cr_prog)

# (4) Type II: 첫 70 사건
sorted_X <- sort(X)
T_r <- sorted_X[70]
T4 <- pmin(X, T_r)
delta4 <- as.integer(X <= T_r)

# (5) Random censoring (independent)
C_r <- rexp(n, rate = 0.02)
T5 <- pmin(X, C_r)
delta5 <- as.integer(X <= C_r)

# (6) Random + Type I 결합
T6 <- pmin(X, C_r, 30)
delta6 <- as.integer(X <= pmin(C_r, 30))

# 6 형태별 MLE 비교 (exponential 모형)
schemes <- list(
  TypeI = list(t = T1, d = delta1),
  GenTypeI = list(t = T2, d = delta2),
  ProgI = list(t = T3, d = delta3),
  TypeII = list(t = T4, d = delta4),
  Random = list(t = T5, d = delta5),
  Combined = list(t = T6, d = delta6)
)

results <- data.frame(
  scheme = names(schemes),
  n_events = sapply(schemes, function(x) sum(x$d)),
  total_time = sapply(schemes, function(x) sum(x$t)),
  lambda_hat = sapply(schemes, function(x) sum(x$d) / sum(x$t)),
  bias_pct = sapply(schemes, function(x)
    100 * (sum(x$d)/sum(x$t) - lambda_true) / lambda_true)
)
print(results)

# Type I 의 KM curve
fit1 <- survfit(Surv(T1, delta1) ~ 1)
plot(fit1, lwd = 2, col = "red", xlab = "Weeks", ylab = "S(t)",
     main = "Type I censoring (C_r = 30)")
curve(exp(-lambda_true * x), 0, 50, add = TRUE, col = "black", lty = 2, lwd = 2)
legend("topright", c("KM", "True S"), col = c("red", "black"),
       lty = c(1, 2), lwd = 2)

5.3 Python — Likelihood 직접 계산

import numpy as np
import pandas as pd
from scipy.optimize import minimize_scalar
import matplotlib.pyplot as plt
from lifelines import KaplanMeierFitter

np.random.seed(42)
n = 100
lambda_true = 0.05
X = np.random.exponential(1/lambda_true, n)

# Negative log-likelihood (Type I/Generalized/Random — 모두 동일 식 3.5.3)
def neg_loglik_exp(lam, t, delta):
    return -np.sum(delta * np.log(lam) - lam * t)

schemes = {}

# (1) Type I
schemes["Type I"] = (np.minimum(X, 30), (X <= 30).astype(int))

# (2) Generalized Type I
E = np.random.uniform(0, 12, n)
Cr_gen = 30 - E
schemes["Gen Type I"] = (np.minimum(X, Cr_gen), (X <= Cr_gen).astype(int))

# (3) Type II — 첫 70 사건
sorted_X = np.sort(X)
T_r = sorted_X[69]  # r=70
schemes["Type II"] = (np.minimum(X, T_r), (X <= T_r).astype(int))

# (4) Random
C_r = np.random.exponential(1/0.02, n)
schemes["Random"] = (np.minimum(X, C_r), (X <= C_r).astype(int))

# 6 형태별 MLE
results = []
for name, (t, d) in schemes.items():
    res = minimize_scalar(neg_loglik_exp, args=(t, d),
                          bounds=(0.001, 1), method="bounded")
    lam_hat = res.x
    results.append({
        "scheme": name,
        "n_events": int(d.sum()),
        "total_time": t.sum(),
        "lambda_hat": lam_hat,
        "bias_pct": 100 * (lam_hat - lambda_true) / lambda_true
    })
print(pd.DataFrame(results).to_string(index=False))

# 4 schemes KM 비교
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
ts = np.linspace(0, 60, 100)
true_S = np.exp(-lambda_true * ts)

for ax, (name, (t, d)) in zip(axes.flatten(), schemes.items()):
    kmf = KaplanMeierFitter().fit(t, d)
    kmf.plot_survival_function(ax=ax, color="red", ci_show=True)
    ax.plot(ts, true_S, "k--", label="True S(t)", linewidth=2)
    ax.set_title(f"{name}: events={d.sum()}, λ̂={results[list(schemes).index(name)]['lambda_hat']:.4f}")
    ax.set_xlabel("Time")
    ax.set_ylabel("S(t)")
    ax.legend()

plt.tight_layout()
plt.savefig("klein_3_2_right_censoring.png", dpi=100)

결과 해석

6 형태 모두 unbiased (대표본에서) — non-informative censoring 가정 만족.
사건 수 \(r\) 이 정밀도 결정 — 시간 \(S_T\) 가 같아도 \(r\) 이 작으면 분산 큼.
Type II 는 사건 수 정확히 \(r = 70\) 으로 통제 — 신뢰성 시험 design 의 장점.
Random + Type I 결합 은 가장 흔한 임상시험 형태.

6 직관 통합 — 6 형태의 통일 메시지

핵심 5 가지 교훈

6 형태 모두 식 (3.5.1) 의 특수 경우 — censoring 기여는 \(S(C_{r,i})\), event 기여는 \(f(t_i)\). 이 공식이 모든 right censoring 의 통일 표현.
Non-informative 가정:
- Type I/Generalized/Progressive Type I — fixed \(C_r\), 자동 만족.
- Type II/Progressive Type II — fixed \(r\), 자동 만족.
- Random — 독립성 + \(G\) 가 \(f\) 와 무관, 검증 불가.
사건 수 \(r\) 이 정밀도 결정 — \(\text{Var}(\hat{\lambda}) \approx \lambda^2/r\) for exponential. 표본 크기 \(n\) 이 아닌 사건 수가 핵심.
임상시험 = Generalized Type I + Random. 환자 입학 시점 분산 + dropout 동시 존재.
신뢰성 공학 = Type II 가 흔함. 사건 수 사전 결정 + 비용 예측 가능.

7 실전 체크리스트 — § 3.1~3.2

Censoring 식별

Right censoring vs left/interval censoring 구분.
Censoring vs truncation 구분.
6 형태 (Type I·Generalized·Progressive·Type II·Progressive II·Random) 식별.

Likelihood

Master 식 \(L = \prod f^{\delta} S^{1-\delta}\) (식 3.5.3).
Exponential 의 closed form \(L = \lambda^r e^{-\lambda S_T}\) (식 3.5.4).
Type II 의 order statistics 표현 (식 3.5.7).
Random censoring 의 independence + non-informative 분리.

가정 검증

Non-informative censoring 가정 — 검증 불가, 도메인 지식.
Lexis diagram 으로 generalized Type I 시각화.
Dropout 의 informativeness 평가.

임상 매핑

Ch.1 19 예제의 censoring 형태 분류.
임상시험 = Generalized Type I + Random 결합 인식.

8 관련 주제

Klein 시리즈

(이전) Ch.3 overview
(다음) § 3.3~3.4 심화 — Left/Interval Censoring + Truncation (예정)
(다음) § 3.5~3.6 심화 — Likelihood + Counting Process (예정)

Ch.1 시리즈 — right censoring 정전 예제

관련 개념 (cross-category)

9 참고문헌

Klein, J. P., & Moeschberger, M. L. (2003). Survival Analysis: Techniques for Censored and Truncated Data (2nd ed.), Ch.3 § 3.1~3.2, pp. 63-70. Springer.
David, H. A., & Moeschberger, M. L. (1978). The Theory of Competing Risks. Griffin.
David, H. A. (1981). Order Statistics, 2nd ed. Wiley.
Keiding, N. (1990). Statistical inference in the Lexis diagram. Philosophical Transactions of the Royal Society A, 332, 487-509.
Tsiatis, A. (1975). A nonidentifiability aspect of the problem of competing risks. PNAS, 72(1), 20-22.
Robins, J. M., & Rotnitzky, A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology, Birkhäuser.
Andersen, P. K., Borgan, Ø., Gill, R. D., & Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer.
Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data, 2nd ed. Wiley.
Kalbfleisch, J. D., & Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd ed. Wiley.
Therneau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer.