Kwangmin Kim - Spillover Detection

출처

이 글은 사전지식 기반 (교재 미확인 — agent 사전학습 기반). 핵심 인용 — Hudgens & Halloran (2008), Aronow & Samii (2017), Eckles, Karrer, Ugander (2017), Athey, Eckles, Imbens (2018).

이 글은 J-SWITCH 시리즈의 마지막 글. Spillover detection 의 정통 — Aronow-Samii + Hudgens-Halloran — 을 다룬다.

1 진입 직관 — “친구의 효과까지 측정”

Spillover Detection 의 한 줄 원리: 한 사용자의 처치가 친구의 행동 에 영향. 이 간접 효과 (indirect effect) 를 직접 효과 (direct effect) 와 분리 측정.

1.1 사례

Facebook 새 feature: 일부 사용자에게 활성. 활성 사용자 본인의 post 증가 (direct effect). 그 친구들의 like, comment 증가 (indirect effect — friend 의 post 가 더 흥미로워서).

표준 A/B 는 direct + indirect 합 추정 — 모집단 전체에 배포 시 효과 와 다름.

1.2 비유

백신: 본인이 백신 맞으면 감염 위험 ↓ (direct). 주변 사람들이 백신 맞으면 나도 감염 위험 ↓ (herd immunity, indirect).

2 Hudgens & Halloran (2008) — Partial Interference

2.1 의의

Spillover 인과 추론의 정통화. Cluster 단위 spillover 의 공식 framework.

2.2 Partial Interference

가정: spillover 는 같은 cluster 내 에서만 발생, cluster 간 에는 없음.

예: spillover 가 같은 학급 내 만 — 학급 간 분리.

2.3 2-Stage Randomization

Stage 1: cluster 의 처치 비율 \(\alpha\) 를 무작위 선택 (예: 일부 cluster \(\alpha = 0.5\), 다른 cluster \(\alpha = 0.0\)).

Stage 2: cluster 내에서 비율 \(\alpha\) 만큼 unit 무작위 처치.

2.4 Estimands

효과	정의
Direct Effect (DE)	\(\bar{Y}(A_i=1, \alpha) - \bar{Y}(A_i=0, \alpha)\)
Indirect Effect (IE)	\(\bar{Y}(A_i=0, \alpha=0.5) - \bar{Y}(A_i=0, \alpha=0)\)
Total Effect (TE)	\(\bar{Y}(A_i=1, \alpha=0.5) - \bar{Y}(A_i=0, \alpha=0)\)
Overall Effect (OE)	\(\bar{Y}(\alpha=0.5) - \bar{Y}(\alpha=0)\)

2.5 의의

각 효과 의 명료한 정의. Vaccine, education, public health 응용.

3 Aronow & Samii (2017) — Exposure Mapping

3.1 의의

Hudgens-Halloran 보다 일반화. Network 의 임의 spillover 처리.

3.2 Exposure Mapping

Unit \(i\) 의 exposure = \(i\) 의 처치 이웃의 함수. 예:

\(i\) 의 친구 중 처치 받은 비율

\(i\) 의 친구 수 (처치 받은)

Binary: \(i\) 의 친구 중 한 명 이상이 처치 받음 → 1

3.3 메커니즘

Network 정의 (친구 관계, 같은 cluster, 거리 등)

Exposure mapping 결정 (예: \(E_i\) = 친구의 처치 비율)

Unit 들을 exposure level 로 분류

Exposure level 별 outcome 비교 — spillover effect 추정

3.4 추정량

\[ \tau_{\text{spillover}}(e_1, e_2) = \mathbb{E}[Y_i | A_i = 0, E_i = e_1] - \mathbb{E}[Y_i | A_i = 0, E_i = e_2] \]

처치 안 받은 사람 의 exposure level 별 outcome 차이.

3.5 Inverse Probability Weighting (IPW)

각 unit 의 exposure 확률 을 계산. IPW 로 unbiased estimator.

4 Eckles, Karrer, Ugander (2017) — Network Experiment

4.1 동기

Facebook scale 의 network experiment 에서 spillover 처리.

4.2 핵심 contribution

Cluster-randomized 의 효율성 분석. Network 자연 cluster (community detection) 활용.

4.3 사례

Facebook 의 post promotion algorithm 변경. Friend network 의 cluster 단위 처치.

4.4 Trade-off

Cluster 크기 의 trade-off:

큰 cluster: spillover 더 잘 처리, sample 효율 ↓

작은 cluster: 효율 ↑, spillover 일부 누설

5 Athey, Eckles, Imbens (2018) — Exact P-values

5.1 동기

Network interference 하에서 Fisher exact p-value.

5.2 메커니즘

Permutation inference — 처치 무작위 재배정 시 가능한 outcome 의 exact distribution.

5.3 의의

Asymptotic 가정 없이 exact inference. 작은 sample 또는 non-standard exposure 에서 유용.

6 Direct Effect vs Indirect Effect

6.1 정의

Direct effect (DE): 본인이 처치 받음으로 인한 본인의 outcome 변화 (다른 unit 처치 비율 고정)
Indirect effect (IE) = Spillover: 다른 unit 의 처치 비율 변화로 인한 본인의 outcome 변화 (본인의 처치 고정)

6.2 사례 — 백신

DE: 백신 맞은 사람 본인의 감염 위험 감소
IE: 주변 사람 백신 비율 ↑ → 본인 감염 위험 감소 (본인 안 맞아도)

6.3 정책 의의

백신 mandate 의 효과 = DE + IE. 표준 A/B 가 IE 무시하면 효과 과소 평가.

7 응용 사례 — Vaccine

7.1 Halloran et al. (2008+)

학교 단위 vaccine A/B. 일부 학교 대량 vaccination, 다른 학교 일반.

Stage 1: 학교 비율, Stage 2: 학교 내 vaccination.

결과:

DE: 백신 맞은 사람의 감염 위험 50% ↓

IE: 주변 비율 50% ↑ 시 안 맞은 사람의 감염 위험 30% ↓

TE: 60% ↓ — herd immunity 의 추가 효과

8 Network Experiment 의 도전

8.1 1: Network 정의

친구 관계 가 명확한가? 영향력 weight 는?

8.2 2: Exposure Mapping

어떤 함수가 진짜 spillover 를 포착? 잘못된 mapping → biased.

8.3 3: Cluster 식별

자연 cluster 가 있는가? Community detection 의 정확성.

8.4 4: Sample Size

Cluster-randomized 는 sample 효율 ↓. 큰 sample 필요.

9 시뮬레이션 — Network Spillover

import numpy as np

np.random.seed(42)

# 시나리오: 1000 사용자, 친구 그래프 (각자 5 명 친구)
n = 1000
n_friends = 5

# 친구 그래프 (random)
friends = np.zeros((n, n_friends), dtype=int)
for i in range(n):
    candidates = np.setdiff1d(np.arange(n), [i])
    friends[i] = np.random.choice(candidates, n_friends, replace=False)

# 처치 무작위 50%
A = np.random.binomial(1, 0.5, n)

# Outcome:
# - Direct effect: A_i = 1 → +5
# - Indirect (spillover): 친구 중 처치 받은 비율 × 3
true_direct = 5.0
true_indirect = 3.0

# 친구의 처치 비율
friend_treatment_rate = np.zeros(n)
for i in range(n):
    friend_treatment_rate[i] = A[friends[i]].mean()

Y = 50 + true_direct * A + true_indirect * friend_treatment_rate + np.random.normal(0, 3, n)

# Naive A/B
naive = Y[A == 1].mean() - Y[A == 0].mean()
print(f"[Spillover Detection 시뮬레이션]\n")
print(f"진짜 direct effect: {true_direct}")
print(f"진짜 indirect effect (spillover, 친구 비율 *): {true_indirect}\n")

print(f"[표준 A/B]")
print(f"  Naive estimate: {naive:.2f}")
print(f"  → direct effect 비슷, but indirect 분리 안 됨")

# Aronow-Samii: exposure level 별 분석
print(f"\n[Aronow-Samii Exposure Analysis]")
print(f"  처치 안 받은 사람 (A=0) 의 outcome by friend treatment rate:")
for rate_low, rate_high in [(0.0, 0.2), (0.2, 0.4), (0.4, 0.6), (0.6, 0.8), (0.8, 1.0)]:
    mask = (A == 0) & (friend_treatment_rate >= rate_low) & (friend_treatment_rate < rate_high)
    if mask.sum() > 5:
        print(f"    [{rate_low:.1f}, {rate_high:.1f}): n = {mask.sum()}, mean Y = {Y[mask].mean():.2f}")

# IE estimate: high vs low friend treatment rate
high_friend = (A == 0) & (friend_treatment_rate >= 0.6)
low_friend = (A == 0) & (friend_treatment_rate < 0.4)
ie = Y[high_friend].mean() - Y[low_friend].mean()
print(f"\n  IE estimate (high vs low friend exposure): {ie:.2f}")
print(f"  → 진짜 IE (rate 0.7 vs 0.3 → 차이 0.4 × 3 = 1.2) 와 비교")

# 회귀 — direct + indirect 분리
import statsmodels.api as sm

X = np.column_stack([A, friend_treatment_rate])
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit()
print(f"\n[회귀 (Y ~ A + friend_rate)]")
print(f"  β_A (direct): {model.params[1]:.2f}")
print(f"  β_friend_rate (indirect coef): {model.params[2]:.2f}")
print(f"  → direct + indirect 분리, 진짜 효과와 비교")

10 결론

Spillover Detection 은 direct + indirect effect 분리. Hudgens-Halloran 의 2-stage randomization, Aronow-Samii 의 exposure mapping, Eckles 등의 cluster-randomized 가 정통. Vaccine, social network, 광고 등 응용. 모집단 전체 배포 시 효과 를 미리 평가.

핵심 메시지:

Direct vs Indirect (Spillover) Effect: 명료한 분리
Hudgens-Halloran 2008: 2-stage randomization, partial interference
Aronow-Samii 2017: exposure mapping, IPW
Eckles-Karrer-Ugander 2017: cluster-randomized, Facebook scale
Athey-Eckles-Imbens 2018: exact p-values
응용: vaccine, social network, 광고
도전: network 정의, exposure mapping, cluster 식별

J-SWITCH 시리즈 종합:

J-SWITCH-0: Overview
J-SWITCH-1: Switchback (시간 단위)
J-SWITCH-2: Geo Holdout (지역 단위)
J-SWITCH-3: Spillover Detection (분석 단계) — 이 글

다음 시리즈 (J-ADAPT): Adaptive Trial.

11 관련 주제

선행 지식

Phase J 후속 시리즈

Adaptive Trial (placeholder)

12 참고문헌

Hudgens, M. G. & Halloran, M. E. (2008). Toward causal inference with interference. J. Amer. Statist. Assoc. 103, 832-842.
Aronow, P. M. & Samii, C. (2017). Estimating average causal effects under general interference. Annals of Applied Statistics 11, 1912-1947.
Eckles, D., Karrer, B., Ugander, J. (2017). Design and analysis of experiments in networks: Reducing bias from interference. Journal of Causal Inference 5, 1-23.
Athey, S., Eckles, D., Imbens, G. W. (2018). Exact p-values for network interference. J. Amer. Statist. Assoc. 113, 230-240.
Manski, C. F. (2013). Identification of treatment response with social interactions. Econometrics Journal 16, S1-S23.
Halloran, M. E., Longini, I. M., Struchiner, C. J. (2010). Design and Analysis of Vaccine Studies. Springer.