Kwangmin Kim - 합성 통제와 비교

출처

이 글은 사전지식 기반 (교재 미확인 — agent 사전학습 기반). 핵심 인용 — Abadie & Gardeazabal (2003), Abadie, Diamond, Hainmueller (2010), Xu (2017), Athey et al. (2021).

이 글은 Phase J 시리즈의 16 번째 글이자 J-DID 시리즈의 마지막 글. DiD 의 친척 이자 최근 인기 인 Synthetic Control Method 를 다룬다.

1 진입 직관 — “대조 group 을 직접 만들기”

DiD 는 대조 group 이 명확히 존재 가정. 그러나:

현실의 도전: 단일 처치 unit (예: 바스크 지역, 캘리포니아 주) 만 있고, 대조로 적합한 단일 unit 없음. 다수 후보 대조 unit 만 존재.

Synthetic Control 의 한 줄 원리: 후보 대조 unit 들의 가중 평균 으로 합성 대조 (synthetic control) 만들기. 가중치는 처치 전 outcome 의 처치 unit 과 일치 기준.

즉 데이터-driven 으로 가장 비슷한 가상 대조 만들기.

비유 — 음악 추천: 한 유저의 비슷한 가상 유저 만들기 위해 다른 유저들의 가중 합. 그 합성 유저의 향후 행동 으로 원래 유저의 반사실 추정.

2 Abadie & Gardeazabal (2003) — 바스크 사례

2.1 시나리오

1968~1997 년 스페인 바스크 지역 의 바스크 분리주의 테러. 경제 영향 측정.

2.2 도전

바스크 만 처치 받음 (테러 활동). 비슷한 지역 어디?

2.3 방법

다른 스페인 지역들 (15+) 의 가중 평균 으로 합성 바스크. 가중치는:

1955~1969 (테러 전) 의 경제 지표 (GDP per capita 등) 가 바스크와 일치

2.4 결과

합성 바스크 (전쟁 안 일어났다면 가상 바스크) 의 GDP 와 실제 바스크 GDP 비교. 차이 = 테러의 경제 비용.

결과: 1980 년대 후반 연 10% GDP 감소 — 큰 비용.

2.5 영향

Synthetic Control 의 첫 응용. 이후 polit ics, economics, public health 광범위 응용.

3 Abadie, Diamond, Hainmueller (2010) — California Tobacco

3.1 시나리오

1988 년 California 의 Proposition 99 — 담배세 인상 + 흡연 캠페인. 담배 소비 감소 효과 측정.

3.2 합성 California

38 개 주의 가중 합 — pre-1988 담배 소비 + covariate (인구, 소득 등) 가 California 와 일치.

3.3 결과

합성 California 의 1988 후 담배 소비 vs 실제 California. 큰 차이 — Prop 99 의 명확한 인과 효과.

3.4 Statistical Significance

Placebo test: 다른 주에 Prop 99 가 있었다고 가정 하고 같은 분석. California 의 효과 크기가 placebo 들 중 가장 큼 — 통계적 유의.

4 Synthetic Control 알고리즘

SCM Algorithm (Abadie, Diamond, Hainmueller 2010)

Input:
  - 처치 unit (1개)
  - J 개 후보 대조 unit
  - Pre-treatment outcomes (시점 1~T_0)
  - K 개 covariate

Step 1: 가중치 결정
  W = (w_1, ..., w_J) 찾기
  - Constraint: w_j ≥ 0, Σ w_j = 1
  - Objective: 합성 대조의 pre-treatment outcomes 와 covariate 가
    처치 unit 과 일치 (Mean Squared Prediction Error 최소화)

Step 2: 합성 대조 outcome 계산
  Y_synth(t) = Σ w_j * Y_j(t)   for each t

Step 3: 효과 추정
  ATT(t) = Y_treated(t) - Y_synth(t)   for t > T_0

4.1 가중치 결정 — 수식

Pre-treatment 거리 최소화:

\[ W^* = \arg\min_W \|X_T - X_C W\|^2_V \]

\(X_T\): 처치 unit 의 pre-treatment outcome + covariate \(X_C\): 후보 대조 unit 들의 같은 데이터 \(V\): 변수별 weight 행렬 (보통 outcome predictive importance)

4.2 Constraint

Convex weights — \(w_j \geq 0\), \(\sum w_j = 1\). Extrapolation 회피 — 합성 대조가 후보들의 convex hull 안.

4.3 Donor Pool

후보 대조 unit 의 집합. 처치 unit 과 비슷한 unit 만 포함. 너무 다른 unit 은 제외.

5 DiD vs Synthetic Control — Trade-off

측면	DiD	Synthetic Control
처치 unit 수	다수	단일 (또는 소수)
대조 group	명확	합성 (가중 평균)
가정	Parallel Trends	Pre-treatment fit
Verifiability	Pre-trend test	Pre-treatment MSPE
Inference	Cluster SE / Wild bootstrap	Permutation / Placebo
응용	Panel data 다수	단일 정책 사례

5.1 언제 SCM 우선

처치 unit 수 적음 (단일 또는 2~3)

후보 대조 unit 다수

Pre-treatment 시점 충분 (예: 10+ years)

Continuous outcome (binary 어려움)

5.2 언제 DiD 우선

처치 unit 다수

명확한 대조 group

Panel data 풍부

6 Generalized Synthetic Control (Xu 2017)

6.1 동기

다수 처치 unit 으로 SCM 확장. Interactive fixed effects model + factor analysis.

6.2 알고리즘 (간략)

Pre-treatment data 로 factor model 학습: \[Y_{it} = \lambda_i^\top f_t + \varepsilon_{it}\]

Factor 사용하여 처치 unit 의 post-treatment counterfactual 예측

ATT = observed - predicted counterfactual

6.3 장점

DiD 와 SCM 의 통합. 다수 처치 unit + factor structure.

6.4 R Package: `gsynth`

7 Matrix Completion (Athey et al. 2021)

7.1 동기

Panel data 를 low-rank matrix + noise 로 모델링. Missing values (counterfactual outcomes) 을 matrix completion 으로 채움.

7.2 메커니즘

Outcome matrix \(Y\) — 행: unit, 열: 시점

처치 받은 cell 의 counterfactual 은 missing

Nuclear norm penalized regression 으로 missing 채움

ATT = observed - imputed counterfactual

7.3 장점

Modern ML 방법론. Cross-validation 으로 hyperparameter tuning. 큰 데이터에서 효율적.

7.4 응용

Causal Panel Methods 의 frontier. Athey 등의 표준 도구.

8 Card-Krueger 재방문 — Synthetic Control 시각

8.1 원 분석 (1994)

DiD: NJ vs PA. 2x2. 결과: 최저 임금 인상 고용 감소 없음.

8.2 비판 (Neumark & Wascher 2000)

다른 데이터 + 분석. 고용 감소 발견. Card-Krueger 의 데이터·분석 한계.

8.3 후속 — Synthetic Control

Sabia, Burkhauser, Hansen (2012) 등이 Synthetic NJ 분석. 합성 대조 가 PA 단독 보다 더 적합 가능성.

8.4 Cengiz, Dube, Lindner, Zipperer (2019)

138 주별 최저 임금 인상의 event-study + bunching estimator. 고용 감소 거의 없음, 임금 분포 변화. Card-Krueger 의 큰 그림 지지.

8.5 함의

한 정책 평가에 여러 방법론 적용 — robustness 확인. SCM, DiD, event-study 모두.

9 시뮬레이션 — Synthetic Control

import numpy as np
from scipy.optimize import minimize

np.random.seed(42)

# 시나리오: 1 처치 unit + 10 후보 대조 unit, 10 시점 (t=0~4 pre, t=5~9 post)
T_periods = 10
t_treat = 5

# 후보 대조 unit 의 outcome (10 unit, 10 시점)
n_donors = 10
np.random.seed(42)
# 각 unit 의 baseline + 시간 trend
donors = np.random.normal(50, 10, (n_donors, T_periods))
for j in range(n_donors):
    donors[j] += np.linspace(0, 5, T_periods)   # 시간 trend
    donors[j] += np.random.normal(0, 1, T_periods)   # noise

# 처치 unit: 후보 1, 3, 5 의 평균 + 처치 효과
true_weights = np.zeros(n_donors)
true_weights[[1, 3, 5]] = 1/3
treated = donors @ true_weights + np.random.normal(0, 1, T_periods)

# 처치 효과 (t >= t_treat)
true_ATT = 5
treated_observed = treated.copy()
treated_observed[t_treat:] += true_ATT

print("[Synthetic Control 시뮬레이션]\n")
print(f"진짜 처치 효과: {true_ATT}")
print(f"진짜 weights (donor 1, 3, 5 = 1/3): {true_weights.round(3)}\n")

# Synthetic Control: pre-period MSPE 최소화
def mspe(w, donors_pre, treated_pre):
    synth = donors_pre.T @ w
    return np.mean((treated_pre - synth) ** 2)

# Pre-treatment data
donors_pre = donors[:, :t_treat]
treated_pre = treated_observed[:t_treat]

# Optimization with simplex constraint (w ≥ 0, sum = 1)
def constraint_sum(w):
    return np.sum(w) - 1

result = minimize(
    mspe,
    x0=np.ones(n_donors) / n_donors,
    args=(donors_pre, treated_pre),
    method='SLSQP',
    bounds=[(0, 1)] * n_donors,
    constraints={'type': 'eq', 'fun': constraint_sum}
)
w_synth = result.x

print(f"학습된 weights: {w_synth.round(3)}")

# Synthetic control outcomes
synth_outcomes = donors.T @ w_synth

# Effect over time
print(f"\n[처치 효과 over time (post-period)]")
for t in range(t_treat, T_periods):
    effect = treated_observed[t] - synth_outcomes[t]
    print(f"  t={t}: effect = {effect:+.3f}")

avg_post_effect = (treated_observed[t_treat:] - synth_outcomes[t_treat:]).mean()
print(f"\n평균 post-period 효과: {avg_post_effect:.3f}")
print(f"진짜 효과 5 와 비교: {avg_post_effect:.3f} vs {true_ATT}")

10 결론

Synthetic Control 은 단일 처치 unit + 다수 대조 의 표준. 합성 대조 의 가중 평균 으로 반사실 추정. DiD 와 보완·경쟁. 최근 Generalized SCM 과 Matrix Completion 으로 확장.

핵심 메시지:

SCM 정의: 후보 대조의 가중 평균으로 합성 대조
가중치 결정: Pre-treatment fit (MSPE 최소화)
바스크, California Tobacco: 고전 사례
DiD vs SCM: 처치 unit 수, 가정, 응용
Generalized SCM (Xu 2017): 다수 처치 unit
Matrix Completion (Athey 2021): Modern ML
Card-Krueger 재방문: 다양한 방법론의 가치

J-DID 시리즈 종합:

J-DID-0: Overview
J-DID-1: PTA + TWFE
J-DID-2: Staggered (최근 연구)
J-DID-3: Synthetic Control — 이 글

다음 시리즈 (J-RDD): Regression Discontinuity Design.

11 관련 주제

선행 지식

DiD 시리즈
(Phase D) Counterfactual framework

Phase J 후속 시리즈

RDD — Regression Discontinuity (placeholder, 곧 Causal_Inference 폴더로 작성)
Switchback·Geo (placeholder)
Adaptive Trial (placeholder)

12 참고문헌

Abadie, A. & Gardeazabal, J. (2003). The economic costs of conflict: A case study of the Basque Country. American Economic Review 93, 113-132.
Abadie, A., Diamond, A., Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. J. Amer. Statist. Assoc. 105, 493-505.
Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature 59, 391-425.
Xu, Y. (2017). Generalized synthetic control method: Causal inference with interactive fixed effects models. Political Analysis 25, 57-76.
Athey, S., Bayati, M., Doudchenko, N., Imbens, G., Khosravi, K. (2021). Matrix completion methods for causal panel data models. J. Amer. Statist. Assoc. 116, 1716-1730.
Cengiz, D., Dube, A., Lindner, A., Zipperer, B. (2019). The effect of minimum wages on low-wage jobs. QJE 134, 1405-1454.
Sabia, J. J., Burkhauser, R. V., Hansen, B. (2012). Are the effects of minimum wage increases always small? Industrial and Labor Relations Review 65, 350-376.

1 진입 직관 — “대조 group 을 직접 만들기”

2 Abadie & Gardeazabal (2003) — 바스크 사례

2.1 시나리오

2.2 도전

2.3 방법

2.4 결과

2.5 영향

3 Abadie, Diamond, Hainmueller (2010) — California Tobacco

3.1 시나리오

3.2 합성 California

3.3 결과

3.4 Statistical Significance

4 Synthetic Control 알고리즘

4.1 가중치 결정 — 수식

4.2 Constraint

4.3 Donor Pool

5 DiD vs Synthetic Control — Trade-off

5.1 언제 SCM 우선

5.2 언제 DiD 우선

6 Generalized Synthetic Control (Xu 2017)

6.1 동기

6.2 알고리즘 (간략)

6.3 장점

6.4 R Package: gsynth

7 Matrix Completion (Athey et al. 2021)

7.1 동기

7.2 메커니즘

7.3 장점

7.4 응용

8 Card-Krueger 재방문 — Synthetic Control 시각

8.1 원 분석 (1994)

8.2 비판 (Neumark & Wascher 2000)

8.3 후속 — Synthetic Control

8.4 Cengiz, Dube, Lindner, Zipperer (2019)

8.5 함의

9 시뮬레이션 — Synthetic Control

10 결론

11 관련 주제

12 참고문헌

6.4 R Package: `gsynth`