Kwangmin Kim - McCrary Density Test + 진단

출처

이 글은 사전지식 기반 (교재 미확인 — agent 사전학습 기반). 핵심 인용 — McCrary (2008), Cattaneo, Jansson, Ma (2020), Imbens & Lemieux (2008), Lee & Lemieux (2010).

이 글은 J-RDD 시리즈의 마지막 글. RDD 가정의 검증 — McCrary density test + 다른 robustness check — 을 다룬다.

1 진입 직관 — “RDD 가정이 진짜로 성립하는가?”

RDD 의 식별 가정:

Continuity: 다른 모든 변수의 distribution 이 cutoff 에서 연속
No manipulation: cutoff 양쪽으로 조작 불가능

이 가정들이 실제로 성립 하는지 검증 해야. 그렇지 않으면 RDD 추정치가 biased.

검증 방법 4 가지:

McCrary density test: running variable manipulation

Covariate balance: 다른 변수의 cutoff 연속성

Placebo cutoff: 가짜 cutoff 에서의 jump

Bandwidth sensitivity: bandwidth 변경 시 추정치 안정성

2 McCrary Density Test (2008)

2.1 동기

만약 사람들이 cutoff 위로 점수 조작 가능하면 cutoff 바로 위 에 sample 몰림. Running variable 의 density 가 cutoff 에서 jump.

2.2 사례

SAT 1300 점 cutoff 가 입학 기준. 학생들이 시험 문제 review, 재시험 등으로 1300 점 위로 조작. 1295~1299 점 sample 적고 1300~1304 점 sample 많음 — density jump.

2.3 검정

Running variable 의 density 를 cutoff 양쪽에서 별도 추정 + cutoff 에서 동일 검정.

Local linear density estimator — McCrary (2008) 의 핵심.

2.4 통계량

\[ \hat{\theta} = \log\hat{f}(c^+) - \log\hat{f}(c^-) \]

\(\hat{f}(c^+)\), \(\hat{f}(c^-)\): cutoff 우/좌의 density 추정치.

\(H_0: \theta = 0\) (manipulation 없음).

2.5 R Package

DCdensity() from rdd package.

또는 rddensity (Cattaneo et al.) — 다음 절.

3 Cattaneo, Jansson, Ma (2020) — 개선 방법

3.1 동기

McCrary 의 kernel density estimator 의 boundary bias. Local polynomial 로 개선.

3.2 메커니즘

Local polynomial density estimator — boundary 에서 더 정확. Robust SE 포함.

3.3 R/Python Package: `rddensity`

library(rddensity)
result <- rddensity(X = data$running_var, c = cutoff)
summary(result)

Modern RDD 의 표준 manipulation test.

4 Covariate Balance — Continuity 검증

4.1 동기

Cutoff 양쪽의 다른 covariate 가 연속 인가? Jump 발견 시 RDD 가정 위반.

4.2 사례

나이 65 cutoff (Medicare):

성별 distribution 이 cutoff 에서 연속?

인종 distribution 이 cutoff 에서 연속?

직업 distribution 이 cutoff 에서 연속?

만약 65 세 바로 직전 에 유난히 여성 많음 → continuity 위반 → confounding.

4.3 검정

각 covariate 를 outcome 처럼 RDD 분석. Cutoff 에서 jump 의 통계적 유의.

모든 covariate 의 jump 가 0 에 가까움 이 이상적.

4.4 보고

표 형식 — 각 covariate 의 cutoff jump + SE + p-value.

5 Placebo Cutoff — 가짜 cutoff 검증

5.1 동기

진짜 cutoff \(c\) 가 아니라 임의의 다른 위치 에서도 jump 가 발견된다면, RDD 결과는 spurious.

5.2 메커니즘

임의의 placebo cutoff \(c'\) 선택 (예: \(c \pm 0.5\)). Placebo cutoff 에서 RDD 분석. Jump 가 0 에 가까운가?

만약 placebo 에서도 큰 jump 발견 → 진짜 cutoff 의 jump 도 artifact 가능성.

5.3 보고

여러 placebo cutoff 의 jump distribution. 진짜 cutoff 의 jump 가 극단값 (placebo 들 중 가장 큼) 인가?

6 Bandwidth Sensitivity Analysis

6.1 동기

Bandwidth 변경 시 RDD 추정치가 안정 인가?

6.2 메커니즘

여러 bandwidth (\(h_1, h_2, ..., h_K\)) 로 RDD 분석. 추정치 + SE 의 변화 패턴.

6.3 보고

표 또는 plot — bandwidth vs RDD estimate. 안정 (변화 작음) 이 이상적.

큰 변동 시 bandwidth-dependent artifact 의심.

7 다른 진단

7.1 Donut Hole Test

Cutoff 바로 옆 sample (예: \(|X - c| < \epsilon\)) 을 제외 하고 RDD 분석. Manipulation 의심 일 때 유용.

7.2 Visual Inspection

RDD plot — outcome scatter + fitted line. Cutoff jump 가 시각적으로 명확? Cutoff 멀리 의 추세는 연속?

모든 RDD paper 에 이런 plot 첨부 표준.

8 응용 사례 — Test Score Manipulation

8.1 Almond, Doyle, Kowalski, Williams (2010)

저체중 신생아 1500g cutoff 에서 집중 치료 효과.

McCrary test: 1500g 양쪽의 birth weight density. Manipulation 없음 (1500g 정확히 측정 어려움).

RDD 결과 신뢰. NICU 입원 의 사망률 감소 효과 추정.

8.2 Camacho & Conover (2011)

Colombia 의 빈곤 지원 score cutoff. Manipulation 발견 — score 조작 가능.

McCrary test 가 significant jump. RDD 결과 biased 가능성. 결과 해석 신중.

8.3 Eggers et al. (2015)

50% 득표 cutoff (선거) 의 manipulation. 대부분의 선거 데이터에서 manipulation 없음 (정밀 측정).

예외: Mexico 의 일부 선거에서 manipulation 발견.

9 실제 분석 워크플로우

9.1 단계

Visual inspection: RDD plot 확인
McCrary density test: manipulation
Covariate balance: 다른 변수 continuity
Local linear regression with CCT: 점 추정 + SE
Bandwidth sensitivity: 안정성
Placebo cutoff: spurious jump 의심
종합 해석: 모든 진단 통과 시 RDD 결과 신뢰

9.2 보고 형식

논문 / 보고서 에 표:

Main RDD result (CCT bandwidth)

McCrary p-value

Covariate balance table

Bandwidth sensitivity (h × 0.5, 1, 2)

Placebo cutoff results

10 시뮬레이션 — McCrary Test

import numpy as np
from scipy.stats import norm

np.random.seed(42)

# Scenario 1: Manipulation 없음
n = 5000
X_no_manip = np.random.uniform(2.0, 4.0, n)

# Scenario 2: Manipulation 있음 — 3.5 바로 위로 조작
X_manip = X_no_manip.copy()
# 3.4~3.5 sample 의 30% 가 3.5~3.6 으로 이동
manip_mask = (X_manip >= 3.4) & (X_manip < 3.5) & (np.random.random(n) < 0.3)
X_manip[manip_mask] = X_manip[manip_mask] + 0.1

# McCrary test (간단 구현)
def mccrary_test(X, cutoff, h=0.2):
    """Naive density jump test."""
    left_count = ((X >= cutoff - h) & (X < cutoff)).sum()
    right_count = ((X >= cutoff) & (X < cutoff + h)).sum()
    n_total = ((X >= cutoff - h) & (X < cutoff + h)).sum()

    f_left = left_count / (n_total * h)
    f_right = right_count / (n_total * h)

    log_diff = np.log(f_right) - np.log(f_left)

    # SE (간략)
    se = np.sqrt((1/left_count + 1/right_count))
    z = log_diff / se
    p_value = 2 * (1 - norm.cdf(abs(z)))

    return log_diff, se, z, p_value

print(f"[McCrary Density Test 시뮬레이션]\n")

print(f"Scenario 1: Manipulation 없음")
log_diff, se, z, p = mccrary_test(X_no_manip, cutoff=3.5)
print(f"  log(f+) - log(f-) = {log_diff:+.3f}")
print(f"  z = {z:.2f}, p = {p:.4f}")
print(f"  → {'유의 jump (manipulation 의심)' if p < 0.05 else 'no significant jump (OK)'}")

print(f"\nScenario 2: Manipulation 있음")
log_diff, se, z, p = mccrary_test(X_manip, cutoff=3.5)
print(f"  log(f+) - log(f-) = {log_diff:+.3f}")
print(f"  z = {z:.2f}, p = {p:.4f}")
print(f"  → {'유의 jump (manipulation 의심)' if p < 0.05 else 'no significant jump'}")

# Histogram 확인
print(f"\n[Histogram bins (cutoff=3.5)]")
print(f"Scenario 1 (no manip):")
for low, high in [(3.3, 3.4), (3.4, 3.5), (3.5, 3.6), (3.6, 3.7)]:
    count = ((X_no_manip >= low) & (X_no_manip < high)).sum()
    print(f"  [{low:.1f}, {high:.1f}): {count}")

print(f"Scenario 2 (manip):")
for low, high in [(3.3, 3.4), (3.4, 3.5), (3.5, 3.6), (3.6, 3.7)]:
    count = ((X_manip >= low) & (X_manip < high)).sum()
    print(f"  [{low:.1f}, {high:.1f}): {count}")

11 결론

RDD 의 가정 검증은 McCrary density test, covariate balance, placebo cutoff, bandwidth sensitivity 의 결합. 모든 진단 통과 시 RDD 결과 신뢰. Cattaneo, Jansson, Ma (2020) 의 local polynomial density test 가 modern 표준.

핵심 메시지:

McCrary density test: manipulation 탐지
CJM (2020): local polynomial 개선
Covariate balance: continuity 검증
Placebo cutoff: spurious jump 검증
Bandwidth sensitivity: 추정치 안정성
Donut hole + Visual inspection: 추가 진단
워크플로우: 단계적 종합 분석

J-RDD 시리즈 종합:

J-RDD-0: Overview
J-RDD-1: Sharp vs Fuzzy
J-RDD-2: Local linear regression + bandwidth
J-RDD-3: McCrary + 진단 — 이 글

다음 시리즈 (J-SWITCH): Switchback / Geo / Spillover.

12 관련 주제

선행 지식

Phase J 후속 시리즈

Switchback / Geo / Spillover (placeholder)
Adaptive Trial (placeholder)

13 참고문헌

McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. J. Econometrics 142, 698-714.
Cattaneo, M. D., Jansson, M., Ma, X. (2020). Simple local polynomial density estimators. J. Amer. Statist. Assoc. 115, 1449-1455.
Imbens, G. W. & Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. J. Econometrics 142, 615-635.
Lee, D. S. & Lemieux, T. (2010). Regression discontinuity designs in economics. J. Economic Literature 48, 281-355.
Almond, D., Doyle, J. J., Kowalski, A. E., Williams, H. (2010). Estimating marginal returns to medical care. QJE 125, 591-634.
Camacho, A. & Conover, E. (2011). Manipulation of social program eligibility. AEJ: Economic Policy 3, 41-65.
Eggers, A. C., Fowler, A., Hainmueller, J., Hall, A. B., Snyder, J. M. (2015). On the validity of the regression discontinuity design for estimating electoral effects. AJPS 59, 259-274.

1 진입 직관 — “RDD 가정이 진짜로 성립하는가?”

2 McCrary Density Test (2008)

2.1 동기

2.2 사례

2.3 검정

2.4 통계량

2.5 R Package

3 Cattaneo, Jansson, Ma (2020) — 개선 방법

3.1 동기

3.2 메커니즘

3.3 R/Python Package: rddensity

4 Covariate Balance — Continuity 검증

4.1 동기

4.2 사례

4.3 검정

4.4 보고

5 Placebo Cutoff — 가짜 cutoff 검증

5.1 동기

5.2 메커니즘

5.3 보고

6 Bandwidth Sensitivity Analysis

6.1 동기

6.2 메커니즘

6.3 보고

7 다른 진단

7.1 Donut Hole Test

7.2 Visual Inspection

8 응용 사례 — Test Score Manipulation

8.1 Almond, Doyle, Kowalski, Williams (2010)

8.2 Camacho & Conover (2011)

8.3 Eggers et al. (2015)

9 실제 분석 워크플로우

9.1 단계

9.2 보고 형식

10 시뮬레이션 — McCrary Test

11 결론

12 관련 주제

13 참고문헌

3.3 R/Python Package: `rddensity`