Kwangmin Kim - Goodhart·Campbell·Lucas 법칙 — OEC 설계의 인식론적 함정

1 정의

정의: 세 가지 법칙

OEC 설계의 인식론적 위험을 경고하는 세 법칙 (Kohavi, Tang, Xu, 2020, Ch.7.4).

Goodhart 법칙 (Goodhart 1975, Strathern 1997) “When a measure becomes a target, it ceases to be a good measure.”
Campbell 법칙 (Campbell 1979) “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures.”
Lucas Critique (Lucas 1976) “Relationships observed in historical data cannot be considered structural — policy decisions can alter the structure.”

세 법칙 모두 measurement → distortion + correlation ≠ causation 를 강조. OEC 가 target 이 되는 순간 그 OEC 의 가치가 약화될 수 있다는 경고.

2 개념 및 원리

2.1 Goodhart 법칙

원본 (1975, 영국 통화 정책 맥락): “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

2.1.1 함의

특정 metric 이 의사결정에 사용되면 사람·시스템이 그 metric 을 직접 조작하기 시작 → metric 의 원래 의미 (true value 의 proxy) 가 약화.

2.1.2 예시 1 — 학교 시험

원래: 시험 점수 = 학생의 학력 proxy.

시험 점수가 졸업 결정 + 교사 평가에 사용 → 교사·학생이 점수 자체를 game (test prep, teach to test). 시험 점수와 진정한 학력의 상관 ↓.

2.1.3 예시 2 — 국가 GDP

원래: GDP = 경제 활동 proxy.

GDP 가 정치적 success metric → 정부가 GDP 를 직접 조작 (인프라 over-investment, sustainability 무시). 경제 진정 가치와의 상관 ↓.

2.1.4 예시 3 — 디지털 OEC

원래: CTR = 사용자 만족 proxy.

CTR 이 알고리즘 최적화 target → clickbait 출현. CTR 과 만족의 상관 ↓.

직관 — 왜 이 패턴이 일반적인가

근본 메커니즘: 측정의 한계.

모든 metric 은 진정한 가치 (true value) 의 일부만 capture. 즉 metric = true_value + 무관_factor.

True Value: 학생의 학력
Metric (시험 점수): True Value × 0.7 + Test-Prep_Skill × 0.3

Metric 이 사용 안 되면 Test-Prep_Skill 차원이 작음 → metric ≈ True_Value × 0.7 (좋은 proxy).

Metric 이 사용 + 인센티브 → Test-Prep_Skill 적극 추구. metric 의 분포가 Test-Prep 에 끌려감.

새 분포: True Value × 0.7 + Test-Prep_Skill (game 후) × 0.3 (커짐)

이때 metric ≠ True_Value × 0.7. proxy 의 가중치 변화 = Goodhart 법칙의 메커니즘.

이는 모든 proxy metric 의 일반 운명. 사용 강도에 따라 proxy 의 직각도가 약화.

2.2 Campbell 법칙

Donald Campbell (1979) — 사회 정책 영역에 적용한 Goodhart 의 강한 버전.

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

2.2.1 Goodhart vs Campbell 차이

Goodhart — metric 자체의 가치 약화
Campbell — metric 이 사회 process 자체를 corruption

즉 Campbell 은 metric 이 사회를 metric 의 모양으로 reshape 한다고 주장. 더 강한 주장.

2.2.2 예시 — 미국 No Child Left Behind (NCLB)

2001 년 미국 교육 정책. 학교 ranking = 시험 점수.

결과:

Test-prep 으로 시험 점수 ↑ (Goodhart)
그러나 더 깊은 변화: 학교 커리큘럼 자체가 시험 위주로 reshape. 음악·예술·체육 등 시험 외 과목 축소. 학생들이 “교육” 의 정의 자체를 시험 점수로 internalize (Campbell).

이 사회 process 의 corruption 은 시험 점수의 distortion 보다 더 큰 비용.

가정 — 디지털 OEC 의 Campbell 효과

YouTube 의 watchtime 사례 (사전지식).

Watchtime 이 alg 최적화 target → 알고리즘이 sensational·divisive 콘텐츠 추천 (engagement ↑).

Goodhart: watchtime 의 진정 가치 신호 약화. Campbell: 콘텐츠 생태계 자체가 reshape. 크리에이터가 watchtime 최적화 콘텐츠 제작. 사용자 가 sensational 콘텐츠를 “노멀” 로 internalize.

YouTube 가 satisfied watchtime 으로 진화한 이유: Campbell 효과 detect 후. Goodhart 만 다루 는 metric 변경은 표면적, Campbell 은 system 차원의 변화.

OEC 설계 시 Campbell 함의: OEC 가 사용자 행동·콘텐츠 생태계 자체를 reshape. 단순 수치 distortion 이 아닌 system 차원 영향 고려 필요.

2.3 Lucas Critique

Robert Lucas (1976) — 경제 정책의 historical correlation 사용 비판.

2.3.1 핵심 메시지

Historical data 에서 관찰된 correlation 은 현재 policy regime 하의 행동을 반영.

Policy 변경 → 사람 행동 변경 → 새 correlation. 옛 correlation 은 깨진다.

2.3.2 Phillips Curve 사례

Phillips (1958): 1861~1957 영국 데이터에서 강한 correlation.

Inflation ↑ → Unemployment ↓
Inflation ↓ → Unemployment ↑

이 correlation 으로 정책 가설: “Inflation 을 의도적으로 높이면 Unemployment ↓ 가능.”

1973~1975 미국 recession: 정책으로 inflation 추구. 결과:

Inflation ↑
Unemployment 도 ↑ (예상과 반대)
“Stagflation” 등장

가정 깨짐. Phillips correlation 은 inflation 이 정책 변수가 되기 전에만 hold. 정책으로 사용된 후에는 사람들이 inflation expectation 을 형성 → 임금·가격 결정 패턴 변경 → correlation disappear.

직관 — Lucas Critique 의 OEC 함의

Phillips Curve 의 교훈: historical correlation 으로 policy 만들면 위험.

OEC 적용:

Historical data 에서 metric A 와 goal B 의 강한 correlation 발견
A 를 OEC driver 로 채택
Team 들이 A 를 적극 추구
A 와 B 의 correlation 약화 (Goodhart) 또는 A 의 의미 자체 변화 (Campbell)
OEC 가 더 이상 goal 의 proxy 아님

해결: 인과 검증 (Ch.6.3). historical correlation 은 hypothesis 출발점. 인과 실험 (Ch.6.3 의 metric validation experiment) 으로 검증. 또한 OEC 정기 갱신 시 correlation 재검증.

이 프로세스를 거치지 않으면 OEC 가 결국 Goodhart·Campbell 효과로 가치 상실.

2.4 Tim Harford 의 Fort Knox 비유

Harford (2014, 147): “Fort Knox has never been robbed, so we can save money by sacking the guards.”

2.4.1 비유의 메시지

Historical data (“never robbed”) 만 보면 “guards 불필요” 결론. 그러나 이 결론에 따라 정책 변경 (guards 해고) 시:

강도들이 success 확률 재평가
시도 ↑
“Robbed” 결과

즉 historical “never robbed” 는 guards 가 있는 상태의 결과. guards 제거 후의 결과는 다름. Lucas Critique 의 일반화.

2.4.2 디지털 OEC 적용

“이 metric 으로 결정해도 game 시도 없었다” → metric 이 OEC 가 아니라 hidden 일 때만 사실. OEC 공개 후 game 시도 발생.

따라서 OEC 후보 평가 시 현재 hidden 상태에서의 measure 는 부족. OEC 가 되었을 때의 동작 시뮬레이션 필요.

2.5 Correlation ≠ Causation 의 OEC 함의

세 법칙 모두 핵심: historical correlation 으로 OEC 설계 위험.

2.5.1 인과 검증 5 가지 접근 (Ch.6.3 재인용)

Survey/UER 비교
관찰 데이터 (invalidate)
다른 회사 replication
Metric validation 실험 (인과 establish)
Historical experiments corpus

이 중 4 가 가장 강한 검증. 직접 인과 실험. 다른 4 가지는 보조.

2.5.2 예시 — Customer Loyalty Program

가설: loyalty program 이 retention ↑ 시킨다.

Historical data: loyalty 가입자가 비가입자보다 retention ↑.

함정 (Lucas): loyalty 가입자가 이미 retention 성향이 높았다 (self-selection). loyalty 가 원인이 아닐 수 있음.

검증: loyalty 를 randomized rollout. 일부 사용자에 random 으로 loyalty 제공.

Treatment retention vs Control retention 차이 = 인과 효과
차이 큼 → 인과 관계 establish
차이 작음 → 가설 무효, OEC 에서 retention proxy 로 loyalty 사용 안 함

이 인과 검증 없이 historical correlation 으로 OEC 정의하면 Lucas critique 함정.

3 왜 필요한가

세 법칙 인지 없이 OEC 설계 시.

Goodhart 함정 — OEC 사용 후 점진적 가치 약화. 발견까지 수년 소요.
Campbell 함정 — system 차원의 corruption. 발견 후 회복 어려움.
Lucas 함정 — historical correlation 기반 OEC 가 적용 후 깨짐.

세 법칙 인지 시.

OEC 정기 검증 + 갱신
Game 시도 monitoring
인과 검증 우선

4 응용 사례

4.1 YouTube 의 Goodhart → Satisfied Watchtime

Phase 1: Watchtime → sensational/clickbait 콘텐츠 알고리즘 추천 (Goodhart). Phase 2: User survey 로 watchtime 의 satisfaction 비율 측정 → game 시도 식별. Phase 3: Satisfied watchtime 으로 OEC 진화. survey + 행동 (skip, dislike) 가중.

이 진화는 Goodhart 효과의 발견 → 보정 패턴.

4.2 학교 ranking 의 Campbell 효과

NCLB 도입 후 미국 학교 시스템 변화. 시험 외 과목 축소가 사회 process corruption.

이후 정책 보정: 다중 metric (창의성, 협업 등) 도입. 그러나 system level 영향 회복 어려움.

4.3 Bing 의 Lucas 검증

Bing 사례 (Ch.7.2): historical 에서 distinct queries → revenue + 였지만 ranker bug 사례에서 distinct queries ↑ + 사용자 만족 ↓ (장기 이탈) 발견.

검증: A/A 테스트, segment 분석, long-term holdout 으로 distinct queries 의 진정 가치 재평가.

결론: distinct queries 단독 OEC 사용 안 함. Sessions per user (constraint 하) 로 진화.

5 예시 — Lucas Critique 시뮬레이션

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)

# Phase 1: OEC 가 hidden. correlation 측정.
# 가상의 historical data — metric A (CTR) 와 goal B (LTV) 의 correlation
n_users = 10_000
hidden_quality = rng.normal(0, 1, n_users)  # 진정 가치
ctr = 0.05 + 0.02 * hidden_quality + rng.normal(0, 0.005, n_users)
ltv = 100 + 30 * hidden_quality + rng.normal(0, 10, n_users)

corr_phase1 = np.corrcoef(ctr, ltv)[0, 1]
print(f"Phase 1 (CTR hidden): corr(CTR, LTV) = {corr_phase1:.3f}")

# Phase 2: CTR 가 OEC. 알고리즘이 CTR 추구 (clickbait).
# clickbait → CTR ↑, but hidden_quality 와의 관계 약화
ctr_with_clickbait = ctr + 0.015 * rng.uniform(0, 1, n_users)  # clickbait 추가
ltv_after_clickbait = ltv - 5 * rng.uniform(0, 1, n_users)  # 사용자 불만 → LTV ↓

corr_phase2 = np.corrcoef(ctr_with_clickbait, ltv_after_clickbait)[0, 1]
print(f"Phase 2 (CTR is OEC): corr(CTR, LTV) = {corr_phase2:.3f}")

# 결과: correlation 약화 = Lucas Critique
print(f"\nCorrelation 약화: {corr_phase1:.3f} → {corr_phase2:.3f}")
print("→ CTR 이 OEC 가 되면서 LTV 와의 인과 link 약화")

예상 출력 (시드 42).

Phase 1 (CTR hidden): corr(CTR, LTV) = 0.880
Phase 2 (CTR is OEC): corr(CTR, LTV) = 0.471

Correlation 약화: 0.880 → 0.471
→ CTR 이 OEC 가 되면서 LTV 와의 인과 link 약화

직관 — 시뮬레이션의 메시지

Phase 1 (CTR hidden): CTR ↔︎ LTV 강한 correlation (0.88). 사용자가 자연스러운 행동 (quality 가 좋으면 click) 을 보임.

Phase 2 (CTR is OEC + alg 가 CTR 추구): - Clickbait 추가로 CTR ↑ (low quality 도 click 유도) - 사용자 불만으로 LTV ↓ - 결과: CTR 과 LTV 의 correlation 약화 (0.47)

이 패턴이 Lucas Critique 의 정량 시뮬레이션. Historical correlation (0.88) 으로 CTR 을 OEC 채택하면 OEC 사용 후 correlation 이 0.47 로 약화. 즉 OEC 의 가치 (LTV proxy 로서) 가 약화.

해결책 (Ch.6.3, Ch.7.2 재인용):

Multi-metric ensemble — CTR + satisfied CTR + LTV-aware metric
정기 OEC 갱신 — Phase 2 detect 후 OEC 진화
인과 검증 — historical correlation 만이 아니라 인과 실험

OEC 설계는 일회성이 아니라 lifecycle. 세 법칙은 이 lifecycle 의 필요성을 증명한다.

6 Ch.7 시리즈 마무리

전 시리즈 (F7-0 ~ F7-2) 의 핵심 원칙.

Experimentation metric ≠ Business metric — 4 특성 (measurable, attributable, sensitive, timely)
5 개 한도 — Otis Redding 함정 + 다중 검정
다중 metric → 단일 OEC — 정규화 + 가중 합 + 4 시나리오 분류
Constraint 우선 — unconstrained metric 은 거의 게임화
Goodhart 인지 — measure → target 의 가치 약화
Campbell 인지 — system 차원 reshape
Lucas 인지 — historical correlation 의 fragility
OEC lifecycle — 정기 검증 + 진화 + 인과 검증

이 8 가지가 결합되어 OEC 가 단순 평가 도구가 아닌 전략 도구 로 작동.

7 관련 주제

선행 — Ch.7 시리즈

다음 챕터

F8-* — 제도적 기억 (Ch.8) — OEC 진화 추적

관련 챕터

F6-* — 조직 지표 + Gameability
F19-* — A/A Test (Ch.19) — Metric 자체 검증

다른 카테고리 연결

Causal Inference — 인과 검증의 일반 프레임
Statistics — Confounding — Self-selection bias
Strategy Frameworks — KPI 진화 거버넌스