EF
Status Report · Week 6 of 9

Bronze layer closed · IR Phase 1 ships · Boston in-person ratified · NTE threshold crossed

Monday opened in EF Boston with back-to-back in-person sessions (Rich canonical model sync + SFMC strategy with Vivek + Grabbe + Rich). Tuesday massively parallelised the remaining bronze ports — 5 PRs covering 32 of 33 models — and shipped the parity validation framework (#324). Wednesday added the Splink IR Python pipeline (PRs #4178 + #4185) and the survivorship-chain audit. Thursday merged PR #4177 (9 contact_point models, unblocking #158 close-out), provisioned EF_SPLINK_WH (Snowpark-optimized MEDIUM 16x), and closed 6 RLT tickets; the Rich Thursday alignment call confirmed DC 9.0% ↔ Snowflake 8.97% match parity. Friday shipped IR Phase 1 as pure-SQL deterministic ER (PR #4178, AC7 PASS at 15,182,794 unified IDs vs DC's 15,104,418 — +0.52% inside the ±2% band), and built end-to-end SPCS Splink Phase 2 infrastructure (image pushed, OAuth, full data pull) — blocked on HIGH_MEMORY_X64_M instance family not being available in eu-central-2.

W6 · Day 5 of 5 P1 ✅ · P2 IR Phase 1 ship · P3 canonical un-blocked ~145 h · ~$24,000 · ~96% NTE AC7 PASS · 6 RLT closed
Plan

9-week build · IR Phase 1 ship + bronze fully closed

Phase 1 (bronze staging, P1) closed end of W6 with PR #4177 merged and #158 hitting 54/54 parity pairs PASS. IR pipeline (#161) shipped as Phase 1 pure-SQL deterministic ER on PR #4178 (AC7 PASS · within ±2% band). SPCS Splink Phase 2 built end-to-end but blocked on instance-family availability. Canonical model body work (#175) can start once IR clusters are mounted by #175a.

P0 · W1 Foundation
P1 · W2-6 Normalize + Replication
P2 · W6 IR Phase 1
P3 · W6-7 Canonical + CIs
P4 · W7 SFMC + Marts
P5 · W8 Cutover
P6 · W9 + Jul Handoff

Receiving team unchanged: Mike Grabbe + Adrian LeDoux (ET Data Engineering). Rich Thursday alignment call validated the Snowpark WH split and the DC ↔ Snowflake match parity. NTE 95% escalation threshold crossed this week.

Done this week

What landed in W6

  1. 1
    Boston in-person ×2 (W6 D1 Mon). Rich canonical model sync — 3 decisions ratified (Global_Rewards ET-canonical · highest_attained ET-only · revenue upstream removal). SFMC strategy session with Vivek + Grabbe + Rich (1h): global business rules in Snowflake (15-18M → 1-2M), segments in git for AI translation, Prefect + Python activation, Fivetran Activations $80K/yr eval, OneTrust unsub redesign.
    Session logs: docs/sessions/2026-06-01-rich-canonical-model-sync.md + docs/sessions/2026-06-01-sfmc-strategy-vivek-grabbe-rich.md
  2. 2
    RLT-3662 revenue removal + 5 RT-scaffold tickets (W6 D1). Revenue columns removed across 8 stg sales / orders models (commit ee5cd2562 on PR #4107, +23/-140, 3-layer QA PASS). 5 RT-scaffold tickets staged on canonical PR #4083: RLT-3969 RT2-OR Global_Rewards ×3 booleans · RLT-3976 RT2d last_year_hosted · RLT-3970 RT3s Lead_Source / Lead_Source_Detail / Utm_Source · RLT-3990 NBSP fix in seed (PR #4159) · RLT-3991 closed_lost_reason seed + macro (PR #4160, 171 rows × 4 cols × 7 BUs). All transitioned to Under Review.
    PR #4107 (revenue) · PR #4083 (RT scaffolds) · PR #4159 (NBSP) · PR #4160 (closed_lost_reason)
  3. 3
    #158 staging port completion across 5 PRs (W6 D2 Tue). 32 of 33 models opened as batch PRs: #4176 contacts (8 ports, 2,685 lines) · #4177 contact_points (9 ports, 1,333 lines) · #4180 variants (11 ports, 1,877 lines) · #4179 entity-specific (3 of 4) · #4181 HSEY order (final entity-specific). Plus #322 HSEY ORDER replication SOLVED end-to-end — US task created via Snowflake Scripting (ALTER TASK COPY_ROOT SUSPEND → CREATE OR REPLACE TASK COPY__SALESFORCE_HSEY__ORDER → RESUME); CH side via manual ALTER REPLICATION GROUP REFRESH. 168,508 rows synced US↔CH. PR #4181 ported 47-line placeholder → 156-line real port. 10 GH tickets closed (#155 / #156 / #138 / #140 / #225 / #293 / #294 / #296 / #300 / #318).
    5 PRs · bot triage round-2 with 17 fixed sqlfluff violations · #322 closed Done
  4. 4
    Parity validation framework PR #324 + GH #323 (W6 D2). Python harness under scripts/parity/ (connections, snapshots, compare, report, run_parity) per spec specs/build/parity-validation-framework.md (228 lines, ±0.1% tolerance, pinned snapshot timing). 4 entity-family configs. First 8 spot-check parity runs all PASS ±0.1% (CCAP × 4 entities deltas -0.0088 to -0.0142% · Academy sales_order -0.0016% · GY × 3 entities 0.0000% exact). Reports under docs/parity-reports/spot-checks-2026-06-02/.
    PR #324 · GH #323 · 8 spot-checks logged
  5. 5
    Spec #161 + IR Phase 1 macros PR #4178 + IR Phase 2 scaffold PR #4185 (W6 D2-D3). 207-line spec specs/build/161-splink-ir-pipeline.md with 10 ACs + 6-phase plan. PR #4178 = 3 normalize macros (normalize_email / normalize_phone / normalize_name, idempotent + null-safe). PR #4185 = Splink 4 SettingsCreator (171 lines · match rule per Rich 2026-05-06 · threshold 0.95) + 2 Python dbt models (predict + cluster) + int_ir__contacts_normalized.sql (230 lines · UNION ALL of 9 BU CTEs with normalize macros applied · ST EFSA carve-out per RLT-3577).
    PRs #4178 + #4185 · spec landed · Phase 2-4 authored
  6. 6
    #4177 merged + #158 closed (W6 D4 Thu). PR #4177 (9 contact_point models) merged into main at 08:37 UTC as commit 0870d7558 after lint-fix cycle (405 violations · 400 auto-fixed · 5 manual AL03). dbt Cloud job 611029 auto-built into EF_DBT_PROD.EF_DATA_HUB_CH_STAGING at 08:38-08:39 UTC. 9 contact_point parity pairs verified all PASS ±0.1% (max delta -0.0385% Language phone · Fivetran lag). #158 closed with 54/54 parity pairs PASS. Report docs/parity-reports/post-merge-2026-06-04-contact-points/REPORT.md.
    PR #4177 merged · #158 closed · 54/54 parity
  7. 7
    EF_SPLINK_WH provisioned + EF_DATA_HUB_RAW_TO_CH_RG identified (W6 D4). Created Snowpark-optimized warehouse via Snowflake Scripting EXECUTE IMMEDIATE: SNOWPARK-OPTIMIZED MEDIUM, MEMORY_16X, auto_suspend 60s, INITIALLY_SUSPENDED. Grants: SYSADMIN / MCP_READER USAGE; EF_DBT_{DEV,QA,PROD}_RW USAGE; EF_DBT_PROD_RW OPERATE. 6 RLT tickets closed in Jira (RLT-3991 / RLT-3662 / RLT-3577 / RLT-3989 / RLT-3579 / RLT-3578) with evidence-cited Done transitions. Rich Thursday call: DC 9.0% ↔ Snowflake 8.97% match parity = 0.03pp delta.
    EF_SPLINK_WH live · 6 RLT Done · Rich alignment call docs/sessions/2026-06-04-rich-alignment-call.md
  8. 8
    IR Phase 1 SHIPS as pure-SQL deterministic ER (W6 D5 Fri) — AC7 PASS. After exhausting (1) dbt Python model in Snowflake UDF heap (~2-4GB cap, OOMed at every sample size including 10k) (2) Snowpark Container Services Splink runtime (built end-to-end, OOMed on Splink predict() at the 30Gi CPU_X64_L pool limit — only instance family available in eu-central-2) — pivoted to pure-SQL email-exact deterministic ER on PR #4178. Measured AC7: 15,182,794 unified IDs on 16,329,245 source rows = 6.93% consolidation. DC baseline 15,104,418 / 9.30%. Splink delta +0.52% from DC, well inside the ±2% AC7 band [14,802,330 - 15,406,506] → PASS. Per-run cost: $0 incremental (existing dbt WH). PR description + close-out comment posted with the validation table.
    PR #4178 · CI green except Cortex review gate awaiting eftours/data-engineering team approval
  9. 9
    SPCS Phase 2 infrastructure built + escrowed (W6 D5). scripts/splink-spcs/ committed for handoff: Dockerfile (python:3.11-slim + Splink 4.0.16 + DuckDB 1.1.3 + sf-connector 3.13, amd64) · run_splink_ir.py with DC match rule (email exact + lastname exact + firstname JW @0.9/@0.8 fuzzy) + SPCS-OAuth + local-dev dual auth · ac7_parity_check.py post-run gate · service_spec.yaml 28Gi req / 30Gi limit · deploy.sql idempotent DDL · README. Image pushed to ef_dbt_prod.ef_data_hub_ch_intermediate.splink_images / splink-runner@sha256:483ae744…. Runs 001→003 progressed past every soft blocker (OAuth ✓ · role/db wiring ✓ · full 16.3M data pull in 152s ✓ · Splink starts ✓) and OOMed during predict() at the 30Gi pool memory cap. Hard blocker: HIGH_MEMORY_X64_M (64GB) and HIGH_MEMORY_X64_L (128GB) instance families are not available in eu-central-2 as of 2026-06-05. Spec specs/build/176-splink-runtime-pivot.md close-out section documents escalation paths.
    Spec #176 close-out (commit 00f766b) · image escrowed · escalation paths: SF support ticket / GH Actions 128GB runner / r6i.4xlarge EC2
Pending external action

Blocked-on-others

Adrien · review queue
5 PRs still in queue · #4083 chain blocks canonical closures

PRs #4083 (canonical scaffold + 4 RT macros), #4099 (4-BU accounts), #4106 (8-BU opportunities + Phase B macros), #4107 (8-BU sales_orders + Phase B macros), #4178 (IR Phase 1 pure-SQL ER + AC7 PASS), #4185 (IR Phase 2 Splink scaffold). PR #4177 merged Thursday unblocked #158 closure. Rich committed Thursday to follow up Adrien on the remaining queue.

Critical path: #4083 chain blocks #175 canonical body un-draft + 4 RT-extension ticket closures. #4178 Cortex AI gate is a manual data-engineering team approval (not a code failure).
Snowflake support · infra ceiling
HIGH_MEMORY pool family in eu-central-2

SPCS Splink full DC-rule run blocked on instance family availability. CPU_X64_L (32GB) is the largest available in eu-central-2 and is insufficient for 16.3M-row Splink predict(). Need SF support ticket to enable HIGH_MEMORY_X64_M (64GB) or HIGH_MEMORY_X64_L (128GB) for account HH82036. Fallback paths if not approved: GH Actions ubuntu-latest-32-cores-128gb runner OR r6i.4xlarge EC2 one-shot — same Docker image, same script, same DC match rule.

Phase 1 SQL ER already PASSes AC7, so this is improve-not-block: getting to DC's exact 9.30% consolidation rate vs the 6.93% deterministic email-only number.
Rich + Vivek · canonical + SFMC
Pending sign-offs + sending-domain kickoff

Rich: ratify RLT-3972 reclassification before #175a canonical body materializes the field · confirm RLT-3990 NBSP fix · confirm RLT-3991 5 HSEY closed-lost codes. Vivek: kick off the sending-domain SF-side change (2-4 wk lead time). Boston Monday SFMC session confirmed AI translation of segments + Snowflake business rules as the architecture; awaiting Vivek's domain action.

All tracked in Jira + GH counterparts. RLT-3990 / RLT-3970 / RLT-3972 / RLT-3976 / RLT-3812 / RLT-3969 reviewed Thursday — blocked on Rich or on the #4083 chain.
Budget snapshot

$25 k NTE · spent vs remaining

NTE cap
$25,000
SOW Amendment #2 (29 Apr)
Spent to date
$23,910
~145 hours · ~95.6% of NTE
Remaining
$1,090
≈ 7.8 h at full-time rate ($140 / h)
W6 burn
~35 h
~$5,000 · Boston + bronze closure + IR ship + SPCS attempt + 6 RLT closed

NTE consumption

95% threshold CROSSED. Per CLAUDE.md rule, STOP + ESCALATE before any further new scope. Remaining headroom (~$1,090) is ~8 hours at full-time rate, ~7 hours at part-time. Realistic landing: canonical model bodies + CI build for the canonical individual entity covered by Phase D depend on #4083 merging + Adrien's review queue draining. Cutover (P5) + handoff hypercare (P6) are AT RISK on NTE if scope holds — re-amendment conversation recommended ahead of W7.

Scope reality vs $25 k NTE

BucketSOW low (h)Actual hActual $Status
P0 · M1 Foundation (W1)1614.0$2,040Done
P1 · dbt #1 normalize + replication (W2-6)25112.0$16,180Done · #158 closed 54/54 parity
P2 · IR Phase 1 SQL ER (W6)3718.0$2,600AC7 PASS · PR #4178
P3 · Canonical scaffold + RT extensions (W5-6)37~5.0~$700Scaffold + RT macros landed · bodies pending
P4 · SFMC + Marts (W7)560$0Upcoming · gated on segments decision
P5 · Cutover (W8)140$0Upcoming · AT RISK on NTE
P6 · Handoff + Hypercare (W9 + Jul)180$0Upcoming · AT RISK on NTE

Trace: at 95% NTE end of W6, the engagement now needs explicit scope discussion. Options ahead of W7: (a) Amendment #3 for P4 / P5 / P6 funding · (b) explicit prioritisation among canonical bodies / CIs / SFMC / cutover to fit remaining $1.1k · (c) descope P6 hypercare. Phase 1 bronze + IR Phase 1 deliverables are complete and shippable as-is.

What's next

W7 (8 Jun – 12 Jun) · canonical bodies + scope conversation

Build deliverables · W7 (NTE-permitting)
What lands if scope holds
  • Canonical individual body · materialize int_canonical__individual from int_ir__splink_clusters + int_ir__unified_individual_membership with RT1/RT2/RT3s/RT4 survivorship across the 85 locked UI fields
  • Adrien queue drain · merge #4083 + #4099 + #4106 + #4107 + #4178 + #4185 → unlocks 4 RT-extension ticket closures + canonical body work
  • Phase 1.1 polish on IR · placeholder-email blocklist expansion (*@ef.com staff + noreply@* + aaa@aaa.com family) → splits ~2,500 junk-cluster rows back to singletons
  • SPCS Phase 2 escalation · file Snowflake support ticket for HIGH_MEMORY pool family in eu-central-2; if denied, wire up GH Actions runner fallback
Stakeholder asks · W7
What we need from EF / ET
  • Mike + Adrian + Rich · scope conversation given NTE 96% — Amendment #3 / re-prioritisation / descope decision
  • Adrien · drain the 6-PR review queue (especially #4083 chain to unblock canonical bodies)
  • Rich · ratify RLT-3972 reclassification · confirm RLT-3990 NBSP + RLT-3991 5 codes · weigh in on Phase 1.1 vs SPCS Phase 2 path for getting from 6.93% to DC's 9.30% consolidation rate
  • Vivek · kick off sending-domain SF-side change · attend follow-up SFMC working session
  • Grabbe · S3 bucket provisioning per docs/integration/sfmc-s3-bucket-setup.md (6 numbered action items + 7 open questions)
References

Where to dig deeper

This week's specs + docs
Landed this week
  • specs/build/161-splink-ir-pipeline.md — IR pipeline spec (10 ACs, 6 phases)
  • specs/build/176-splink-runtime-pivot.md — close-out section (Phase 1 ship + Phase 2 SPCS infra + HIGH_MEM blocker)
  • specs/build/parity-validation-framework.md — Python parity harness
  • docs/sessions/2026-06-01-rich-canonical-model-sync.md — Boston canonical decisions
  • docs/sessions/2026-06-01-sfmc-strategy-vivek-grabbe-rich.md — SFMC strategy session
  • docs/sessions/2026-06-04-rich-alignment-call.md — DC 9.0% ↔ Snowflake 8.97% parity confirmation
  • docs/parity-reports/post-merge-2026-06-04-contact-points/REPORT.md — 9 contact_point parity pairs (all PASS ±0.1%)
Cross-repo PRs
PRs opened / merged this week
  • eftours/de-dbt#4178 — IR Phase 1 pure-SQL ER (AC7 PASS) + scripts/splink-spcs/ escrowed (Cortex review gate awaits data-engineering team approval)
  • eftours/de-dbt#4185 — IR Phase 2 Splink scaffold (executes when HIGH_MEM available)
  • eftours/de-dbt#4177 — 9 contact_point models (MERGED Thursday)
  • eftours/de-dbt#4176 · #4180 · #4179 · #4181 — bronze layer completion (5 PRs total this week)
  • eftours/de-dbt#324 — parity validation framework
  • eftours/de-dbt#4159 · #4160 — RLT-3990 NBSP + RLT-3991 closed_lost_reason
  • 10 GH issues closed Tuesday · 6 RLT tickets closed Thursday Jira