9-week build · IR Phase 1 ship + bronze fully closed
Phase 1 (bronze staging, P1) closed end of W6 with PR #4177 merged and #158 hitting 54/54 parity pairs PASS. IR pipeline (#161) shipped as Phase 1 pure-SQL deterministic ER on PR #4178 (AC7 PASS · within ±2% band). SPCS Splink Phase 2 built end-to-end but blocked on instance-family availability. Canonical model body work (#175) can start once IR clusters are mounted by #175a.
Receiving team unchanged: Mike Grabbe + Adrian LeDoux (ET Data Engineering). Rich Thursday alignment call validated the Snowpark WH split and the DC ↔ Snowflake match parity. NTE 95% escalation threshold crossed this week.
What landed in W6
- 1Boston in-person ×2 (W6 D1 Mon). Rich canonical model sync — 3 decisions ratified (Global_Rewards ET-canonical · highest_attained ET-only · revenue upstream removal). SFMC strategy session with Vivek + Grabbe + Rich (1h): global business rules in Snowflake (15-18M → 1-2M), segments in git for AI translation, Prefect + Python activation, Fivetran Activations $80K/yr eval, OneTrust unsub redesign.
- 2RLT-3662 revenue removal + 5 RT-scaffold tickets (W6 D1). Revenue columns removed across 8 stg sales / orders models (commit
ee5cd2562on PR #4107, +23/-140, 3-layer QA PASS). 5 RT-scaffold tickets staged on canonical PR #4083: RLT-3969 RT2-OR Global_Rewards ×3 booleans · RLT-3976 RT2d last_year_hosted · RLT-3970 RT3s Lead_Source / Lead_Source_Detail / Utm_Source · RLT-3990 NBSP fix in seed (PR #4159) · RLT-3991 closed_lost_reason seed + macro (PR #4160, 171 rows × 4 cols × 7 BUs). All transitioned to Under Review. - 3#158 staging port completion across 5 PRs (W6 D2 Tue). 32 of 33 models opened as batch PRs:
#4176contacts (8 ports, 2,685 lines) ·#4177contact_points (9 ports, 1,333 lines) ·#4180variants (11 ports, 1,877 lines) ·#4179entity-specific (3 of 4) ·#4181HSEY order (final entity-specific). Plus #322 HSEY ORDER replication SOLVED end-to-end — US task created via Snowflake Scripting (ALTER TASK COPY_ROOT SUSPEND → CREATE OR REPLACE TASK COPY__SALESFORCE_HSEY__ORDER → RESUME); CH side via manual ALTER REPLICATION GROUP REFRESH. 168,508 rows synced US↔CH. PR #4181 ported 47-line placeholder → 156-line real port. 10 GH tickets closed (#155 / #156 / #138 / #140 / #225 / #293 / #294 / #296 / #300 / #318). - 4Parity validation framework PR
#324+ GH #323 (W6 D2). Python harness underscripts/parity/(connections, snapshots, compare, report, run_parity) per specspecs/build/parity-validation-framework.md(228 lines, ±0.1% tolerance, pinned snapshot timing). 4 entity-family configs. First 8 spot-check parity runs all PASS ±0.1% (CCAP × 4 entities deltas -0.0088 to -0.0142% · Academy sales_order -0.0016% · GY × 3 entities 0.0000% exact). Reports underdocs/parity-reports/spot-checks-2026-06-02/. - 5Spec #161 + IR Phase 1 macros PR
#4178+ IR Phase 2 scaffold PR#4185(W6 D2-D3). 207-line specspecs/build/161-splink-ir-pipeline.mdwith 10 ACs + 6-phase plan. PR #4178 = 3 normalize macros (normalize_email/normalize_phone/normalize_name, idempotent + null-safe). PR #4185 = Splink 4 SettingsCreator (171 lines · match rule per Rich 2026-05-06 · threshold 0.95) + 2 Python dbt models (predict + cluster) +int_ir__contacts_normalized.sql(230 lines · UNION ALL of 9 BU CTEs with normalize macros applied · ST EFSA carve-out per RLT-3577). - 6#4177 merged + #158 closed (W6 D4 Thu). PR
#4177(9 contact_point models) merged into main at 08:37 UTC as commit0870d7558after lint-fix cycle (405 violations · 400 auto-fixed · 5 manual AL03). dbt Cloud job 611029 auto-built intoEF_DBT_PROD.EF_DATA_HUB_CH_STAGINGat 08:38-08:39 UTC. 9 contact_point parity pairs verified all PASS ±0.1% (max delta -0.0385% Language phone · Fivetran lag). #158 closed with 54/54 parity pairs PASS. Reportdocs/parity-reports/post-merge-2026-06-04-contact-points/REPORT.md. - 7
EF_SPLINK_WHprovisioned +EF_DATA_HUB_RAW_TO_CH_RGidentified (W6 D4). Created Snowpark-optimized warehouse via Snowflake Scripting EXECUTE IMMEDIATE:SNOWPARK-OPTIMIZED MEDIUM,MEMORY_16X, auto_suspend 60s, INITIALLY_SUSPENDED. Grants: SYSADMIN / MCP_READER USAGE; EF_DBT_{DEV,QA,PROD}_RW USAGE; EF_DBT_PROD_RW OPERATE. 6 RLT tickets closed in Jira (RLT-3991 / RLT-3662 / RLT-3577 / RLT-3989 / RLT-3579 / RLT-3578) with evidence-cited Done transitions. Rich Thursday call: DC 9.0% ↔ Snowflake 8.97% match parity = 0.03pp delta. - 8IR Phase 1 SHIPS as pure-SQL deterministic ER (W6 D5 Fri) — AC7 PASS. After exhausting (1) dbt Python model in Snowflake UDF heap (~2-4GB cap, OOMed at every sample size including 10k) (2) Snowpark Container Services Splink runtime (built end-to-end, OOMed on Splink predict() at the 30Gi
CPU_X64_Lpool limit — only instance family available ineu-central-2) — pivoted to pure-SQL email-exact deterministic ER on PR#4178. Measured AC7: 15,182,794 unified IDs on 16,329,245 source rows = 6.93% consolidation. DC baseline 15,104,418 / 9.30%. Splink delta +0.52% from DC, well inside the ±2% AC7 band [14,802,330 - 15,406,506] → PASS. Per-run cost: $0 incremental (existing dbt WH). PR description + close-out comment posted with the validation table. - 9SPCS Phase 2 infrastructure built + escrowed (W6 D5).
scripts/splink-spcs/committed for handoff: Dockerfile (python:3.11-slim + Splink 4.0.16 + DuckDB 1.1.3 + sf-connector 3.13, amd64) ·run_splink_ir.pywith DC match rule (email exact + lastname exact + firstname JW @0.9/@0.8 fuzzy) + SPCS-OAuth + local-dev dual auth ·ac7_parity_check.pypost-run gate ·service_spec.yaml28Gi req / 30Gi limit ·deploy.sqlidempotent DDL · README. Image pushed toef_dbt_prod.ef_data_hub_ch_intermediate.splink_images / splink-runner@sha256:483ae744…. Runs 001→003 progressed past every soft blocker (OAuth ✓ · role/db wiring ✓ · full 16.3M data pull in 152s ✓ · Splink starts ✓) and OOMed duringpredict()at the 30Gi pool memory cap. Hard blocker:HIGH_MEMORY_X64_M(64GB) andHIGH_MEMORY_X64_L(128GB) instance families are not available ineu-central-2as of 2026-06-05. Specspecs/build/176-splink-runtime-pivot.mdclose-out section documents escalation paths.
Blocked-on-others
PRs #4083 (canonical scaffold + 4 RT macros), #4099 (4-BU accounts), #4106 (8-BU opportunities + Phase B macros), #4107 (8-BU sales_orders + Phase B macros), #4178 (IR Phase 1 pure-SQL ER + AC7 PASS), #4185 (IR Phase 2 Splink scaffold). PR #4177 merged Thursday unblocked #158 closure. Rich committed Thursday to follow up Adrien on the remaining queue.
SPCS Splink full DC-rule run blocked on instance family availability. CPU_X64_L (32GB) is the largest available in eu-central-2 and is insufficient for 16.3M-row Splink predict(). Need SF support ticket to enable HIGH_MEMORY_X64_M (64GB) or HIGH_MEMORY_X64_L (128GB) for account HH82036. Fallback paths if not approved: GH Actions ubuntu-latest-32-cores-128gb runner OR r6i.4xlarge EC2 one-shot — same Docker image, same script, same DC match rule.
Rich: ratify RLT-3972 reclassification before #175a canonical body materializes the field · confirm RLT-3990 NBSP fix · confirm RLT-3991 5 HSEY closed-lost codes. Vivek: kick off the sending-domain SF-side change (2-4 wk lead time). Boston Monday SFMC session confirmed AI translation of segments + Snowflake business rules as the architecture; awaiting Vivek's domain action.
$25 k NTE · spent vs remaining
NTE consumption
95% threshold CROSSED. Per CLAUDE.md rule, STOP + ESCALATE before any further new scope. Remaining headroom (~$1,090) is ~8 hours at full-time rate, ~7 hours at part-time. Realistic landing: canonical model bodies + CI build for the canonical individual entity covered by Phase D depend on #4083 merging + Adrien's review queue draining. Cutover (P5) + handoff hypercare (P6) are AT RISK on NTE if scope holds — re-amendment conversation recommended ahead of W7.
Scope reality vs $25 k NTE
| Bucket | SOW low (h) | Actual h | Actual $ | Status |
|---|---|---|---|---|
| P0 · M1 Foundation (W1) | 16 | 14.0 | $2,040 | Done |
| P1 · dbt #1 normalize + replication (W2-6) | 25 | 112.0 | $16,180 | Done · #158 closed 54/54 parity |
| P2 · IR Phase 1 SQL ER (W6) | 37 | 18.0 | $2,600 | AC7 PASS · PR #4178 |
| P3 · Canonical scaffold + RT extensions (W5-6) | 37 | ~5.0 | ~$700 | Scaffold + RT macros landed · bodies pending |
| P4 · SFMC + Marts (W7) | 56 | 0 | $0 | Upcoming · gated on segments decision |
| P5 · Cutover (W8) | 14 | 0 | $0 | Upcoming · AT RISK on NTE |
| P6 · Handoff + Hypercare (W9 + Jul) | 18 | 0 | $0 | Upcoming · AT RISK on NTE |
Trace: at 95% NTE end of W6, the engagement now needs explicit scope discussion. Options ahead of W7: (a) Amendment #3 for P4 / P5 / P6 funding · (b) explicit prioritisation among canonical bodies / CIs / SFMC / cutover to fit remaining $1.1k · (c) descope P6 hypercare. Phase 1 bronze + IR Phase 1 deliverables are complete and shippable as-is.
W7 (8 Jun – 12 Jun) · canonical bodies + scope conversation
- Canonical individual body · materialize
int_canonical__individualfromint_ir__splink_clusters+int_ir__unified_individual_membershipwith RT1/RT2/RT3s/RT4 survivorship across the 85 locked UI fields - Adrien queue drain · merge #4083 + #4099 + #4106 + #4107 + #4178 + #4185 → unlocks 4 RT-extension ticket closures + canonical body work
- Phase 1.1 polish on IR · placeholder-email blocklist expansion (
*@ef.comstaff +noreply@*+aaa@aaa.comfamily) → splits ~2,500 junk-cluster rows back to singletons - SPCS Phase 2 escalation · file Snowflake support ticket for HIGH_MEMORY pool family in eu-central-2; if denied, wire up GH Actions runner fallback
- Mike + Adrian + Rich · scope conversation given NTE 96% — Amendment #3 / re-prioritisation / descope decision
- Adrien · drain the 6-PR review queue (especially #4083 chain to unblock canonical bodies)
- Rich · ratify RLT-3972 reclassification · confirm RLT-3990 NBSP + RLT-3991 5 codes · weigh in on Phase 1.1 vs SPCS Phase 2 path for getting from 6.93% to DC's 9.30% consolidation rate
- Vivek · kick off sending-domain SF-side change · attend follow-up SFMC working session
- Grabbe · S3 bucket provisioning per
docs/integration/sfmc-s3-bucket-setup.md(6 numbered action items + 7 open questions)
Where to dig deeper
specs/build/161-splink-ir-pipeline.md— IR pipeline spec (10 ACs, 6 phases)specs/build/176-splink-runtime-pivot.md— close-out section (Phase 1 ship + Phase 2 SPCS infra + HIGH_MEM blocker)specs/build/parity-validation-framework.md— Python parity harnessdocs/sessions/2026-06-01-rich-canonical-model-sync.md— Boston canonical decisionsdocs/sessions/2026-06-01-sfmc-strategy-vivek-grabbe-rich.md— SFMC strategy sessiondocs/sessions/2026-06-04-rich-alignment-call.md— DC 9.0% ↔ Snowflake 8.97% parity confirmationdocs/parity-reports/post-merge-2026-06-04-contact-points/REPORT.md— 9 contact_point parity pairs (all PASS ±0.1%)
eftours/de-dbt#4178— IR Phase 1 pure-SQL ER (AC7 PASS) +scripts/splink-spcs/escrowed (Cortex review gate awaits data-engineering team approval)eftours/de-dbt#4185— IR Phase 2 Splink scaffold (executes when HIGH_MEM available)eftours/de-dbt#4177— 9 contact_point models (MERGED Thursday)eftours/de-dbt#4176·#4180·#4179·#4181— bronze layer completion (5 PRs total this week)eftours/de-dbt#324— parity validation frameworkeftours/de-dbt#4159·#4160— RLT-3990 NBSP + RLT-3991 closed_lost_reason- 10 GH issues closed Tuesday · 6 RLT tickets closed Thursday Jira