EF
Status Report · Week 4 of 9

From placeholders to real per-BU staging models · canonical design locked

The bronze layer is now real. Monday landed the bootstrap PR eftours/de-dbt#3868 + Phase B seeds (16-cat lead_source_mappings, 82-row language_iso_mappings, normalize macros). Wednesday the first real Phase C model shipped — stg_ccap__contacts ports the legacy CCAP_CONTACTS_V end-to-end (878,903 rows, lead-source normalized + survivorship clock). Thursday extended the same pattern to the other 7 BUs (52 M rows total in DEV). Friday morning Rich returned his final decisions on the 17 ambiguous fields from the DC-direct verification: canonical model #175a now design-locked at 85 active fields. Adrien approved 2 of 3 PRs this morning (sources refactor + CCAP pilot); 7-BU extension still pending review.

W4 · Day 5 of 5 P1 close-out · P3 design lock 95.0 h · $13,640 · 54.6% NTE 2 of 3 PRs approved · 1 awaiting review
Plan

9-week build · P1 close-out + P3 design lock

The replication pipeline keeps running every night. P1 (bronze staging) is closing out: bootstrap PR merged Monday, Phase B seeds + macros landed, and all 8 BU contacts staging models materialize in DEV against real data. Rich's reconciliation review closed Friday morning — the canonical model field list is now locked at 85 fields, and #175a (P3a canonical) is ready to start once the 3 staging PRs merge. Splink IR (P2) build window is the next workstream, in parallel to the canonical build.

P0 · W1 Foundation
P1 · W2-4 Normalize + Replication
P2 · W5-6 Splink IR
P3 · W5-6 Canonical + CIs
P4 · W6-7 SFMC + Marts
P5 · W8 Cutover
P6 · W9 + Jul Handoff

Receiving team unchanged: Mike Grabbe + Adrian LeDoux (ET Data Engineering). Rich transitioning out — weekly standups continue; this report keeps you in the loop.

Three milestones this week

W4 cleared the critical-path bottleneck

Two long-running blockers cleared this week — Rich's reconciliation review came back, and Adrien approved the first two of three staging PRs. Plus the structural verification of the CH ↔ US replication confirmed the pipeline is healthy across all 49 source tables (rows + columns + bytes identical). Net: canonical build (#175a) can start, blocked only on the third PR review.

Milestone 1 · field decisions
Rich locked canonical at 85 fields

Friday morning Rich returned the DC-direct verification XLS with his final decisions on the 17 ambiguous fields:

  • 6 fields Drop (gone from canonical + staging)
  • 1 field Keep in canonical (Contact_Restriction_Code · ET+EFSA scope)
  • 2 fields Keep on Source only (per-BU staging, not promoted to unified): Departure_Year (HSEY), Is_Primary_Contact (CCAP+Academy)
  • 1 field Use for CI: Is_Currently_Hosting (cross-BU CCAP+HSEY)
  • 7 ghost fields Derive in DBT, keep on source only — Rich chose to derive them INTO the canonical from CCAP source (the only BU with the data) rather than deploy as DMO columns. No DC metadata changes needed; passthrough survivorship since CCAP is the only source.

Canonical model #175a active UI fields: 89 → 85 (77 implicit Keep + 1 explicit Keep + 7 ghosts derived from CCAP). Design-locked.

Milestone 2 · code review
Adrien approved 2 of 3 staging PRs

Friday morning Adrien approved both foundational PRs:

  • #3989 · sources refactor — splits monolithic sources.yml into per-product files inside models/staging/<product>/
  • #4027 · CCAP Phase C pilot — first real stg_ccap__contacts (878,903 rows, lead-source normalized, survivorship clock wired)

The 7-BU extension PR (#4059) is still pending Adrien's review — gentle follow-up sent. Likely waits until #4027 lands so review happens against the merged base.

Milestone 3 · pipeline audit
CH = US 100% across 49 tables

Comprehensive sweep across every registered source table for the dbt project — verified row count, column count, and storage bytes are identical between US consolidation (EF_DATA_HUB_RAW) and CH replica:

  • 49 tables across 9 schemas (SF orgs, shared DBs, Juno Kafka, NeverBounce)
  • 285 M+ rows total · all match exactly
  • Original Fivetran sources sit ~0.01-0.02% above CH (nightly task lag — expected)

Replication group is healthy. No structural drift, no missing tables.

Done this week

What landed in W4

  1. 1
    Bootstrap PR eftours/de-dbt#3868 merged Monday (W4 D1). The vanilla-audit restructured project (8 → 4 vanilla schemas, UPPERCASE prefixed, sources outside models/) is now on main. Pass 3c DDL executed on HH82036 — dropped the 8 obsolete unprefixed schemas across DEV/QA/PROD (24 schemas total). The cross-repo project surface is now clean and Phase B work moves forward on top of the merged base.
    Closes #143 + #155 + #158 Phase A · PR #3868 also brought Phase B seeds (16-cat lead_source_mappings · 82-row language_iso_mappings) and 8 normalize macros (lead_source, language_iso, gender, geo, age, contactable, etc.) live in PROD
  2. 2
    Phase C pilot — stg_ccap__contacts ported end-to-end (W4 D3). First real per-BU staging model. UNION of Contact (host-family record type, non-au-pair, US-only) + Lead (US, non-deleted, not-yet-converted), LEFT JOIN to Account for is_primary_contact, full utm_source normalization (~20 mappings preserved from legacy), lead_source_normalized via seed JOIN, survivorship_ts macro wired. Materializes 878,903 rows in DEV (346,778 Contact + 532,125 Lead) with 81.2% seed coverage on lead_source_normalized.
    PR eftours/de-dbt#4027 · CI green · Adrien-approved Friday
  3. 3
    7 BU staging contacts extended (W4 D4) — same pattern applied to the other 7 BUs. Real ports for academy (PersonAccount + Contact-Guardian UNION, 2.77 M rows), hsey (Contact + Lead UNION, 1.59 M), language (Juno Kafka source, 33.4 M), student_tours (data_hub_share, 5.79 M), study_abroad (PLB share, 16.1 k), gap_year (PLB share, 132 k), world_journeys (multi-tenant via BUSINESS_CODE, 8.69 M). Each materializes ≤ 2.4 s in DEV. Row counts reconcile to filter logic to the exact row — verified independently via raw-source count queries.
    PR eftours/de-dbt#4059 · Ready for Review since Thursday EOD · awaiting Adrien (was draft when he reviewed the others on Friday AM)
  4. 4
    Sources refactor PR (#3989) Adrien-approved. Splits the monolithic sources/sources.yml (~181 lines, 9 sources in one file) into per-product _<product>__sources.yml files inside each models/staging/<product>/ folder, aligning with the dbt Labs 1.12 best practice Adrien flagged on May 13. Bonus fix on a missing Fivetran freshness config for salesforce_ccap. No new logic — just organization.
    Approved Friday AM (post Friday morning re-nudge) · awaits merge
  5. 5
    Rich's reconciliation review closed Friday morning — canonical at 78 fields. Rich returned the DC-direct verification XLS with his decisions on all 17 ambiguous fields: 6 Drop · 1 Keep · 2 Keep-on-Source-only · 1 Use-for-CI · 7 ghost fields Derive-in-DBT (rather than deploy as DMO columns). No DC metadata changes needed. Canonical model #175a active field count: 89 → 78. Design-locked.
    Reply saved: docs/sf-analysis/unifiedindividual-fillrate-rich-final-decisions-2026-05-22.csv · Summary: docs/sf-analysis/unifiedindividual-rich-final-decisions-summary-2026-05-22.md
  6. 6
    DC-direct verification + 49-table CH ↔ US audit. Earlier in the week (W4 D3-D4), did a full DC-direct verification of the 17 ambiguous fields against UnifiedIndividual__dlm in datahub_prod (15.09 M rows) — including the surprise finding that the 7 "ghost" fields don't actually exist on the DMO (PROPOSED in dataCloudMappings/ + recon rules but never deployed). Then verified the CH-side replica matches US consolidation 100% on row count, column count, AND storage bytes across all 49 source tables (285 M+ rows total). Original Fivetran sources sit ~0.01-0.02% above CH due to nightly task lag — expected.
    PR #317 (Rich XLS) merged W4 D4 · RLT-3812 commented with the per-BU replication match table
Pipeline + bronze layer live

What's running every night + what's now built on top

Source tables on CH
49
across 9 schemas in EF_DATA_HUB_RAW · CH = US 100% match (rows + cols + bytes)
Staging contacts models · real
8 / 8
all 8 BUs ported · 52 M+ rows materialized in DEV (1 in PROD post #4027 merge, +7 post #4059)
Total dbt models in DAG
57
8 contacts (real) + ~49 placeholders for other entities · CI-validated

Nightly schedule (UTC)

TimeStepOwner
00:00Root task COPY_ROOT fires; 48 children copy from source DBs into EF_DATA_HUB_RAW on USSnowflake task graph (autonomous)
~00:30All 48 children complete · row counts loggedSnowflake
01:00Replication group EF_DATA_HUB_RAW_TO_CH_RG refresh · pulls deltas to CHSnowflake replication
~01:30+Available on HH82036 · ready for dbt build

Schemas + sizes (CH side)

SchemaTablesRowsSizeSource
data_hub_share745.6 M8.07 GBET / Student Tours share
language_kafka9209.6 M35.20 GBEU Language (Juno + Poseidon Kafka)
wojo311.5 M0.65 GBWorld Journeys share
salesforce_ccap78.6 M1.59 GBCCAP Fivetran landing
salesforce_academy84.6 M0.50 GBAcademy Fivetran landing
salesforce_hsey62.9 M0.36 GBHSEY Fivetran landing
ef_us_uploads10.75 M0.01 GBNeverBounce upload (audit pending)
higher_ed_gy40.16 M0.02 GBHigher Ed share — Gap Year
higher_ed_sa40.18 M0.02 GBHigher Ed share — Study Abroad

EVALUATION_QUESTION_RESPONSES_V from W2 has been on CH for a week now — included in the 49 source count above.

Schema surface on HH82036 (post Pass 3c · obsolete schemas dropped W4 D3)

DatabaseSchemaLayerCurrent content
EF_DBT_DEVEF_DATA_HUB_CH_STAGINGbronze8 real stg_<bu>__contacts models · 49 placeholders for other entities (opportunities, sales_orders, contact_point_*) · all CI-validated
EF_DATA_HUB_CH_INTERMEDIATEsilverEmpty · canonical model (#175a · 85 UI fields locked) + Splink IR (#161) land here
EF_DATA_HUB_CH_COREsilverEmpty · ~37 CIs (#166) + segments (#142)
EF_DATA_HUB_CH_MARTSgoldEmpty · Cortex agent feeds · cross-regional sharing · BI consumers (#224)
Same 4-schema set mirrored in EF_DBT_QA + EF_DBT_PROD (12 total). The 8 obsolete unprefixed schemas were dropped W4 D3.
Board state

Where each in-flight ticket landed

Closed this week

TicketTitleWhy closed
#143CH dbt home — Pass 3 finalPass 3c executed W4 D3 — 8 obsolete schemas dropped across DEV/QA/PROD (24 total)
#155Bootstrap single dbt project ef_data_hub_chPR #3868 merged W4 D1 via merge queue
#225Pre-spec audit · reconciliation rules + IR + field normalizationRich's final decisions on 17 ? fields received Friday morning · canonical design-locked at 85 UI fields
#158 Phase AStaging placeholders for 8 productsSuperseded by Phase C — 8 BU staging contacts ports landed real (in PRs #4027 + #4059)

Now ready for the build · pending PR merges only

Phase 1 · close-out · ready
#158 Phase C — 8 BU contacts ports + per-product sources refactor

PR #4027 (CCAP exemplar · 878 k rows) and PR #3989 (sources refactor) both Adrien-approved Friday morning. PR #4059 (the 7 other BUs · 52 M rows) ready for review since Thursday EOD; promotion from DRAFT happened after Adrien's first review pass, so the next pass will pick it up. Together they close P1 bronze.

Gates on: Adrien merging #3989 + #4027 (already approved) + reviewing/approving #4059
Phase 3 · build · canonical
#175a — canonical model (85 UI fields locked)

Field list locked at 85 UI fields after Rich's Friday decisions: 77 implicit Keep + 1 explicit Keep (Contact_Restriction_Code) + 7 ghosts derived from CCAP source INTO canonical (Rich's option (b) — derive in dbt rather than deploy DMO column). The 6 Drops, 2 Keep-on-Source-only, and 1 Use-for-CI are out. Spec refreshed today. Build commits can start against DEV stg views (no longer gated on the staging PRs for design — just for materialization in PROD).

Gates on: spec #175a refresh · staging PR merges for PROD ref()

Still pending or stakeholder-blocked

  • PR #4059Adrien review (was draft when he reviewed the other two on Friday AM; gentle re-nudge sent, expecting Monday)
  • #142 — Segments reframed: MC primary / Snowflake Plan B; Vivek call gates the methodology
  • #161 + #162 — Splink IR + parity QA (incorporate Amon's modified 10-rule design from the parallel CH workstream's SF PS session); next workstream after canonical build kickoff
  • #224 — Marts sublayer (Cortex / cross-regional / BI consumers); spec write pending after #158 Phase C fully merges
  • #136-#140 — RLT-* normalize tickets: tracked as follow-up enrichments on the per-BU staging models once #4059 merges (nationality maps, host family aggregation, contact_origin_bu split, etc.)
  • RLT-3991 — 5 HSEY closed-lost codes (CN/CY/EF/APP/INV → "Other"): Rich to confirm
Budget snapshot

$25 k NTE · spent vs remaining

NTE cap
$25,000
SOW Amendment #2 (29 Apr)
Spent to date
$13,640
95.0 hours · 54.6% of NTE
Remaining
$11,360
≈ 81 h at full-time rate ($140 / h)
W4 burn
18 h
$2,560 · P1 close-out · Phase C ports · CH↔US audit

NTE consumption

54.6% consumed end of W4. Trajectory still healthy: P0 + P1 done in 4 active weeks with ~45% of the cap reserved for P2 (Splink IR) through P6 (handoff). EF-engagement work only — the parallel ohanacloud-CH workstream (IR design with SF PS) is tracked separately and not billed against this NTE.

How this is calculated

Three rate tiers per SOW Amendment §4 (depending on day intensity):

Day intensityRateApplied to
Full-time (> 4 h/day)$140 / hMost active days · ~83 h × $140 ≈ $11,620
Part-time (2.1–4 h/day)$160 / h3 days · ~10 h × $160 ≈ $1,600
Ad-hoc (≤ 2 h/day)$180 / h3 days · ~2-4 h × $180 = $360-720

Total: 95 h · $13,640. Detailed daily breakdown lives in sprint-log.md.

Phase status

PhaseSOW low (h)Actual hActual $Status
P0 · M1 Foundation & access (W1)1614.0$2,040Done
P1 · dbt #1 normalize + replication (W2-4)2578.0$11,120Closing · 312% h · 278% $
P2 · Splink IR (W5-6)370$0Upcoming
P3 · Canonical + CIs (W5-6)373.0$480Design locked · build pending PR merges
P4 · SFMC + Marts (W6-7)560$0Upcoming
P5 · Cutover (W8)140$0Upcoming
P6 · Handoff + Hypercare (W9 + Jul)180$0Upcoming

P1 closed at 78 h vs 25-h SOW estimate (over-run absorbed). Phase scope materially expanded over W2-W4: #143 + #159 + #155 bootstrap + vanilla-audit restructure + CI debug + Pass 3 DDL + Phase B seeds + Phase C real ports for all 8 BUs + CH↔US 49-table audit. Cumulative NTE at 54.6% leaves ~$11.4 k for P2 → P6. Pacing P5/P6 conservatively; no escalation needed.

What's next

W5 (26 May – 1 Jun) · canonical build kickoff · IR design intake

P3a · canonical_individual model build

With the design locked at 78 fields and 8 BU staging contacts views materialized in DEV, the canonical model build is the natural next bite. canonical_individual sits in EF_DATA_HUB_CH_INTERMEDIATE and references the 8 per-BU staging views through the survivorship pattern. Splink IR (#161) work runs in parallel — the rule design from the parallel CH workstream's SF PS sessions can start integrating here.

Build deliverables · W5
What lands by Friday 29 May
  • Spec #175a refresh · 78-field canonical list locked with Rich's 4 decision categories documented
  • Canonical scaffold · canonical_individual.sql in models/intermediate/canonical/ reading from ref('stg_<bu>__contacts') × 8
  • Survivorship CTE pattern · 6-rule-type framework wired (RT1 source-priority → RT6 aggregate) classified across the 78 fields
  • 3 BU contacts enrichments · biggest-impact TODOs landed: HSEY nationality→ISO2 + host-family aggregation · Academy nationality + guardian dedup · Student Tours contact_origin_bu split
  • 3 Google_Analytics_* cols · added to stg_ccap__contacts (Rich's "derive in DBT" decision)
  • PR #4059 merged (assumes Adrien reviews early-week)
Stakeholder asks · W5
What we need from EF / ET
  • Adrien — review + merge PR #4059 (was draft when he checked the other two on Friday AM; re-nudge sent)
  • Rich — confirm 5 HSEY closed-lost codes for RLT-3991 (CN/CY/EF/APP/INV → "Other") to close that ticket
  • Vivek (MC) — still gating #142 segments methodology call
  • Mike / Adrian — NeverBounce source status confirmation (5+ months stale)
  • Mike — sanity check on first 4 weeks of replication credits to size the cost line

Looking past W5 — phase trajectory

WindowPhaseWhat gets built
W5-6 (26 May - 8 Jun)P2 + P3aCanonical model (78-field canonical_individual with 6-RT survivorship CTE pattern) · Splink IR pipeline kickoff (blocking, fuzzy, graph resolution) · staging BU enrichments (nationality maps, host-family aggregation, contact_origin_bu split)
W6-7 (3-16 Jun)P3b + P4~37 CIs (including new ci__is_currently_hosting from Rich's decision) · IR parity QA vs DC's UnifiedIndividual · marts pre-aggregated for Cortex agent · Segments (MC primary or Snowflake fallback) · SFMC engagement via S3
W7-8 (10-23 Jun)P4 + P5Marketing Cloud integration · CloudPage UI · cutover runbook · parallel-run reconciliation report · Prefect orchestration · DC schedule disable plan
W9 + JulP6Handoff to Mike Grabbe + Adrian LeDoux · operational runbook · KT sessions · 2-cycle hypercare · formal sign-off checklist

Critical path now: PR #4059 merge → #175a canonical_individual → #161 Splink IR (parallel) → #166 CIs → #142 segments → cutover. The 3 open PRs are the last bronze-layer gate.

Reference material

Where to dig deeper

Engagement docs
Plan + commercial baseline
  • Migration plan deck + Architecture options + Structure proposal + Vanilla audit deck
  • SOW Amendment #2 (signed 29 Apr 2026) — NTE $25 k, T&M capped, Net 30
  • Sprint log — weekly hours and budget tracker
This week's deliverables
Audits + decisions captured
  • docs/sf-analysis/unifiedindividual-fillrate-rich-final-decisions-2026-05-22.csv — Rich's reply with the 17 ? field decisions
  • docs/sf-analysis/unifiedindividual-rich-final-decisions-summary-2026-05-22.md — per-category breakdown + downstream actions
  • docs/sf-analysis/unifiedindividual-fillrate-for-rich-2026-05-21-dc-verified.xlsx — the DC-direct verification XLS shared with Rich on Wed
  • Sprint log W4 days 1-5 with per-day breakdown
Specs landed / updated
Build-ready specs
  • specs/build/158-dbt1-normalize-staging-models.md — Phase A/B/C plan (now closing P1)
  • specs/build/175a-canonical-survivorship.md — canonical model (78-field design lock pending today's refresh)
  • specs/setup/143-ch-dbt2-home.md — CH dbt home (closed; Pass 3c done)
  • specs/setup/159-us-ch-replication-group.md — replication group US→CH (closed; 49 tables verified)
Cross-repo + tooling
Outside this repo
  • eftours/de-dbt#3868 — bootstrap restructure (MERGED W4 D1)
  • eftours/de-dbt#3989 — sources refactor (Adrien-approved · pending merge)
  • eftours/de-dbt#4027 — CCAP Phase C pilot real model (Adrien-approved · pending merge)
  • eftours/de-dbt#4059 — 7-BU staging contacts extension (Ready for Review · awaiting Adrien)
  • Repository CH = US 100% match audit · 49 source tables · 285M+ rows