EF
Status Report · Week 2 of 9

From plan to live pipeline

Big week. Architecture pivoted Tuesday morning to a single CH dbt project, refined Wednesday afternoon with Rich, and executed Wednesday–Thursday. Raw data now flows nightly US → CH — 48 source tables, 283 M rows, 47 GB byte-parity on HH82036. P0 foundation done; P1 build (#158 staging + #175 canonical) unblocked for next week.

W2 · Day 4 of 5 P1 · dbt #1 normalize 42.0 h · $5,960 · 23.8% NTE Pipeline live
Plan

9-week build · architecture pivoted W2 day 2

Architecture simplified Tuesday 6 May after a call with Adrien and Allan: single dbt project in CH (replacing the planned dbt-#1-US + dbt-#2-CH split). Raw US data lands in a consolidation DB (EF_DATA_HUB_RAW) and replicates nightly to CH. The 57 legacy ANALYTICS views stay alive in US as deprecating-legacy. Phase numbering unchanged; phase content adjusted accordingly.

P0 · W1 Foundation
P1 · W2-3 Normalize + Replication
P2 · W3-4 Splink IR
P3 · W4-5 Canonical + CIs
P4 · W6-7 SFMC + Marts
P5 · W8 Cutover
P6 · W9 + Jul Handoff

Receiving team unchanged: Mike Grabbe + Adrien LeDoux (ET Data Engineering). Rich transitioning out — weekly standups continue; this report keeps you in the loop.

Two pivot points this week

Tue 6 May AM · architecture · Wed 7 May PM · refinements

Tue · architecture
Single dbt project in CH (was: 2 projects)

Decision driver: Adrien's preference for one operational surface and a clean retirement path for the 57 ANALYTICS views (they stay alive as legacy and deprecate alongside Data Cloud, instead of being ported).

Implication: raw US source DBs land into a single consolidation DB (EF_DATA_HUB_RAW) on US side; that DB then replicates to CH nightly. Consolidation is needed because Snowflake replication groups can't include shared databases — discovered Tue PM.

Cost: ~85% of Day 1 work product preserved (inventory, spec structure, macros). ~15% discarded (the US-side dbt-#1 home provisioning that's no longer needed).

Session log: docs/sessions/2026-05-05-architecture-pivot-call-adrien.md
Wed · Rich's deep-dive
3 envs · single CI layer · marts active · MC-segments primary

Live call Wednesday afternoon settled five architectural refinements that the structure-proposal ticket (#215) had open:

  • 3 envs (DEV / QA / PROD) as full DB copies — match EF framework (Rich's team owns QA maintenance)
  • CI layers collapse 4 → 1 — Snowflake doesn't have DC's SQL/join limits, so Helpers/Base/Rollup/Final become a single cis schema
  • Marts is active (was reserved) — pre-aggregated for Cortex agent + cross-regional + BI
  • Segments: MC-driven primary, Snowflake Plan B — preserves marketing self-serve, Vivek call gates methodology
  • Reconciliation rule refinements — phone / first_name / primary_address → most-recent across BUs · contact_creation_date → oldest across BUs (affects Splink survivorship)
Session log: docs/sessions/2026-05-06-rich-architecture-deep-dive.md
Done this week

What landed in W2

  1. 1
    #143 — CH dbt home rewritten + executed (closed). Original 2-DB sandbox/prod from W1 evolved into the post-Rich layout: 3 environments (EF_DBT_DEV / EF_DBT_QA / EF_DBT_PROD) × 8 schemas each (staging, normalize, sfmc_engagement, canonical, ir, cis, segments, marts). 3 RW roles renamed/created with cross-env grants. ACCOUNTADMIN revoked post-verification (least-privilege restored).
    Spec: specs/setup/143-ch-dbt2-home.md · DDL: scripts/setup/143-ch-dbt-home-rewrite.sql · status: Done
  2. 2
    #159 — Replication group US → CH executed end-to-end (closed). Three parts in one day:

    Part A — provisioning. Consolidation DB EF_DATA_HUB_RAW on TAC01291 with 9 data schemas + a PUBLIC orchestration schema. Dedicated copy-job role with 815 grants (sources via IMPORTED PRIVILEGES on the shared DBs, write privileges on the target schemas, EXECUTE TASK on account, USAGE on REPORTING_SALESFORCE_WH).

    Part B — copy tasks. 1 root task + 48 child tasks (one per source table) scheduled at 00:00 UTC nightly. Smoke test landed 48 tables / 283 M rows / 47 GB in EF_DATA_HUB_RAW on US — two Snowflake quirks fixed mid-flight (CREATE TASK clause order; predecessor-schema constraint forces children + root to share a schema).

    Part C — replication group. EF_DATA_HUB_RAW_TO_CH_RG on US side (CRON 0 1 * * * UTC) plus the replica on CH. Initial cross-region refresh complete with full byte-parity. Four read roles granted on the replicated DB: EF_DBT_DEV_RW, EF_DBT_QA_RW, EF_DBT_PROD_RW, MCP_READER.

    Spec: specs/setup/159-us-ch-replication-group.md · status: Done
  3. 3
    Cross-repo PR eftours/de-dbt#3868 — bootstrap content under ef_data_hub_ch/. Drop-in artefacts ready for Allan: dbt_project.yml (3-env framework, 8 schemas, single CI layer, marts active), profiles.yml.template, sources.yml (49 source tables across 9 schemas), 6 reusable macros extracted from the 57 legacy views (bu_attribution, cancellation_reason_normalized, phone_e164, gender_canonical, country_code_normalize, traveler_age_3depth), 7 layer placeholders. CI runs green except 📦 Compile — expected, no models yet.
    Allan to review + finalize the dbt Cloud project connection · awaiting first review
  4. 4
    Three decks refreshed for Monday's wider audience presentation. Architecture options · CH dbt project structure (decisions locked, 12 slides, +2 new for segment strategy + reconciliation rules) · migration plan deck. HTML edits + PPTX re-renders.
    Decks: docs/migration-plan/{architecture-options,dbt-project-structure,migration-plan-deck}-2026-05-05.html
  5. 5
    Ticket housekeeping. 6 tickets closed (#143, #159, #163 + #164 + #165 CI layers collapsed, #215 structure proposal). 10 status-updated (#155, #158, #161, #162, #166 reframed + renamed to single CI layer, #167, #175, #224, #142, #225). Net board state cleaner: P0/P1 setup work largely done, P1 build is now actually unblocked.
Pipeline live

What's running every night now

Tables on CH
48
across 9 schemas in EF_DATA_HUB_RAW
Rows replicated
283 M
byte-parity vs US side
Data volume
47 GB
LANGUAGE_KAFKA = 36 GB (Juno + Poseidon)

Nightly schedule (UTC)

TimeStepOwner
00:00Root task COPY_ROOT fires; 48 children copy from source DBs into EF_DATA_HUB_RAW on USSnowflake task graph (autonomous)
~00:30All 48 children complete · row counts loggedSnowflake
01:00Replication group EF_DATA_HUB_RAW_TO_CH_RG refresh · pulls deltas to CHSnowflake replication
~01:30+Available on HH82036 · ready for dbt build

Schemas + sizes (CH side)

SchemaTablesRowsSizeSource
data_hub_share745.6 M8.07 GBET / Student Tours share
language_kafka9209.6 M35.20 GBEU Language (Juno + Poseidon Kafka)
wojo311.5 M0.65 GBWorld Journeys share
salesforce_ccap78.6 M1.59 GBCCAP Fivetran landing
salesforce_academy84.6 M0.50 GBAcademy Fivetran landing
salesforce_hsey62.9 M0.36 GBHSEY Fivetran landing
ef_us_uploads10.75 M0.01 GBNeverBounce upload (audit pending)
higher_ed_gy40.16 M0.02 GBHigher Ed share — Gap Year
higher_ed_sa40.18 M0.02 GBHigher Ed share — Study Abroad

Note: data_hub_share shows 7 tables once EVALUATION_QUESTION_RESPONSES_V (added today on US) lands on CH at the next 01:00 UTC refresh.

Board state

Where each in-flight ticket landed

Closed this week

TicketTitleWhy closed
#143CH Snowflake dbt home3 envs × 8 schemas provisioned + roles + grants done
#159Replication group US→CHEnd-to-end live · 48 tables / 283 M rows on CH
#163CIs Helpers layerNot needed — Snowflake doesn't have DC's SQL limits
#164CIs Base layerCollapsed into single cis schema
#165CIs Rollup layerCollapsed into single cis schema
#215CH dbt project structure proposal4 of 5 questions resolved + 2 new decisions surfaced

Now unblocked for build

Phase 1 · build
#158 · Staging models for 8 products

Sources resolve to EF_DATA_HUB_RAW.<schema>.<table> on CH (already configured). 5-phase migration plan in spec; HSEY recommended as the smallest first bite (364 MB / 2.9 M rows).

Gates on: #155 dbt Cloud project provisioning (Allan)
Phase 3 · build
#175 · Canonical model (7 DMOs / 259 fields)

Reads raw replicas, writes to canonical schema. Survivorship rules to incorporate the reconciliation refinements Rich confirmed Wednesday.

Gates on: #155 + #161 Splink IR

Still pending or scope-shifting

  • #155 — dbt Cloud project bootstrap (Allan; cross-repo PR #3868 has the content ready, awaiting first review)
  • #142 — Segments reframed: MC primary / Snowflake Plan B; Vivek call gates the methodology
  • #166 — single CI layer build (renamed; replaces #163-#165); pending spec write
  • #161 + #162 — Splink IR + parity QA, with the new survivorship rules
  • #224 — Marts sublayer (NEW; Cortex / cross-regional / BI consumers)
  • #225 — Confluence reconciliation-rules pre-spec cleanup pass
Budget snapshot

$25 k NTE · spent vs remaining

NTE cap
$25,000
SOW Amendment #2 (29 Apr)
Spent to date
$5,960
42.0 hours · 23.8% of NTE
Remaining
$19,040
≈ 136 h at full-time rate ($140 / h)
W2 phase burn
28 h
$3,920 · over hour estimate, within $ budget

NTE consumption

23.8% consumed end of W2 day 4. Trajectory healthy: P0 + a chunk of P1 done in 4 active days, with ~76% of the cap reserved for P2 (Splink IR) through P6 (handoff).

How this is calculated

Three rate tiers per SOW Amendment §4 (depending on day intensity):

Day intensityRateApplied to
Full-time (> 4 h/day)$140 / h4 of 5 active days so far (4/30, 5/4, 5/5, 5/6, 5/7) → 40 h × $140 = $5,600
Part-time (2.1–4 h/day)$160 / h0 days so far
Ad-hoc (≤ 2 h/day)$180 / h1 day so far (4/29 walkthrough) → 2 h × $180 = $360

Total: 42 h · $5,960. The full-time intensity rate ($140) applies whenever a day exceeds 4 hours — that's been every active day except the initial 29 Apr walkthrough. Detailed daily breakdown lives in sprint-log.md.

Phase status

PhaseSOW low (h)Actual hActual $Status
P0 · M1 Foundation & access (W1)1614.0$2,040Done
P1 · dbt #1 normalize + replication (W2-3)2528.0$3,920112% h · 98% $
P2 · Splink IR (W3-4)370$0Upcoming
P3 · Canonical + CIs (W4-5)370$0Upcoming
P4 · SFMC + Marts (W6-7)560$0Upcoming
P5 · Cutover (W8)140$0Upcoming
P6 · Handoff + Hypercare (W9 + Jul)180$0Upcoming

P1 burn: 28 h at $140/h = $3,920 vs $4,000 phase target (98%). Hours over the 25-h estimate (which was for normalize alone; we also executed the entire #143 + #159 stack this week). No NTE escalation needed — phase estimate was conservative on hours, dollar target intact.

What's next

W3 (12 May – 18 May) · build phase opens for real

P1 build kicks off — staging models

Now that the replication pipeline is live, the staging layer can read from EF_DATA_HUB_RAW on CH and produce stg_<product>__<entity> models in the staging schema. The plan is to land HSEY first (smallest product at 364 MB / 2.9 M rows), validate the pattern + macros + tests against legacy parity, then fan out to the other 7 products. Splink-IR (#161) and canonical model (#175) follow on top of staging in W3-4.

Build deliverables · W3
What lands by Friday
  • dbt source freshness passes against the replicated DB (validates the nightly cadence)
  • HSEY staging models — 5-6 models covering the largest entities (Account, Contact, Lead, Opportunity, Order_Item) with parity tests vs legacy ANALYTICS views
  • 6 reusable macros dropped in from the bootstrap (BU attribution, cancellation reason, phone E.164, gender canonical, country normalize, traveler age)
  • #166 CI spec — single-layer scope (replaces the four DC-style sub-tickets), ~37 CI definitions ported from DC
  • #142 segments spec branched per Vivek call output (MC-primary path or Plan B)
  • #225 Confluence reconciliation pass — refine the field-level rules with the most-recent / oldest patterns Rich confirmed
Stakeholder asks · W3
What we need from EF / ET
  • Allan — first review of eftours/de-dbt#3868; finalize the dbt Cloud project connection so #155 closes and the build phase has its execution surface
  • Vivek (MC) — schedule the segment-creation methodology call; this gates #142 and the reader-role decisions on the gold layer
  • Mike (Snowflake) — eyes on the first 2-3 nightly replication refresh credits to size the cost line; flag if it materially exceeds expectations
  • Adrien / Mike — confirm the NeverBounce flow status (OLD_WAY_NEVERBOUNCE_UPLOADS is 5+ months stale; need to know whether to wire up a fresh source or treat as deprecated)
  • Rich — sign off the Confluence reconciliation rules once #225 lands a proposal

Looking past W3 — phase trajectory

WindowPhaseWhat gets built
W3-4 (12-26 May)P1 + P2Staging + normalize models for all 8 products · Splink IR pipeline (blocking, fuzzy, survivorship, graph resolution) on Snowpark-optimized warehouse · IR parity QA vs DC's UnifiedIndividual
W4-5 (20 May - 2 Jun)P3Canonical model (7 DMOs / 259 fields) · ~37 CIs in single layer · marts pre-aggregated for Cortex agent
W6-7 (3-16 Jun)P4Segments (MC primary path or Snowflake fallback) · SFMC engagement ingestion via S3 · Marketing Cloud integration · CloudPage UI
W8 (17-23 Jun)P5Cutover runbook · parallel-run reconciliation report · Prefect orchestration end-to-end · DC schedule disable plan
W9 + JulP6Handoff to Mike Grabbe + Adrien · operational runbook · KT sessions · 2-cycle hypercare · formal sign-off checklist

Critical path now runs through #155 (dbt Cloud project) → #158 (staging) → #161 (IR) → #175 (canonical) → #166 (CIs). Replication pipeline is no longer on the critical path — it's autonomous.

Reference material

Where to dig deeper

Engagement docs
Plan + commercial baseline
  • Migration plan deck + Architecture options + Structure proposal
  • SOW Amendment #2 (signed 29 Apr 2026) — NTE $25 k, T&M capped, Net 30
  • Sprint log — weekly hours and budget tracker
This week's session logs
Pivot + deep-dive notes
  • 2026-05-04 · W2 kickoff
  • 2026-05-05 · Architecture pivot (Adrien)
  • 2026-05-05 · Naming lock
  • 2026-05-06 · Rich deep-dive
Specs landed / updated
Build-ready specs
  • specs/setup/143-ch-dbt2-home.md — CH dbt home (3 envs / 8 schemas)
  • specs/setup/159-us-ch-replication-group.md — replication group US→CH
  • specs/build/158-dbt1-normalize-staging-models.md — staging models (5-phase plan)
  • specs/build/175-canonical-model.md — canonical model
Cross-repo + tooling
Outside this repo
  • eftours/de-dbt#3868 — bootstrap content (Allan to review)
  • dbt-project-bootstrap/ — drop-in artefacts mirror
  • scripts/setup/ — DDL + Python generators for #143 + #159