9-week build · architecture pivoted W2 day 2
Architecture simplified Tuesday 6 May after a call with Adrien and Allan: single dbt project in CH (replacing the planned dbt-#1-US + dbt-#2-CH split). Raw US data lands in a consolidation DB (EF_DATA_HUB_RAW) and replicates nightly to CH. The 57 legacy ANALYTICS views stay alive in US as deprecating-legacy. Phase numbering unchanged; phase content adjusted accordingly.
Receiving team unchanged: Mike Grabbe + Adrien LeDoux (ET Data Engineering). Rich transitioning out — weekly standups continue; this report keeps you in the loop.
Tue 6 May AM · architecture · Wed 7 May PM · refinements
Decision driver: Adrien's preference for one operational surface and a clean retirement path for the 57 ANALYTICS views (they stay alive as legacy and deprecate alongside Data Cloud, instead of being ported).
Implication: raw US source DBs land into a single consolidation DB (EF_DATA_HUB_RAW) on US side; that DB then replicates to CH nightly. Consolidation is needed because Snowflake replication groups can't include shared databases — discovered Tue PM.
Cost: ~85% of Day 1 work product preserved (inventory, spec structure, macros). ~15% discarded (the US-side dbt-#1 home provisioning that's no longer needed).
Live call Wednesday afternoon settled five architectural refinements that the structure-proposal ticket (#215) had open:
- 3 envs (DEV / QA / PROD) as full DB copies — match EF framework (Rich's team owns QA maintenance)
- CI layers collapse 4 → 1 — Snowflake doesn't have DC's SQL/join limits, so Helpers/Base/Rollup/Final become a single
cisschema - Marts is active (was reserved) — pre-aggregated for Cortex agent + cross-regional + BI
- Segments: MC-driven primary, Snowflake Plan B — preserves marketing self-serve, Vivek call gates methodology
- Reconciliation rule refinements — phone / first_name / primary_address → most-recent across BUs · contact_creation_date → oldest across BUs (affects Splink survivorship)
What landed in W2
-
1
#143 — CH dbt home rewritten + executed (closed). Original 2-DB sandbox/prod from W1 evolved into the post-Rich layout: 3 environments (
EF_DBT_DEV/EF_DBT_QA/EF_DBT_PROD) × 8 schemas each (staging,normalize,sfmc_engagement,canonical,ir,cis,segments,marts). 3 RW roles renamed/created with cross-env grants. ACCOUNTADMIN revoked post-verification (least-privilege restored). -
2
#159 — Replication group US → CH executed end-to-end (closed). Three parts in one day:
Part A — provisioning. Consolidation DB
EF_DATA_HUB_RAWon TAC01291 with 9 data schemas + aPUBLICorchestration schema. Dedicated copy-job role with 815 grants (sources via IMPORTED PRIVILEGES on the shared DBs, write privileges on the target schemas, EXECUTE TASK on account, USAGE onREPORTING_SALESFORCE_WH).Part B — copy tasks. 1 root task + 48 child tasks (one per source table) scheduled at 00:00 UTC nightly. Smoke test landed 48 tables / 283 M rows / 47 GB in
EF_DATA_HUB_RAWon US — two Snowflake quirks fixed mid-flight (CREATE TASK clause order; predecessor-schema constraint forces children + root to share a schema).Part C — replication group.
EF_DATA_HUB_RAW_TO_CH_RGon US side (CRON0 1 * * * UTC) plus the replica on CH. Initial cross-region refresh complete with full byte-parity. Four read roles granted on the replicated DB:EF_DBT_DEV_RW,EF_DBT_QA_RW,EF_DBT_PROD_RW,MCP_READER. -
3
Cross-repo PR eftours/de-dbt#3868 — bootstrap content under
ef_data_hub_ch/. Drop-in artefacts ready for Allan:dbt_project.yml(3-env framework, 8 schemas, single CI layer, marts active),profiles.yml.template,sources.yml(49 source tables across 9 schemas), 6 reusable macros extracted from the 57 legacy views (bu_attribution,cancellation_reason_normalized,phone_e164,gender_canonical,country_code_normalize,traveler_age_3depth), 7 layer placeholders. CI runs green except📦 Compile— expected, no models yet. -
4
Three decks refreshed for Monday's wider audience presentation. Architecture options · CH dbt project structure (decisions locked, 12 slides, +2 new for segment strategy + reconciliation rules) · migration plan deck. HTML edits + PPTX re-renders.
-
5
Ticket housekeeping. 6 tickets closed (#143, #159, #163 + #164 + #165 CI layers collapsed, #215 structure proposal). 10 status-updated (#155, #158, #161, #162, #166 reframed + renamed to single CI layer, #167, #175, #224, #142, #225). Net board state cleaner: P0/P1 setup work largely done, P1 build is now actually unblocked.
What's running every night now
EF_DATA_HUB_RAWNightly schedule (UTC)
| Time | Step | Owner |
|---|---|---|
00:00 | Root task COPY_ROOT fires; 48 children copy from source DBs into EF_DATA_HUB_RAW on US | Snowflake task graph (autonomous) |
~00:30 | All 48 children complete · row counts logged | Snowflake |
01:00 | Replication group EF_DATA_HUB_RAW_TO_CH_RG refresh · pulls deltas to CH | Snowflake replication |
~01:30+ | Available on HH82036 · ready for dbt build | — |
Schemas + sizes (CH side)
| Schema | Tables | Rows | Size | Source |
|---|---|---|---|---|
data_hub_share | 7 | 45.6 M | 8.07 GB | ET / Student Tours share |
language_kafka | 9 | 209.6 M | 35.20 GB | EU Language (Juno + Poseidon Kafka) |
wojo | 3 | 11.5 M | 0.65 GB | World Journeys share |
salesforce_ccap | 7 | 8.6 M | 1.59 GB | CCAP Fivetran landing |
salesforce_academy | 8 | 4.6 M | 0.50 GB | Academy Fivetran landing |
salesforce_hsey | 6 | 2.9 M | 0.36 GB | HSEY Fivetran landing |
ef_us_uploads | 1 | 0.75 M | 0.01 GB | NeverBounce upload (audit pending) |
higher_ed_gy | 4 | 0.16 M | 0.02 GB | Higher Ed share — Gap Year |
higher_ed_sa | 4 | 0.18 M | 0.02 GB | Higher Ed share — Study Abroad |
Note: data_hub_share shows 7 tables once EVALUATION_QUESTION_RESPONSES_V (added today on US) lands on CH at the next 01:00 UTC refresh.
Where each in-flight ticket landed
Closed this week
| Ticket | Title | Why closed |
|---|---|---|
| #143 | CH Snowflake dbt home | 3 envs × 8 schemas provisioned + roles + grants done |
| #159 | Replication group US→CH | End-to-end live · 48 tables / 283 M rows on CH |
| #163 | CIs Helpers layer | Not needed — Snowflake doesn't have DC's SQL limits |
| #164 | CIs Base layer | Collapsed into single cis schema |
| #165 | CIs Rollup layer | Collapsed into single cis schema |
| #215 | CH dbt project structure proposal | 4 of 5 questions resolved + 2 new decisions surfaced |
Now unblocked for build
Sources resolve to EF_DATA_HUB_RAW.<schema>.<table> on CH (already configured). 5-phase migration plan in spec; HSEY recommended as the smallest first bite (364 MB / 2.9 M rows).
Reads raw replicas, writes to canonical schema. Survivorship rules to incorporate the reconciliation refinements Rich confirmed Wednesday.
Still pending or scope-shifting
- #155 — dbt Cloud project bootstrap (Allan; cross-repo PR #3868 has the content ready, awaiting first review)
- #142 — Segments reframed: MC primary / Snowflake Plan B; Vivek call gates the methodology
- #166 — single CI layer build (renamed; replaces #163-#165); pending spec write
- #161 + #162 — Splink IR + parity QA, with the new survivorship rules
- #224 — Marts sublayer (NEW; Cortex / cross-regional / BI consumers)
- #225 — Confluence reconciliation-rules pre-spec cleanup pass
$25 k NTE · spent vs remaining
NTE consumption
23.8% consumed end of W2 day 4. Trajectory healthy: P0 + a chunk of P1 done in 4 active days, with ~76% of the cap reserved for P2 (Splink IR) through P6 (handoff).
How this is calculated
Three rate tiers per SOW Amendment §4 (depending on day intensity):
| Day intensity | Rate | Applied to |
|---|---|---|
| Full-time (> 4 h/day) | $140 / h | 4 of 5 active days so far (4/30, 5/4, 5/5, 5/6, 5/7) → 40 h × $140 = $5,600 |
| Part-time (2.1–4 h/day) | $160 / h | 0 days so far |
| Ad-hoc (≤ 2 h/day) | $180 / h | 1 day so far (4/29 walkthrough) → 2 h × $180 = $360 |
Total: 42 h · $5,960. The full-time intensity rate ($140) applies whenever a day exceeds 4 hours — that's been every active day except the initial 29 Apr walkthrough. Detailed daily breakdown lives in sprint-log.md.
Phase status
| Phase | SOW low (h) | Actual h | Actual $ | Status |
|---|---|---|---|---|
| P0 · M1 Foundation & access (W1) | 16 | 14.0 | $2,040 | Done |
| P1 · dbt #1 normalize + replication (W2-3) | 25 | 28.0 | $3,920 | 112% h · 98% $ |
| P2 · Splink IR (W3-4) | 37 | 0 | $0 | Upcoming |
| P3 · Canonical + CIs (W4-5) | 37 | 0 | $0 | Upcoming |
| P4 · SFMC + Marts (W6-7) | 56 | 0 | $0 | Upcoming |
| P5 · Cutover (W8) | 14 | 0 | $0 | Upcoming |
| P6 · Handoff + Hypercare (W9 + Jul) | 18 | 0 | $0 | Upcoming |
P1 burn: 28 h at $140/h = $3,920 vs $4,000 phase target (98%). Hours over the 25-h estimate (which was for normalize alone; we also executed the entire #143 + #159 stack this week). No NTE escalation needed — phase estimate was conservative on hours, dollar target intact.
W3 (12 May – 18 May) · build phase opens for real
P1 build kicks off — staging models
Now that the replication pipeline is live, the staging layer can read from EF_DATA_HUB_RAW on CH and produce stg_<product>__<entity> models in the staging schema. The plan is to land HSEY first (smallest product at 364 MB / 2.9 M rows), validate the pattern + macros + tests against legacy parity, then fan out to the other 7 products. Splink-IR (#161) and canonical model (#175) follow on top of staging in W3-4.
- dbt source freshness passes against the replicated DB (validates the nightly cadence)
- HSEY staging models — 5-6 models covering the largest entities (Account, Contact, Lead, Opportunity, Order_Item) with parity tests vs legacy ANALYTICS views
- 6 reusable macros dropped in from the bootstrap (BU attribution, cancellation reason, phone E.164, gender canonical, country normalize, traveler age)
- #166 CI spec — single-layer scope (replaces the four DC-style sub-tickets), ~37 CI definitions ported from DC
- #142 segments spec branched per Vivek call output (MC-primary path or Plan B)
- #225 Confluence reconciliation pass — refine the field-level rules with the most-recent / oldest patterns Rich confirmed
- Allan — first review of
eftours/de-dbt#3868; finalize the dbt Cloud project connection so #155 closes and the build phase has its execution surface - Vivek (MC) — schedule the segment-creation methodology call; this gates #142 and the reader-role decisions on the gold layer
- Mike (Snowflake) — eyes on the first 2-3 nightly replication refresh credits to size the cost line; flag if it materially exceeds expectations
- Adrien / Mike — confirm the NeverBounce flow status (
OLD_WAY_NEVERBOUNCE_UPLOADSis 5+ months stale; need to know whether to wire up a fresh source or treat as deprecated) - Rich — sign off the Confluence reconciliation rules once #225 lands a proposal
Looking past W3 — phase trajectory
| Window | Phase | What gets built |
|---|---|---|
| W3-4 (12-26 May) | P1 + P2 | Staging + normalize models for all 8 products · Splink IR pipeline (blocking, fuzzy, survivorship, graph resolution) on Snowpark-optimized warehouse · IR parity QA vs DC's UnifiedIndividual |
| W4-5 (20 May - 2 Jun) | P3 | Canonical model (7 DMOs / 259 fields) · ~37 CIs in single layer · marts pre-aggregated for Cortex agent |
| W6-7 (3-16 Jun) | P4 | Segments (MC primary path or Snowflake fallback) · SFMC engagement ingestion via S3 · Marketing Cloud integration · CloudPage UI |
| W8 (17-23 Jun) | P5 | Cutover runbook · parallel-run reconciliation report · Prefect orchestration end-to-end · DC schedule disable plan |
| W9 + Jul | P6 | Handoff to Mike Grabbe + Adrien · operational runbook · KT sessions · 2-cycle hypercare · formal sign-off checklist |
Critical path now runs through #155 (dbt Cloud project) → #158 (staging) → #161 (IR) → #175 (canonical) → #166 (CIs). Replication pipeline is no longer on the critical path — it's autonomous.
Where to dig deeper
- Migration plan deck + Architecture options + Structure proposal
- SOW Amendment #2 (signed 29 Apr 2026) — NTE $25 k, T&M capped, Net 30
- Sprint log — weekly hours and budget tracker
- 2026-05-04 · W2 kickoff
- 2026-05-05 · Architecture pivot (Adrien)
- 2026-05-05 · Naming lock
- 2026-05-06 · Rich deep-dive
- specs/setup/143-ch-dbt2-home.md — CH dbt home (3 envs / 8 schemas)
- specs/setup/159-us-ch-replication-group.md — replication group US→CH
- specs/build/158-dbt1-normalize-staging-models.md — staging models (5-phase plan)
- specs/build/175-canonical-model.md — canonical model
- eftours/de-dbt#3868 — bootstrap content (Allan to review)
- dbt-project-bootstrap/ — drop-in artefacts mirror
- scripts/setup/ — DDL + Python generators for #143 + #159