EF Migration · W3 Status Report

Plan

9-week build · P1 foundation now structurally complete

The replication pipeline from W2 keeps running every night. This week the dbt project layer was restructured to the receiving team's vanilla conventions, the CI route was fixed, and the Snowflake-side schemas + grants for the new layout were provisioned live on HH82036. Phase 1 build (real model logic in #158, then #175 canonical) opens Monday as soon as the bootstrap PR merges.

P0 · W1 Foundation

P1 · W2-3 Normalize + Replication

P2 · W3-4 Splink IR

P3 · W4-5 Canonical + CIs

P4 · W6-7 SFMC + Marts

P5 · W8 Cutover

P6 · W9 + Jul Handoff

Receiving team unchanged: Mike Grabbe + Adrien LeDoux (ET Data Engineering). Rich transitioning out — weekly standups continue; this report keeps you in the loop.

Vanilla-audit decisions ratified

Tue 13 May · Allan + Adrien sync · 3 structural decisions

Tuesday's audit deck identified 18 layout items vs the dbt Labs Best Practices Guide + the eftours/de-dbt monorepo conventions. 8 items already aligned, 7 quick wins land before model logic, and 3 substantive decisions needed sign-off. All three settled in the same call:

Decision 1 · layer collapse

8 schemas → 4 vanilla layers

Drop the DC-vocab schemas (canonical / ir / cis / segments) as Snowflake schemas. Vocabulary survives at the model-name + model-folder level (e.g. int_canonical__individual, models/intermediate/ir/) but the Snowflake surface is the dbt Labs canonical 3-layer + medallion gold:

STAGING (bronze) · source-aligned views
INTERMEDIATE (silver) · business logic · was normalize + canonical + ir
CORE (silver) · final non-public · was cis + segments
MARTS (gold) · public consumption · Cortex / cross-regional / BI

Driver: receiving team navigates 40+ projects on account 338; less cognitive switching cost.

Decision 2 · naming

UPPERCASE + project-prefixed schemas

Schemas in Snowflake renamed to EF_DATA_HUB_CH_STAGING / _INTERMEDIATE / _CORE / _MARTS. Matches the convention in 5 of 6 sampled production projects (reporting_student_tours, core_finance, reverse_etl, etc.). The prefix protects against cross-project schema collisions if a future project lands on the same CH databases.

Decision 3 · sources

Top-level sources/ outside models/

Source declarations live at the project root in sources/sources.yml, with dbt_project.yml declaring model-paths: ["models", "sources"]. Adrien flagged this directly on PR #3868 review. Differs from the existing pattern in 20+ monorepo projects (which keep models/sources/), but Adrien is raising the bar to stricter dbt Labs vanilla for new projects.

Session log: docs/sessions/2026-05-13-allan-adrien-dbt-structure-decisions.md

Done this week

What landed in W3

1
Spec #143 rewritten for the vanilla-audit layout. Pass 1 + 2 marked done with acceptance criteria checked. New Pass 3 split into three sub-passes: 3a SYSADMIN (create 4 new project-prefixed schemas), 3b ACCOUNTADMIN (grants on the new schemas), 3c SYSADMIN (drop the 8 obsolete unprefixed schemas after PR merge). Idempotent DDL inline + companion scripts on disk.
Spec: specs/setup/143-ch-dbt2-home.md · DDL: scripts/setup/143-ch-dbt-home-pass3.sql + ...-pass3ab-only.sql
2
Pass 3a + 3b DDL executed live on HH82036. 12 new schemas (EF_DATA_HUB_CH_STAGING / _INTERMEDIATE / _CORE / _MARTS × DEV / QA / PROD) + full grant matrix on each: USAGE/MONITOR/MODIFY + 10 CREATE privileges to the owning RW role, USAGE cross-env to the other two RW roles, USAGE to MCP_READER on all 12. ACCOUNTADMIN granted temporarily to EF_MCP_SERVICE_USER_CH for the grant block and revoked immediately after — least-privilege restored.
Pass 3c (drop the 8 obsolete staging/normalize/sfmc_engagement/canonical/ir/cis/segments/marts schemas) queued for after PR #3868 merges
3
Cross-repo PR eftours/de-dbt#3868 — restructured end-to-end to the vanilla layout. dbt_project.yml collapsed 8 layer configs → 4 (UPPERCASE schema directives). Sources moved to top-level sources/ outside models/. 7 quick wins from the audit deck landed: dbt-cloud: project-id: 17231 block, +group: ef_data_hub_ch + models/_groups.yml, +access: public on marts, +meta.required_tests on persistent layers, +transient: false + +materialized: table on core + marts, medallion tags dropped. Custom generate_schema_name macro replicated from the EF monorepo standard pattern. .dbt_cloud.txt with project 17231's job IDs routes the workflow to the dbt-cloud CI path. _tours_utilities wired as local package; migrated from calogica/dbt_expectations → metaplane/dbt_expectations to match transitive deps. Phase A placeholder macro call replaced with cast(null as timestamp) across all 56 staging skeletons (the original macro hardcoded a per-source column name that didn't exist on every source table). CI is 14/14 green; PR is mergeable.
Adrien tagged for code-owner review · 6 commits pushed this week to feat/ef_data_hub_ch-bootstrap
4
CI diagnose-and-fix marathon — 5 distinct failure modes resolved end-to-end. (1) Stale base vs main caused a "missing script" false alarm — fixed by merging origin/main into the PR branch. (2) KeyError: 'javascript' from dbt-snowflake 1.11.5 when CI fell to the dbt-core path → root-caused as a routing issue, not the regression itself. (3) Routing fix: shipped .dbt_cloud.txt with the project's job IDs so the workflow uses the dbt-cloud build path. (4) Defer chicken-and-egg: defer-based CI jobs cancel BEFORE running steps if the env has no baseline manifest. Solved by triggering deploy-type runs (compile-only, no DML) to seed PROD env (job 611029 PATCH-trigger-restore) and QA env (created temp job 622417, ran it, deleted). (5) Macro/package gaps: identified tnightengale/dbt_meta_testing as the missing source for required_tests / required_docs; pulled in transitively via _tours_utilities. Final placeholder macro fix landed the SQL-validity errors on staging models.
Lesson captured: state:modified defer is sticky — steps_override can't bypass the pre-check, only a deploy-type job seeds the env baseline
5
Allan's Terraform PR eftours/de-dbt-cloud-terraform#119 applied. Connection 17238 in dbt Cloud account 338 flipped from the leftover EF_DBT_SANDBOX/EF_DBT_SANDBOX_RW to EF_DBT_DEV/EF_DBT_DEV_RW. Same PR reconciled Terraform drift on the QA/PROD env credentials (added EF_DBT_PROD as the explicit database for PROD; aligned QA role + database to EF_DBT_QA_RW / EF_DBT_QA). Future terraform apply won't revert the manual cred PATCHes from W3 Day 1.
6
Ticket housekeeping. #143 spec rewrite landed via PR #274. Pass 3 scripts added via PR #283 + #284. Session log + decision capture via PR #273. Several mid-week correction PRs (#272, #278, #279, #280, #281) keeping the local source-of-truth in sync with the cross-repo state. Once PR #3868 merges, #143 + #155 close.

Pipeline + project surface live

What's running every night + what's now built on top

Source tables on CH

across 9 schemas in EF_DATA_HUB_RAW · nightly refresh

dbt models resolving

staging placeholders × 8 products · CI-validated

New schemas provisioned

4 vanilla layers × DEV / QA / PROD (HH82036)

Nightly schedule (UTC)

Time	Step	Owner
`00:00`	Root task `COPY_ROOT` fires; 48 children copy from source DBs into `EF_DATA_HUB_RAW` on US	Snowflake task graph (autonomous)
`~00:30`	All 48 children complete · row counts logged	Snowflake
`01:00`	Replication group `EF_DATA_HUB_RAW_TO_CH_RG` refresh · pulls deltas to CH	Snowflake replication
`~01:30+`	Available on HH82036 · ready for dbt build	—

Schemas + sizes (CH side)

Schema	Tables	Rows	Size	Source
`data_hub_share`	7	45.6 M	8.07 GB	ET / Student Tours share
`language_kafka`	9	209.6 M	35.20 GB	EU Language (Juno + Poseidon Kafka)
`wojo`	3	11.5 M	0.65 GB	World Journeys share
`salesforce_ccap`	7	8.6 M	1.59 GB	CCAP Fivetran landing
`salesforce_academy`	8	4.6 M	0.50 GB	Academy Fivetran landing
`salesforce_hsey`	6	2.9 M	0.36 GB	HSEY Fivetran landing
`ef_us_uploads`	1	0.75 M	0.01 GB	NeverBounce upload (audit pending)
`higher_ed_gy`	4	0.16 M	0.02 GB	Higher Ed share — Gap Year
`higher_ed_sa`	4	0.18 M	0.02 GB	Higher Ed share — Study Abroad

EVALUATION_QUESTION_RESPONSES_V from W2 has been on CH for a week now — included in the 49 source count above.

New schema surface on HH82036 (post Pass 3a/3b)

Database	Schema	Layer	Future content
`EF_DBT_DEV`	`EF_DATA_HUB_CH_STAGING`	bronze	57 `stg_<product>__<entity>` models (currently CI-validated placeholders; real SQL from #158 Phase B)
	`EF_DATA_HUB_CH_INTERMEDIATE`	silver	Normalize · canonical model (#175a) · Splink IR (#161)
	`EF_DATA_HUB_CH_CORE`	silver	~37 CIs (#166) · segments (#142)
	`EF_DATA_HUB_CH_MARTS`	gold	Cortex agent feeds · cross-regional sharing · BI consumers (#224)
Same 4-schema set mirrored in `EF_DBT_QA` + `EF_DBT_PROD` (12 total). Pass 3c will drop the 8 obsolete unprefixed schemas after PR merges.

Board state

Where each in-flight ticket landed

Ready to close once PR #3868 merges

Ticket	Title	Why closing
#155	Bootstrap single dbt project `ef_data_hub_ch`	Project restructured + 14/14 CI · PR #3868 mergeable
#143	CH dbt home — Pass 3 final	Pass 3a + 3b done · Pass 3c (drop obsolete schemas) runs post-merge
#158 Phase A	Staging placeholders for 8 products	57 placeholders compile, materialize as views, validate vs sources · ready to move to Phase B real-SQL implementation

Now unblocked for the real build

Phase 1 · build · next bite

#158 Phase B — port-and-evolve from ANALYTICS

57 staging models currently compile as where 1=0 placeholders. Phase B replaces them with real port-and-evolve SQL from the legacy EF_DATA_HUB.ANALYTICS views in US (which still serve the legacy stack). Plan: pilot on stg_ccap__contact (CCAP audited, smallest controlled scope), then fan out by product.

Gates on: PR #3868 merge (after which dev work moves to the merged branch)

Phase 3 · build · canonical

#175a — canonical model (6-RT survivorship)

Canonical individual + 3 ContactPoint canonicals using the 6-rule-type CTE pattern (RT1 source-priority through RT6 aggregate) classified across all 150 DMO fields. Writes to EF_DATA_HUB_CH_INTERMEDIATE. Spec already lands the survivorship classification done in W2.

Gates on: Rich's CSV validation (pending 7 days) + #161 Splink IR

Still pending or stakeholder-blocked

#3868 PR review — Adrien (code-owner approval needed; CI is 14/14 green)
#225 — Confluence reconciliation-rules CSV validation (Rich; 150-row inventory pending since 8 May, gentle nudge planned for W4)
#142 — Segments reframed: MC primary / Snowflake Plan B; Vivek call gates the methodology
#161 + #162 — Splink IR + parity QA (incorporate Amon's modified 10-rule design from the parallel CH workstream's SF PS session)
#224 — Marts sublayer (Cortex / cross-regional / BI consumers); spec write pending after #158 Phase B settles patterns
#136-#140 — RLT-* normalize tickets: ready to move to In Review on Board #7 once #158 Phase B resolves the underlying scope

Budget snapshot

$25 k NTE · spent vs remaining

NTE cap

$25,000

SOW Amendment #2 (29 Apr)

Spent to date

$10,600

74.0 hours · 42.4% of NTE

Remaining

$14,400

≈ 103 h at full-time rate ($140 / h)

W3 burn

24 h

$3,520 · structure rewrite + CI marathon

NTE consumption

42.4% consumed end of W3. Trajectory healthy: P0 done + structural P1 foundation complete in 3 active weeks, with ~58% of the cap reserved for P2 (Splink IR) through P6 (handoff). EF-engagement work only — the parallel ohanacloud-CH workstream (IR design with SF PS) is tracked separately and not billed against this NTE.

How this is calculated

Three rate tiers per SOW Amendment §4 (depending on day intensity):

Day intensity	Rate	Applied to
Full-time (> 4 h/day)	$140 / h	8 of 10 active days · 68 h × $140 = $9,520
Part-time (2.1–4 h/day)	$160 / h	1 day (5/11) · 4 h × $160 = $640
Ad-hoc (≤ 2 h/day)	$180 / h	2 days (4/29 walkthrough · 5/12) · 4 h × $180 = $720

Total: 74 h · $10,600. Detailed daily breakdown lives in sprint-log.md.

Phase status

Phase	SOW low (h)	Actual h	Actual $	Status
P0 · M1 Foundation & access (W1)	16	14.0	$2,040	Done
P1 · dbt #1 normalize + replication (W2-3)	25	60.0	$8,560	240% h · 214% $
P2 · Splink IR (W3-4)	37	0	$0	Upcoming
P3 · Canonical + CIs (W4-5)	37	0	$0	Upcoming
P4 · SFMC + Marts (W6-7)	56	0	$0	Upcoming
P5 · Cutover (W8)	14	0	$0	Upcoming
P6 · Handoff + Hypercare (W9 + Jul)	18	0	$0	Upcoming

P1 burn at 60 h vs the 25-h estimate is a clear over-run on the phase tracker, but the phase scope expanded materially: it now bundles #143 + #159 (W2), then the entire #155 bootstrap + vanilla-audit restructure + CI debug + Pass 3 DDL (W3) — all of which are P1 foundation prereqs that the SOW didn't separately budget. Cumulative NTE at 42.4% leaves ~$14.4 k for P2 → P6. No escalation; tracking the over-phase burn as a flag to pace P2 onward more strictly.

What's next

W4 (19 May – 25 May) · real-SQL build opens once PR merges

P1 Phase B — port-and-evolve from ANALYTICS

With structural P1 foundation done (12 schemas + grants + project layout + CI green), the bottleneck moves to real model SQL. Plan: pilot stg_ccap__contact from the legacy EF_DATA_HUB.ANALYTICS.CCAP.CONTACT_V view (CCAP fully audited, smallest controlled scope) to validate the port-and-evolve pattern + macros + parity tests end-to-end. Then fan out by product. Splink-IR (#161) and canonical model (#175a) build on staging once Phase B has 2-3 products materializing real data.

Build deliverables · W4

What lands by Friday 23 May

PR eftours/de-dbt#3868 merged · Pass 3c DDL applied (drops the 8 obsolete unprefixed schemas)
Spec #158 update · paths + schema configs reflect the new vanilla layout · re-baseline before Phase B starts
Phase B pilot · stg_ccap__contact ported with real SQL · parity tests against EF_DATA_HUB.ANALYTICS.CCAP.CONTACT_V
Phase B fan-out · 2-3 more products (HSEY + ET / Student Tours likely next) using the validated pattern
survivorship_ts macro enhancement · per-table column overrides for wojo + higher_ed (replace the bootstrap stub on placeholders)
#225 Confluence reconciliation pass · once Rich validates the CSV inventory

Stakeholder asks · W4

What we need from EF / ET

Adrien — code-owner review on PR eftours/de-dbt#3868 (CI 14/14 green; pinged 14 May)
Rich — sign off the 150-row reconciliation CSV (docs/sf-analysis/reconciliation-rules-inventory-2026-05-08.csv; pending since 8 May) to unblock #175a survivorship classification
Vivek (MC) — schedule the segment-creation methodology call; gates #142 and the reader-role decisions on the gold layer
Mike (Snowflake) — sanity check the first 2-3 weeks of nightly replication credits to size the cost line
Adrien / Mike — confirm NeverBounce flow status (OLD_WAY_NEVERBOUNCE_UPLOADS is 5+ months stale; need to know whether to wire up a fresh source or treat as deprecated)

Looking past W4 — phase trajectory

Window	Phase	What gets built
W4-5 (19 May - 1 Jun)	P1 + P2	Phase B staging port-and-evolve for all 8 products · Splink IR pipeline (blocking, fuzzy, survivorship, graph resolution) on Snowpark-optimized warehouse · IR parity QA vs DC's UnifiedIndividual
W5-6 (26 May - 8 Jun)	P3	Canonical model (7 DMOs / 150 fields with 6-RT survivorship pattern) · ~37 CIs in single layer · marts pre-aggregated for Cortex agent
W6-7 (3-16 Jun)	P4	Segments (MC primary path or Snowflake fallback) · SFMC engagement ingestion via S3 · Marketing Cloud integration · CloudPage UI
W8 (17-23 Jun)	P5	Cutover runbook · parallel-run reconciliation report · Prefect orchestration end-to-end · DC schedule disable plan
W9 + Jul	P6	Handoff to Mike Grabbe + Adrien · operational runbook · KT sessions · 2-cycle hypercare · formal sign-off checklist

Critical path now: PR #3868 merge → #158 Phase B real SQL → #161 Splink IR → #175a canonical → #166 CIs → #142 segments → cutover. Replication pipeline and dbt project surface are both autonomous now — not on the critical path.

Reference material

Where to dig deeper

Engagement docs

Plan + commercial baseline

Migration plan deck + Architecture options + Structure proposal + Vanilla audit deck
SOW Amendment #2 (signed 29 Apr 2026) — NTE $25 k, T&M capped, Net 30
Sprint log — weekly hours and budget tracker

This week's session logs

Decisions + diagnostics

2026-05-13 · Allan + Adrien · vanilla-audit decisions ratified
2026-05-12 (W3 prep) · vanilla audit deck for Adrien
Sprint log entries for W3 days 1-4 with full per-day breakdown

Specs landed / updated

Build-ready specs

specs/setup/143-ch-dbt2-home.md — CH dbt home (3 envs · 4 vanilla schemas · UPPERCASE project-prefixed)
specs/setup/159-us-ch-replication-group.md — replication group US→CH
specs/build/158-dbt1-normalize-staging-models.md — staging models (5-phase plan)
specs/build/175a-canonical-survivorship.md — canonical model with 6-RT survivorship pattern

Cross-repo + tooling

Outside this repo

eftours/de-dbt#3868 — restructured project · CI 14/14 green · Adrien to review
eftours/de-dbt-cloud-terraform#119 — Connection 17238 fix + cred reconciliation (applied 2026-05-13)
scripts/setup/143-ch-dbt-home-pass3.sql + ...-pass3ab-only.sql — Pass 3 DDL (3a + 3b applied; 3c queued for post-merge)

EF Migration · Status

From live pipeline to project ready-to-merge