EF
Status Report · Week 3 of 9

From live pipeline to project ready-to-merge

Decisions locked, project restructured, CI green. The vanilla-audit deck landed three structural decisions Tuesday with Allan + Adrien (collapse 8 schemas → 4 vanilla layers, project-prefixed UPPERCASE, sources outside models/). The ef_data_hub_ch dbt project was rewritten end-to-end across cross-repo PR eftours/de-dbt#3868, Pass 3 DDL on HH82036 created the 4 new schemas + grants, and CI is now 14 of 14 green. The PR is mergeable — waiting on Adrien's code-owner review.

W3 · Day 5 of 5 P1 · dbt build foundation 74.0 h · $10,600 · 42.4% NTE PR mergeable · CI 14/14
Plan

9-week build · P1 foundation now structurally complete

The replication pipeline from W2 keeps running every night. This week the dbt project layer was restructured to the receiving team's vanilla conventions, the CI route was fixed, and the Snowflake-side schemas + grants for the new layout were provisioned live on HH82036. Phase 1 build (real model logic in #158, then #175 canonical) opens Monday as soon as the bootstrap PR merges.

P0 · W1 Foundation
P1 · W2-3 Normalize + Replication
P2 · W3-4 Splink IR
P3 · W4-5 Canonical + CIs
P4 · W6-7 SFMC + Marts
P5 · W8 Cutover
P6 · W9 + Jul Handoff

Receiving team unchanged: Mike Grabbe + Adrien LeDoux (ET Data Engineering). Rich transitioning out — weekly standups continue; this report keeps you in the loop.

Vanilla-audit decisions ratified

Tue 13 May · Allan + Adrien sync · 3 structural decisions

Tuesday's audit deck identified 18 layout items vs the dbt Labs Best Practices Guide + the eftours/de-dbt monorepo conventions. 8 items already aligned, 7 quick wins land before model logic, and 3 substantive decisions needed sign-off. All three settled in the same call:

Decision 1 · layer collapse
8 schemas → 4 vanilla layers

Drop the DC-vocab schemas (canonical / ir / cis / segments) as Snowflake schemas. Vocabulary survives at the model-name + model-folder level (e.g. int_canonical__individual, models/intermediate/ir/) but the Snowflake surface is the dbt Labs canonical 3-layer + medallion gold:

  • STAGING (bronze) · source-aligned views
  • INTERMEDIATE (silver) · business logic · was normalize + canonical + ir
  • CORE (silver) · final non-public · was cis + segments
  • MARTS (gold) · public consumption · Cortex / cross-regional / BI

Driver: receiving team navigates 40+ projects on account 338; less cognitive switching cost.

Decision 2 · naming
UPPERCASE + project-prefixed schemas

Schemas in Snowflake renamed to EF_DATA_HUB_CH_STAGING / _INTERMEDIATE / _CORE / _MARTS. Matches the convention in 5 of 6 sampled production projects (reporting_student_tours, core_finance, reverse_etl, etc.). The prefix protects against cross-project schema collisions if a future project lands on the same CH databases.

Decision 3 · sources
Top-level sources/ outside models/

Source declarations live at the project root in sources/sources.yml, with dbt_project.yml declaring model-paths: ["models", "sources"]. Adrien flagged this directly on PR #3868 review. Differs from the existing pattern in 20+ monorepo projects (which keep models/sources/), but Adrien is raising the bar to stricter dbt Labs vanilla for new projects.

Session log: docs/sessions/2026-05-13-allan-adrien-dbt-structure-decisions.md
Done this week

What landed in W3

  1. 1
    Spec #143 rewritten for the vanilla-audit layout. Pass 1 + 2 marked done with acceptance criteria checked. New Pass 3 split into three sub-passes: 3a SYSADMIN (create 4 new project-prefixed schemas), 3b ACCOUNTADMIN (grants on the new schemas), 3c SYSADMIN (drop the 8 obsolete unprefixed schemas after PR merge). Idempotent DDL inline + companion scripts on disk.
    Spec: specs/setup/143-ch-dbt2-home.md · DDL: scripts/setup/143-ch-dbt-home-pass3.sql + ...-pass3ab-only.sql
  2. 2
    Pass 3a + 3b DDL executed live on HH82036. 12 new schemas (EF_DATA_HUB_CH_STAGING / _INTERMEDIATE / _CORE / _MARTS × DEV / QA / PROD) + full grant matrix on each: USAGE/MONITOR/MODIFY + 10 CREATE privileges to the owning RW role, USAGE cross-env to the other two RW roles, USAGE to MCP_READER on all 12. ACCOUNTADMIN granted temporarily to EF_MCP_SERVICE_USER_CH for the grant block and revoked immediately after — least-privilege restored.
    Pass 3c (drop the 8 obsolete staging/normalize/sfmc_engagement/canonical/ir/cis/segments/marts schemas) queued for after PR #3868 merges
  3. 3
    Cross-repo PR eftours/de-dbt#3868 — restructured end-to-end to the vanilla layout. dbt_project.yml collapsed 8 layer configs → 4 (UPPERCASE schema directives). Sources moved to top-level sources/ outside models/. 7 quick wins from the audit deck landed: dbt-cloud: project-id: 17231 block, +group: ef_data_hub_ch + models/_groups.yml, +access: public on marts, +meta.required_tests on persistent layers, +transient: false + +materialized: table on core + marts, medallion tags dropped. Custom generate_schema_name macro replicated from the EF monorepo standard pattern. .dbt_cloud.txt with project 17231's job IDs routes the workflow to the dbt-cloud CI path. _tours_utilities wired as local package; migrated from calogica/dbt_expectationsmetaplane/dbt_expectations to match transitive deps. Phase A placeholder macro call replaced with cast(null as timestamp) across all 56 staging skeletons (the original macro hardcoded a per-source column name that didn't exist on every source table). CI is 14/14 green; PR is mergeable.
    Adrien tagged for code-owner review · 6 commits pushed this week to feat/ef_data_hub_ch-bootstrap
  4. 4
    CI diagnose-and-fix marathon — 5 distinct failure modes resolved end-to-end. (1) Stale base vs main caused a "missing script" false alarm — fixed by merging origin/main into the PR branch. (2) KeyError: 'javascript' from dbt-snowflake 1.11.5 when CI fell to the dbt-core path → root-caused as a routing issue, not the regression itself. (3) Routing fix: shipped .dbt_cloud.txt with the project's job IDs so the workflow uses the dbt-cloud build path. (4) Defer chicken-and-egg: defer-based CI jobs cancel BEFORE running steps if the env has no baseline manifest. Solved by triggering deploy-type runs (compile-only, no DML) to seed PROD env (job 611029 PATCH-trigger-restore) and QA env (created temp job 622417, ran it, deleted). (5) Macro/package gaps: identified tnightengale/dbt_meta_testing as the missing source for required_tests / required_docs; pulled in transitively via _tours_utilities. Final placeholder macro fix landed the SQL-validity errors on staging models.
    Lesson captured: state:modified defer is sticky — steps_override can't bypass the pre-check, only a deploy-type job seeds the env baseline
  5. 5
    Allan's Terraform PR eftours/de-dbt-cloud-terraform#119 applied. Connection 17238 in dbt Cloud account 338 flipped from the leftover EF_DBT_SANDBOX/EF_DBT_SANDBOX_RW to EF_DBT_DEV/EF_DBT_DEV_RW. Same PR reconciled Terraform drift on the QA/PROD env credentials (added EF_DBT_PROD as the explicit database for PROD; aligned QA role + database to EF_DBT_QA_RW / EF_DBT_QA). Future terraform apply won't revert the manual cred PATCHes from W3 Day 1.
  6. 6
    Ticket housekeeping. #143 spec rewrite landed via PR #274. Pass 3 scripts added via PR #283 + #284. Session log + decision capture via PR #273. Several mid-week correction PRs (#272, #278, #279, #280, #281) keeping the local source-of-truth in sync with the cross-repo state. Once PR #3868 merges, #143 + #155 close.
Pipeline + project surface live

What's running every night + what's now built on top

Source tables on CH
49
across 9 schemas in EF_DATA_HUB_RAW · nightly refresh
dbt models resolving
57
staging placeholders × 8 products · CI-validated
New schemas provisioned
12
4 vanilla layers × DEV / QA / PROD (HH82036)

Nightly schedule (UTC)

TimeStepOwner
00:00Root task COPY_ROOT fires; 48 children copy from source DBs into EF_DATA_HUB_RAW on USSnowflake task graph (autonomous)
~00:30All 48 children complete · row counts loggedSnowflake
01:00Replication group EF_DATA_HUB_RAW_TO_CH_RG refresh · pulls deltas to CHSnowflake replication
~01:30+Available on HH82036 · ready for dbt build

Schemas + sizes (CH side)

SchemaTablesRowsSizeSource
data_hub_share745.6 M8.07 GBET / Student Tours share
language_kafka9209.6 M35.20 GBEU Language (Juno + Poseidon Kafka)
wojo311.5 M0.65 GBWorld Journeys share
salesforce_ccap78.6 M1.59 GBCCAP Fivetran landing
salesforce_academy84.6 M0.50 GBAcademy Fivetran landing
salesforce_hsey62.9 M0.36 GBHSEY Fivetran landing
ef_us_uploads10.75 M0.01 GBNeverBounce upload (audit pending)
higher_ed_gy40.16 M0.02 GBHigher Ed share — Gap Year
higher_ed_sa40.18 M0.02 GBHigher Ed share — Study Abroad

EVALUATION_QUESTION_RESPONSES_V from W2 has been on CH for a week now — included in the 49 source count above.

New schema surface on HH82036 (post Pass 3a/3b)

DatabaseSchemaLayerFuture content
EF_DBT_DEVEF_DATA_HUB_CH_STAGINGbronze57 stg_<product>__<entity> models (currently CI-validated placeholders; real SQL from #158 Phase B)
EF_DATA_HUB_CH_INTERMEDIATEsilverNormalize · canonical model (#175a) · Splink IR (#161)
EF_DATA_HUB_CH_COREsilver~37 CIs (#166) · segments (#142)
EF_DATA_HUB_CH_MARTSgoldCortex agent feeds · cross-regional sharing · BI consumers (#224)
Same 4-schema set mirrored in EF_DBT_QA + EF_DBT_PROD (12 total). Pass 3c will drop the 8 obsolete unprefixed schemas after PR merges.
Board state

Where each in-flight ticket landed

Ready to close once PR #3868 merges

TicketTitleWhy closing
#155Bootstrap single dbt project ef_data_hub_chProject restructured + 14/14 CI · PR #3868 mergeable
#143CH dbt home — Pass 3 finalPass 3a + 3b done · Pass 3c (drop obsolete schemas) runs post-merge
#158 Phase AStaging placeholders for 8 products57 placeholders compile, materialize as views, validate vs sources · ready to move to Phase B real-SQL implementation

Now unblocked for the real build

Phase 1 · build · next bite
#158 Phase B — port-and-evolve from ANALYTICS

57 staging models currently compile as where 1=0 placeholders. Phase B replaces them with real port-and-evolve SQL from the legacy EF_DATA_HUB.ANALYTICS views in US (which still serve the legacy stack). Plan: pilot on stg_ccap__contact (CCAP audited, smallest controlled scope), then fan out by product.

Gates on: PR #3868 merge (after which dev work moves to the merged branch)
Phase 3 · build · canonical
#175a — canonical model (6-RT survivorship)

Canonical individual + 3 ContactPoint canonicals using the 6-rule-type CTE pattern (RT1 source-priority through RT6 aggregate) classified across all 150 DMO fields. Writes to EF_DATA_HUB_CH_INTERMEDIATE. Spec already lands the survivorship classification done in W2.

Gates on: Rich's CSV validation (pending 7 days) + #161 Splink IR

Still pending or stakeholder-blocked

  • #3868 PR review — Adrien (code-owner approval needed; CI is 14/14 green)
  • #225 — Confluence reconciliation-rules CSV validation (Rich; 150-row inventory pending since 8 May, gentle nudge planned for W4)
  • #142 — Segments reframed: MC primary / Snowflake Plan B; Vivek call gates the methodology
  • #161 + #162 — Splink IR + parity QA (incorporate Amon's modified 10-rule design from the parallel CH workstream's SF PS session)
  • #224 — Marts sublayer (Cortex / cross-regional / BI consumers); spec write pending after #158 Phase B settles patterns
  • #136-#140 — RLT-* normalize tickets: ready to move to In Review on Board #7 once #158 Phase B resolves the underlying scope
Budget snapshot

$25 k NTE · spent vs remaining

NTE cap
$25,000
SOW Amendment #2 (29 Apr)
Spent to date
$10,600
74.0 hours · 42.4% of NTE
Remaining
$14,400
≈ 103 h at full-time rate ($140 / h)
W3 burn
24 h
$3,520 · structure rewrite + CI marathon

NTE consumption

42.4% consumed end of W3. Trajectory healthy: P0 done + structural P1 foundation complete in 3 active weeks, with ~58% of the cap reserved for P2 (Splink IR) through P6 (handoff). EF-engagement work only — the parallel ohanacloud-CH workstream (IR design with SF PS) is tracked separately and not billed against this NTE.

How this is calculated

Three rate tiers per SOW Amendment §4 (depending on day intensity):

Day intensityRateApplied to
Full-time (> 4 h/day)$140 / h8 of 10 active days · 68 h × $140 = $9,520
Part-time (2.1–4 h/day)$160 / h1 day (5/11) · 4 h × $160 = $640
Ad-hoc (≤ 2 h/day)$180 / h2 days (4/29 walkthrough · 5/12) · 4 h × $180 = $720

Total: 74 h · $10,600. Detailed daily breakdown lives in sprint-log.md.

Phase status

PhaseSOW low (h)Actual hActual $Status
P0 · M1 Foundation & access (W1)1614.0$2,040Done
P1 · dbt #1 normalize + replication (W2-3)2560.0$8,560240% h · 214% $
P2 · Splink IR (W3-4)370$0Upcoming
P3 · Canonical + CIs (W4-5)370$0Upcoming
P4 · SFMC + Marts (W6-7)560$0Upcoming
P5 · Cutover (W8)140$0Upcoming
P6 · Handoff + Hypercare (W9 + Jul)180$0Upcoming

P1 burn at 60 h vs the 25-h estimate is a clear over-run on the phase tracker, but the phase scope expanded materially: it now bundles #143 + #159 (W2), then the entire #155 bootstrap + vanilla-audit restructure + CI debug + Pass 3 DDL (W3) — all of which are P1 foundation prereqs that the SOW didn't separately budget. Cumulative NTE at 42.4% leaves ~$14.4 k for P2 → P6. No escalation; tracking the over-phase burn as a flag to pace P2 onward more strictly.

What's next

W4 (19 May – 25 May) · real-SQL build opens once PR merges

P1 Phase B — port-and-evolve from ANALYTICS

With structural P1 foundation done (12 schemas + grants + project layout + CI green), the bottleneck moves to real model SQL. Plan: pilot stg_ccap__contact from the legacy EF_DATA_HUB.ANALYTICS.CCAP.CONTACT_V view (CCAP fully audited, smallest controlled scope) to validate the port-and-evolve pattern + macros + parity tests end-to-end. Then fan out by product. Splink-IR (#161) and canonical model (#175a) build on staging once Phase B has 2-3 products materializing real data.

Build deliverables · W4
What lands by Friday 23 May
  • PR eftours/de-dbt#3868 merged · Pass 3c DDL applied (drops the 8 obsolete unprefixed schemas)
  • Spec #158 update · paths + schema configs reflect the new vanilla layout · re-baseline before Phase B starts
  • Phase B pilot · stg_ccap__contact ported with real SQL · parity tests against EF_DATA_HUB.ANALYTICS.CCAP.CONTACT_V
  • Phase B fan-out · 2-3 more products (HSEY + ET / Student Tours likely next) using the validated pattern
  • survivorship_ts macro enhancement · per-table column overrides for wojo + higher_ed (replace the bootstrap stub on placeholders)
  • #225 Confluence reconciliation pass · once Rich validates the CSV inventory
Stakeholder asks · W4
What we need from EF / ET
  • Adrien — code-owner review on PR eftours/de-dbt#3868 (CI 14/14 green; pinged 14 May)
  • Rich — sign off the 150-row reconciliation CSV (docs/sf-analysis/reconciliation-rules-inventory-2026-05-08.csv; pending since 8 May) to unblock #175a survivorship classification
  • Vivek (MC) — schedule the segment-creation methodology call; gates #142 and the reader-role decisions on the gold layer
  • Mike (Snowflake) — sanity check the first 2-3 weeks of nightly replication credits to size the cost line
  • Adrien / Mike — confirm NeverBounce flow status (OLD_WAY_NEVERBOUNCE_UPLOADS is 5+ months stale; need to know whether to wire up a fresh source or treat as deprecated)

Looking past W4 — phase trajectory

WindowPhaseWhat gets built
W4-5 (19 May - 1 Jun)P1 + P2Phase B staging port-and-evolve for all 8 products · Splink IR pipeline (blocking, fuzzy, survivorship, graph resolution) on Snowpark-optimized warehouse · IR parity QA vs DC's UnifiedIndividual
W5-6 (26 May - 8 Jun)P3Canonical model (7 DMOs / 150 fields with 6-RT survivorship pattern) · ~37 CIs in single layer · marts pre-aggregated for Cortex agent
W6-7 (3-16 Jun)P4Segments (MC primary path or Snowflake fallback) · SFMC engagement ingestion via S3 · Marketing Cloud integration · CloudPage UI
W8 (17-23 Jun)P5Cutover runbook · parallel-run reconciliation report · Prefect orchestration end-to-end · DC schedule disable plan
W9 + JulP6Handoff to Mike Grabbe + Adrien · operational runbook · KT sessions · 2-cycle hypercare · formal sign-off checklist

Critical path now: PR #3868 merge → #158 Phase B real SQL → #161 Splink IR → #175a canonical → #166 CIs → #142 segments → cutover. Replication pipeline and dbt project surface are both autonomous now — not on the critical path.

Reference material

Where to dig deeper

Engagement docs
Plan + commercial baseline
  • Migration plan deck + Architecture options + Structure proposal + Vanilla audit deck
  • SOW Amendment #2 (signed 29 Apr 2026) — NTE $25 k, T&M capped, Net 30
  • Sprint log — weekly hours and budget tracker
This week's session logs
Decisions + diagnostics
  • 2026-05-13 · Allan + Adrien · vanilla-audit decisions ratified
  • 2026-05-12 (W3 prep) · vanilla audit deck for Adrien
  • Sprint log entries for W3 days 1-4 with full per-day breakdown
Specs landed / updated
Build-ready specs
  • specs/setup/143-ch-dbt2-home.md — CH dbt home (3 envs · 4 vanilla schemas · UPPERCASE project-prefixed)
  • specs/setup/159-us-ch-replication-group.md — replication group US→CH
  • specs/build/158-dbt1-normalize-staging-models.md — staging models (5-phase plan)
  • specs/build/175a-canonical-survivorship.md — canonical model with 6-RT survivorship pattern
Cross-repo + tooling
Outside this repo
  • eftours/de-dbt#3868 — restructured project · CI 14/14 green · Adrien to review
  • eftours/de-dbt-cloud-terraform#119 — Connection 17238 fix + cred reconciliation (applied 2026-05-13)
  • scripts/setup/143-ch-dbt-home-pass3.sql + ...-pass3ab-only.sql — Pass 3 DDL (3a + 3b applied; 3c queued for post-merge)