9-week build · P1 foundation now structurally complete
The replication pipeline from W2 keeps running every night. This week the dbt project layer was restructured to the receiving team's vanilla conventions, the CI route was fixed, and the Snowflake-side schemas + grants for the new layout were provisioned live on HH82036. Phase 1 build (real model logic in #158, then #175 canonical) opens Monday as soon as the bootstrap PR merges.
Receiving team unchanged: Mike Grabbe + Adrien LeDoux (ET Data Engineering). Rich transitioning out — weekly standups continue; this report keeps you in the loop.
Tue 13 May · Allan + Adrien sync · 3 structural decisions
Tuesday's audit deck identified 18 layout items vs the dbt Labs Best Practices Guide + the eftours/de-dbt monorepo conventions. 8 items already aligned, 7 quick wins land before model logic, and 3 substantive decisions needed sign-off. All three settled in the same call:
Drop the DC-vocab schemas (canonical / ir / cis / segments) as Snowflake schemas. Vocabulary survives at the model-name + model-folder level (e.g. int_canonical__individual, models/intermediate/ir/) but the Snowflake surface is the dbt Labs canonical 3-layer + medallion gold:
STAGING(bronze) · source-aligned viewsINTERMEDIATE(silver) · business logic · was normalize + canonical + irCORE(silver) · final non-public · was cis + segmentsMARTS(gold) · public consumption · Cortex / cross-regional / BI
Driver: receiving team navigates 40+ projects on account 338; less cognitive switching cost.
Schemas in Snowflake renamed to EF_DATA_HUB_CH_STAGING / _INTERMEDIATE / _CORE / _MARTS. Matches the convention in 5 of 6 sampled production projects (reporting_student_tours, core_finance, reverse_etl, etc.). The prefix protects against cross-project schema collisions if a future project lands on the same CH databases.
sources/ outside models/Source declarations live at the project root in sources/sources.yml, with dbt_project.yml declaring model-paths: ["models", "sources"]. Adrien flagged this directly on PR #3868 review. Differs from the existing pattern in 20+ monorepo projects (which keep models/sources/), but Adrien is raising the bar to stricter dbt Labs vanilla for new projects.
What landed in W3
-
1
Spec #143 rewritten for the vanilla-audit layout. Pass 1 + 2 marked done with acceptance criteria checked. New Pass 3 split into three sub-passes: 3a SYSADMIN (create 4 new project-prefixed schemas), 3b ACCOUNTADMIN (grants on the new schemas), 3c SYSADMIN (drop the 8 obsolete unprefixed schemas after PR merge). Idempotent DDL inline + companion scripts on disk.
-
2
Pass 3a + 3b DDL executed live on HH82036. 12 new schemas (
EF_DATA_HUB_CH_STAGING/_INTERMEDIATE/_CORE/_MARTS× DEV / QA / PROD) + full grant matrix on each:USAGE/MONITOR/MODIFY+ 10CREATEprivileges to the owning RW role,USAGEcross-env to the other two RW roles,USAGEtoMCP_READERon all 12. ACCOUNTADMIN granted temporarily toEF_MCP_SERVICE_USER_CHfor the grant block and revoked immediately after — least-privilege restored. -
3
Cross-repo PR eftours/de-dbt#3868 — restructured end-to-end to the vanilla layout.
dbt_project.ymlcollapsed 8 layer configs → 4 (UPPERCASE schema directives). Sources moved to top-levelsources/outsidemodels/. 7 quick wins from the audit deck landed:dbt-cloud: project-id: 17231block,+group: ef_data_hub_ch+models/_groups.yml,+access: publicon marts,+meta.required_testson persistent layers,+transient: false++materialized: tableon core + marts, medallion tags dropped. Customgenerate_schema_namemacro replicated from the EF monorepo standard pattern..dbt_cloud.txtwith project 17231's job IDs routes the workflow to the dbt-cloud CI path._tours_utilitieswired as local package; migrated fromcalogica/dbt_expectations→metaplane/dbt_expectationsto match transitive deps. Phase A placeholder macro call replaced withcast(null as timestamp)across all 56 staging skeletons (the original macro hardcoded a per-source column name that didn't exist on every source table). CI is 14/14 green; PR is mergeable. -
4
CI diagnose-and-fix marathon — 5 distinct failure modes resolved end-to-end. (1) Stale base vs main caused a "missing script" false alarm — fixed by merging
origin/maininto the PR branch. (2)KeyError: 'javascript'fromdbt-snowflake 1.11.5when CI fell to the dbt-core path → root-caused as a routing issue, not the regression itself. (3) Routing fix: shipped.dbt_cloud.txtwith the project's job IDs so the workflow uses the dbt-cloud build path. (4) Defer chicken-and-egg: defer-based CI jobs cancel BEFORE running steps if the env has no baseline manifest. Solved by triggering deploy-type runs (compile-only, no DML) to seed PROD env (job 611029 PATCH-trigger-restore) and QA env (created temp job 622417, ran it, deleted). (5) Macro/package gaps: identifiedtnightengale/dbt_meta_testingas the missing source forrequired_tests/required_docs; pulled in transitively via_tours_utilities. Final placeholder macro fix landed the SQL-validity errors on staging models. -
5
Allan's Terraform PR
eftours/de-dbt-cloud-terraform#119applied. Connection 17238 in dbt Cloud account 338 flipped from the leftoverEF_DBT_SANDBOX/EF_DBT_SANDBOX_RWtoEF_DBT_DEV/EF_DBT_DEV_RW. Same PR reconciled Terraform drift on the QA/PROD env credentials (addedEF_DBT_PRODas the explicit database for PROD; aligned QA role + database toEF_DBT_QA_RW/EF_DBT_QA). Futureterraform applywon't revert the manual cred PATCHes from W3 Day 1. -
6
Ticket housekeeping. #143 spec rewrite landed via PR #274. Pass 3 scripts added via PR #283 + #284. Session log + decision capture via PR #273. Several mid-week correction PRs (#272, #278, #279, #280, #281) keeping the local source-of-truth in sync with the cross-repo state. Once PR #3868 merges, #143 + #155 close.
What's running every night + what's now built on top
EF_DATA_HUB_RAW · nightly refreshNightly schedule (UTC)
| Time | Step | Owner |
|---|---|---|
00:00 | Root task COPY_ROOT fires; 48 children copy from source DBs into EF_DATA_HUB_RAW on US | Snowflake task graph (autonomous) |
~00:30 | All 48 children complete · row counts logged | Snowflake |
01:00 | Replication group EF_DATA_HUB_RAW_TO_CH_RG refresh · pulls deltas to CH | Snowflake replication |
~01:30+ | Available on HH82036 · ready for dbt build | — |
Schemas + sizes (CH side)
| Schema | Tables | Rows | Size | Source |
|---|---|---|---|---|
data_hub_share | 7 | 45.6 M | 8.07 GB | ET / Student Tours share |
language_kafka | 9 | 209.6 M | 35.20 GB | EU Language (Juno + Poseidon Kafka) |
wojo | 3 | 11.5 M | 0.65 GB | World Journeys share |
salesforce_ccap | 7 | 8.6 M | 1.59 GB | CCAP Fivetran landing |
salesforce_academy | 8 | 4.6 M | 0.50 GB | Academy Fivetran landing |
salesforce_hsey | 6 | 2.9 M | 0.36 GB | HSEY Fivetran landing |
ef_us_uploads | 1 | 0.75 M | 0.01 GB | NeverBounce upload (audit pending) |
higher_ed_gy | 4 | 0.16 M | 0.02 GB | Higher Ed share — Gap Year |
higher_ed_sa | 4 | 0.18 M | 0.02 GB | Higher Ed share — Study Abroad |
EVALUATION_QUESTION_RESPONSES_V from W2 has been on CH for a week now — included in the 49 source count above.
New schema surface on HH82036 (post Pass 3a/3b)
| Database | Schema | Layer | Future content |
|---|---|---|---|
EF_DBT_DEV | EF_DATA_HUB_CH_STAGING | bronze | 57 stg_<product>__<entity> models (currently CI-validated placeholders; real SQL from #158 Phase B) |
EF_DATA_HUB_CH_INTERMEDIATE | silver | Normalize · canonical model (#175a) · Splink IR (#161) | |
EF_DATA_HUB_CH_CORE | silver | ~37 CIs (#166) · segments (#142) | |
EF_DATA_HUB_CH_MARTS | gold | Cortex agent feeds · cross-regional sharing · BI consumers (#224) | |
Same 4-schema set mirrored in EF_DBT_QA + EF_DBT_PROD (12 total). Pass 3c will drop the 8 obsolete unprefixed schemas after PR merges. | |||
Where each in-flight ticket landed
Ready to close once PR #3868 merges
| Ticket | Title | Why closing |
|---|---|---|
| #155 | Bootstrap single dbt project ef_data_hub_ch | Project restructured + 14/14 CI · PR #3868 mergeable |
| #143 | CH dbt home — Pass 3 final | Pass 3a + 3b done · Pass 3c (drop obsolete schemas) runs post-merge |
| #158 Phase A | Staging placeholders for 8 products | 57 placeholders compile, materialize as views, validate vs sources · ready to move to Phase B real-SQL implementation |
Now unblocked for the real build
57 staging models currently compile as where 1=0 placeholders. Phase B replaces them with real port-and-evolve SQL from the legacy EF_DATA_HUB.ANALYTICS views in US (which still serve the legacy stack). Plan: pilot on stg_ccap__contact (CCAP audited, smallest controlled scope), then fan out by product.
Canonical individual + 3 ContactPoint canonicals using the 6-rule-type CTE pattern (RT1 source-priority through RT6 aggregate) classified across all 150 DMO fields. Writes to EF_DATA_HUB_CH_INTERMEDIATE. Spec already lands the survivorship classification done in W2.
Still pending or stakeholder-blocked
- #3868 PR review — Adrien (code-owner approval needed; CI is 14/14 green)
- #225 — Confluence reconciliation-rules CSV validation (Rich; 150-row inventory pending since 8 May, gentle nudge planned for W4)
- #142 — Segments reframed: MC primary / Snowflake Plan B; Vivek call gates the methodology
- #161 + #162 — Splink IR + parity QA (incorporate Amon's modified 10-rule design from the parallel CH workstream's SF PS session)
- #224 — Marts sublayer (Cortex / cross-regional / BI consumers); spec write pending after #158 Phase B settles patterns
- #136-#140 — RLT-* normalize tickets: ready to move to In Review on Board #7 once #158 Phase B resolves the underlying scope
$25 k NTE · spent vs remaining
NTE consumption
42.4% consumed end of W3. Trajectory healthy: P0 done + structural P1 foundation complete in 3 active weeks, with ~58% of the cap reserved for P2 (Splink IR) through P6 (handoff). EF-engagement work only — the parallel ohanacloud-CH workstream (IR design with SF PS) is tracked separately and not billed against this NTE.
How this is calculated
Three rate tiers per SOW Amendment §4 (depending on day intensity):
| Day intensity | Rate | Applied to |
|---|---|---|
| Full-time (> 4 h/day) | $140 / h | 8 of 10 active days · 68 h × $140 = $9,520 |
| Part-time (2.1–4 h/day) | $160 / h | 1 day (5/11) · 4 h × $160 = $640 |
| Ad-hoc (≤ 2 h/day) | $180 / h | 2 days (4/29 walkthrough · 5/12) · 4 h × $180 = $720 |
Total: 74 h · $10,600. Detailed daily breakdown lives in sprint-log.md.
Phase status
| Phase | SOW low (h) | Actual h | Actual $ | Status |
|---|---|---|---|---|
| P0 · M1 Foundation & access (W1) | 16 | 14.0 | $2,040 | Done |
| P1 · dbt #1 normalize + replication (W2-3) | 25 | 60.0 | $8,560 | 240% h · 214% $ |
| P2 · Splink IR (W3-4) | 37 | 0 | $0 | Upcoming |
| P3 · Canonical + CIs (W4-5) | 37 | 0 | $0 | Upcoming |
| P4 · SFMC + Marts (W6-7) | 56 | 0 | $0 | Upcoming |
| P5 · Cutover (W8) | 14 | 0 | $0 | Upcoming |
| P6 · Handoff + Hypercare (W9 + Jul) | 18 | 0 | $0 | Upcoming |
P1 burn at 60 h vs the 25-h estimate is a clear over-run on the phase tracker, but the phase scope expanded materially: it now bundles #143 + #159 (W2), then the entire #155 bootstrap + vanilla-audit restructure + CI debug + Pass 3 DDL (W3) — all of which are P1 foundation prereqs that the SOW didn't separately budget. Cumulative NTE at 42.4% leaves ~$14.4 k for P2 → P6. No escalation; tracking the over-phase burn as a flag to pace P2 onward more strictly.
W4 (19 May – 25 May) · real-SQL build opens once PR merges
P1 Phase B — port-and-evolve from ANALYTICS
With structural P1 foundation done (12 schemas + grants + project layout + CI green), the bottleneck moves to real model SQL. Plan: pilot stg_ccap__contact from the legacy EF_DATA_HUB.ANALYTICS.CCAP.CONTACT_V view (CCAP fully audited, smallest controlled scope) to validate the port-and-evolve pattern + macros + parity tests end-to-end. Then fan out by product. Splink-IR (#161) and canonical model (#175a) build on staging once Phase B has 2-3 products materializing real data.
- PR
eftours/de-dbt#3868merged · Pass 3c DDL applied (drops the 8 obsolete unprefixed schemas) - Spec #158 update · paths + schema configs reflect the new vanilla layout · re-baseline before Phase B starts
- Phase B pilot ·
stg_ccap__contactported with real SQL · parity tests againstEF_DATA_HUB.ANALYTICS.CCAP.CONTACT_V - Phase B fan-out · 2-3 more products (HSEY + ET / Student Tours likely next) using the validated pattern
- survivorship_ts macro enhancement · per-table column overrides for wojo + higher_ed (replace the bootstrap stub on placeholders)
- #225 Confluence reconciliation pass · once Rich validates the CSV inventory
- Adrien — code-owner review on PR
eftours/de-dbt#3868(CI 14/14 green; pinged 14 May) - Rich — sign off the 150-row reconciliation CSV (
docs/sf-analysis/reconciliation-rules-inventory-2026-05-08.csv; pending since 8 May) to unblock #175a survivorship classification - Vivek (MC) — schedule the segment-creation methodology call; gates #142 and the reader-role decisions on the gold layer
- Mike (Snowflake) — sanity check the first 2-3 weeks of nightly replication credits to size the cost line
- Adrien / Mike — confirm NeverBounce flow status (
OLD_WAY_NEVERBOUNCE_UPLOADSis 5+ months stale; need to know whether to wire up a fresh source or treat as deprecated)
Looking past W4 — phase trajectory
| Window | Phase | What gets built |
|---|---|---|
| W4-5 (19 May - 1 Jun) | P1 + P2 | Phase B staging port-and-evolve for all 8 products · Splink IR pipeline (blocking, fuzzy, survivorship, graph resolution) on Snowpark-optimized warehouse · IR parity QA vs DC's UnifiedIndividual |
| W5-6 (26 May - 8 Jun) | P3 | Canonical model (7 DMOs / 150 fields with 6-RT survivorship pattern) · ~37 CIs in single layer · marts pre-aggregated for Cortex agent |
| W6-7 (3-16 Jun) | P4 | Segments (MC primary path or Snowflake fallback) · SFMC engagement ingestion via S3 · Marketing Cloud integration · CloudPage UI |
| W8 (17-23 Jun) | P5 | Cutover runbook · parallel-run reconciliation report · Prefect orchestration end-to-end · DC schedule disable plan |
| W9 + Jul | P6 | Handoff to Mike Grabbe + Adrien · operational runbook · KT sessions · 2-cycle hypercare · formal sign-off checklist |
Critical path now: PR #3868 merge → #158 Phase B real SQL → #161 Splink IR → #175a canonical → #166 CIs → #142 segments → cutover. Replication pipeline and dbt project surface are both autonomous now — not on the critical path.
Where to dig deeper
- Migration plan deck + Architecture options + Structure proposal + Vanilla audit deck
- SOW Amendment #2 (signed 29 Apr 2026) — NTE $25 k, T&M capped, Net 30
- Sprint log — weekly hours and budget tracker
- 2026-05-13 · Allan + Adrien · vanilla-audit decisions ratified
- 2026-05-12 (W3 prep) · vanilla audit deck for Adrien
- Sprint log entries for W3 days 1-4 with full per-day breakdown
- specs/setup/143-ch-dbt2-home.md — CH dbt home (3 envs · 4 vanilla schemas · UPPERCASE project-prefixed)
- specs/setup/159-us-ch-replication-group.md — replication group US→CH
- specs/build/158-dbt1-normalize-staging-models.md — staging models (5-phase plan)
- specs/build/175a-canonical-survivorship.md — canonical model with 6-RT survivorship pattern
- eftours/de-dbt#3868 — restructured project · CI 14/14 green · Adrien to review
- eftours/de-dbt-cloud-terraform#119 — Connection 17238 fix + cred reconciliation (applied 2026-05-13)
- scripts/setup/143-ch-dbt-home-pass3.sql + ...-pass3ab-only.sql — Pass 3 DDL (3a + 3b applied; 3c queued for post-merge)