Cdx Portal with Insights

Madhuri Ganta's org logoMadhuri Ganta's org
Build something like this?
Insights2

Architecture dossier

Cdx Portal with Insights

The root board of this workspace, serving as the central hub for viewing, organizing and navigating all systems, entities and connections

9 findings · 3 traced paths across this system.

Reliability & Blast Radius — 2026-06-05

3 insights · 2 paths

Reliability & Blast Radius Report

Summary

3 findings: 1 critical, 2 high, 0 medium, 0 low Polarities: 3 risks, 0 strengths, 0 opportunities, 0 observations 2 paths traced, 1 suggestion

Two structural single points of failure dominate the rootboard: the unreplicated PostgreSQL instance and the single Lambda handler that fronts the entire API.

Findings

1. PostgreSQL is an unreplicated single point of failure (risk · critical · confidence: inferred)

Element: postgres-db

Nine of ten backend domains plus NextAuth depend on one PostgreSQL/pgvector instance. No replica or standby appears in the graph.

Recommendation: Run multi-AZ with a read replica; route dashboard reads to the replica; define and test RTO/RPO.


2. Entire API runs through one Lambda handler (risk · high · confidence: verified)

Element: lambda-runtime File: apps/server/serverless.yml

The whole NestJS app is one dist/main.handler behind ANY /{proxy+}. Every domain shares one function's concurrency and cold-start budget.

Recommendation: Reserved/provisioned concurrency, per-route timeouts; split auth + billing webhooks into dedicated functions.


3. Auth failure means total lockout (risk · high · confidence: inferred)

Element: sso-platform-auth

Every authenticated request depends on sso-platform-auth, which depends on the IdP and the database. A DB or auth outage blocks all new logins.

Recommendation: Circuit breakers + timeouts on IdP/Cognito calls; graceful-degradation messaging.

Paths

PostgreSQL Outage Blast Radius (risk · critical)

web → nestjs-api → postgres-db (Ring 0) branching into auth lockout, dashboard failure, billing/webhook backup, and onboarding loss. Nothing stops the cascade — the blast boundary is the whole platform.

Single Lambda Handler Chokepoint (risk · high)

web → nestjs-api → lambda-runtime. One function, no per-domain isolation.

Suggestions

add node — Add a PostgreSQL read replica: removes the DB as an absolute read SPOF and shrinks the Ring 1 blast radius.

Trace a path on the diagram

PostgreSQL Outage Blast Radius

How a single database failure cascades across the platform's domains.

  1. 1.User request enters the portal
  2. 2.NestJS API handles the requestsync_call
  3. 3.Ring 0 — PostgreSQL unavailable

Single Lambda Handler Chokepoint

All portal traffic funnels through one Lambda function.

  1. 1.All portal + API traffic
  2. 2.NestJS app handles every routesync_call
  3. 3.Single dist/main.handler — ANY /{proxy+}

risk3

PostgreSQL is an unreplicated single point of failure

riskcriticalinferredmedium

Nine of ten backend domains plus NextAuth read or write the single PostgreSQL/pgvector instance (billing, deployment, container-images, sso, onboarding, portal, org-members, user-profile, curation, and nextjs-web all have edges into postgres-db). There is no evidence of a read replica or multi-AZ standby in the graph.

Impact: A database outage takes down auth, dashboard, billing, and onboarding simultaneously — a full platform outage with no partial degradation.

Run PostgreSQL with a multi-AZ standby and a read replica; route dashboard reads to the replica and define an RTO/RPO target with a tested restore runbook.

reliabilityspofdatabase

Entire API runs through one Lambda handler

riskhighverifiedlarge

The whole NestJS app is bundled as a single dist/main.handler behind an httpApi catch-all (ANY /{proxy+}), confirmed in apps/server/serverless.yml. Every domain shares one function's concurrency, memory, and cold-start budget.

Impact: A poison request, memory leak, or concurrency throttle in any one domain degrades every endpoint at once; there is no per-domain isolation.

Set reserved/provisioned concurrency and per-route timeouts; consider splitting hot or risky domains (auth, billing webhooks) into dedicated functions.

reliabilityspofserverless

Auth failure means total lockout

riskhighinferredmedium

sso-platform-auth sits between nextjs-web and both the IdP and PostgreSQL. Because every authenticated request depends on it, an auth or DB failure blocks all logins — existing sessions may survive but no new access is possible.

Impact: During a DB or auth outage, users cannot sign in at all; the blast radius is the entire user base.

Add circuit breakers and timeouts on the IdP and Cognito calls, and graceful-degradation messaging so the portal fails informatively rather than hanging.

reliabilityauth

How Does It Work? — Onboarding — 2026-05-30on Onboarding

6 insights · 1 path

How Onboarding Works

In one sentence: Onboarding is a client-driven, four-step REST flow on OnboardingController (/user/*) that progressively provisions a user's profile, an organisation + first-user membership, a scored qualification record, and finally a subscription + deployment — sending a fire-and-forget notification to the support team at the end.

The four steps

StepEndpointWhat OnboardingService does
1POST /user/save-profile-precisSets name + userHandle on the user profile. (GET /user/isUserHandleAvailable backs the live check — a handle is rejected if taken as a user or org handle.)
2POST /user/setup-solo-handle or setup-organisation-handleCreates the organisation (kind solo-user or org) and the first org_user membership, then lists the public profile.
3POST /user/save-qualificationScores the answers (computeQualificationScore), inserts an onboarding_qualification row, mirrors the role onto userProfile.title, and writes a summary onto organisation.businessDescription. (save-business-information is the deprecated predecessor.)
4POST /user/setup-deployment-choiceCreates a subscription + deployment, updates subscriptionStatus, and fires the onboarding-complete email.

Every mutating step first runs assertNotInvitedMember — invited (non-admin) members are rejected with 403.

Decision points

  • Handle kind — solo workspace reuses the user handle as the org handle; the org path takes an explicit org name + handle.
  • Billing mode (BILLING_MODE env) — stripe-trial -> FREE plan, immediately ACTIVE; founder (default) -> FOUNDER plan with a 6-month preview, status WAITLIST unless the email is in ROOT_ADMIN_EMAILS (then ACTIVE).
  • Deployment typeSAAS -> deployment COMPLETED with a platform URL; self-hosted -> PENDING.

What stood out

  1. The org module's service is bypassed for writes (how-002). Step 2 re-implements OrganisationService.createOrganisation inline against the repositories.
  2. Multi-write steps aren't transactional (how-003) — partial failure orphans rows.
  3. organisation.businessDescription doubles as a progress flag (how-006) — a semantic field overloaded with flow-control meaning.

The qualification scoring (how-004) is a clean, pure, testable function, and the completion email (how-005) is correctly non-blocking.

Trace a path on the diagram

Final step: onboarding-complete notification

  1. 1.POST /user/setup-deployment-choice
  2. 2.setupDeploymentChoice provisions sub + deployment, then notifies
  3. 3.EmailService.sendEmail (not awaited)
  4. 4.Resend delivers to support@contextdx.com

risk3

Multi-write steps run without a transaction

riskhighverifiedmedium

setupSoloHandle does organisationRepository.create then organisationUserRepository.create as two separate awaits; setupDeploymentChoice does subscription.create -> deployment.create -> profile.update as three. A failure on any later write leaves the earlier rows committed and orphaned (org with no user, subscription with no deployment).

Impact: Orphaned organisations/subscriptions on partial failure; the user can land in a half-provisioned state that later steps assume is complete.

Wrap each step's writes in a single transaction via DatabaseService.runEffect() so partial provisioning rolls back atomically.

Onboarding bypasses OrganisationService and writes org tables directly

riskmediumverifiedsmall

setupSoloHandle/setupOrganisationHandle build the org via OrganisationFactory and persist with OrganisationRepository.create + OrganisationUserRepository.create directly, duplicating the exact logic in OrganisationService.createOrganisation. The nested module's service is used only for the findOrganisation read in assertNotInvitedMember.

Impact: Two copies of provisioning logic can drift; the module's public service no longer owns its own writes.

Route provisioning through OrganisationService.createOrganisation so the org/first-user creation logic lives in one place and the module boundary is respected.

Qualification summary is overloaded onto organisation.businessDescription

riskmediumverifiedmedium

saveQualification persists structured answers to onboarding_qualification, then also writes a human-readable summary string into organisation.businessDescription. The code comment states this is to keep an existing state-machine check (!org?.description) working — so the field doubles as an onboarding-progress signal.

Impact: A semantic field is coupled to flow-control; clearing or editing the description could silently reset the user's perceived onboarding state.

Track onboarding step completion explicitly (e.g. a status/step column) instead of inferring progress from whether businessDescription is populated.

strength1

Qualification scoring is a deterministic pure function

strengthlowverified

computeQualificationScore takes the answers and returns {score, tier} with no I/O: role (+1..+3), each qualifying system type (+2), team size (+1..+2), qualifying challenge (+2), tooling signal (+1). Tier is HIGH at >=11, MEDIUM at >=7, else LOW. The service persists this alongside the raw answers.

Being side-effect-free makes the scoring trivially unit-testable and safe to re-run; only OnboardingService.saveQualification touches the database around it.

observation2

Onboarding is a client-driven 4-step state machine

observationmediumverified

OnboardingService exposes four independent POST steps under /user: save-profile-precis (name + handle), setup-solo/organisation-handle (provision org + first user), save-qualification (score the lead), and setup-deployment-choice (subscription + deployment). Each is a separate stateless call; the client decides ordering — there is no server-side step sequencing.

Step order is enforced only by the frontend. The backend accepts any step in any order, relying on data from earlier steps (e.g. the org must already exist before save-qualification looks it up by orgCode).

Onboarding-complete notification is fire-and-forget

observationlowverified

After the final step, setupDeploymentChoice calls sendOnboardingNotification (EmailService.sendEmail -> Resend) to support@contextdx.com without awaiting it; the promise is .catch()'d and only logged. Email delivery failure never blocks or fails onboarding completion.

Good degradation behaviour for a non-critical side effect, but it also means a silently dropped notification has no retry or alerting beyond a log line.

Want this view of your own system?

ContextDx maps your architecture from your codebase and reconciles it into living, shareable insights — just like this board.