Discovery & Probes — Arcitopsia™ | Automated Enterprise Knowledge Graph

Section 1 · Why this matters

Time-to-Knowledge-Graph
is the EA bottleneck. CXO Architect

Enterprises spend 6–18 months manually surveying tools, transcribing architectures into Visio, and hand-mapping team ownership before they can answer a basic question like "which services would fail if we deprecate this database?" Probes collapse that into days.

~45 min

Full self-discovery

Arcitopsia scanning its own GitHub + AWS + RDS + ECS stack end-to-end (§24 dogfood)

≥ 180

Records / DB scan

PostgreSQL DataProbe against a real production schema — schemas, tables, views, columns, FKs, routines

≥ 80

Provenance citations

Auto-generated Solution Architecture doc cites only Records discovered by probes — zero hallucination

≤ 5,500

Credits / full sweep

Full Arcitopsia stack end-to-end + EA artefact synthesis + doc generation, total cost ceiling

The substrate is already there. Workflow execution, ConnectorInstance, Record / SpecType / LinkType, ChangeSet semantics via WorkflowJob.changeSetId, AIPromptTemplate, and ToolConfiguration scope-resolver all exist in the platform. Probes add a first-class concept of a scheduled or on-demand scanner, multi-target tool configuration, probe → org-hierarchy routing, and a shared pattern library that grounds AI generation in tenant context.

Section 2 · v1 mandatory probes

Four concrete probes ship in v1. CXO Sales

The end-of-v1 demo is the Probe Analyzer running these four probes against the live Arcitopsia production stack and producing an Enterprise Knowledge Graph rich enough to generate a complete, traceable Solution Architecture Document about Arcitopsia itself — the dogfood acceptance contract (§24).

GitHub Source-Code Probe

Octokit + tree-sitter · 6 sub-stages · LIBRARY-mostly

Scans github.com/arcitopsia/arcitopsia-application across the default branch. Discovers the Next.js app + BullMQ workers, ≥50 library dependencies, CI workflows, and CODEOWNERS-derived ownership.

Emits: code-repo, service, library, api-spec, pipeline
Read-only PAT — repo:read, read:org, metadata:read
~$0.50 LLM cost per full repo scan

AWS Infrastructure Probe

AWS SDK v3 · 4 sub-stages · LIBRARY-only · 0 LLM tokens

Scans account 848111426925 in us-east-1. Discovers VPCs, subnets, security groups, IAM roles, IAM users, ECR repositories and images.

Emits: cloud-account, vpc, subnet, security-group, iam-role, iam-user, container-image-registry
STS role-assumption to arcitopsia-probe-discovery-role (ReadOnlyAccess)
~$0.00 LLM cost — pure SDK calls

AWS ECS / Fargate Probe

AWS SDK v3 · 5 sub-stages · LIBRARY-only · 0 LLM tokens

Discovers cluster arcitopsia-cluster + service arcitopsia-service + task definitions. Captures container env-var names only (never values — secrets stay opaque). CloudWatch log-group ARNs captured; log contents never read in v1.

Emits: cluster, service-deployment, container-workload, deployment-environment
Same arcitopsia-probe-discovery-role as Infrastructure probe
Links back to the cloud-account Record from Stage 1

PostgreSQL DataProbe Sensitive

Direct SQL · 4 sub-stages · LIBRARY-only · METADATA-only

Scans arcitopsia-prod.ci9wausu8oeh.us-east-1.rds.amazonaws.com. Reads information_schema and pg_catalog only. Never executes SELECT * FROM tenant_data_table. Three-defence design: SQL guard (synchronous reject) + session-level read-only + DB user with metadata-only grants.

Emits: schema, table, view, stored-procedure, column-metadata, FK Links
DB user: arcitopsia_probe_meta — zero write privileges anywhere
Records carry column names + types + isPii heuristic flag — never sample values

Data sensitivity is enforced in three layers. probe-output-connector.ts synchronously rejects any SQL that targets a non-system table. The PostgreSQL session is opened with SET default_transaction_read_only = on. And the database user's grants only include SELECT on information_schema.* + pg_catalog.* — even if the application layer were bypassed, the database itself would refuse a write.

Section 3 · Probe taxonomy

Seven first-class
probe categories. Architect LLM

Every probe carries a category that determines the SpecTypes it can emit, the LinkTypes it can create, and the default ownership-routing rules. v1 ships the four concrete probes above; v1.5+ adds the remaining categories one probe per PR.

Category	Reads from	Emits SpecTypes	Status
SOURCE_CODE	GitHub, GitLab, Bitbucket	`application`, `service`, `library`, `api-spec`, `code-repo`, `pipeline`	v1 — GitHub
INFRASTRUCTURE	AWS, GCP, Azure, Terraform Cloud, Kubernetes	`cloud-account`, `vpc`, `compute-instance`, `terraform-module`, `deployment-environment`, `cluster`	v1 — AWS Infra + ECS
DATA	RDS, Snowflake, BigQuery, Databricks	`database`, `schema`, `table`, `view`, `dataset`, `etl-pipeline`, `data-lake`	v1 — PostgreSQL
IDENTITY	Okta, Azure AD, Google Workspace, GitHub teams	`user`, `group`, `role`, `permission-policy`	v1.5
CI_CD	GitHub Actions, Jenkins, ArgoCD	`pipeline`, `deployment`, `release`	v1.5
OBSERVABILITY	Datadog, Grafana, Prometheus, New Relic	`monitor`, `dashboard`, `slo`, `alert-policy`	v1.5
DOCUMENT / EA_CONTENT	Confluence, Notion, SharePoint, Google Drive	`architecture-doc`, `runbook`, `decision-record`, `customer-ea-document-candidate`	v1.5 (skeleton in v1.1)

Section 4 · Probe Analyzer

A six-phase wizard turns "we have lots of tools" into a deterministic execution plan. Architect

Probes work best when the platform already knows which systems are authoritative for which facts. The Probe Analyzer captures that upfront, runs a shallow discovery sweep to validate scope, and produces a stage-ordered DAG of which probes to run, with which targets, in what sequence, and where parallelism is safe.

flowchart LR
    A["A · Inventory
What systems
do you have?"]:::phase
    Ap["A' · Connect
Map systems to
ConnectorInstance"]:::phase
    B["B · Source-of-Truth
Pick primary source
per FactCategory"]:::phase
    Bp["B' · EA Content
Pre-Probe + Package
Gap Analysis"]:::phase
    C["C · Discovery Sweep
Shallow probes
~30-120s"]:::phase
    D["D · Plan Generation
Pure function
topological sort"]:::phase
    E["E · Approve & Run
Credit reservation
+ ProbeExecutionRun"]:::phase

    A --> Ap --> B --> Bp --> C --> D --> E

    classDef phase fill:#161616,stroke:#a3e635,stroke-width:1.5px,color:#e5e5e5,font-family:JetBrains Mono;

The 7-phase wizard (Phase B' added in v1.1 for EA content pre-probe).

Inputs to plan generation

OrganizationProbeProfile — singleton per tenant
SystemRegistration[] — declared systems by kind
FactCategoryMapping[] — primary source per FactCategory
DiscoverySnapshot[] — shallow-sweep findings per system

Outputs

ProbeExecutionPlan — versioned, supersedeable
ProbeExecutionPlanStage[] — barrier-ordered
ProbeExecutionPlanItem[] — probes within stage, with overrides
ProbeExecutionRun — on Approve & Run click

Canonical probe DAG

flowchart TB
    subgraph S0["Stage 0 · Foundation"]
        HR[HRProbe]:::s0
        ID[IdentityProbe]:::s0
        CO[CloudOrgProbe]:::s0
        VC[VCSOrgProbe]:::s0
    end
    subgraph S1["Stage 1 · System Scanning"]
        SC[SourceCodeProbe]:::s1
        IN[InfrastructureProbe]:::s1
        DP[DataPlatformOrgProbe]:::s1
        CI[CI_CDProbe]:::s1
    end
    subgraph S2["Stage 2 · Detail Scanning"]
        DA[DataProbe]:::s2
        OB[ObservabilityProbe]:::s2
        DO[DocumentProbe]:::s2
    end
    subgraph S3["Stage 3 · Analytical / Derived"]
        PM[PatternMatchProbe]:::s3
        GA[GapAnalysisProbe]:::s3
        HS[HierarchySynthesisProbe]:::s3
    end
    subgraph S4["Stage 4 · Generative"]
        BG[BacklogGenerationWorkflow]:::s4
        DG[DocDraftGenerationWorkflow]:::s4
    end

    HR --> SC
    ID --> SC
    CO --> IN
    VC --> SC
    SC --> DA
    IN --> DA
    DP --> DA
    SC --> OB
    IN --> OB
    SC --> DO
    DA --> PM
    DA --> GA
    DA --> HS
    PM --> DG
    GA --> BG
    HS --> DG

    classDef s0 fill:#0d1a0d,stroke:#a3e635,color:#fff;
    classDef s1 fill:#1e2a3a,stroke:#38bdf8,color:#fff;
    classDef s2 fill:#3b2e1a,stroke:#fbbf24,color:#fff;
    classDef s3 fill:#3b0764,stroke:#c084fc,color:#fff;
    classDef s4 fill:#3f1d1d,stroke:#fb7185,color:#fff;

Hard barriers between stages — Stage 1 only runs after every Stage 0 probe reaches terminal status.

Section 5 · Intra-probe sub-stages

Each probe is itself a layered scan, not one LLM blast. Architect LLM

A single LLM context cannot ingest an entire 1,500-routine schema. A single workflow cannot reliably emit 500 Records in one transaction. So every probe carries an ordered list of ProbeStageDefinition rows; stages execute sequentially per target with cross-target fan-out in parallel.

Worked example — PostgreSQL DataProbe

#	Goal	Extraction strategy	Output SpecTypes	LLM tokens
1	Schema Inventory — schemas, tables, views	LIBRARY — `information_schema.tables`	`schema`, `table`, `view`	0
2	Structural Detail — columns, PKs, FKs, indexes	LIBRARY — `information_schema.columns`, `pg_indexes`	`column-metadata`, `references` Links	0
3	Routines & Triggers — stored procedures, functions	HYBRID — `pg_proc` for structure; LLM for top-50 summaries via §19 fan-out	`stored-procedure`, `function`	~18K
4	Routine Dependencies — read/write graph	LIBRARY — `pg_depend` + `node-sql-parser`	`reads-from`, `writes-to`, `calls` Links	0
5	Lineage Stitching — table → SP → table chains	COMPUTED — no DB hit, pure graph traversal	`derives-from` Links	0

First run of a 200-table schema burns ~$0.20 in LLM cost. Compare with a naive "describe-everything-at-once" approach which would exceed $50 just to summarise the routines. The architectural lever is decomposition (§19) + library-first (§17).

Per-stage failure modes

failureMode	If this stage fails…
`BLOCK_DOWNSTREAM`	Stop the probe run for this target. Mark all later stages `BLOCKED`. Default for Stage 1 — without inventory, nothing else makes sense.
`DEGRADE_DOWNSTREAM`	Mark later stages `degradedDownstream=true`. They still run, but AI calls auto-downgrade to `PENDING_DISCOVERY`. Default for Stage 3.
`CONTINUE`	Later stages unaffected. Default for Stage 5 — lineage stitching is best-effort.

Section 6 · Library-First Doctrine

LLM tokens are economics, not magic. Most extraction is library work. Architect CXO

Every ProbeStageDefinition declares extractionStrategy ∈ {LIBRARY, LLM, HYBRID, COMPUTED}. LLM strategy without a justification (whyLlm) fails package validation. Plan-review surfaces the library coverage ratio; anything below 70% earns a yellow warning.

Tooling catalogue (v1 bundled)

Probe	Library	Why preferred over LLM	Tokens avoided / use
GitHub source-code	`@octokit/rest`, `@octokit/graphql`	Native API; rate-limited but deterministic	~3K × repo
GitHub — TS/JS service discovery	`tree-sitter`, `tree-sitter-typescript`	Universal AST; 100+ grammars; incremental	~10K × class
GitHub — manifests	`yaml`, native `JSON.parse`	Deterministic structural parsing	~2K × file
AWS Infra/ECS	`@aws-sdk/client-*` v3	First-party; consistent error surface	~5K × resource
PostgreSQL — schema	Native SQL → `information_schema`	Zero LLM cost; type-safe metadata	~3K × table
PostgreSQL — SQL parsing	`node-sql-parser`	Deterministic SP parsing	~5K × routine

When LLM is justified (v1)

Ownership inference when CODEOWNERS / tags exhausted (§3.2 step 5)
Identity resolution / dedup across naming-divergent systems (§15 #4)
EA artefact drafting — no library writes a "Postgres Standards" document (§14, §15 #6)
One-line semantic descriptors for routines / classes / endpoints
Anomaly natural-language synthesis (library detects; LLM phrases)
Document summarisation (Tika extracts text; only LLM produces summary)

Section 7 · Hierarchy Synthesis

Auto-create the EA chain upstream of every discovered Record. Architect

A probe discovers payments-postgres-prod (Delivery Stream Record owned by Payments Team). For governance to function, that Record needs a chain of EA Stream Records — Postgres Standards, Postgres Guardrails, Reference Architecture, OLTP Pattern — each owned by the correct EA team. In an immature tenant none of these exist. The Hierarchy Synthesizer fixes this.

flowchart TB
    DR[Discovered Record
payments-postgres-prod
SpecType: database]:::disc
    DT[Delivery Team
Payments Team]:::team

    DR -- owned-by --> DT

    AS[Architecture Standard
Postgres 15 Standard]:::ea
    AG[Architecture Guardrail
Postgres Prod Guardrails]:::ea
    AP[Architecture Pattern
OLTP DB Pattern]:::ea
    RA[Reference Architecture
Standard Postgres Deployment]:::ea

    DR -- governed-by --> AS
    DR -- subject-to --> AG
    DR -- conforms-to --> AP
    DR -- instantiates-from --> RA

    TST[Tech-Stack Team
Data Platform Team]:::eateam
    SDT[Sub-Domain Team
Data Storage]:::eateam
    DT2[Domain Team
Data Architecture]:::eateam

    AS --> TST
    AG --> TST
    AP --> TST
    RA --> TST
    TST -- child-of --> SDT
    SDT -- child-of --> DT2

    classDef disc fill:#3b2e1a,stroke:#fbbf24,color:#fff;
    classDef team fill:#1e2a3a,stroke:#38bdf8,color:#fff;
    classDef ea fill:#0d1a0d,stroke:#a3e635,color:#fff;
    classDef eateam fill:#3b0764,stroke:#c084fc,color:#fff;

The EA chain auto-built when a Delivery-Stream Record is discovered for the first time.

The four governance LinkTypes

LinkType	From → To	Semantics
`governed-by`	any tech Record → `architecture-standard`	"This thing must comply with this Standard"
`subject-to`	any tech Record → `architecture-guardrail`	"This Guardrail applies here"
`conforms-to`	any tech Record → `architecture-pattern`	"This implements this Pattern"
`instantiates-from`	any tech Record → `reference-architecture`	"This is an instance of this RefArch"

Idempotent on re-run. Every step uses upsert-by-composite-key. A second run against the same tenant emits zero new Records / zero new Links. Rejecting an EA artefact in the review queue does NOT cascade into re-creation on the next probe run — see Direction of Authority.

Section 8 · Publication policy

Auto-publish or human-review is policy-driven, not a single boolean. Architect

A DBA running the DataProbe should auto-publish discovered table metadata but must review newly drafted stored-procedure semantic summaries. ProbeDefinition.publicationPolicyKey references a PublicationPolicy row; each policy carries an ordered rule list (first-match wins).

Worked policy — `data-probe-default`

{
  "key": "data-probe-default",
  "defaultDecision": "PENDING_DISCOVERY",
  "rules": [
    { specTypeKey: "table",
      matchSynthesisOrigin: ["DISCOVERED"],
      decision: "AUTO_PUBLISH",
      rationale: "Structural facts from JDBC are reliable" },

    { specTypeKey: "table",
      matchSensitivity: ["PII"],
      decision: "PENDING_DISCOVERY",
      requiredApproverPersonas: ["data-steward"],
      rationale: "PII tables require data steward sign-off" },

    { specTypeKey: "stored-procedure",
      matchAiConfidenceLt: 0.85,
      decision: "PENDING_DISCOVERY",
      requiredApproverRoleCategories: ["DBA"] },

    { specTypeKey: "architecture-standard",
      matchSynthesisOrigin: ["AI_DRAFTED_ARTIFACT"],
      decision: "PENDING_DISCOVERY",
      requiredApproverPersonas: ["ea-architect", "domain-architect"] }
  ]
}

Bulk-approve endpoints (POST /api/probe-runs/:id/approve) slice their updateMany by matching rule; the caller's roles/personas determine which Records flip to APPROVED in a single call.

Section 9 · Child-workflow decomposition

Fan out, consolidate. The same pattern powers all batched AI work. Architect

LLM context windows cannot ingest "describe every stored procedure in a 1,500-routine schema." The diagram-generation workflow in ea-package-demo3 solves this by splitting into N child workflows and consolidating outputs. Probes reuse the exact same primitives.

flowchart LR
    P["Parent Stage
50 routines
chunkSize=10"]:::parent
    C1[Child #1
routines 1-10]:::child
    C2[Child #2
routines 11-20]:::child
    C3[Child #3
routines 21-30]:::child
    C4[Child #4
routines 31-40]:::child
    C5[Child #5
routines 41-50]:::child
    M[Consolidate
mergeStrategy=CONCAT]:::merge

    P --> C1
    P --> C2
    P --> C3
    P --> C4
    P --> C5
    C1 --> M
    C2 --> M
    C3 --> M
    C4 --> M
    C5 --> M

    classDef parent fill:#0d1a0d,stroke:#a3e635,color:#fff;
    classDef child fill:#1e2a3a,stroke:#38bdf8,color:#fff;
    classDef merge fill:#3b2e1a,stroke:#fbbf24,color:#fff;

Token cost vs naive single-LLM approach: ~98% reduction. Latency slightly higher, but children run in parallel.

Decomposition thresholds — scope-resolved

All four decomposition thresholds are resolved at probe-execution time via the existing ToolConfiguration scope hierarchy (RECORD → TEAM → USER → ROLE → TENANT). Lets ops tune cost vs latency per deployment context.

Setting	Default	Hard floor	Hard ceiling
`decompositionTriggerItemCount`	50	1	10,000
`chunkSize` (LLM strategies)	10	5	50
`chunkSize` (LIBRARY strategies)	100	50	1,000
`maxConcurrentChildren`	20	1	50
`childTimeoutMs`	60,000	5,000	600,000

Section 10 · Context Resolution Pipeline

Generated artefacts are grounded in real Records. Nothing else. Architect LLM

A context-resolver is a Record of SpecType context-resolver (so resolvers are tenant-customisable, versioned, and shipped via PDL). Its body is a declarative DSL: anchor SpecType, facets (multi-hop graph queries with dependencies), and a completeness gate.

Resolver DSL — `solution-architecture-doc-context` (excerpt)

{
  "anchor": { specTypeKey: "solution-architecture", required: true },
  "facets": [
    { key: "technologyStack",
      traversal: { from: "anchor", linkType: "uses-technology", direction: "OUTBOUND" },
      required: true,
      completenessRule: "AT_LEAST_ONE" },

    { key: "standardsPerTech",
      dependsOn: "technologyStack",
      traversal: { fromEach: "technologies", linkType: "governed-by",
                   filterSpecType: "architecture-standard", filterStatusIn: ["APPROVED"] },
      required: true,
      completenessRule: "AT_LEAST_ONE_PER_INPUT" },

    // ... 19 more facets: refArchs, guardrails, patterns, ADRs, NFRs, compliance, ...
  ],
  "completenessGate": {
    minimumRequiredFacetsPresent: 1.0,
    onMissing: "BLOCK_GENERATION"
  }
}

BLOCK_GENERATION

Engine throws. AI workflow does not run. UI surfaces which facets are missing and offers "Open backlog" (creates Tasks for each gap).

DEGRADE

Engine returns what it has. Template must include placeholders like {standards | "[no documented]"} so the LLM acknowledges the gap.

WARN

Full payload + warnings[]. Generation proceeds; warnings persisted on the generated Record's contextResolverWarnings.

Reverse traceability — every generated Record knows what fed it

Record.contextResolverKey — which resolver, if AI-generated
Record.contextRecordIds[] — exact Record IDs that fed the LLM
Record.contextRecordVersionIds[] — per-Record version pinned at generation time
Record.aiInvocationIds[] — all LLM calls that produced this Record
Record.generationStaleness — FRESH | DRIFT_DETECTED | STALE | REGENERATING

Section 11 · Two-gate injection completeness

Fetched ≠ Injected. Two gates close two silent-failure paths. Architect LLM

The Resolver Completeness Gate verifies that all required facets were fetched from the graph. But fetched data is not the same as injected data. Two silent failure modes exist between resolver output and the LLM call: token-budget truncation, and template-authoring omission. The Injection Completeness Gate closes both.

flowchart LR
    R[Resolver runs
fetches 21 facets]:::start
    G1{Gate 1
Completeness?}:::gate
    Bk["BLOCK
Required facet
didn't resolve"]:::fail
    Pr[Prompt renderer
renders template
vs facet data]:::middle
    G2{Gate 2
Mandatory facets
in prompt text?}:::gate
    Bk2["BLOCK
Mandatory facet truncated
or unreferenced"]:::fail
    L[LLM call proceeds
with full context]:::success

    R --> G1
    G1 -- pass --> Pr
    G1 -- fail --> Bk
    Pr --> G2
    G2 -- pass --> L
    G2 -- fail --> Bk2

    classDef start fill:#0d1a0d,stroke:#a3e635,color:#fff;
    classDef gate fill:#3b2e1a,stroke:#fbbf24,color:#fff;
    classDef middle fill:#1e2a3a,stroke:#38bdf8,color:#fff;
    classDef fail fill:#3f1d1d,stroke:#fb7185,color:#fff;
    classDef success fill:#0a3b16,stroke:#86efac,color:#fff;

Both gates must pass for an AI invocation to proceed. Skipping either is a hallucination vector.

Token-budget truncation policy. When rendered prompt > 80% of model context window: priority order is mandatory facets → required facets → other facets in dependsOn order. If truncation reaches a mandatory facet, the call is aborted — don't silently call with known-incomplete context. AIUsageLog.contextTruncated=true records the event.

Section 12 · Governed Entity Pattern

Canonical state + version snapshots + project change proposals. One pattern, ten governed types. Architect

The canonical-vs-change separation applies to every governed entity type that multiple projects can modify concurrently: Applications, Services, APIs, Workflows, Databases, Systems, Tech Stacks, Reference Architectures, EA artefacts. The same three SpecTypes + six LinkTypes + one concurrency-analyser workflow handle them all (parameterised by canonical SpecType).

Governed Entity	Canonical SpecType (EA-owned)	Implementation Snapshot	Change Proposal (Delivery-owned)
Applications	`application`	`application-version-snapshot`	`application-change-proposal`
Services	`service`	`service-version-snapshot`	`service-change-proposal`
APIs	`api-spec`	`api-spec-version-snapshot`	`api-change-proposal`
Workflows	`workflow-definition`	`workflow-version-snapshot`	`workflow-change-proposal`
Database Schemas	`database-schema`	`database-schema-version-snapshot`	`database-schema-change-proposal`
Systems	`system`	`system-version-snapshot`	`system-change-proposal`
Tech Stacks	`technology-stack`	`technology-stack-version-snapshot`	`tech-stack-change-proposal`
Reference Architectures	`reference-architecture`	RecordVersion serves	`reference-architecture-change-proposal`
EA Standards / Guardrails / Patterns	`architecture-standard`, `architecture-guardrail`, `architecture-pattern`	RecordVersion serves	`<artefact>-change-proposal`

Concurrent change detection

The change-concurrency-analyzer workflow is parameterised by canonicalSpecTypeKey — not hardcoded for Applications. Triggered on every new/updated change-proposal Record across all 10 entity types, plus a nightly catch-up sweep. For each canonical entity with ≥ 2 in-flight proposals, runs a structural diff (deterministic), then an LLM judge call (§15 confidence-gate) only for ambiguous text cases. Emits change-conflict-finding Records with severity BLOCKER | HIGH | LOW.

Section 13 · Direction of Authority

The apparent paradox — EA gates generation, but probes write EA Stream. Resolved. Architect

Different gates apply at different lifecycle stages. Probes are never gated by EA completeness — only generation is. Five direction-of-authority rules are enforced in code with unit tests per rule.

Conflict	Rule
Probe re-discovers a tech the architect previously `REJECTED`	Synthesizer does not re-create. Emits `policy-divergence-finding` to App Architecture Team.
Architect-approved Standard claims "Postgres 13" but probe finds Postgres 16	Probe does not silently update the Standard. Emits `tech-taxonomy-drift-finding`; architect must update Standard or fix prod.
Two probe runs disagree about the same fact (different sources, different values)	Last-write-wins only when sources agree on authority. Otherwise emits `discovery-conflict-finding` and routes to review.
Architect manually edits a probe-discovered Record	Edit sticks. `Record.lastHumanEditAt` timestamp protects against silent overwrite on next probe run. If reality still differs, emits `discovery-divergence-finding`.
Probe needs to delete a probe-discovered Record (entity removed from source)	Never hard-delete. Update `Record.status=ARCHIVED`, set `archivedAt`, retain full audit trail.

Section 14 · EA Content Pre-Probe + Package Gap Analysis

Don't overwrite the customer's existing investment. CXO Sales

Many enterprise customers hold years of EA documentation in Confluence / SharePoint / Notion / Git docs. Installing PDL packages first risks overwriting or duplicating that material, devaluing prior investment. The customer-friendly approach: probe the existing EA corpus first, compare against what PDL packages provide, and surface per-item architect consent.

Per-item consent options

Install Package

Package's Record created in EKG as-is on Phase E execution.

Keep Customer

Record created with package SpecType scaffolding; body / metadata from matched customer document.

Hybrid / Merge

Both versions imported; customer version lands PENDING_DISCOVERY for manual merge.

Skip

Neither installed; architect deems out of scope.

Decisions persist as OrganizationProbeProfile.packageInstallationDecisions — durable record of {packageKey, itemKey, choice, decidedByUserId, decidedAt, matchedCandidateId?} per item. Re-running gap analysis after a package update only re-evaluates changed items, preserving prior decisions. Adopted customer documents are marked synthesisOrigin = CUSTOMER_DOC_ADOPTED with source URL + last-modified date.

Section 15 · Credentials Management

Five credential families. Read-only enforcement. Five-step validation gauntlet. Architect Sales

Probes are read-only by design. The wizard's Phase A' refuses to persist credentials that fail the read-only scope check. AWS Secrets Manager + tenant-scoped KMS keys; raw secrets never reach the DOM more than once.

Five credential families

Family	Example probes	Validation gate
`STATIC_TOKEN`	GitHub PAT, Confluence API token, Datadog API key	API call → inspect `x-oauth-scopes` against allowed/forbidden lists
`AWS_ROLE_ASSUMPTION`	AWS Infrastructure, AWS ECS, RDS IAM auth	`sts:AssumeRole` + `iam:SimulatePrincipalPolicy` against representative write actions; all must return `implicitDeny`
`DATABASE_CONNECTION_STRING`	PostgreSQL, MySQL, MSSQL, Oracle, Snowflake	`SELECT has_table_privilege(current_user, 'public.x', 'INSERT')` must be FALSE; canary INSERT in rolled-back savepoint must fail with insufficient-privilege
`MULTI_SECRET`	GitHub App, Azure AD service principal	Per-connector handshake (GitHub App: App-ID + private key → installation access token)
`OAUTH_AUTHORIZATION_CODE`	Confluence Cloud, Notion, Google Workspace, Okta admin (v1.5)	Full OAuth handshake; refresh-token storage

The five-step save-time gauntlet

Schema validation — form values match ConnectorDefinition.configSchema (type, required, pattern, minLength)
Connectivity test — call validation.endpoint; non-2xx → reject with UX-surfaced error
Identity verification — confirm the authenticated identity matches the architect's declared identity (e.g. GitHub login)
Scope enforcement — per-family check (above). Rejected if any forbidden capability is present.
Read-only confirmation — architect checks a box confirming read-only intent before save proceeds.

Read-only scope enforcement is a hard gate in v1. A PAT with repo:write scope is rejected with INSUFFICIENT_PRIVILEGE_ENFORCEMENT_FAILED — no "save anyway" override. Architect must regenerate with reduced scope. Every read goes through lib/probes/credentials/access-gateway.ts, which writes a CredentialAccessLog row first.

Encryption at rest

Storage: AWS Secrets Manager. Secret naming pattern: arcitopsia/<tenantId>/<connectorKey>/<targetKey>.
Key management: tenant-scoped KMS key alias alias/arcitopsia-tenant-<tenantId>; annual auto-rotation.
In the DB: only the secret reference (secretRef) is stored. Resolved at probe-run time via the access-gateway, held in memory for one probe stage only, never logged, never written to telemetry.
Cross-tenant access is impossible: each tenant's secrets encrypted with their own KMS key + IAM role's kms:Decrypt permission scoped per-tenant via kms:ResourceAliases condition.

Section 16 · AI LLM Integration

Ten integration points. One confidence-gate. Full provenance. Architect LLM

Every LLM call in v1 goes through lib/ai/confidence-gate.ts — context injection, schema validation, confidence thresholds, provenance recording, and cost caps in one gateway. Below threshold → Record forced to PENDING_DISCOVERY regardless of autoPublish.

#	Where	Output	Autonomy	Threshold
1	SpecType classification	`{specTypeKey, confidence, reasoning?}`	Auto-apply if ≥ threshold	0.85
2	EA Domain / Tech tagging	`{eaDomainKey, subDomainKey?, techStackKey?, productKey?}`	Auto-tag if ≥ threshold	0.80
3	Ownership inference (step 5 of §3.2)	`{ownerTeamId, confidence, rationale}`	< threshold forces PENDING_DISCOVERY	0.70
4	Identity resolution / dedup	`{verdict: SAME \| DIFFERENT \| UNSURE, mergeKey?}`	Auto-merge if SAME ≥ 0.9	0.90 / 0.70
5	Document parsing	`{extractedFields, summary, mentionedEntityIds[]}`	Never auto-apply; review queue	n/a
6	EA artefact drafting (RefArch/Standard/Guardrail/Pattern)	`{title, body Markdown, structured metadata}`	Never auto-apply; PENDING_DISCOVERY	n/a
7	Pattern match scoring	`[{patternId, score, gaps[], strengths[]}]`	Top match ≥ threshold sets `Record.patternId`	0.60
8	Diagram generation (Mermaid)	`{c4Component?, c4Container?, sequence?, er?}`	Always review queue	n/a
9	Anomaly flagging	`[{severity, finding, suggestedAction}]`	Auto-emit if severity ≥ MEDIUM	severity-gated
10	Analyzer guidance	`{suggestions: string[]}`	Suggestion only — never auto-action	n/a

Confidence sources (template-declared)

MODEL_SELF_REPORT — model returns a confidence field. Cheap; default for classification/tagging.
MULTI_SHOT_CONSISTENCY — call N times with temperature > 0, compute agreement. Used for dedup + ownership (expensive but high-trust).
HEURISTIC — derived from input quality signals (e.g. "doc text length < 200 chars → confidence ≤ 0.5"). Used for document extraction.

Section 17 · Credit economics

Reserve worst-case → settle actual. 99% gross margin at scale. CXO Sales

Per PAYG §5.0: every credit-bearing activity is priced at 20% of the documented manual cost to produce the equivalent outcome. baseCredits = round(manualCostBasisUSD × captureRatePct) where captureRatePct defaults to 0.20 (negotiable 0.15–0.25 per TenantCreditContract).

Worked example — PostgreSQL DataProbe with full hierarchy synthesis

200 tables, 50 stored procedures (10 require AI summarisation), 4 EA artefacts drafted per tech across 7 techs. Tenant on MULTINATIONAL tier (×1.30) with 20% Scale-pack discount (globalCreditMultiplier = 0.80), non-BYOLLM.

Line item	Activity	Manual basis	Base credits
Probe Discovery Run	`probe-discovery-run`	$500	100
Bulk Import 200 tables	`bulk-import-per-100-records`	$50 / 100	200
AI Classification × 10 routines	`ai-enrichment-per-record`	$100 / record	200
Hierarchy Synthesis × 4 artefacts × 7 techs (subset)	`probe-ea-artifact-draft`	$250 / artefact	200
Σ baseCredits			700

Worst-case reservation at Phase E

maxReserve = baseCredits × maxComplexityMultiplier × tierMultiplier × globalMultiplier
           = 700 × 3.0 × 1.30 × 0.80
           = 2,184 credits   // ≈ $1,747 at Scale-pack rate

Actual settled (after 10-factor complexity computation with log₂ dampening)

complexityScore       = 3.225           // weighted sum across 10 dimensions
rawMultiplier         = log₂(3.225 + 1) × 1.44
                      = 2.99
complexityMultiplier  = clamp(2.99, min=0.5, max=3.0) = 2.99

finalCredits = round(700 × 2.99 × 1.30 × 0.80) = 2,177

2,184

Reserved

worst-case at Approve & Run

2,177

Settled

actual after complexity

7

Released

back to balance on settle

$0.24

Internal cost

LLM + compute + storage (internal-only)

Gross margin: 99.99% on this run. Per PAYG §7.0 two-layer rule: customer never sees internal-cost or margin numbers — only the credit-side ledger visible in /admin/credits.

Failure scenario. If Stage 2 fails after importing 50 tables and 3 routine summaries: releaseReservation(executionId) flips reservation status to RELEASED_ON_FAILURE. Full 2,184 credits returned to balance. Zero credits consumed per PAYG §5.4 — "credits are never deducted for failed jobs." Internal telemetry still recorded (compute consumed by the failed run is iArchitron's cost).

Section 18 · Self-Discovery Dogfood Acceptance

v1 ships when an architect can scan Arcitopsia itself and generate a real architecture doc. CXO Sales Architect

A "framework with no probes" is not deliverable. The v1 acceptance test is an architect sitting down, completing the wizard against the Arcitopsia production stack, running the plan, and producing a Solution Architecture Document that cites only Records discovered by the four v1 probes — and passes architect spot-check.

18

Steps

in the end-to-end acceptance walkthrough

≤ 5,500

Credits

total spend ceiling (MULTINATIONAL tier)

≤ 45 min

Wall-clock

end-to-end including review-queue approval

≥ 80

Citations

contextRecordIds.length on generated doc

The acceptance test in one paragraph

Open /admin/probe-analyzer/ → register 4 SystemRegistrations (GitHub org arcitopsia, AWS account 848111426925, AWS-RDS prod DB, ECS cluster arcitopsia-cluster) → connect each via Phase A' with the 5-step gauntlet → assign FactCategoryMappings → run Discovery Sweep → Plan Generation produces a 4-stage DAG → Approve & Run reserves ≤ 5,000 credits → 4 probes execute in correct order → Hierarchy Synthesis auto-drafts ~28 EA artefacts → architect approves a subset in /admin/ea-review-queue/ → click "Generate Solution Architecture Document" on the synthesised application:arcitopsia-platform Record → Context Resolver application-context runs (completeness gate passes after approvals) → 11-section doc generated via §19 child-workflow fan-out → architect spot-checks 10 random claims → all 10 trace cleanly to source Records.

What the dogfood test validates. Passing §24.6 validates the framework end-to-end: multi-target connectors, intra-probe sub-stages, library-first doctrine, persona/role gating, child-workflow decomposition, context resolver completeness, application architecture separation, Hierarchy Synthesis, AI gateway + two-gate injection completeness, credit reservation + settle, data sensitivity safeguards. Failing any step triggers fix-forward, not scope reduction.

Section 19 · Probe API surface

Ten REST routes. Tenant-scoped. Same auth as the rest of the platform. Architect

Every route uses requireAuth + tenantId scoping per memory/multi-tenancy-patterns.md. Cross-tenant requests return 404 (not 403 — avoids existence-leakage). No DELETE in v1; soft-delete via PATCH { isArchived: true }.

Route	Verbs	Purpose
/api/probes	GET, POST	List / create ProbeDefinition
/api/probes/:id	GET, PATCH	Get / update probe
/api/probes/:id/targets	GET, POST, PATCH	Manage `ProbeConnectionTarget` rows for a probe
/api/probes/:id/runs	GET, POST	List runs / trigger an on-demand run
/api/probe-runs/:id	GET	Run details with per-stage status
/api/probe-runs/:id/discoveries	GET	Discovered Records (paginated, filterable by SpecType)
/api/probe-runs/:id/approve	POST	Bulk-approve, sliced per `PublicationPolicy` rule
/api/probe-runs/:id/reject	POST	Bulk-reject with optional reason
/api/probe-runs/:id/lineage	GET	Per-Record ownership-resolver decision audit
/api/connection-targets	GET	Cross-probe view of all `ProbeConnectionTarget` rows for the tenant

Analyzer-specific routes ship in parallel under /api/probe-analyzer/* — 8 routes covering the wizard phases (profile / inventory / mappings / sweep / plan / approve / runs). Gated behind canConfigureTenant from lib/auth/permissions/admin-bypass.ts.

Section 20 · PDL Distribution

Probes, policies, resolvers, and TaxMaps ship as PDL packages. Architect

The Package Definition Language (PDL) is the platform's portable, idempotent, version-controlled installer format. Every Probe Framework artefact has a corresponding PDL component type so packages travel cleanly across tenants.

New PDL component types added for Probe Framework

Component type	Phase	Installer	Idempotent on
`probes[]` (24th type)	4	`lib/platform/installers/probe-installer.ts`	`(tenantId, probeDefinition.key)`
`publicationPolicies[]`	12	`publication-policy-installer.ts`	`(tenantId, policy.key)`
`techTaxonomyMaps[]`	7	`tech-taxonomy-map-installer.ts`	`(tenantId, productKey)`
`factCategoryMatrix`	5	analyzer-extension installer	tenant-singleton merge
`probeDependencyEdges[]`	5	analyzer-extension installer	per-edge upsert
`aiPromptTemplates[]` (extended)	8	`prompt-template-installer.ts`	`(tenantId, template.key)`
`contextResolvers[]` (Records)	14	standard record installer	`(tenantId, specType, key)`

Two reference distributions ship with v1: ea-foundation v-next (new SpecTypes + LinkTypes for the governance chain + Governed Entity Pattern) and ea-probes-arcitopsia-v1 (the 4 dogfood probe definitions + their ProbeConnectionTarget seeds for the Arcitopsia prod tenant — not installed by default; architect installs manually for the §24 acceptance test).

Section 21 · EA Review Queue

The human-in-the-loop surface for everything AI drafts. Architect CXO

Lives at /admin/ea-review-queue/. Lists every Record in PENDING_DISCOVERY with synthesisOrigin ∈ {AUTO_CREATED_HIERARCHY, AI_DRAFTED_ARTIFACT, CUSTOMER_DOC_ADOPTED} plus every Team.creationMode = AUTO_PROVISIONED waiting for sign-off. The single workflow that closes the loop on probes → drafts → governance.

Available actions per item

Inspect

Full AI provenance: model, prompt template, confidence, prompt hash, the source Records that fed the draft. One click to the source-Record list.

Edit

Refine title / body / structured metadata before approval. The edit is recorded as a manual override; aiProvenance.overriddenAt timestamp is set.

Approve

Flips Record to APPROVED. Existing approval workflow fires (cascades to dependents, lifecycle transitions, notifications).

Merge

Pick two Records of the same SpecType drafted for the same tech. Edges from the loser repoint to the winner. Audit trail preserved.

Reject

Flips to REJECTED. Cascades: inbound governance Links deleted. Records discovered by future probe runs that would re-trigger this artefact emit policy-divergence-finding instead of re-creating (per §13 Direction of Authority).

Bulk approve by template

"Approve all standards drafted by template postgres-standard-draft-v1 with confidence ≥ 0.85." Slices the updateMany per PublicationPolicy rule + the caller's persona.

Persona / role enforcement is server-side. The bulk-approve endpoint checks the caller's roles against each Record's matching PublicationPolicy rule. Records the caller cannot approve come back in a notApproved[] array with the required-persona hint — never silently approved.

Section 22 · Application Architecture Ownership

EA owns the canonical Application. Projects own change proposals. They link, never duplicate. CXO Architect

The original motivating case for the Governed Entity Pattern (§12). One Application has one canonical Record (slow-changing, EA-owned) and many in-flight application-change-proposal Records (one per project modifying it). Project proposals carry only the delta — the canonical Record stays stable across many parallel projects.

The three Application-tier SpecTypes

SpecType	Owner	Lifecycle	Cardinality / Application
`application`	EA team	slow-changing canonical	1
`application-version-snapshot`	EA team	append-only history	N (one per merged change)
`application-change-proposal`	Delivery team	ephemeral (lives for the project)	M (one per active project)

Cross-project concurrency detection

The change-concurrency-analyzer workflow (per §12 — generic across all 10 Governed Entity types) fires whenever an application-change-proposal is created or updated, plus a nightly catch-up sweep. For each application with ≥ 2 in-flight proposals, it runs a structural diff (deterministic) over the proposal bodies — and only escalates to an LLM judge for ambiguous text cases. Emits change-conflict-finding Records with severity BLOCKER | HIGH | LOW, owner = App Architecture Team.

Visibility surfaces (v1.5 UI). Each canonical application Record gets an "In-Flight Changes" panel listing every application-change-proposal currently linked via proposes-change-to. Each proposal Record gets a "Cross-Project Conflicts" panel listing every change-conflict-finding it's implicated in. App Architecture Team gets a portfolio dashboard rolling up conflicts across every Application they own. v1 ships the data + API; v1.5 ships the rich UI panels.

Section 23 · Package Recommendations

Probes never auto-install packages. They recommend, you decide. Architect Sales

Packages mutate SpecTypes, Workflows, and permissions — too high-stakes for autonomous action. Instead, probes emit package-recommendation Records that surface in /admin/package-recommendations/ for architect-gated install.

Three detection points

1. Probe Analyzer Phase D

Plan-generator sees a SystemKind=DATABASE_PLATFORM, productKey=snowflake registered with no ea-data-snowflake-extension installed → emits recommendation referencing the missing package.

2. Post-probe gap analyzer

Gap-findings cluster around a missing SpecType (e.g., ≥ 10 services need data-classification Records but the SpecType isn't installed) → recommends ea-data-classification-v1.

3. Hierarchy Synthesizer

Unmapped tech whose canonical TechTaxonomyMap could be supplied by a known package → emits recommendation alongside (not instead of) the unmapped-technology-finding.

Anti-spam guarantee

@@unique([tenantId, recommendedPackageKey]) while status=OPEN. Re-triggers append to the existing recommendation's triggeringProbeRunIds[] array rather than inserting a duplicate row. A single recommendation can cite 20 probe runs without polluting the admin queue.

Install lifecycle

Recommendation lands in /admin/package-recommendations/ + the EA Review Queue
Architect inspects (gap description, expected findings closed, dependencies, risk notes)
Picks version, confirms; install runs through standard lib/platform/package-installer.ts
Originating gap-findings auto-resolve on next resolver run
Recommendation flips to INSTALLED with the install ChangeSet ID for audit

Section 24 · Resolver Inferencer

Starter resolvers inferred from observed graph topology. Architect LLM

Hand-authoring Context Resolver DSL is precise but expensive. The Resolver Inferencer drafts starter resolvers automatically by observing the LinkType usage patterns in a working tenant — ea-package-demo3 in v1. A mature, well-linked tenant graph encodes its own best practices; the inferencer extracts them mechanically.

Algorithm

function inferResolverForAnchor(anchorSpecTypeKey, tenantGraph) {
  candidates = top-20 most-linked Records of SpecType anchorSpecTypeKey
  for each candidate:
    walk OUTBOUND Links → tally (LinkType, target SpecType) pairs
    walk INBOUND  Links → tally same
    walk 2-hop transitive for governance LinkTypes
  rank facets by frequency-across-candidates
  set required=true for facets present on ≥80% of candidates
  set completenessRule="AT_LEAST_ONE_PER_INPUT" for fan-out facets
  order facets by dependency (foundation first — team, program, technology)
  return draft ContextResolver DSL JSON
}

v1 — build-time only

A CI job runs scripts/generate-starter-resolvers.ts against a freshly-rehydrated ea-package-demo3 tenant on every release branch. Output is committed to packages/ea-foundation-v-next/resolvers/. Tenants get the starter resolvers by installing the package; they edit per-tenant in the standard record editor afterwards.

v2 — tenant-runnable. Same engine, exposed under /admin/context-resolvers/inferencer. A tenant outgrowing the platform-default resolvers can re-run the inferencer against their own graph topology to draft tenant-customised resolvers. Compared against the platform starter; deltas surfaced for architect review.

Section 25 · Multi-Target Connectors

One `ConnectorInstance`, N targets. Native fan-out at probe time. Architect

The original ConnectorInstance design assumed "one Slack workspace, one Jira instance, one GitHub org" — its @@unique([tenantId, connectorKey]) meant exactly one instance per tenant per connector. That breaks for "scan all 5 AWS accounts under our Org" or "introspect all 30 GitHub orgs we own." The ProbeConnectionTarget table solves this without changing the connector model.

ProbeConnectionTarget — the multi-target row

ProbeConnectionTarget
  id                String   // PK
  tenantId          String   // for the multi-tenancy guard
  toolConfigurationId String // FK → ToolConfiguration
  targetKey         String   // tenant-stable id, e.g. "aws-prod-848111426925"
  targetType        String   // "AWS_ACCOUNT" | "GITHUB_ORG" | "SNOWFLAKE_WAREHOUSE" | …
  externalId        String?  // native id at the source
  displayName       String
  environment       String?  // "prod" | "stage" | "dev" — null = unspecified
  region            String?
  tier              String?  // "tier-0" | "tier-1" | "tier-2"
  connectionConfig  Json     // per-target credential overrides (secretRef + scope hints)
  isEnabled         Boolean  // per-target enable/disable without deleting

  @@unique([tenantId, toolConfigurationId, targetKey])

Per-target credential override

Different AWS accounts often need different cross-account role ARNs; different GitHub orgs may require different installation tokens. The probe runtime resolves credentials in this order:

Target-level ProbeConnectionTarget.connectionConfig.secretRef (most specific)
ToolConfiguration scope-resolved connectionConfigs[connectorKey] per ENVIRONMENT → TEAM → PROJECT → PROGRAM → TENANT walk
ConnectorInstance.configuration tenant default (least specific)

Fan-out at probe execution time

A single ProbeRun creates one parent WorkflowJob + N child WorkflowJobs (one per ProbeConnectionTarget), all sharing one changeSetId for audit / rollback. Across-target stages run in parallel; intra-target stages preserve their declared order per §5. Hard cap of 100 concurrent child jobs per tenant (configurable via OrganizationProbeProfile.maxConcurrentTargets).

Auto-discovery of targets. Some probes (AWS Organizations probe, GitHub Enterprise probe) discover their own targets — they list all member accounts / orgs and emit ProbeConnectionTarget rows via the probe.upsertConnectionTarget operation. A follow-up probe (the actual scanner) then fans out across those targets on its next run.

Section 26 · EA Stream vs Delivery Stream Routing

A Record's stream is derived, not stored. Architect

Routing to EA Stream vs Delivery Stream is automatic the moment a Record's ownership is resolved. Existing platform permission logic (lib/auth/permissions/ea-delivery-separation.ts) keys off record.ownerTeam.program.programType — probes inherit it for free, no new column needed.

The derivation rule

record.stream = record.ownerTeam.program.programType
              ∈ { EA, GOVERNANCE, PLATFORM }  → "EA Stream"
              ∈ { DELIVERY, SHARED }            → "Delivery Stream"

Five-step ownership resolution (§3.2)

For every Record a probe is about to upsert, the resolver walks this pipeline in order; stops at the first match. Step 5 (AI inference) under threshold forces the Record to PENDING_DISCOVERY regardless of probe autoPublish.

#	Source	Example	Confidence
1	Explicit signal from the probed system	GitHub `CODEOWNERS` file maps to a tenant-known team; AWS resource tag `arc:ownerTeam=payments`	1.00
2	`DeliveryTeamType.specTypeKeys[]` lookup	SpecType `service` claimed by exactly one DTT in the tenant	0.95
3	`DeliveryTeamType.govLayerLevel` routing	L3 for App/Integration, L6 for Data, L7 for Infra/Cloud, L8 Security, L9 DevOps	0.85
4	`ProbeDefinition.defaultOwnerTeamId` fallback	Configured at probe registration time	0.75
5	AI inference (last resort)	Calls `record-owner-inference` AIPromptTemplate with tenant team list	model-reported

Conflict case — shared infrastructure

A Kafka cluster used by both Delivery and EA streams: the probe writes one Record owned by the Platform team (L7/L9 DTT) and emits cross-stream Links (used-by) to the consuming Delivery teams. No duplication, no ambiguity about which Record is authoritative.

Section 27 · v1 scope vs v1.5+ deferrals

What ships in v1, and what's deliberately deferred. Sales Architect

v1 ships

Probe framework + 4 concrete probes (GitHub, AWS Infra, AWS ECS, PostgreSQL)
Probe Analyzer 6-phase wizard + plan generation
Hierarchy Synthesis (4 governance LinkTypes + AI artefact drafting)
AI gateway + confidence-gate + provenance + two-gate completeness
Context Resolution Pipeline + 1 starter resolver (solution-architecture-doc-context)
Publication policy engine with default policies
Library-first doctrine + tooling registry for the 4 v1 probes
Child-workflow decomposition (scope-resolved thresholds)
Governed Entity Pattern for 10 entity types + concurrency analyser
Credentials management (3 of 5 families: STATIC_TOKEN, AWS_ROLE_ASSUMPTION, DATABASE_CONNECTION_STRING)
Intelligence Credits integration (estimate / reserve / settle / release)
Self-Discovery Dogfood Acceptance against the live Arcitopsia stack

Deferred to v1.5 / v2

Concrete probes for everything beyond v1 (GitLab, GCP, Azure, Snowflake, Okta, Datadog, etc.)
OAuth credential family (Confluence Cloud, Notion, Google Workspace)
MULTI_SECRET credential family (GitHub App)
Scheduler worker + drift detector + scheduled re-analysis
Embedding-similarity match strategy (RAG vector store) — v2
Tenant-runnable Context Resolver Inferencer — v2 (build-time only in v1)
Visual DSL editor for resolvers — v2
Stripe payouts API roundtrip for reconciliation — v1.5
Annual committed credit packages with carryover — v2
Razorpay (v1.5) / Chargebee (v2) / Paddle (v3) payment-processor add-ons
Auto-merge of redundant project-architecture Records — v2
Real-time credit consumption websocket dashboard — v2

Section 28 · Glossary

Vocabulary for AI LLM grounding + onboarding. LLM Architect

A canonical glossary keyed to the runtime types. When ground-truthing an LLM about Arcitopsia, paste this section + the §3 taxonomy table + the §4 analyzer DAG.

Record: A typed instance of a SpecType in a tenant's Knowledge Graph. Carries status (PENDING_DISCOVERY / APPROVED / REJECTED / ARCHIVED), ownerTeamId, programId, projectId, and arbitrary data JSON.
SpecType: The schema definition for a class of Record (e.g. service, database, architecture-standard). Versioned; tenant-customisable.
Link: A typed edge between two Records. Has source SpecType, target SpecType, and a LinkType (e.g. uses-technology, governed-by).
LinkType: The schema for a class of Link. Defines from/to SpecType constraints, cardinality, and traversal directionality.
ProbeDefinition: A scanner spec: category, workflowId, target selector, publication policy, default owner team, list of ProbeStageDefinition sub-stages.
ProbeRun: One execution of a ProbeDefinition. Wraps a parent WorkflowJob + fans out into N child WorkflowJobs (one per ProbeConnectionTarget). Shares one changeSetId for audit / rollback.
ProbeStageRun: One sub-stage execution of a ProbeRun against a specific target. Strict intra-target ordering; cross-target parallel.
ProbeConnectionTarget: A specific scan target under a ProbeDefinition (e.g. one AWS account, one GitHub org). Carries per-target credentials + scope hints.
SystemRegistration: Probe-Analyzer wizard Phase A output. Declares "we have this kind of system" — kind ∈ {IDP, HR_SYSTEM, CLOUD_ORG, VCS_ORG, CI_CD_PLATFORM, OBSERVABILITY, DOC_REPO, DATABASE_PLATFORM, MESSAGING, DATA_WAREHOUSE, GRC_TOOL, FINANCE_SYSTEM, MANUAL_INPUT}.
FactCategoryMapping: Phase B output. Maps a FactCategory (e.g. ORG_HIERARCHY, CLOUD_ACCOUNT_INVENTORY) to a primary source + sourcing decision (PROBED / MANUAL / AI_INFERRED / DEFERRED).
ProbeExecutionPlan: Phase D output. Versioned, supersedeable DAG: stages → items. Pure function of inputs — re-runnable deterministically.
Hierarchy Synthesis: The post-Stage-1 probe that walks the EA chain upward from each newly-discovered Delivery-Stream Record, auto-creating missing Teams + drafting missing EA artefacts as PENDING_DISCOVERY. Idempotent.
Context Resolver: A Record of SpecType context-resolver. Body is a DSL declaring an anchor SpecType + multi-hop facets + a completeness gate. Used by AI workflows to fetch grounded context.
Completeness Gate: The mechanism by which a resolver refuses to feed an LLM call with incomplete context. Modes: BLOCK_GENERATION, DEGRADE, WARN.
Two-Gate Injection: The pair of gates that protect against silent AI degradation: (1) Resolver Completeness — facets fetched; (2) Injection Completeness — mandatory facets reach the rendered prompt text.
Governed Entity: Any SpecType that follows the canonical-vs-change separation: one EA-owned canonical Record + N implementation snapshots + M project change proposals. Applies to ten entity types in v1.
EA Stream vs Delivery Stream: Streams are derived, not stored: record.stream = record.ownerTeam.program.programType. EA / GOVERNANCE / PLATFORM programs → EA Stream; DELIVERY / SHARED programs → Delivery Stream. Governs visibility + approval rules.
Direction of Authority: The set of rules (§13) that resolve probes-vs-architects conflicts: probes never silently overwrite architect decisions; architects never silently override discovered reality; both flag divergence as findings.
Library-First Doctrine: The rule: every stage that can use a deterministic library MUST use one. LLM is for semantic judgment, not structural extraction. Validated by package validator + plan-review library-coverage ratio.
Confidence Gate: lib/ai/confidence-gate.ts — the single AI gateway. Looks up template, injects context, calls model, validates output schema, computes confidence, enforces threshold + fallback, persists AIUsageLog with full provenance.
Provenance: Record.aiProvenance + Record.contextRecordIds[] + Record.aiInvocationIds[] + AIUsageLog entries. Every AI-touched Record traces back to its model, prompt template, confidence, source Records, and pinned versions at generation time.
Credit Ledger: The PSP-agnostic CreditTransaction table that records every credit movement (DEDUCTION / TOPUP / MONTHLY_GRANT / EXPIRY / REFUND). Idempotent via (executionId, type) unique index. Never deleted.
Reserve / Settle / Release: The credit lifecycle: reserve the worst-case at Phase E (hold against balance); settle the actual after complexity computation (deduct, release remainder); release the full reservation if execution failed.
Complexity Multiplier: 10 runtime dimensions × weights, sum normalised, then log₂(score + 1) × 1.44 dampening, clamped to [contract.min, contract.max]. Captures real cost variation without runaway bills on legitimately large scans.
PSP-Agnostic Invariant: Per PAYG §9.6: the credit ledger (credit-engine.ts, credit-service.ts, complexity-calculator.ts) contains zero Stripe-specific types/IDs/imports. Stripe is isolated to stripe-credits.ts + webhook handler. Future PSPs (Razorpay v1.5, Chargebee v2, Paddle v3) drop in without touching the ledger.
Two-Layer Measurement: Per PAYG §7.0: internal telemetry (iArchitron-only — WorkflowExecutionTelemetry with compute units, tokens, LLM cost, gross margin) is strictly separated from customer-facing credits (CreditTransaction with fixed activity-type base × complexity × tier).

Time-to-Knowledge-Graphis the EA bottleneck. CXO Architect

Four concrete probes ship in v1. CXO Sales

GitHub Source-Code Probe

AWS Infrastructure Probe

AWS ECS / Fargate Probe

PostgreSQL DataProbe Sensitive

Seven first-classprobe categories. Architect LLM

A six-phase wizard turns "we have lots of tools" into a deterministic execution plan. Architect

Inputs to plan generation

Outputs

Canonical probe DAG

Each probe is itself a layered scan, not one LLM blast. Architect LLM

Worked example — PostgreSQL DataProbe

Per-stage failure modes

LLM tokens are economics, not magic. Most extraction is library work. Architect CXO

Tooling catalogue (v1 bundled)

When LLM is justified (v1)

Auto-create the EA chain upstream of every discovered Record. Architect

The four governance LinkTypes

Auto-publish or human-review is policy-driven, not a single boolean. Architect

Worked policy — data-probe-default

Fan out, consolidate. The same pattern powers all batched AI work. Architect

Decomposition thresholds — scope-resolved

Generated artefacts are grounded in real Records. Nothing else. Architect LLM

Resolver DSL — solution-architecture-doc-context (excerpt)

BLOCK_GENERATION

DEGRADE

WARN

Reverse traceability — every generated Record knows what fed it

Fetched ≠ Injected. Two gates close two silent-failure paths. Architect LLM

Canonical state + version snapshots + project change proposals. One pattern, ten governed types. Architect

Concurrent change detection

The apparent paradox — EA gates generation, but probes write EA Stream. Resolved. Architect

Don't overwrite the customer's existing investment. CXO Sales

Per-item consent options

Install Package

Keep Customer

Hybrid / Merge

Skip

Five credential families. Read-only enforcement. Five-step validation gauntlet. Architect Sales

Five credential families

The five-step save-time gauntlet

Encryption at rest

Ten integration points. One confidence-gate. Full provenance. Architect LLM

Confidence sources (template-declared)

Reserve worst-case → settle actual. 99% gross margin at scale. CXO Sales

Worked example — PostgreSQL DataProbe with full hierarchy synthesis

Worst-case reservation at Phase E

Actual settled (after 10-factor complexity computation with log₂ dampening)

v1 ships when an architect can scan Arcitopsia itself and generate a real architecture doc. CXO Sales Architect

The acceptance test in one paragraph

Ten REST routes. Tenant-scoped. Same auth as the rest of the platform. Architect

Probes, policies, resolvers, and TaxMaps ship as PDL packages. Architect

New PDL component types added for Probe Framework

The human-in-the-loop surface for everything AI drafts. Architect CXO

Available actions per item

Inspect

Edit

Approve

Merge

Reject

Bulk approve by template

EA owns the canonical Application. Projects own change proposals. They link, never duplicate. CXO Architect

The three Application-tier SpecTypes

Cross-project concurrency detection

Probes never auto-install packages. They recommend, you decide. Architect Sales

Three detection points

1. Probe Analyzer Phase D

2. Post-probe gap analyzer

3. Hierarchy Synthesizer

Anti-spam guarantee

Install lifecycle

Starter resolvers inferred from observed graph topology. Architect LLM

Algorithm

v1 — build-time only

One ConnectorInstance, N targets. Native fan-out at probe time. Architect

ProbeConnectionTarget — the multi-target row

Per-target credential override

Fan-out at probe execution time

A Record's stream is derived, not stored. Architect

Time-to-Knowledge-Graph
is the EA bottleneck. CXO Architect

Seven first-class
probe categories. Architect LLM

Worked policy — `data-probe-default`

Resolver DSL — `solution-architecture-doc-context` (excerpt)

One `ConnectorInstance`, N targets. Native fan-out at probe time. Architect