+

Sai Dubbaca's Design MOST RECENT

Edit
Updated 2026-03-17 18:09
OVERALL DESIGN AREAS 1. Publish Workflow Researchers approve content and trigger a publish action that generates a snapshot of changed content with all dependencies, emitted as an event to Stream 1. 2. Event Streaming Layer Stream 1 carries the publish event. Synchronous OCI functions handle metadata reads and dashboard status responses. Asynchronous OCI functions launch OCI Batch jobs. 3. OCI Batch Processing Jobs have task-level dependencies (e.g., validation job must complete before export job). Handles write to Global Distribute V2, TTL/TriG generation, and SPARQL updates to SKG. 4. Global Distribute V2 Stores only the latest published version of data. Receives flagged writes (insert/update/delete). Source of truth for downstream consumers. 5. SKG Export Batch job calls the SKG curation API directly with generated TriG/TTL content. Replication from SKG into EHRC for clinician use. 6. VantageRx / Shadow Testing Periodic package generation aligned to Gen 1 release cadence. Automated diff comparison between Gen 1 and Gen 2 artifacts. Flip to Gen 2 as source after confidence is established. 7. Service Ownership / Data Isolation Each database owned by exactly one service. No direct cross-service database access. All interaction via messages or service APIs. 8. Error Handling / Dead Letter Queue Failed batch launches queue to DLQ. Operators monitor stuck workflows. Validation failures surfaced to researcher dashboard before and during publish. 9. Gen 1 / Gen 2 Transition Tables managed per-generation with defined source-of-truth ownership. Merge via replication/batch as Gen 2 coverage expands. OUTSTANDING ITEMS 1. Payload size limit -- 1MB OCI Streams cap is insufficient for PEL files (~280KB each) and SNOMED-scale content. Options: object store reference pattern, or platform ticket to raise the limit. Decision pending. 2. Cascade change capture -- When a new NDC causes downstream obsolescence of other records (e.g., drug names), the mechanism for capturing and pushing those secondary changes to SKG is not defined. 3. Write-back / versioning model -- Agreed no write-back from pipeline to Global Research. Versioning approach (V1/V2 publish versions visible in research UI) needs to be fleshed out. 4. Research service snapshot responsibility -- Exactly what data gets included in the publish event per clinical area (which tables, which dependencies) needs definition and buy-in from Grace and Scott. 5. Package inventory -- Full list of output packages requiring generation and shadow testing (beyond VantageRx and CMCD) needs to be pulled from Drew/Brian Retz. 6. TTL/TriG incremental generation -- Current scripts do full regeneration. Incremental delta generation needed for near-real-time model. George's parallel effort to automate this -- Michael has a session scheduled. 7. Permission model -- Who can trigger a publish needs to be defined and implemented in the Research Web App layer. 8. Use case stress tests -- Michael and Chandra to run specific clinical area scenarios through the design to validate no overlooked edge cases before PDRC review. 9. SPA vs. traditional web app -- Technology decision for Research Web App UI not finalized. 10. PDRC baseline -- Design review and committee sign-off pending completion of the above.