Proposal: All Tables to SKG
Updated 2026-03-09 16:19
What is our goal?
Speed or quality? Prioritize each scale of 1-10
Multiple Databases: GD (GR + GE + MedSupplies)
Product Consistency: Multi model distribution files, RDF, etc
MODELING DATA:
RDF does not get tool tables Xref tables
Table and Column Names (Semantic Renames)
NOT FOR DISTRIBUTE
Tables: global_drug_name_distributed
ID's: GDN:drug_name_id (Change over time)
MODEL OPTIMIZATION (PERFORMANCE)
Optimized for read not write
Highly normalized WRITE, denorm, READ
USABILITY:
Full: highly complex
All Tables Too many hops (performance, SKG 4 optional only)
PRODUCT VIEWS
Replace NDC identifiers for certain countries (SNOMED/UK, Country Specific Code ES etc.)
Certain countries get certain data, others do not (filtered)
All tables , send country A to country etc
Licensing of Data divides what can be in a specific country
Multiple Databases: GD (GR + GE + MedSupplies)
Product Consistency: Multi model distribution files, RDF, etc
MODELING DATA:
RDF does not get tool tables Xref tables
Table and Column Names (Semantic Renames)
NOT FOR DISTRIBUTE
Tables: global_drug_name_distributed
ID's: GDN:drug_name_id (Change over time)
MODEL OPTIMIZATION (PERFORMANCE)
Optimized for read not write
Highly normalized WRITE, denorm, READ
USABILITY:
Full: highly complex
All Tables Too many hops (performance, SKG 4 optional only)
PRODUCT VIEWS
Replace NDC identifiers for certain countries (SNOMED/UK, Country Specific Code ES etc.)
Certain countries get certain data, others do not (filtered)
All tables , send country A to country etc
Licensing of Data divides what can be in a specific country
*** ask AI
What Has to Change Going from Relational to RDF
The question "why not just convert everything" is reasonable on the surface. The answer is that RDF is not a storage format -- it's a modeling paradigm. The conversion isn't mechanical; it requires rethinking how data is structured, identified, and queried.
1. Identity: surrogate keys become IRIs
Relational models use surrogate keys (integers, sequences) that are meaningless outside the database. RDF requires globally unique, dereferenceable identifiers. Every entity needs an IRI. This means designing a URI scheme, deciding what constitutes a stable identity, and handling cases where two tables represent the same real-world entity. This is non-trivial for pharmaceutical data where the same drug concept may exist across 6 normalized tables.
2. Normalization logic inverts
3NF/BCNF normalization eliminates redundancy by splitting data across tables with foreign keys. RDF flattens this into subject-predicate-object triples. A join you write once in SQL becomes a property traversal in SPARQL. The modeling decision shifts from "how do I avoid duplication" to "what are the semantic relationships between entities." These are different design questions with different answers.
3. Schema becomes ontology
A relational schema defines structure: column names, types, constraints. An RDF ontology defines meaning: classes, properties, domain/range restrictions, subclass hierarchies. Columns with the same data type but different semantics (e.g., start_date on a prescription vs. start_date on a clinical trial) must be explicitly differentiated as distinct properties or mapped to shared ones intentionally.
4. Null handling changes completely
Relational nulls represent unknown or inapplicable values, handled by IS NULL checks. RDF has no nulls -- absence of a triple means absence of a fact. This is the Open World Assumption vs. Closed World Assumption distinction. Queries that rely on "this field is null therefore this condition applies" will not translate correctly without explicit redesign.
5. Query semantics shift
SQL is set-based and operates under CWA: if it's not in the database, it's false. SPARQL operates under OWA: absence of a fact doesn't mean falsity. Queries that do negative inference in SQL (NOT EXISTS, outer joins checking for absence) behave differently in SPARQL and require redesign, not just translation.
6. Many-to-many relationships lose their join tables
Associative/bridge tables (drug-indication mappings, drug-drug interactions) exist in relational models as implementation artifacts. In RDF they become reified relationships or named graphs, depending on whether the relationship itself has properties. This is often where the most semantic richness lives in pharmaceutical data, and it requires the most careful modeling.
7. Inference and reasoners are now in scope
Relational systems are explicitly declarative: what you put in is what you get out. RDF + OWL enables inference: a reasoner can derive facts not explicitly stored. This is a capability, but also a risk surface. If you convert tables without understanding what inferences become possible, you may get results that are technically correct by the ontology but wrong for your application.
The honest summary for your colleagues: conversion is doable table-by-table with R2RML, but "converting tables" produces RDF-shaped relational data, not a knowledge graph. The value of RDF comes from the remodeling work, not the format change.