Healthcare data is messy. Between HL7 v2 messages, CDA documents, proprietary EHR exports, and CSV flat files from legacy systems, building a reliable data pipeline in healthcare feels like assembling furniture from five different stores with no shared instruction manual.
Enter FHIR (Fast Healthcare Interoperability Resources). Over the past two years, I have been building FHIR-native data pipelines for clinical and regulatory use cases in Switzerland, and it has fundamentally changed how I think about healthcare data engineering.
Why FHIR Changes the Game
Traditional healthcare data pipelines spend most of their complexity budget on translation: converting between formats, mapping terminologies, and reconciling schemas that were never designed to talk to each other. FHIR flips this by providing a universal resource model that both humans and machines can understand.
The best data pipeline is one where you spend more time on business logic than on format translation.
FHIR resources like Patient, Observation, MedicationRequest, and Encounter map directly to clinical concepts. This means your pipeline code reads like a description of the domain, not a maze of transformation rules.
Architecture: A FHIR-Native Pipeline
Here is the high-level architecture I have been using for FHIR-based pipelines:
- Ingestion Layer — Accept data in any format (HL7 v2, CSV, CDA) and convert to FHIR resources using HAPI FHIR or custom mappers.
- Validation Layer — Validate resources against FHIR profiles (StructureDefinitions) specific to the Swiss healthcare context.
- Transformation Layer — Apply business rules, enrich with terminology services (SNOMED CT, LOINC), and link related resources into Bundles.
- Storage Layer — Persist to a FHIR server (like HAPI JPA) or a data lake with Parquet files partitioned by resource type.
- Serving Layer — Expose data via FHIR REST API or feed downstream analytics systems.
The Validation Challenge
The hardest part is not building the pipeline — it is validation. Swiss healthcare has its own FHIR profiles maintained by FHIR CH, and ensuring every resource conforms to these profiles while remaining performant requires careful design.
We solved this by implementing a two-pass validation strategy:
- Fast pass: Structural validation using JSON Schema — catches 80% of issues in milliseconds.
- Deep pass: Full profile validation using the FHIR validator — catches terminology bindings, invariants, and cross-resource constraints.
// Simplified two-pass validation
async function validateResource(resource) {
// Fast structural check
const structuralErrors = jsonSchemaValidate(resource);
if (structuralErrors.length > 0) {
return { valid: false, errors: structuralErrors, pass: 'structural' };
}
// Deep profile validation (only if structural passes)
const profileErrors = await fhirValidator.validate(
resource,
getSwissProfile(resource.resourceType)
);
return { valid: profileErrors.length === 0, errors: profileErrors, pass: 'profile' };
}
Lessons Learned
After two years of building FHIR pipelines, here are the key takeaways:
- Start with profiles, not code. Define your FHIR profiles first. They are your contract and your documentation.
- Invest in terminology services. A good terminology server (like Ontoserver) pays for itself in reduced mapping errors.
- Bundle wisely. FHIR Bundles are powerful but can become unwieldy. Keep them focused on a single clinical event or workflow.
- Monitor data quality continuously. Healthcare data quality degrades silently. Build dashboards that track validation pass rates, terminology coverage, and resource completeness.
What is Next
The Swiss healthcare ecosystem is moving rapidly toward FHIR adoption, driven by the EPD (Electronic Patient Dossier) mandate. I am excited about the convergence of FHIR with modern data engineering tools — imagine dbt models that speak FHIR, or Spark jobs that process Bundles natively.
If you are building healthcare data infrastructure and want to chat about FHIR pipelines, feel free to reach out. I am always happy to exchange ideas with fellow data engineers navigating the healthcare interoperability landscape.