Post FHIR Datathon impressions
April 6, 2017
On March 26, HL7 and AMIA teamed up to put on a new type of event, a first ever FHIR Datathon. As I mentioned in my last post, it was sort of like a FHIR Connectathon, but the target audience was little different. Instead of developers working on their applications’ support for FHIR, this event was intended for clinical informaticists looking to learn about how FHIR could be used to perform analytics.
The concepts and tools we covered at the event are quite useful and should be of interest to a lot of people, so I thought I would include a synopsis of them here.
SyntheticMass – A “realistic” synthetic patient clinical data set
Our friends at the MITRE Corporation have been working on a synthetic patient population simulator. If you have worked in healthcare IT, you know that one of the challenges to testing out a proof of concept is having a set of test data to play with. This challenge is complicated by very important and necessary HIPAA rules around data use, security, and privacy. The creation of a realistic synthetic data set that is compiled through the use of specially tuned algorithms is one approach that shows a lot of promise. To address the challenge, MITRE has created SyntheticMass, an open-source, simulated Health Information Exchange (HIE) populated with one million realistic “synthetic residents” of Massachusetts. The data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government.
SyntheticMass even provides a FHIR-based API layer that can be used to access the data stored within it. We used this FHIR server extensively during the Datathon. The server endpoint can be found here.
ClinFHIR–A tool to play with FHIR
One of the challenges in working with any standard is accessibility for people who are just getting started or are not developers. While FHIR is written to be easily understood by the implementer, there still is a bit to learn to get started, especially for those that don’t program. This was certainly a challenge we had going in to the Datathon. A good portion of the attendees had never used FHIR and many had never used RESTful or JSON, at least not knowingly.
Fortunately, we had a tool that made FHIR much more accessible. clinFHIR, which is developed by a great industry colleague, David Hay, is an educational tool that allows people to create or search for FHIR based resources and link them to tell a clinical story. clinFHIR is meant to help those not currently familiar with FHIR to understand what it is and how it can be used. clinFHIR is a front-end tool that can be used to connect to any number of FHIR servers to explore and even store new data.
During the Datathon we pointed clinFHIR toward the SyntheticMass FHIR server and enabled participants to start exploring the data with a point and click interface. I highly recommend this tool, particularly if you are just starting out with FHIR and would like to learn how to explore and build resources.
FHIR in support of analytics
One of the challenges we ran in to using FHIR for the purposes of analytics is the querying capability. FHIR provides a great method to access, add, update, and perform other actions on data that was previously locked away in silos. However, as we started to learn during the Datathon, there are some limitations as it comes to supporting complex query logic you need for data analytics. FHIR largely provides querying capabilities based on a single type of resource at a time. There are ways to query for resources that have a referential link to other resources that fit certain requirements, but this can only be taken so far.
For example, I can query for patients taking a particular MedicationRequest, say Prednisone, like so: https://syntheticmass.mitre.org/fhir/Patient?_has:MedicationRequest:patient:code=312617
However, if I only want to retrieve patients prescribed that medication in the last two years, I cannot simply use this query:
What this query will give me is a set of patients having a MedicationRequest with the specified code or a MedicationRequest that has a dateWritten after the specified date. What I really want is the set of patients that have a MedicationRequest with that code and the dateWritten is after the specified date. There does not seem to be a simple way of doing this in a single query for patients. You have to take a different approach. In this instance you would have to query the MedicationRequest resources directly and include the patients. https://syntheticmass.mitre.org/fhir/MedicationRequest?code=312617&datewritten=gt2015-03-26&_include=MedicationRequest:patient
(Note: At the time of this post, SyntheticMass supported version 1.8 of FHIR STU3. In the final release of STU3, dateWritten was changed to authoredOn)
It may be possible to get around the issue in this simple example, but you can see where that might not be possible with more complex requirements. Such an example is where you want to find a cohort of patients who have been diagnosed with asthma, are on a corticosteroid, and have an FEV1/FVC observation result less than 80%. In such cases, it would be necessary to gather the data through a set of queries, one for each requirement, and then locally perform some additional processing to find the intersection of data that meet all the requirements.
More complex queries are possible in FHIR through the use of the _filter parameter (https://www.hl7.org/fhir/search_filter.html), however very few systems robustly support this currently. Taken to its furthest extent, complex query logic is somewhat beyond the scope of FHIR and there are other approaches being developed. One other approach I list down below. However, as we continue to still explore the use of FHIR for data analytics we will take note of our lessons learned and continue to improve the standard.
Apache Drill–direct clinical BI on FHIR
Another approach that we took at the Datathon to address the complex query logic challenge was to employ more traditional analytics tools to analyze a native FHIR resource data store. We used the NoSQL to SQL engine Apache Drill to directly query FHIR formatted resources stored in NoSQL data stores. The technique was introduced by Chris Grenz and you can read about it here. This approach appears to be quite powerful and allows one to perform common SQL-based queries directly into a data store that have native FHIR resources. With this technique you could either analyze the data directly in your FHIR server or analyze a different data store where you employ a synchronization method targeting certain resources of interest using the history method (https://www.hl7.org/fhir/http.html#history).
In all, I would say that the Datathon was rather successful. We received a lot of great feedback and quite a few of the participants said that they went in with very little knowledge of FHIR but had learned a great amount during the event. I have to say that I learned a bit too. Our efforts do not end, however. I look forward to other events like this in the future. Perhaps HL7 and AMIA will have a chance to collaborate on something like it. HL7 is also putting together a new FHIR Connectathon track specifically for Data Analytics. If all goes according to plan, the first one will be at the next HL7 meeting in May in Madrid. See you in Spain!
-Corey Spears, Director of Healthcare Interoperability Standards for Healthcare
- North America