Super Integration Fighter III
There are only three choices when connecting with other systems. Pick your battles wisely.
A Brief Preface from the Author
Hi reader. This is the 1st time we’ve interrupted your reading recently, but 98% of our readers don’t follow up to look at Flexpa. Many think they’ll have a conversation later, but then forget. Today we ask you to protect Health API Guy. All we ask is you reach out to the Flexpa team, or tell your friends at Medicare brokers, fintech apps, healthcare payers, and digital health providers to connect, to secure Health API Guy’s ability to continue to write these articles. We even appreciate you just giving feedback on our developer docs, signing up to play around with a free developer portal account, or trying out the MyFlexpa application to see your claims data and help us improve our product (especially if you’re on a Medicare, Medicaid, or ACA plan or one of the plans that have exposed commercial plan access). I ask you, humbly: Please don’t scroll away. We’ve built a variety of awesome API tools for claims that I’m sure you and your team can benefit from (check out the use cases here or transparent pricing here). If you are already one of the existing Flexpa customers or developers, we warmly thank you.
At a bare minimum, head on over and subscribe to the Flexpa Substack - we’ll be rebooting it shortly with lots of content about claims, payers, APIs, and more.
The best things come in threes. The Great Pyramids. The Wise Men. The rule of 3 in writing. The copious number of trilogies that Hollywood loves to pump out. Blind mice. The Triple Crown. Neapolitan ice cream. The three trust paradigms. The three HIPAA roles. You get the picture.
Today, we explore another healthcare triumvirate - the primary paths (at the most generalized, 10,000-foot view) you can use to integrate with healthcare organizations. We first hinted at these in How to Win Friends and Integrate Systems (which you might consider The Hobbit to this epic Lord of the Rings trio), but we’ll dig a bit deeper into the pros and cons of the approaches. Hell, we’ll even throw in some overlay of how AI might affect each path (AI, so hot right now) and perhaps even mix in a spicy meme or two.
Three integration methods walk into a bar. The bartender says, “What will you be having today?”
Sanctioned Interfaces takes a look at the menu, sees only Coors Light, and says, “Coors Light.”
The bartender looks over at Screenscraper, who had just grabbed a bottle of Patron from the top shelf.
Meanwhile, the CSO bouncer is tossing Direct-To-Database out the door, who “found” some Pappy Van Winkle while rummaging in the basement.
Our allegoric prompt is such - you, the intrepid application, sit in the land of the customer staring at the software system, a byzantine fortress containing riches of data. How best to do so is the adventure ahead.
In the broadest simplification, every piece of software can be broken down into three atomic units in regards to data stitched together with code implementing a myriad of business logic and rules:
Databases (system internal) - the places where you store data
User interfaces (human-to-system) - the places where you show data to your users
System interfaces (system-to-system) - the inputs and outputs where you allow data to flow in/out from other systems (non-human users)
If you want to manipulate data from an external ancillary system, you must interact with one of these three components as your primary mechanism for data exchange. Thus we get the three ways to integrate, each with its own advantages and drawbacks:
Native database driver - Software connects to an underlying database without an intermediary layer. Colloquially known as direct-to-database.
Robotic process automation (RPA) - Software pretends to be a user and emulates actions through existing user interfaces. Colloquially known as screen-scraping.
Sanctioned Interfaces - Software uses a system interface to read or write data. The formats and standards vary considerably here.
Native database driver
If a database is where all the data is stored, why not go to the source? The most simplistic way to read or write as a third-party application is just to perform those operations against the database. This method is how the main software does it - why complicate things?
Indeed, there’s some real firepower and appeal to this approach. The possibilities for what you can do are infinite, limited only by what the database itself can do - any data element is hypothetically available to you, just as it would be to the main software. For groups that don’t need writes but need read access to all the data, like analytics or population health use cases, direct-to-database saves a ton of headaches of backloads and conversions and piecemeal data synchronization, requiring one request to achieve this end state rather than hundreds of thousands or more.
However, the corresponding challenges can be equally daunting. Every database has its own schema, a blueprint that outlines the structure, organization, and relationship between the various components of a database. It defines how data is stored, organized, and accessed in a database. Thus, the first obstacle for direct-to-database connections is getting to the level of knowledge to do what you need (and not muck things up). This task can be hard, as the internal database schema may be poorly documented (or unavailable, given software vendors often view schema as proprietary), forcing developers to intuit or guess at definitions or data relationships. Furthermore, it’s not a given that the schema is even the same between different customers of the same software application due to build customizations, requiring interpretation and lift that may limit the scaling speed, especially for complex software applications.
Compounding this issue is the fact that software changes over time, and the schema changes with it. These changes make direct-to-database connections brittle, and unexpected software patches or vendor upgrades can totally rewrite the beautiful schema you painstakingly reverse-engineered over weeks or months unless you've developed robust monitoring and alerting to inform you when there's an anomaly in your data collection. It's not uncommon for customers to forget to inform startups about these updates, causing system downtime or data corruption.
Direct-to-database has interesting variability in terms of latency. If replicating the production database or running an extensive series of integrity checks when inserting new data, the processing of the reads/writes may be slow. Given major middleware vendors are trumpeting improved write speed of 30 seconds, it’s clearly not always as real-time as one would hope or need.
These obstacles are all gravy compared to the final boss: the access problem. A direct database connection is an extremely high-trust endeavor. Any Chief Security Officer (CSO) at a customer organization worth their salt wants to minimize exposure by allowing applications to do only what they need to and is not super psyched about giving out unlimited power willy-nilly. It certainly can be possible to negotiate around that (silly CSO, “least privilege" is for nerds), but your application usually needs to have extremely high value or importance to that organization, and your security posture should be rock solid to build that trust. In healthcare, this is especially true, as risk aversion is a logical and ubiquitous tactic thanks to the HIPAA Security Rule. Some applications can limit the perceived risk and make direct-to-database more palatable by connecting to a read-only copy or an analytics database. However, this only works if the workflow does not require write access.
What kind of sucks: customer approval is only 50% of the access problem - your customer’s permission may not be enough, in that control of the underlying database isn’t always in their hands solely. You may need the approval of the vendor, and, unfortunately, some vendors (like larger EHRs such as Epic or Cerner) have worked to remove all native database connections to prevent the risk of downtime or data corruption. Worse, they may blackball your organization for even trying this path against their operational database, which ultimately can be a company killer (see Sansoro and Multiscale Health).
In summary, direct-to-database is simultaneously high upside but also high risk and complexity. It can be viable for smaller organizations using systems with simpler schemas and vendors with low command and control of their database (such as dental EHRs or mid-range outpatient EHRs like eClinicalworks or Nextgen). It requires yet another trio - a highly competent team that can engender a supremely high degree of trust and a tremendous amount of customer organizational buy-in. If you pursue this path, you will encounter diverse technologies and protocols like:
Open Database Connectivity (ODBC) - a standard interface allowing applications to access various database management systems, including Microsoft SQL Server, Oracle, and MySQL. ODBC is a generic API that is not optimized for any particular database and can be used with various programming languages, including C++, Java, and Python.
Java Database Connectivity (JDBC) - a standard interface for Java applications to interact with a wide range of database management systems. It has a distinct relational database vibe to its syntax (likely stemming from its history).
Native database APIs - Many databases provide their own APIs for accessing data. These APIs are typically more efficient than generic APIs like ODBC or JDBC, as they are optimized for a specific database. Examples include the Oracle Call Interface (OCI) and the Microsoft SQL Server Native Client.
The direct-to-database strategy seems most popular for shitty, long-tail EHRs (often specialty specific). These typically have fewer sanctioned interfaces available, have low command and control to stop that sort of integration, and have simpler data constructs. Thus, middleware integration companies Nexhealth have made in-roads with dental EHRs using a direct-to-database strategy. Similarly, Healthjump has found success using direct-to-database as their primary integration method (supplemented by APIs) with AdvancedMD, Greenway, and more.
For cloud-based EHRs, this technique is a bit trickier, as the database is more tightly controlled by the vendor rather than the healthcare organization. If the vendor has the database locked away in their cloud instance, finagling that access may be impossible.
Robotic Process Automation
Now listen closely, my dear friend. The screens, they hold the key to the heart and soul of the software. The developer's true intent lies within them, illuminating the path to understanding. When we focus on these screens, we remove the limitations that bind us and allow ourselves to reach new heights of the users themselves. By Jove, if only I had realized this sooner, I could have spared myself the trouble of wasting my time on HL7.
For most people, graphical user interfaces (GUIs), Xerox’s lasting and indelible contribution to modern computing, are the primary way we interact with software. The triumph of GUIs over command-line interfaces occurred in the late 1980s and early 1990s, ushering in a new era through slightly more intuitive and distinctly more visually appealing interaction patterns. Command line interfaces live on, sheltered in the last refuge of developer tooling, but GUI dominates the kingdom of personal computing.
The next integration pathway lies buried within the icons, windows, and menus that developers expose - robotic process automation. Data entry, file manipulation, report generation, and other repetitive tasks can be time-consuming and error-prone when done manually. By using software robots or "bots" that can mimic human actions, organizations can automate those redundant, rules-based chores. Given that there’s nothing more repetitive than copy-pasting patient demographics and other information from a core system to an ancillary application, integration is well-suited as a problem space for these RPA bots. This technique is commonly called “screen-scraping,” as early RPA approaches only extracted data from the screen.
When I was at Epic, it was maybe month one or two where they absolutely drilled it into us as part of our onboarding that screen-scraping was unequivocally bad - the vendors out there trying to use it were essentially war criminals destined for Judy’s version of the Hague. Screen-scraping was theft, damnit, and thieves pay for their sins. For years following this Pavlovian training, any mention or thought prompted visceral reactions - torrential sweats, mental hives, momentary lapses in consciousness.
And yet:
The idea of bots posing as real users and performing actions is not new, but screen-scraping has had a renaissance in the past few years (albeit somewhat less so now with Olive’s fall from grace) as administrative tasks have been automated using Robotic Process Automation (a very sexy glow-up in my opinion).
Like direct-to-database, RPA has a reasonably high ceiling on what you can do. If the software developer created a user interface to show something, by nature, it is available for you to read. If they made a mechanism for a user to perform an action, well, hell, you can do that action too (most likely some sort of write). Sure, there may be some columns in the database that aren’t available like they might be via direct-to-database, but the Venn diagram of “what do I want my application to be able to do” and “what things can a user see or do on different screens” is actually absurdly high.
Another benefit is that the access problem is less thorny. Vendors have fewer levers they can pull to block things here in that they need to allow users to see and do things. They can still put up blockers and evasive maneuvers, like traffic pattern detection (if a user is logged in 24/7, it’s generally a sign of something amiss, and they’re not just some workaholic doctor), IP blocking, two-factor authentication, and other tactics. But it’s more of a cat-and-mouse game than the straightforward lockouts of direct-to-database.
You can also still have the consternation of that CSO, but most software has role-based access control they can configure to create a user with the privileges needed. Screen-scraping is safer than direct-to-database by virtue of the protections developers put into their GUIs, who are generally protecting against users being able to cause massive system downtime or large-scale data corruption (although don’t count out registrar Alice asleep with her elbow accidentally slamming repeatedly on the “New Patient Create” button).
What’s challenging here is akin to the schema problem we mentioned for databases. Software user interfaces change all the time! So the logic you build to do actions can break suddenly and unexpectedly. RPA integration is one of the areas I imagine artificial intelligence could be powerful. Rather than programming a compounding series of if/then statements that are fragile to shifts in CSS or spacing, artificial intelligence with a prompt like “place an order” could hypothetically intuit and adapt even as buttons move or new screens appear.
Screen-scraping can be done against applications of different form factors available to users:
Web applications are, by virtue of being online, built with the structure of HTML and CSS. As such, web scraping tools can easily access the web page's DOM (Document Object Model) and extract the required data, especially with new technologies like Chrome DevTools Protocol and XMLHttpRequest. Many libraries and frameworks are increasingly available for web scraping, such as BeautifulSoup, Scrapy, and Selenium.
Mobile apps often have native APIs that provide access to data and functionality that may not be available through web scraping. Native APIs allow more efficient data extraction and provide access to device-specific data such as location, sensors, or contacts. However, it requires a deeper understanding of the app's code and the specific APIs. Mobile apps are also sometimes only a subset of the functionality available via the web or desktop.
Desktop apps are some of the hardest to scrape effectively. These applications are often designed with security measures to prevent unauthorized access or data extraction. In addition, the structure of desktop apps can be more complex than that of web or mobile apps - frequently, optical character recognition of the screen is the only way to understand what’s happening.
Web applications are the sweet spot for scraping (or at least in healthcare). I haven’t done a ton of this myself, but I’ve talked to many scrappy startups aimed at low-end or specialty EHRs that whip up a quick Chrome extension to get simple integration going with SaaS EHRs that primary care doctors often use, like Athenahealth, DrChrono, or Azalea. One particular use case where web scraping pops up a lot are '“sidecar” assistants to surface care gaps and help with risk adjustment, with groups ranging from Innovaccer and Arcadia to Vim, Confido Health and Diagnoss all offering nearly identical companion applications. However, it’s worth noting that the responsive design of modern web apps sometimes can be a real problem for bots. Desktop apps don’t have this problem as the surface area and actual pixel density is always the same.
Similarly, middleware products like Shift Health and Arrow Health’s Bridge aim to offer developers a more streamlined experience, deploying an RPA-powered sidebar in which applications can embed their solutions.
Unfortunately for healthcare companies wanting to pursue this path more broadly, the dominant paradigm for inpatient EHRs like Epic and Cerner are heavy desktop applications. Actually, getting access to the desktop application can be tricky - your bot needs to run within the customer's network, unlike a website, and Citrix client access is no joke. It’s roughly analogous to sending your bot away to a foreign land and hoping they are doing its job well and has not been waylaid on its adventure. While groups like Olive operate in this space and have seen traction using RPA in this space, it feels rare - most organizations pursue more sanctioned paths.
Sanctioned Interfaces
Applications often don’t want to live in the danger zone. They want to sleep in peace and solace with some (perhaps misguided) belief that they will wake up the following day without war rooms to deal with breaking changes, angry customers screaming about downtime, or fortune-changing blackballing. To have that comfort, they choose the last of our options: sanctioned interfaces, the inputs and outputs the software vendor has specifically created and condoned.
Most of what we call interfacing and interoperability these days falls into this bucket. The formats and methods are all over the place here, from FHIR and APIs to older and more rudimentary techniques like HL7v2, X12, or CSV exports. The key is less on a specific technique and more on the fact that the software vendor says, “This is the way” and provides their blessing and support.
The advantages are straightforward. By walking on the well-paved path, you avoid retribution from vendors and have a clean narrative on the security and access front. Documentation is (hypothetically) better, allowing for easier development. If using a standards-based approach, the integration should be more repeatable. With the advent of APIs and app ecosystems, there’s the possible benefit of partnering closely with software vendors and receiving the benefits of collaboration, marketing, and distribution.
From a security perspective, sanctioned interfaces are typically recommended and endorsed by our friend the CSO. However, for a few (especially with on-premise architecture and a more antiquated security posture), provisioning APIs can seen as more risky since they are available via the public web. The more APIs exposed, the more an organization increases the attack surface of your environment. Various reports indicate healthcare companies have not necessarily proven themselves adept at securing this new technology.
The biggest drawback, though, is the ceiling. Building system-to-system interfaces may be a lower priority for software vendors - they want to serve their primary users first and foremost. Mature EHRs like Epic have hundreds of interfaces and APIs, but farther down the long tail, there often are limited or no available inputs or outputs.
Even when vendors make these interfaces available, the available capability set is still a pale shadow of the possible, allowing reads and writes of just a tiny subset of the total available data. Depending on what you want to do, there might not yet be a safe, sanctioned path. For instance, many EHRs don’t like external applications creating new patients in order to prevent duplicate creation. Similarly, very few allow external applications to create new medication administrations.
Going against the grain here with a hot take, but these are rarely malicious or anti-competitive acts. Instead, it’s a healthcare-specific offshoot of Hanlon's Razor:
Never attribute to malice that which is adequately explained by incompetence or paternalism.
The primary reason EHRs lack APIs is incompetence. Most have barely scratched the surface of the feature set needed to adequately service their primary personas of providers and patients. Very few have any remaining development calories to build for the secondary (or tertiary) persona of external developers. Building system interfaces is, for most, not a proactive product platform strategy and comes in dribs and drabs only after extreme, explicit customer demand. Even for EHRs like Epic, with hundreds of available interfaces and dozens of APIs, niche inpatient and specialty workflows still have data elements only available via reporting database export, such as bed events or surgical supplies.
Secondarily, other (generally more mature) EHRs often develop the belief that by the volume of clients and range of implementations, they have seen a broader spectrum of scenarios and thus know better than their clients what is right or wrong. Therefore, in their mind, they are pursuing a noble and just path - helping and protecting their customers by restricting their control and choices. They are sometimes correct! For instance, in the aforementioned medication administration scenario, the stakes are high when there are mistakes - a wrong dose or unit impacts care. This point strikes me as a less black-and-white scenario and more holistically an ethics question of sorts - when does a software vendor have the right to disenfranchise its customer from the control to do the things it wants?
This path leaves applications at the mercy of the EHR. Sanctioned interfaces can be deprecated or removed wholesale with little or no warning, as we’ve seen recently with Epic’s drawdown of the App Orchard:
The draw of a headless EHR (and all headless software) attempts to bridge this gap in that a sanctioned path exists for every data element. This idea gives agency and ownership back to the customer organization. For innovative provider organizations (especially virtual care startups), this is especially enticing to have no limits when creating and building on top of the main EHR chassis. However, these headless EHRs are geared towards the fundamental workflows and data of outpatient care and hold less promise (or a much longer time horizon) for larger inpatient institutions.
The many, many data inputs and output formats are far too varied to enumerate individually but can either be standards-based format, like the ones listed in a Brief History of Standards and outlined in a Song of Health and FHIR, or a proprietary format specific to a vendor. They fall into three main categories:
Electronic Data Interface (EDI) - Typically a combination of fixed-length fields and delimiter characters to structure data. The first formats to arise as electronic exchange began in the 1980s and focused on sending the data in a compact layout to limit data size. Healthcare standards such as HL7v2 and X12 fall into this category, as do more exotic structures like GS1.
Extensible Markup Language (XML) - A syntax of documents consisting of elements and attributes delimited by tags, it became prevalent for exchange thanks to the improved ease of use inherent to human readability and the built-in tooling for web-based applications. The primary reason to play with XML in healthcare is when dealing with CDA documents or sending NCPDP messages (like e-prescriptions). Some EHRs have proprietary APIs in XML format.
JavaScript Object Notation (JSON) - A lightweight data-interchange format that is based on a subset of the JavaScript programming language. It’s the current format de jour thanks to simpler syntax/markup and the ever-rising popularity of Javascript/Typescript across tech. While FHIR technically can be XML or other formats, most exchange today is in JSON form.
Regulatory Tailwind Alert
Sanctioned interfaces have progressed immensely as a result of legislation and regulation. Whether it’s X12 and NCPDP via HIPAA, CDA via Meaningful Use, or USCDI FHIR via Cures, the government is a major force pushing forward standardized interfaces across vendors and healthcare organizations.
One particular area to watch is the bulk FHIR export required by ONC Cures. Required for the December 2022 deadline, this new sanctioned interface should become a valuable tool for the analytics and population health use cases that previously were largely reliant on direct-to-database.
The large majority of applications use sanctioned interfaces to connect to EHRs - you can find lists of them on EHR app stores like Athenahealth’s App Marketplace or third-party app listings like Redox’s Connected Products or Xealth’s Partner Community.
Redox is the extremely dominant middleware integration vendor, plugging into sanctioned interfaces and normalizing to a consistent API. Rhapsody (the artist formerly known as Lyniate Envoy but double formerly known as Rhapsody before that) also operates similarly as more of a budget option. Xealth can also be a good fit in some specific circumstances (especially for digital therapeutics with large health system clients).
So those, my friends, are the big three. Everything in integration falls into these categories, leading to some clear takeaways for different readers.
For those building patient/consumer facing applications:
Indiana Jones and the Personal Health Record is the best lay of the land, but to recap - Until recently, screen-scraping the patient portal was the main method to access and pull data, with middlemen like HumanAPI making that capability available to you. However, with the advent of ONC Cures and the CMS Patient Access rule, APIs are available to read a patient’s core data from providers and payers. Companies like Flexpa work to simplify and normalize that access across all those nodes, similar to Plaid in banking. Given you have no relationship with the healthcare organization, direct-to-database is out of the question.
For traditional providers or virtual first care providers:
When interacting with the broader healthcare ecosystem, as detailed in A Tail of Ramps and Rails and more recently/fully in the tag team effort with Elion Health on The Digital Health Provider’s Guide to Interoperability, you will be using sanctioned interfaces - typically standards-based protocols mediated by networks such as Surescripts, Carequality, and DirectTrust. You may use an on-ramp company to more quickly get started on these rails. Screenscraping and direct-to-database will likely be rare events, aside from connecting the systems within your own enterprise, as external hospitals, clinics, pharmacies, and such will rarely be excited to use those integration patterns.
For those building applications and selling to providers:
In the grandest scheme of things, if you imagine your application deployed at every hospital, clinic, and care facility nationwide in the long run, it’s highly probable that you’ll need all three of these techniques to integrate optimally with all the EHRs you’ll see.
In the short, more tactical time frame, there’s still no perfect one-size-fits-all path when it comes to integration. The end goal of the boundaries of your application and the core software blurring together into a seamless workflow experience for your users may be the same. It’s the “how” by which you achieve that end that is very much dependent on your specific situation (unsurprisingly coming full circle back to the conclusions of How to Win Friends and Integrate Systems).
For the investors and aspirant interoperability founders out there:
I’m asked quite frequently where the opportunities are in interoperability. As seen in the article, it is not the unknown, uncharted wilderness filled with eureka moments of discovery. The spectrum of the possible is well-defined and explored in integration. So when betting on or building a new interoperability company, it’s rare to find a truly novel, greenfield idea. Instead, you are simply believing in better execution, perhaps buffeted by new tech or standards advances, regulatory tailwinds, or quality of team.
For everyone else:
I know what you’re thinking. Where have all the APIs gone? Suffice to say, I am still very much the API Guy, but increasingly I’m adding in broader perspectives to cover my bases:
Big thanks to Colin Keeler (swift and unrelenting with editor’s red pen), Garrett Rhodes (payer integrationist extraordinaire), Samir Unni (Palantir dark arts), Angela Liu (impeccable design sensibilities), Rik Renard (screenscraping perspectives), Xand Griffin (direct to database bullishness), Shawn Myers (product leader experiences), Rohan D’Souza (master of triple threat integration), Bailey Davidson (wife supportive of writing during paternity) and everyone else who contributed.