Identifying and Harnessing Demand to Drive Open Data

David Portnoy is part of the Entrepreneur-in-Residence (EIR) program at the U.S. Department of Health and Human Services (HHS). The program brings together talent from within and outside government to tackle high priority issues in health, health care, and the delivery of human services.  This is David’s first blogpost in this EIR series documenting his project and experience.

So it begins…

My project as an Entrepreneur-in-Residence (EIR) with the HHS IDEA Lab is called, “Innovative Design, Development and Linkages of Databases”.  Think of it as Web 3.0 (the next generation of machine-readable and programmable internet applications) applied to open government and focused on healthcare and social service applications.  The original, underlying hypothesis was that by investigating how HHS could better leverage its vast data repositories as a strategic asset, we would discover innovative ways to create value by linking across datasets from different agencies. My hope was that my experience with big data in industry, both for startups and large scale enterprises,was a sufficient catalyst to make progress. I’ve been fortunate to have a project championed by a phenomenal group of internal backers: Keith Tucker and Cynthia Colton, who lead the Enterprise Data Inventory (EDI) in the Office of the Chief Information Officer (OCIO), and Damon Davis, who heads up the Health Data Initiative.

So, tell me: what are your (wildest) data dreams?

The first step in this effort was to set out on a journey of discovery.  With guidance from the internal sponsors, I was able to secure meetings with leaders and innovators on big data and analytics efforts across HHS.  I had the privilege of engaging in stimulating discussions at CMS, FDA, NIH, CDC, NCHS, ONC, ASPE and several other organizations. Upon attempting to synthesize the information gathered into something actionable, I noticed that past open data projects fell into two camps.  The first camp had ample examples of how external organizations were doing fantastic and often unexpected things with the data.  The other group lacked clarity on the true value to “customers,” despite being viewed as successful internally.  It got me thinking about why this was happening.

The “aha” moment

That’s when it hit me — we’re trying to solve the wrong problem.  It seemed that the greatest value that has been created with existing HHS data — and thereby the most innovative linkages — has been done by industry, researchers and citizen activists.  That meant we can accomplish the main goals of the project if we look at the problem a bit differently.  Instead of building the linkages that we think have value outright, we can accelerate the rate at which external organizations do what they do best. It seemed so obvious now. In fact, I had personally experienced this phenomenon myself.  Prior to the HHS EIR  program, I built an online marketplace for medical services called Symbiosis Health.  I made use of three data sets across different HHS organizations.  But I did so with great difficulty.  Each had deficiencies which I thought should be easy to fix.  It might be providing more frequent refreshes, adding a field that enables joins to another dataset, providing a data dictionary or consolidating data sources.  If only I could have told someone at HHS what we needed!

Let’s pivot this thing

Thus, the “pivot” was made.  While pivoting is a well known concept for rapid course correction in Lean Startup circles, it’s not something typically associated with government.  Entrepreneurs are supposed to allow themselves to make mistakes and make fast course corrections.  Government is supposed to plan ahead and stay the course.  Except in this case, we have the best of both worlds – the HHS IDEA Lab.  The IDEA Lab provides access to all the resources and deep domain expertise of HHS, but with the ability to pivot and continue to iterate without being weighed down by original assumptions!  I feel fortunate to have the opportunity to work in such an environment. Graphic depicting the pivot that has been taken  as part of the Demand Driven Open Data Project.

So what exactly is this thing?  

The project born from this pivot is called Demand-Driven Open Data (DDOD).  It’s a framework of tools and methods to provide a systematic, ongoing and transparent mechanism for industry and academia to tell HHS what data they need.  In other words, helping “customers,” drive the value that will better inform current open data efforts.  With DDOD, all open data efforts are managed in terms of “use cases,” which enables allocation of limited resources based on value.  A use case is a clear and concise way to define a desired outcome.  It outlines, from a user’s point of view, a system’s behavior as it responds to a request. Each use case is represented as a sequence of simple steps, beginning with a user’s goal and ending when that goal is fulfilled. It’s the Lean Startup approach to open data.  In this context, one example of a use case is a company providing consumers with quality information on doctors wanting access to a more complete dataset of relevant physician data. As the use cases are completed, several things happen.  Outside of the actual work done on adding and improving datasets, both the specifications and the solution associated with the use cases are documented and made publicly available on the DDOD website.  Additionally, for the datasets involved and linkages enabled, we add or enhance relevant tagging, dataset-level metadata, data dictionary, cross-dataset relationships and long form dataset descriptions.  This approach, in turn, accelerates future discoveries of datasets.  Best of all, it stimulates the linking we wanted in the first place, through coded relationships and field-level matching. In other words, this open, transparent process helps make the existing data-sets better and more accessible.

How does it fit into the big picture?

It’s beautiful how the pieces come together.  DDOD incorporates quite well with HHS’s existing Health Data Initiative (HDI) and HealthData.gov.  While DDOD is demand-driven from outside of HHS, you can think of HDI as its supply-driven counterpart.  That’s the one guided by brilliant subject matter experts throughout HHS.  Finally, HealthData.gov is the data indexing and discovery platform that serves as a home and bridge, enabling both these components.  As a matter of fact, we’re looking for DDOD to serve as the community section of HealthData.gov.

Let’s roll!

So now the fun begins.  Next up? More adventures as we work through actual pilot use cases.  We’ll also cover some cool potential components of DDOD that would put more emphasis on the “linkages” aspect of the project.  These include usage analytics, data maturity reporting, and semantic tagging of the dataset catalog and fields in the data dictionary.  Stay tuned. In the mean time, you can get involved in two ways: get the word out to your network about the opportunities provided by DDOD;  or, if you have actual use cases to add, go to – http://hhs.ddod.us and get them entered! You can also follow David on Twitter @dportnoy or via http://david.portnoy.us. HHS EIRs are brought on for a fixed period of time to leverage their skill-set to develop solutions while also bringing new skills, experience and capabilities to HHS.  Learn more about the EIR program.