Data Lake enables seamless analytics of legacy data for Healthcare organization

Industry

Healthcare

Our Expertise

Data Lake and Cloud

Key Benefits

  • The enterprise data lake enabled downstream users to capitalize on the value in data, by bringing together internal and external datasets in a single place
  • Easy and fast access to variety of data
  • Capability to ingest and process any type of data i.e. structured, semi-structured and unstructured

Technologies Used

Spark
Cloudera

The Solution

The solution required us to create a data ecosystem (data lake and data pipeline) that would be a central location for providing business users and service groups with access to core data domains and critical data regarding patients and clinical trial results.  Based upon the business and technical requirements to enable the data lake, we envisioned a cloud-based platform that would accommodate different types of data and compute needs that were most relevant to the business – a data pipeline that ingests and transforms data to the data lake which acts as a central repository, discovery through a data catalog, data access methods to support disparate needs integrated with a cloud computing environment for applications and analytics.

Microsoft Azure was chosen as the cloud provider and the solution was executed using the agile methodology.

At a high level, the main tasks consisted of

a) Building and supporting the Azure data lake platform
b) Master and Meta data management and Data Quality
c) Data modelling
d) ETL ingestion pipeline using Talend
e) Ingesting the cleansed, validated, transformed and normalised data in the data lake
f) Data warehouse creation on Snowflake
g) Security implementation
h) Continuous integration/continuous delivery (CI/CD) pipeline
i) Azure Cost Optimization

Summary

Our client is one of the largest health care organization in the United States. It has few nationally recognized academic hospitals. The organization is on an endeavor to modernize its IT system to accommodate their growing needs and save costs.

They had crucial and confidential data stored in silos across 55 data sources like IBM DashDB, Oracle DB, SQL server etc. in structured, semi structured and unstructured format.  It was critical to get all this data at one place to empower various data consumers and stakeholders across the organization for better data insights, decision making and analytics.

Ellicium  implemented the Data Lake on Azure Cloud after analyzing business  requirements and pain points of the customer.

Challenges

Current systems posed following challenges

  • Complex data structure in legacy source systems made self service reporting impossible
  • Variety of data sources
  • High data volume and complex business logic

The Results

  • An enterprise data lake was created which was available 24/7 for the downstream users.
  • This helped them to easily access enterprise wide data at a single place rather than being dependent on various internal and external systems to provide them data.
  • Downstream users can perform analysis at 2 levels i.e. via data cataloging services on Azure as well as using the datawarehouse on Snowflake for faster analytics.