Solving Key Business Challenges on AWS
The key challenges Eliza needed to address arise out of key compliance requirements like the Health Insurance Portability and Accountability Act (HIPAA), the variety of the data ingested, and the need for a common view of the data.
Meeting HIPAA Requirements
AWS enables customers and partners to build HIPAA-compliant applications. Based on the requirements around Encryption in transit and Encryption at rest and following guidelines mentioned in the whitepaper, as well as various HIPAA compliance guidelines, a number of steps were implemented by NorthBay to ensure the Data Lake was HIPAA compliant, including spinning up Amazon Elastic MapReduce (EMR) in a dedicated Virtual Private Cloud (VPC), encrypting and decrypting data when needed, and launching a data pipeline orchestration process.
EMR resources are provisioned in dedicated VPCs, and most of the processing is done on transient clusters which leverage spot/reserved and on-demand instances. In addition, a long-running cluster was also provisioned for ad-hoc analysis of data. To make the real-time streaming data ingestion HIPAA-compliant for Eliza, NorthBay leveraged Amazon Kinesis Producer Library to encrypt the data prior to putting it in Amazon Kinesis, and then decrypting it before putting it into Amazon Simple Storage Service (Amazon S3). NorthBay also launched a data pipeline orchestration process, which in turns accesses resources in a dedicated VPC.
Data Obfuscation, Data Cleansing, and Data Mapping
To meet Eliza’s interpretation of protecting data under HIPAA, NorthBay established a business rule that when dealing with PII (Personally Identifiable Information) and PHI (Personal Health Information) data. In non-production environments, the PII must be obfuscated or masked before it can be shared with the development teams. Considering the volume and velocity of the data, the obfuscation task itself became a Big Data problem. To solve this problem NorthBay helped develop an algorithm and data map that reads the data and applies the corresponding obfuscation to protect the data. The data map also provided Eliza a common view across all of their data sources, an issue that they had been struggling with previously.
The data received by Eliza is populated by disparate systems and can include free-form entries by consumers/customers, creating inconsistencies among each entry. NorthBay helped Eliza implement an additional process to cleanse the data and bring it to a common format. The schema structure that was put in places allows Eliza to apply multiple data cleansing rules on the same field and choose the order in which the rules are applied.