Data Engineer

How you will help
You will support the engineering team’s data endeavors, diving in to fix issues, optimize processes, and automate what you do more than once.  You’ll use the best tools for the job, whether modern and revolutionary or time tested and proven, to deliver elegant, scalable solutions that meet business and technical needs.

 What you will do
  • Work with internal stakeholders to load data into HealthVerity's data warehouse
  • Troubleshoot and resolve issues relating to data integrity
  • Help establish procedures and best practices for transforming and storing data
  • Lead requirements gathering around data pipeline automation improvements
  • Work with some of the most exciting open-source tools like Spark, Hadoop, Docker, Airflow, Zeppelin
  • Leverage distributed computing and serverless architecture such as AWS EMR & AWS Lambda, to develop pipelines for transforming data
  • Enjoy the peace that comes with working in a mature software development environment 
  • Marvel at the speed with which your creation makes it into production
  • Research and implement new technologies with a team of developers to execute strategies and implement solutions
  • Produce peer reviewed quality software
  • Solve complex problems related to the real-time discovery of large data

About you
You are...
  • Experienced in writing scalable applications on distributed architectures
  • Data driven, testing and measuring as much as you can
  • Eager to both review peer code and have your code reviewed
  • Comfortable on the command line and consider it an essential tool
  • Confident in SQL, you know it, write smart queries, it’s no big deal

Required skills and experience
  • 5+ years of work experience
  • 3+ years of experience with Python
  • 3+ years of experience with PySpark and Spark-SQL (writing, testing, debugging spark routines)
  • 1+ years of experience with AWS EMR, AWS S3 service. Comfortable using AWS CLI and boto3
  • Comfortable working in remote environments
  • Comfortable using *nix command line (shell scripting, AWK, SED)
  • Experience with MySQL and Postgres

Desired experience
  • Experience with Apache Airflow
  • Experience with Apache Zeppelin
  • Experience with healthcare data

HealthVerity, based in Center City Philadelphia, is a venture-backed technology company that is transforming the way data-led organizations make critical decisions. Our technology platform serves as the foundation for the rapid creation, exchange and management of healthcare and consumer data in a fully-interoperable, privacy-protecting manner. Advantaged by highly sophisticated identity resolution and matching capabilities, HealthVerity is on a mission to increase transparency, forge interoperability and activate deeper insights.

Our company challenges
  • Empowering clients with highly rewarding data discovery and licensing tools
  • Ingesting and managing billions of healthcare records from a wide variety of partners
  • Standardizing on common data models across data types
  • Orchestrating an industry-leading HIPAA privacy layer
  • Innovating our proprietary de-identification and data science algorithms
  • Building a culture that supports rapid iteration and new possibilities

The infrastructure and culture we are building will provide an environment that cultivates innovation. We want to move fast knowing we can fix anything we break along the way. If a new need arises, we want to turn around a solution quickly. We want to solve our challenges in ways that create even more possibilities. We’re creating a platform that lets us discover what else we might do.

We have big plans
We are building a platform that will scale to support an ever-growing array of data providers and innovative products. You must be able to think big while still delivering on near-term requirements.

HealthVerity is an equal opportunity employer.

Want to apply later?

Type your email address below to receive a reminder

Apply to Job

ErrorRequired field
ErrorRequired field
ErrorRequired field
Error
insert_drive_file
insert_drive_file
Error
Error
ErrorRequired field
ErrorRequired field
ErrorRequired field