Data Engineer

Principal Responsibilities and Essential Duties:

  • Build data pipelines as per data transformation specifications to convert source data to be loaded into data lake using proprietary big data processing platform
  • Supports and improves current data ingestion processes for our proprietary healthcare data applications and systems
  • Develop and maintain data engineering processes using a variety of tools including T-SQL, Spark and Scala, and shell scripting. Generally focused on data ingestion for healthcare data management, data validation, statistical report generation, and program validation.
  • Develop tools and techniques for improving process efficiencies and data performance.
  • Review & test the data to ensure accuracy & validity of the data prior to uploading the data to the data lake.
  • Data Troubleshooting and Analysis
  • Perform data analysis, data mining and investigations and identify root cause of issues using several cutting-edge data analysis tools.
  • Work with Technical Operations to troubleshoot complex database issues related to the entire environment including OS, storage, and servers. Provide off hours support to resolve production issues when necessary

Requirements:

  • MUST: 3+ years experience with data aggregation, standardization, linking, quality check mechanisms, and reporting.
  • MUST: 3+ years experience with big data technologies like Hadoop and Spark.
  • MUST: 3+ years experience with RDBMS (Oracle, MS SQL Server) and using SQL or other data integration/ETL tools.
  • MUST: Solid understanding of Linux environments; strong knowledge of shell scripting and file systems.
  • Bachelor’s degree in relevant field such as Computer Science, Engineering, a related field, with 3-5 years of industry experience.

Preferred Qualifications:

  • Experience with healthcare data preferably in a data operations role.
  • Data expert with the ability to debug data issues, identify root causes and fix the data issues in a fast-paced environment.
  • Experience operating in an Agile environment.

Deadline: 

Saturday, February 19, 2022