Principal Responsibilities and Essential Duties:
- Build data pipelines as per data transformation specifications to convert source data to be loaded into data lake using proprietary big data processing platform
- Supports and improves current data ingestion processes for our proprietary healthcare data applications and systems
- Develop and maintain data engineering processes using a variety of tools including T-SQL, Spark and Scala, and shell scripting. Generally focused on data ingestion for healthcare data management, data validation, statistical report generation, and program validation.
- Develop tools and techniques for improving process efficiencies and data performance.
- Review & test the data to ensure accuracy & validity of the data prior to uploading the data to the data lake.
- Data Troubleshooting and Analysis
- Perform data analysis, data mining and investigations and identify root cause of issues using several cutting-edge data analysis tools.
- Work with Technical Operations to troubleshoot complex database issues related to the entire environment including OS, storage, and servers. Provide off hours support to resolve production issues when necessary
- MUST: Solid understanding of Linux environments; strong knowledge of shell scripting and file systems.
- MUST: 1-2 years experience with data aggregation, standardization, linking, quality check mechanisms, and reporting.
- MUST: 1-2 years experience with big data technologies like Hadoop and Spark.
- MUST: 1-2 years experience with RDBMS (Oracle, MS SQL Server) and using SQL or other data integration/ETL tools.
- Bachelor’s degree in relevant field such as Computer Science, Engineering, a related field, with 1-2 years of industry experience.
- Experience with healthcare data preferably in a data operations role.
- Data expert with the ability to debug data issues, identify root causes and fix the data issues in a fast-paced environment.
- Experience operating in an Agile environment.
Saturday, February 19, 2022