Our client is an industry-leading provider of Digital Transformation services. They specialize in helping their customers accelerate and achieve exceptional business outcomes by transforming, simplifying, and integrating their data and cloud technologies. They are seeking a qualified Data Pipeline Engineer. This position will be located in Durham, NC.
We are looking for a data pipeline engineers to be part of our Enterprise wide data lake initiative. The candidate will be responsible for designing & building data pipelines for various sources, across the business units. The Data Engineer will work with development & product teams, architects, analysts and data scientists. We expect the candidate to deliver optimal data pipeline, which is consistent and repeatable across the business units. Candidate must be capable enough to work on this independently, with minimum guidance provided by the development teams, and comfortable supporting the data needs of multiple business units. This will be a unique opportunity to build the reusable pipelines ground-up, at the Enterprise level, which will be implemented by multiple business units.
Responsibilities for Data Engineer
- Design & develop the data pipeline components including extraction, ingestion, preparation, curation of the data.
- Understanding of batch & near real time processes from Data pipeline perspective.
- Schedule the pipeline execution with cross business unit dependency .
- Implement nonfunctional aspects of the pipeline like performance, logging, monitoring, re-startability, validations, regression test suite, CI/CD, QA automation.
- Work with business unit product teams to assist with data-related technical issues and support their data consumption needs.
Qualifications for Data Engineer
3-5 Years of experience in following technical areas:
- Overall AWS experience – EC2, S3 etc. AWS certification will be a plus.
- Python experience in building the pipelines is must.
- Advanced working knowledge and experience working with SQL and/or snowflake is must.
- DAG tool experience like Apache Airflow will be a plus.
- Knowledge of scheduling tools like Control-M will be a plus.
- Debugging and root cause analysis experience is must.
- Knowledge of streaming tools Spark, Kafka will be a plus.