Data Engineer - Python
● Build and maintain multiple data pipelines to ingest new data sources (API and file-based) and support products used by both external users and internal teams.
● Optimize by building tools to evaluate and automatically monitor data quality, develop automated scheduling, testing, alerting, and distribution of feeds
● Work with our data product manager and analytics team to design, rapid prototype, and productize new data product ideas and capabilities
● Work with the data engineering team to migrate and enhance our existing ETL pipeline to a new Python-based system
● Conquer complex problems by finding new ways to solve with simple, efficient approaches with focus on reliability, scalability, quality, and cost of our platforms.
● Build processes supporting data transformation, data structures metadata, and workload management
● Collaborate with the team to perform root cause analysis and audit internal and external data and processes to help answer specific business questions
REQUIREMENTS Basic Qualifications:
● 3-5+ years of professional software engineering experience.
● Strong skills in Python programming language, extensive knowledge in Python libraries / frameworks
● Experience in working on datalake architecture using AWS S3 and Athena.
● Experience in building data pipelines using pyspark on EMR or AWS Glue
● Comfortable working directly with data analytics to bridge business requirements with data engineering.
● Experience with AWS infrastructure, especially Lambda and Kinesis.
● Excellent troubleshooting and problem-solving skills.
● Experience with workflow management tools (Airflow, Oozie, Azkaban).
● Ability to operate in an agile, entrepreneurial start-up environment, and prioritize
● Excellent communication and teamwork, and a passion to learn
● Strong Computer Science fundamentals and basic working knowledge of statistics.
● Experience with Redshift, Snowflake or other MPP databases is a plus.
● Experience with SQL is a plus.
● Familiarity with other distributed computing platforms (e.g. Hadoop, Spark, Storm).
● Some experience with LAMP-based web applications.