Lead Engineer

Description of Work & Required Skills:
 
  • Our environment is as follows
  • We are sunsetting 2 large Netezza appliances with 1 TB of data.
  • The new database is Snowflake and it is hosted at AWS.  Note: this is NOT a lift and shift of the data and processes from Netezza to Snowflake.
  • In the future state, data will be minimally processed and legacy business rules in the ETL are not being carried forward.
  • The reference pattern for loading data is acquire data from source, minimally process it in SparkSQL (with other AWS services if required), partition and write the data to S#, load the data to Snowflake.
  • Rudimentary monitoring & alerting and dev ops have been implemented and need to be enhanced. 
  • The work we seek to outsource is as follows:
  • Acquiring data from internal and external sources, prepping it with minimal ETL in SparkSQL, landing it in Amazon S3 and loading it into a Snowflake database.
  • Working with our Data Platform team to register data into Collibra.
  • Implementing enhanced monitoring and alerting across the SparkSQL ETL  and Snowpipe jobs
  • Implementing a to be determined (TBD) enhanced Dev Ops light-weight capability
  • Advisory services to assess what was implemented in AWS this year and how it can be improved.
  • Specifics
  • Load a temporary database with up to 50 TB of data to serve as an operational back-up in the event one Netezza appliance has technical problems and has an extended outage.
  • Potentially backfill some historical data.  The jobs for current daily loads work.  Historical data dating back to last year needs to be reprocessed with these jobs and loaded.  This will require putting and automation wrapper around the daily jobs so they can process the data.  Additional EMR clusters will likely need to be setup and configured.
  • Additional data sets are being identified this week to be acquired, minimally processed into S3 and loaded into Snowflake.  This could be up to 10+ data sets.
  • Monitoring and alerting of all jobs (from data acquisition to writing to S3) needs to reviewed and then optimized in conjunction with our architecture team.  This must align with best practices.
  • Our dev ops model needs to be reviewed and an enhancement roadmap developed so it aligns with best practices.  The implementation approach will be developed with CA and timeboxed based on the available hours.
  • We also seek advisory services based on your expertise working in similar AWS environments.  To be determined before SOW is submitted.
  • Key Skill Sets for Consulting Team Members
  • 1 person must serve as the hands-on technical leader and key point of contact for CA.
  • Overall Amazon Web Services (AWS) Java skills
  • Overarching AWS Services Skills across the AWS services set
  • Specifically: S3 partitioning, SparkSQL processing, Kafka, Streaming Patterns, Pipelining, etc.
  • AWS Database Expertise: Snowflake is preferred (and hard to find) and Redshift, Vertica EON and other alternatives are OK.
  • All must have professional consultant demeanor and communication skills.



Want to apply later?

Type your email address below to receive a reminder

Apply to Job

ErrorRequired field
ErrorRequired field
ErrorRequired field
Error
Error
insert_drive_file
insert_drive_file