Data Engineers


Help build technology that positively affects hundreds of thousands of Injured Workers everyday!

 
CLARA analytics drives change in the commercial insurance markets with easy-to-use artificial intelligence (AI)-based solutions that dramatically reduce claims costs by anticipating the needs of claimants and helping align the best resources to meet those needs. Leading examples of our solutions include CLARA providers, which identifies the best available doctor engine is an award-winning provider scoring engine that helps rapidly connect injured workers to the right providers, while CLARA claims is an early warning system that helps frontline claims teams efficiently manage claims, reduce escalations and understand the drivers of complexity. CLARA’s customers include a broad spectrum — from the top 25 insurance carriers to small, self-insured organizations. For more information, visit www.claraanalytics.com.

Job Description

If you are a Data Engineer with a craving for making sense out of structured and unstructured data with the goal of affecting people’s lives in a positive manner, please read on! 

We are looking for a Big Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on working with the Data Team to design technologies that wrangle, standardize and enhance our master data repositories, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.

Unique skills expected for this job is the ability to translate Python code into clean, high-quality Spark/Scala libraries that can be re-used within our platform. Ability to create orchestration workflows that ingest structured and unstructured data, enhances them and makes them available for use throughout the platform.

Responsibilities

  • Minimum of 4-6 years experience implementing large-scale production systems
  • Experience with Java or Scala build systems: maven, ant, sbt 
  • OO design and implementation
  • Understanding of database design (SQL/noSQL)
  • Experience with multiple Apache Hadoop / Spark ecosystem applications, like: Spark, Hadoop, Hive, Zeppelin
  • Experience with Python 
  • Experience building and operating at scale 
  • Excellent analytical and problem solving skills 
  • BS/MS in Math, Computer Science, or equivalent experience

Extra points for experience with:

  • Nosql data storage solutions
  • Experience with AWS
  • Continuous delivery
  • Microservice architectures
  • Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
  • Implementing ETL process
  • Monitoring performance and advising any necessary infrastructure changes
  • Defining data retention policies
  • Create and maintain optimal data pipeline architecture,
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
  • Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
  • Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
  • Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
  • Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
  • Work with data and analytics experts to strive for greater functionality in our data systems.
  • Manage databases running on MySQL, MongoDB, Redis

Technologies

  • Experience with object-oriented/object function scripting languages: Scala, Java, Python.
  • AWS: EC2, EMR, RDS
  • Experience with big data tools: Spark, Hadoop, Kafka, etc.
  • Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
  • Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
  • Experience with AWS cloud services: EC2, S3, EMR, RDS
  • Experience with stream-processing systems: Storm, Spark-Streaming, etc.
  • RDS, DynamoDB
  • Microservices
  • Docker
  • Elastic caching (Redis, Memcached)
  • Git, JIRA & Confluence

Requirements

  • Proficient understanding of distributed computing principles
  • Understanding of how to manage Hadoop clusters, with all included services
  • Ability to solve any ongoing issues with operating the cluster
  • Proficiency with Hadoop v2, MapReduce, HDFS
  • Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming
  • Good knowledge of Big Data querying tools, such as Hive
  • Experience with Spark and Scala
  • Experience with integration of data from multiple data sources
  • Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
  • Knowledge of various ETL techniques and frameworks, such as Flume
  • Experience with various messaging systems, such as Kafka or RabbitMQ
  • Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O
  • Good understanding of Lambda Architecture, along with its advantages and drawbacks
  • Experience with Nifi/Kylo.io, or other ETL tools, such as Cloudera/MapR/Hortonworks
  • Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
  • Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
  • Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
  • Strong analytic skills related to working with unstructured datasets.
  • Build processes supporting data transformation, data structures, metadata, dependency and workload management.
  • A successful history of manipulating, processing and extracting value from large disconnected datasets.
  • Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
  • Strong project management and organizational skills.
  • Experience supporting and working with cross-functional teams in a dynamic environment.
  • We are looking for a candidate with 4+ years of experience in a Data Engineer role, who has attained a Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.

Want to apply later?

Type your email address below to receive a reminder

ErrorRequired field

Apply to Job

ErrorRequired field
ErrorRequired field
ErrorRequired field
Error
Error
insert_drive_file
insert_drive_file