Infrastructure Engineer at ClusterOne

Infrastructure Engineer at Clusterone

ClusterOne serves large enterprise customers and our product is relied upon by researchers to build the next AI breakthroughs for a variety of industries including life sciences and robotics. Our products solve various aspects of managing the training and deploying large-scale Machine Learning models, requiring the handling of thousands of servers, petabytes of data on various clouds, and data centers securely and efficiently. 

We seed engineers with a passion for enabling and empowering researchers and developers by building fast and reliable AI-related infrastructure. This role requires you to solve difficult problems together with a team of extremely knowledgeable and talented people. 

Key Qualifications 
  • Passionate about Continuous Build, Integration, Test, Deployment, Delivery and the DevOps culture 
  • Proficient in Golang and Python
  • Experience with the internals of Docker
  • Deep understanding of UNIX/Linux 

Description 
This is an exciting role for a software engineer who is comfortable taking ownership of specific solutions we are looking to develop. This role demands a strong understanding of containers, build systems, and infrastructure automation. Here, you will be hands-on developing the tools that orchestrate containers. These tools will help solve problems regarding everything from providing immutable build infrastructure and CI pipelines, to tooling for distributed golang/Python builds. This role requires an experienced engineer who is used to challenging the status quo.
A person in this role will thrive in an ambiguous and fast-paced environment, operating at the tactical level, while solving difficult problems with a high amount of autonomy. 

On your first day, we’ll expect you to have:
  • Software development experience with Python or Golang
  • Deep understanding of Linux systems. 
  • Serious troubleshooting skills across different levels of the stack.
  • Experience with AWS cloud infrastructure. 
  • Expertise in Monitoring distributed systems application architectures. 
  • Ability to diagnose and resolve problems in high-throughput web applications and network services. 
  • Solid communication skills with team members near and far. 
  • Experience with container management and microservice architectures such as Docker

It’s great, but not required, if you have:
  • Experience with building, automating, and maintaining infrastructure in Amazon Web Services. 
  • Experience with Kubernetes. 
  • Experience monitoring cloud services like DataDog. 
  • Experience working with Atlassian products such as Jira
  • Advanced networking experience. 
  • Experience working with a geographically distributed team.

Education 
Technical BS/MS/PHD or relevant industry experience.


Want to apply later?

Type your email address below to receive a reminder

Apply to Job

ErrorRequired field
ErrorRequired field
ErrorRequired field
Error
Error
insert_drive_file
insert_drive_file