Infrastructure Engineer at Clusterone
ClusterOne serves large enterprise customers and our product is relied upon by researchers to build the next AI breakthroughs for a variety of industries including life sciences and robotics. Our products solve various aspects of managing the training and deploying large-scale Machine Learning models, requiring the handling of thousands of servers, petabytes of data on various clouds, and data centers securely and efficiently.
We seed engineers with a passion for enabling and empowering researchers and developers by building fast and reliable AI-related infrastructure. This role requires you to solve difficult problems together with a team of extremely knowledgeable and talented people.
Key Qualifications
- Passionate about Continuous Build, Integration, Test, Deployment, Delivery and the DevOps culture
- Proficient in Golang and Python
- Experience with the internals of Docker
- Deep understanding of UNIX/Linux
Description
This is an exciting role for a software engineer who is comfortable taking ownership of specific solutions we are looking to develop. This role demands a strong understanding of containers, build systems, and infrastructure automation. Here, you will be hands-on developing the tools that orchestrate containers. These tools will help solve problems regarding everything from providing immutable build infrastructure and CI pipelines, to tooling for distributed golang/Python builds. This role requires an experienced engineer who is used to challenging the status quo.
A person in this role will thrive in an ambiguous and fast-paced environment, operating at the tactical level, while solving difficult problems with a high amount of autonomy.
On your first day, we’ll expect you to have:
- Software development experience with Python or Golang
- Deep understanding of Linux systems.
- Serious troubleshooting skills across different levels of the stack.
- Experience with AWS cloud infrastructure.
- Expertise in Monitoring distributed systems application architectures.
- Ability to diagnose and resolve problems in high-throughput web applications and network services.
- Solid communication skills with team members near and far.
- Experience with container management and microservice architectures such as Docker
It’s great, but not required, if you have:
- Experience with building, automating, and maintaining infrastructure in Amazon Web Services.
- Experience with Kubernetes.
- Experience monitoring cloud services like DataDog.
- Experience working with Atlassian products such as Jira
- Advanced networking experience.
- Experience working with a geographically distributed team.
Education
Technical BS/MS/PHD or relevant industry experience.