DevOps Reliability Engineer

Department: Engineering

At Scality, we take pride in Software Development. We deliver innovative products through agile, continuous development, continuous integration. Scality is an industry recognized leader in distributed FS and object storage proven with 500 million users and over a trillion objects. You will be developing software for Scality’s open source, distributed, multi cloud controller called Zenko . If you are excited by constantly challenging engineering problems and can deliver exceptional quality in a fast-paced, agile environment, we'd like to meet you. 


Job Description:

You will be working in a team responsible for the core of our S3-endpoint service(S3 connector) and open source Zenko functionality, enabling policy based data management and data mobility use-cases. Our DevOps Reliability Engineers are a hybrid of software DevOps and systems engineers. We code our way out of operational problems . We are responsible for reliability, scalability, and automation while keeping an eye on latency, performance, and capacity. 

The role will involve, interacting with our product owner and customer facing technical sales team to understand customer use-cases and requirements, hardening the feature with leads and product architects, and writing the software code to automate operations, systems engineering, management and configuration to ensure reliability and scalability of the features. The Devops reliability engineer will work throughout the stack using Github for code versioning and will also be responsible for automated testing and leverage code as pipeline practices to automate (unit, function, integration and end to end testing) to ensure quality, and for performance, scalability, and triage-ability of newly added feature functionality for proprietary S3-endpoint service(S3 connector) and open source Zenko storage product using a variety of tools and technologies. The engineer will benefit from knowledge of, distributed environments, service oriented architectures, microservices architecture, containers,  cloud native ecosystems, Object oriented design principles, async design patterns, REST, load balancers, API gateways, Cloud terminologies, Linux, Kubernetes and related tools and technologies. 


Responsibilities

  • Design, write, and maintain software to improve the availability, scalability, latency, and efficiency of S3C and Zenko services, incorporating third-party open-source tools when available.
  • Work closely with release engineering to Automate use-cases for test driven deployment and operations in S3C and Zenko distributed systems stack.
  • Design and implement smart tools and processes for continuous integration, development & deployment. 
  • Design, implement, execute, improve, automate configuration management.
  • Own, maintain, and continuously improve all systems provided as a service, such as monitoring and logging, hence improving product serviceability and triage-ability. 
  • Engage in anticipating, investigating and fixing performance bottlenecks and software performance analysis and system tuning.
 

Must-Have Qualifications

  • Fluent in two or more of:  C/C++, /JVM, JavaScript /Go/Python.  (language doesn’t matter; we believe you can learn the syntax of the new languages if you have strong Object oriented fundamentals)
  • 2+ years of industry experience in software engineering field
  • REST API; (We use Node.js and Go. knowledge in Node.js and awareness of server framework like Express/HAPI is a plus). 
  • Able to optimize performance of their own code and system as a whole 
  • Unit-function-integration test automation/ TDD experience. Experience building an automation‐focused culture, both in testing and build/deploy (we have a fully integrated CI/CD pipeline. Awareness a plus)
  • Client-focused, react well to change, work with teams and is able to multi-task on multiple projects
  • Experience working with Unix/Linux systems, understand the high-level concepts (processes, virtual memory, FS) 
  • Familiarity with log analysis, triage using important tools for troubleshooting (like iperf, ps, lsof, iostat etc).
  • Experience working with file systems, and client-server protocols.
  • Experience with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and load balancing, etc.)
  • Experience with Salt and ansible, or some other configuration management tool. 
 

Nice-to-Have Qualifications

  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Kubernetes (a plus)
  • Familiarity with Kafka, zookeeper, mongoDB, Redis, prometheus,grafana and concept of containers.
  • Familiarity with object storage  / cloud storage (AWS /GCP/Azure/others) is a plus 

Want to apply later?

Type your email address below to receive a reminder

Apply to Job

ErrorRequired field
ErrorRequired field
ErrorRequired field
Error
Error
insert_drive_file
insert_drive_file