Site Reliability Engineer (SRE) at Clarisights would fill the mission-critical role of ensuring that our complex, web-scale system is healthy, monitored, automated, and designed to scale. You will use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues.
Responsibilities:
- Help in keeping our services up and running and meeting SLAs and cloud infra scaling.
- DB maintenance and scaling including sharding, Accountable for SLAs - availability & latency, capacity planning, efficient use of infra (cost control)
Skills:
- Must have worked in maintaining a production cluster for 2+ years
- Experience in horizontal scaling
- Must have been “oncall”
Preferred Qualifications:
- Python experience, specifically for systems automation.
- Broad experience with multiple types of database management systems (relational, document-oriented, key-value, time-series).
- Good RESTful API and systems design sensibilities.
- Experience with general performance tuning and optimization of all aspects of platforms and services (systems, network, code).
- Experience in troubleshooting that span systems, network, and code.
- Broad understanding of Internet protocols and network programming.
- Excellent communication skills, initiative, and teamwork.