Senior Site Reliability Engineer: Systems

Salary Range: 300,000 - 500,000 RMB

About Instrumental:
Instrumental is creating the future of manufacturing, empowering hardware companies to optimize their factories through the use of artificial intelligence. Thanks to Instrumental, companies can:

  • increase yields (the percent of goods passing inspection);
  • decrease dark yield (the escape of goods that should have failed inspection); and,
  • trace everything (to narrowly scope recalls to only those goods affected).

Our product is delighting customers, who now have access to technology that produces results previously unattainable. The Instrumental customer list is growing across a diverse set of manufacturing applications, and we are asking you to help us scale to the needs of that entire market.

About the role:
Interested in a Systems SRE role with a twist?

As an SRE at Instrumental, you'll apply your expertise in linux configuration and software development to guide development of our distributed compute platform and make sure our software deploys correctly, runs well, and can be modified at a moment’s notice. The twist? This compute infrastructure is not in the cloud -- it's distributed around the most secure factory floors of the world.

This isn't a solo job -- you will be working together not only with fellow SREs as we build this new team, but with developers, operations, and product staff in a quest to improve the way that things are made, wherever they are made.

What you can expect in the first few months:
  • Learning and improving our process for executing on deployments of Instrumental hardware and software systems
  • Becoming the go-to person for technical operational questions on our software and compute hardware in the field
  • Fixing breakages remotely and automating fix distribution to the rest of the fleet
  • Researching the problems we encounter and developing potential solutions to them, including: Great Firewall of China, VPN solutions, bandwidth issues, and end-to-end encryption
  • Improving our status quo re: monitoring, alerting, and uptime.
  • Planning the next iteration of our factory compute hardware & software

Requirements:
  • Fluency in English (CET >6 or TOEFL >80) – spoken, written, and reading – this is a firm requirement for the role
  • Three or more years supporting high-availability systems as an SRE, Linux Sysadmin, or DevOps role
  • Ability to work in both English & Mandarin, spoken and written
  • Deep experience scripting & configuring linux systems, particularly with an emphasis on hardening for security
  • Ability to solve software delivery and monitoring problems within unusual network constraints. This means working knowledge of TCP, DNS, NAT, routing, firewalls, and VPNs
  • A relentless desire to automate (or the foundation and desire to learn!)
  • Ability to effectively communicate complex ideas to technical and non-technical individuals alike
  • Ability to thrive in a fast-paced environment 

Relevant Technobabble:
Ansible, Terraform, Bash, Docker, Python, Ubuntu, Squid Proxy, Packer


Want to apply later?

Type your email address below to receive a reminder

Apply to Job

ErrorRequired field
ErrorRequired field
ErrorRequired field
Error
Error
insert_drive_file
insert_drive_file