Trimble Cloud xOps
Trimble Cloud xOps is a shared services organisation for Trimble divisions delivering products and services to the market using public cloud. We offer public cloud access and billing, common infrastructure and security services, consulting and application operations, and a suite of DevOps tools hosted for the enterprise. As public cloud usage at Trimble is growing, Trimble Cloud xOps is looking to expand the team and breadth of our service offerings.
Title: Lead Site Reliability Engineer (DevOPs)
Location: Ipswich
Department: Cloud OPs (DevOPS)
Position Overview:
In this role, you’ll be the expert in monitoring web applications and IoT services; driving improvements in availability and supporting ability by our 24x7 AppOps team. You’ll be seeking the state-of-the-art solutions and putting them into practice at a large and diverse scale.
Primary Responsibilities:
- Drive intelligent monitoring for supported Trimble apps, based on actionable alerts (drive volume down on non-actionable alerts)
- Close monitoring gaps for supported Trimble apps
- Define monitoring standards for our customers and create automation (e.g., JSON templates), including define value of each value
- Be the expert in prescribing monitoring for new Trimble apps onboarding to our services
- Define process improvements and consistency across supported Trimble apps
- Installing, configuring applications including custom and off the shelf products
- Work on the monitoring tools such as Datadog, Pingdom, New Relic and Dashboarding tools logging tools like Sumo Logic and API Monitoring tools like Runscope, assess, plan, and support Core monitoring platform services
- Work with Monitoring tool vendors to fix any platform related enhancements to address business needs
- Lead the initiative for Monitoring-as-Code
- Implement Self-healing scripts for monitoring multiple monitoring tools and recover them
- Own monitoring integration with downstream tools for alerting, ticketing and customer dashboards
Required Skills:
- Proven experience in IT, application development or DevOps, including excellent knowledge of networking, computing and storage
- In depth experience with AWS
- Bachelor’s degree or equivalent in Computer Science, Engineering or a related field, or additional comparable experience
- Strong understanding of application architecture, common failure modes and the development process
- Strong understanding 24x7 application operations, including Incident Management, Change Management, and Capacity Management
- Designing complete monitoring solutions and remediation using operations tools including Datadog, Sumo Logic, Pingdom, Runscope, New Relic, PagerDuty, and ITSM ticketing platforms like Freshservice
- Expert knowledge in Windows or Unix system administration
- Scripting skills, such as shell, Python, Powershell or Ruby, including automation with REST APIs and manipulation of JSON
- Excellent written and verbal communication, with customer service skills
- Excellent troubleshooting and problem solving skills
- Strong desire to learn new things independently
- Knowledge of security best practices
Preferred Skills:
- Proven experience with AWS, including AWS certification in SysOps, DevOps or related
- Experience with cost optimization on AWS and/or Azure
- Developing and supporting shared services
- Configuration management tooling, including Puppet, Amazon SSM, Ansible & Terraform
- Experience managing tasks / priorities and code using tools including Jira, Git and Jenkins/Bamboo
- SAML integration and Active Directory / LDAP directory services
- AWS SDK experience
- Good to have RDBMS or NoSQL DB experience
- Working knowledge of Microsoft Azure
- Expert knowledge in Windows and Unix system administration
- Understanding of SW development lifecycle / agile processes
- Able to travel to international destinations when required
Trimble Inc. is proud to be an Equal Opportunity and Affirmative Action Employer and considers qualified applicants for employment without regard to race, gender, age, color, religion, national origin, marital status, disability, sexual orientation, status as a covered veteran in accordance with applicable federal, state and local laws, or any other protected factor. EOE/M/F/V/D
#LI-POST