Trimble is an exciting, entrepreneurial company, with a history of exceptional growth coupled with a disciplined and strategic focus on being the best. While GPS is at our core, we have grown beyond this technology to embrace other sophisticated positioning technologies and, in doing so; we are changing the way the world works. Our solutions are used in over 140 countries and we have incredibly diverse lines of business.
Our employees represent this diversity and can be found in over 30 countries, working closely with their colleagues around the world. Due to our geographic, product and customer reach, there is plenty of room at Trimble for exceptional people to grow. Come position yourself with an innovative industry leader and position yourself for success.
Trimble in Chennai is seeking a Site Reliability Engineer (SRE) for its Cloud Platform Engineering Team to build and run large-scale, distributed, fault-tolerant systems services. The SRE in Chennai for Trimble Cloud platform will strive to make platform services secure, highly available, reliable and performant to our users by closely working with the Engineering squads. The SRE role has a unique opportunity to advocate and participate in building services that are resilient, effectively monitored, alerted and self-healed by applying software engineering best practices.
- Interest in designing, and/or troubleshooting software and systems in a distributed, internet-scale Cloud environment.
- Improve service reliability through root cause analysis, postmortems, and using code to prevent or respond to problem recurrence.
- Ability to and automate routine tasks.
- Lead the way to continuously refine and improve our AWS or Azure deployment practices for improved reliability, repeatability, and security. You’ll create plans, collaborate with other DevOps team members, and coordinate with development and Product Managers. These high-visibility initiatives will help to increase service levels, lower costs, and deliver features more quickly.
- Write code and scripts to automate the provisioning of AWS services and to configure services, using tools and languages including AWS CLI / API, Terraform, Ansible, Python, Bash, and Git.
- Design effective monitoring / alerting (for conditions such as application-errors, high memory usage) and log aggregation approaches (to quickly access logs for troubleshooting, or generate reports for trend analysis) to proactively notify business stakeholders of issues and communicate metrics, working closely with these stakeholders, using tools including AWS CloudWatch, Sumologic, New Relic.
- Help refine DevSecOps security practices (including regular security patching, minimum-permissions accounts, etc ) and verify them, using tools like Veracode to analyze and verify compliance.
- Clearly document and diagram deployment-specific aspects of architectures and environments, working closely with Software Engineers, Quality Engineers, and others in DevOps Engineers
- Troubleshoot issues in production and other environments, applying debugging and problem-solving techniques (e.g., log analysis, non-invasive tests), working closely with Development, QE teams.
- Suggest deployment patterns & practices improvements based on learnings from past deployments and production issues, collaborate with the DevOps team to implement these.
Skills and Qualifications
- At least 10+ years of experience, including 3 years of software engineering experience on public clouds such as AWS or Azure. Solid experience managing environments from development to production
- Very good Linux System admin skills
- AWS Certified Solution Architect
- AWS administration experience including provisioning EC2 instances, VPCs, Lambda functions, RDS databases, S3 storage, IAM security, ECS containers, Cloudwatch metrics & logs
- Experience developing and/or deploying serverless functions using AWS Lambda, Azure Functions, or Google Cloud Functions
- Experience developing and/or deploying Docker Containers on ECS or Kubernetes
- Experience with SQL DBMS
- Experience with monitoring / alerting tools such as New Relic, Grafana
- Experience with log aggregation tools such as Sumologic, FluentD, ELK, Splunk
- Strong communication skills, ownership, and drive.
- Working knowledge of public cloud cost management and security best practices
- Experience or exposure to ISO 27001 certification
- Experience with containerization (Docker) and/or orchestration (e.g. Kubernetes) is a plus
- Experience with messaging technologies such as Kafka, RabbitMQ, etc.
- Strong working knowledge of web server technologies (Nginx, Tomcat and etc)
- Strong troubleshooting knowledge of JVM, Spring boot and dot net core
- A systematic approach to solving problems.