At CARDFREE, we ensure Merchant Mobile Apps delight users and keep them engaged. From innovative solutions for Offers and Loyalty to Stored Value and Order Ahead, we are building products that power user engagement and measure key metrics so merchants can sharpen their mobile business. We have won the best Mobile Startup Award with the MEA, listed as part of the Red Herring Top 100 North America Award, and are also powering the Dunkin’ Donuts mobile application.
The Sr. DevOps Engineer is responsible for the integrity and stability of production and non-production systems. This includes installation, monitoring, maintenance, support, and design.
As a part of the Operations team, this role participates in building scalable solutions that millions of users are and will be using as their primary mobile interface to submit and pay for their orders. We nurture creativity and encourage individuality to generate a great work environment with personality. In turn, we encourage talent that doesn’t always want to do it the corporate way but strives to create innovative solutions that amaze and have some fun while doing it while building good internal team relationships.
The Sr. DevOps Engineer has a broad range of experience working in online e-commerce environments and is continually making a difference, adding his or her valuable skills to our platform solution. This role takes responsibility for critical contributions that have a big impact on the quality of our platform and services provided to our clients.
- You like to automate things. If it isn’t automated it isn’t complete.
- You are familiar with a variety of cloud based solutions and can recommend the best tool for the job.
- You are comfortable working in a fast-paced environment where direction changes quickly and process is always evolving.
- You are a self starter who doesn’t wait for someone to give them a task.
- You are able to communicate complicated ideas simply and clearly.
- You can work well with a geographically distributed team.
- You enjoy working directly with project managers, developers, and QA to ship quickly while maintaining stability.
- You would rather provide value today than pontificate on the perfect solution.
- You keep your cool and work well under pressure.
- Augment our stack for deploying our cloud based infrastructure
- Build tools to increase the velocity of our engineering teams
- Work with engineering teams to increase the operational stability of the platform
- Help drive the organization towards continuous delivery
- Daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups
- Implement a system for automatically scaling our platform to handle our increasing load
- Keeping systems patched, secure, and up to date
- Participating in maintenance events and on-call rotation
- Debug platform failures while working with the business on incident response
- Help ensure that in the event of catastrophic failure we can recover
- Additional duties as assigned
- 4+ years experience building and/or operating software in a production environment
- 2+ years experience working in a public cloud (AWS/GCE/Azure)
- Previous experience operating and scaling RDBMS
- Familiarity with scaling in a cloud native environment
- Understand how to monitor common types of applications
- A firm understanding of HTTP and how HTTP APIs work
- Firm understanding of how to administer Team City and CI/CD platform solutions (TeamCity, Jenkins)
- Advanced understanding of Git
- Familiarity using a Change Management tools to manage a fleet of server (Terraform, Chef, Ansible, Puppet)
- The ability to manage a mixed platform environment consisting of Linux and Windows
- Training engineering and new ops team members on our standard operating procedures.
Nice to Have:
- Demonstrated success scaling a platform
- You were a DBA in another life
- Experience with database management best practices on MySQL
- Experience administering windows servers
- Experience working with IIS or Tomcat
- Experience working in an PCI compliant organization
- Strong understanding of security best practices
- Worked with centralized logging. Eg: Splunk, ELK stack
- Experience with Monitoring Solutions (Prometheus, NewRelic, Grafana)
- Experience with Distributed cache (Redis, Memcached, Elasticache)