Our data platform enables analytics, experimentation, machine learning models, streaming, reporting infrastructure and systems metrics which powers and drives innovation at UserTesting. Our team of data engineers and scientists is focused on creating a competitive advantage for UserTesting and our customers through novel data infrastructure, metrics, insights and data services. We are a small but rapidly growing team that builds and leverages state-of-the-art analytics systems, especially around video and Natural Language Processing (NLP).
This role works with Data Scientists, Software and Systems Engineers on large scale operations focused on improving the reliability levels of Data Infrastructure, Streaming Infrastructure, and Data Products teams, as well as helping create the foundational architecture for rapid ML development. The high amount of video data we store and millions of transactional and events data that we process each day provide a great opportunity to apply machine learning techniques to design powerful and meaningful experiences to our users.
You will help the ML Engineers deliver applications with minimal delays at precisely the right resource footprint with elasticity, while ensuring absolutely tight and robust security, privacy and confidentiality. You are equally comfortable enabling data pipelines to transport data (in and) out of our platform through custom-built systems.
- Improve team health by empowering teams to balance operational responsibilities with development
- Create training and deployment infrastructure for Machine Learning team specializing in Deep Learning for Natural Language Processing using TensorFlow
- Manage Machine Learning and Data Platform Infrastructure with a bias towards automation in Amazon Web Services with a deep familiarity with all of their core, compute, networking, storage, security, compliance, serverless, and analytics offerings including AWS VPC networking, and IAM roles, etc.
- Define, develop, and maintain a monitoring and reporting infrastructure
- Be the principal team resource for security, performance, and reliability concerns
- Set and maintain team standards for our production environment
- Provide a key perspective in architecture discussions with an eye towards reliability, maintenance, scalability, etc.
- Establish and support team in meeting SLO/SLAs for services
- Implement emergency response protocols, including delegating pager duty, for team
- 3+ years of DevOps/SRE experience, with deep familiarity with Dev Ops/Site Reliability Engineering principles.
- 3+ years of programming experience with Python (and ML packages) and either Scala, Ruby, Java, C/C++ or related
- Experience in AWS technologies such as EC2, Cloud formation, EMR Cluster, AWS S3, Kinesis and Redshift.
- Knowledge of validated approaches for scalability, productionalizing models and implementing machine learning applied to expansive and diverse datasets (especially expertise in techniques for Deep Learning at scale)
- Deployment of Machine Learning and Deep Learning algorithms on cloud based services (preferably AWS)
- Expertise in configuration management systems CI/CD (Docker, Kubernetes, Cloud Formation, etc.)
- Familiarity with asynchronous messaging systems (e.g. Kinesis, Kafka, ActiveMQ, RabbitMQ, AWS SQS)
- Self-driven and passionate about solving complex problems while maintaining a learning mindset when engaging with the team on ideas and solutions
Besides a great work environment and the opportunity to change the world, we offer competitive salary, benefits, plenty of perks, as well as equity participation.
UserTesting is an Equal Opportunity Employer and participant in the U.S. Federal E-Verify program. Women, minorities, individuals with disabilities and protected veterans are encouraged to apply. We welcome people of different backgrounds, experiences, abilities and perspectives. UserTesting will consider qualified applicants with criminal histories in a manner consistent with the San Francisco Fair Chance Ordinance.