We are currently on the lookout for a Site Reliability Engineer to join our team and contribute to the operations of the world’s largest foodservice company.
RESPONSIBILITIES
• Maintaining service reliability by monitoring systems, responding to incidents, troubleshooting infrastructure issues, and identifying root causes in accordance with SLA/SLO targets
• Designing, building, and maintaining highly-reliable, scalable, and observable systems and platforms
• Engaging and coordinating with internal engineering teams and third-party vendors to facilitate efficient incident investigation and resolution
• Providing solutions to technical problems for systems patching, deploying, managing, and integrating with other/third-party products and server platforms or networks
• Proactively identify improvement opportunities and product enhancements to improve efficiency
• Developing automations for incident detection, remediation, deployments, environment provisioning, and operational tasks
• Working closely with development teams to influence architecture and ensure services meet reliability and performance goals
• Maintaining and improving infrastructure through Infrastructure-as-Code tools and modern cloud native technologies
REQUIREMENTS
• A Bachelor’s Degree in Computer Science, IT or equivalent qualification
• 1-2 years of experience in Site Reliability Engineering, Platform Support, Cloud Operations, or a similar role supporting enterprise-scale distributed systems
• Experience integrating services, troubleshooting complex issues, and performing root cause analysis in dynamic environments while collaborating across engineering and operations teams.
• Strong analytical and problem-solving skills with the ability to quickly sift through artifacts/data to get to a root cause
• Hands-on experience with reliability engineering concepts: SLIs, SLOs, error budgets, chaos engineering, and incident management
• Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment
• Experience in Cloud technologies like AWS, Azure, GCP and Strong understanding of CI/CD concepts and tooling (GitHub Actions, Jenkins, Argo CD, etc.)
• Experience with Nginx, HAProxy, Docker, Kubernetes, Terraform, or similar technologies
• Strong self-motivation and flexibility and willingness to accept additional responsibilities as they develop
• The ability to work in a roster-based environment including night shifts and weekend shifts
• Experience in DevOps concepts, tools and automation capabilities
• The ability to program/automate with one or more languages, such as Python, Shell, Java, C/C++, and JavaScript