Site Reliability Engineer

We are currently on the lookout for a Site Reliability Engineer to join our team and contribute to the operations of the world’s largest foodservice company.

RESPONSIBILITIES

• Maintaining service reliability by monitoring systems, responding to incidents, troubleshooting infrastructure issues, and identifying root causes in accordance with SLA/SLO targets

• Designing, building, and maintaining highly-reliable, scalable, and observable systems and platforms

• Engaging and coordinating with internal engineering teams and third-party vendors to facilitate efficient incident investigation and resolution

• Providing solutions to technical problems for systems patching, deploying, managing, and integrating with other/third-party products and server platforms or networks

• Proactively identify improvement opportunities and product enhancements to improve efficiency

• Developing automations for incident detection, remediation, deployments, environment provisioning, and operational tasks

• Working closely with development teams to influence architecture and ensure services meet reliability and performance goals

• Maintaining and improving infrastructure through Infrastructure-as-Code tools and modern cloud native technologies

REQUIREMENTS

• A Bachelor’s Degree in Computer Science, IT or equivalent qualification

• 1-2 years of experience in Site Reliability Engineering, Platform Support, Cloud Operations, or a similar role supporting enterprise-scale distributed systems

• Experience integrating services, troubleshooting complex issues, and performing root cause analysis in dynamic environments while collaborating across engineering and operations teams.

• Strong analytical and problem-solving skills with the ability to quickly sift through artifacts/data to get to a root cause

• Hands-on experience with reliability engineering concepts: SLIs, SLOs, error budgets, chaos engineering, and incident management

• Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment

• Experience in Cloud technologies like AWS, Azure, GCP and Strong understanding of CI/CD concepts and tooling (GitHub Actions, Jenkins, Argo CD, etc.)

• Experience with Nginx, HAProxy, Docker, Kubernetes, Terraform, or similar technologies

• Strong self-motivation and flexibility and willingness to accept additional responsibilities as they develop

• The ability to work in a roster-based environment including night shifts and weekend shifts

• Experience in DevOps concepts, tools and automation capabilities

• The ability to program/automate with one or more languages, such as Python, Shell, Java, C/C++, and JavaScript

Careers

Site Reliability Engineer

Apply for this position