Site Reliability Engineering Lead
Lulalend
Administration, Software Engineering
Cape Town, South Africa
Posted on Jan 16, 2026
OVERALL PURPOSE
We are seeking an experienced Site Reliability Engineering Lead to lead, mentor, and grow our SRE team. The ideal candidate will have a deep understanding of Microsoft Azure, cloud computing, and distributed systems.
As the SRE Lead, you will be responsible for the overall strategy and execution of our SRE function. You will guide your team to monitor, maintain, and improve our Azure-based infrastructure and applications, ensuring their reliability, scalability, and security.
KEY RESPONSIBILITIES:
- Lead, mentor, and develop a high-performing SRE team, fostering a culture of ownership, collaboration, and continuous improvement.
- Manage the team's performance, including setting clear goals, conducting regular 1:1s, and supporting career development.
- Collaborate with the software engineering manager on the recruitment process to grow the SRE team, ensuring a high bar for technical skill and cultural fit.
- Own and manage the 24/7 on-call rotation and incident response process, acting as a key escalation point and driving effective root cause analysis (RCA) and remediation plans.
- Define and drive the SRE technical roadmap, partnering with Engineers, DevOps , and SecOps to build and manage highly available, scalable, and resilient architectures on Azure.
- Oversee the platform's monitoring and alerting strategy, guiding the team to build a holistic view of infrastructure and application performance using tools like Azure Monitor.
- Champion automation by directing the team's development of scripts and tools to streamline deployment and management of Azure services.
- Drive platform optimisation by analysing performance metrics and evaluating new Azure features and services to improve workflows.
- Ensure the security of the Azure infrastructure by enforcing security policies and best practices in partnership with the SecOps team.
- Foster a culture of delivery, continuous improvement and innovation within the SRE team, encouraging experimentation
THE EXPERIENCE WE’RE LOOKING FOR
- Matric certificate or equivalent.
- 5+ years of experience in a senior SRE, DevOps, or Cloud Infrastructure role, with deep knowledge of maintaining Azure infrastructure.
- Minimum 2+ years of formal people management and leadership experience.
- Demonstrable experience leading incident response and root cause analysis.
- Strong understanding of Azure services such as Web Applications, Functions, and Application Gateways.
- Strong experience with automation tools such as PowerShell, Azure CLI, and ARM templates.
- Deep experience with monitoring and logging tools such as Azure Monitor, Grafana or similar, Log Analytics, Application Insights, and Logic Apps.
- Excellent troubleshooting, problem-solving, and strategic planning skills.
- Azure certification(s) preferred, such as Azure Administrator Associate.
- Strong familiarity with DevOps practices and tools such as Jira and OpsGenie
Our Tech Stack
- Cloud Platform: Microsoft Azure
- Automation & IaC: Azure CLI, Python, Azure DevOps, GitHub and Terraform.
- Monitoring & Observability: Azure Monitor, Log Analytics and Grafana.
- Operations & Incident Management: Jira, Sentinel and OpsGenie.
- Dev Stack: .NET, React, MS SQL