Manager (Customer Reliability Engineer) (MTN Software Solutions)
Manager (Customer Reliability Engineer) (MTN Software Solutions)



MTN Group
MTN Group
Verified
Featured
Johannesburg, MTN Head Office, Innovation Centre, Fairland
I
I
Gauteng, ZA
Gauteng, ZA
Featured
Featured
Copy Job Reference
Copied
Copy Job Reference
Copied
Posted Date:
19 Sept 2025
Closing Date
Closing Date:
27 Sept 2025
Posted Date:
19 Sept 2025
Closing Date:
19 Sept 2025
Posted Date:
19 Sept 2025
Closing Date:
19 Sept 2025
Tech
Tech
Tech
Tech
Full-Time
Full-Time
Full-Time
Full-Time
Executive level
Executive level
Executive level
Executive level
Hybrid
Hybrid
Hybrid
Hybrid
Negotiable
Negotiable
Negotiable
Negotiable
Job Reference
Copied
Copy Job Reference
Copied
Job Reference
Copied
Back
Back
Back
Back
Responsibilities
Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.
Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.
Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.
Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.
Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.
Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.
Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.
Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.
Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.
Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.
Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.
Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices
Responsibilities
Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.
Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.
Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.
Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.
Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.
Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.
Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.
Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.
Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.
Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.
Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.
Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices
Responsibilities
Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.
Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.
Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.
Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.
Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.
Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.
Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.
Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.
Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.
Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.
Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.
Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices
Responsibilities
Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.
Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.
Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.
Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.
Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.
Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.
Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.
Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.
Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.
Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.
Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.
Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices
Qualifications
Education:
Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.
Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.
A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage
Qualifications
Education:
Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.
Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.
A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage
Qualifications
Education:
Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.
Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.
A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage
Qualifications
Education:
Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.
Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.
A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage
Experience
Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.
At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team
Experience
Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.
At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team
Experience
Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.
At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team
Experience
Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.
At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team
Skills
Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.
Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.
Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.
Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).
Skills
Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.
Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.
Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.
Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).
Skills
Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.
Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.
Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.
Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).
Skills
Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.
Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.
Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.
Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).
Job Identification
(6489)
Job Identification
(6489)
Job Identification
(6489)
Job Identification
(6489)
Spanisam is 100% free for job seekers, and no fees are allowed at any stage of the application process.
Any request for payment should be reported to us immediately.
Spanisam is 100% free for job seekers, and no fees are allowed at any stage of the application process. Any request for payment should be reported to us immediately.
Spanisam does not charge job seekers any fees. If anyone asks you to pay to apply for a job, report it to us immediately.
Spanisam is 100% free for job seekers, and no fees are allowed at any stage of the application process. Any request for payment should be reported to us immediately.
Report An Issue
Report An Issue
The data on this page gets updated
Report An Issue
Report An Issue
The data on this page gets updated
Stay Updated
Get the latest job alerts and career advice delivered straight to your inbox.
Stay Updated
Get the latest job alerts and career advice delivered straight to your inbox.
Stay Updated
Get the latest job alerts and career advice delivered straight to your inbox.
Similar Jobs

Featured
Gauteng, ZA
Entry Level
Office
Contract

Featured
Gauteng, ZA
Entry Level
Office
Contract

Featured
Gauteng, ZA
Mid Level
Office
Full-Time

Featured
Gauteng, ZA
Mid Level
Office
Full-Time

Featured
Gauteng, ZA
Senior Level
Office
Full-Time

Featured
Gauteng, ZA
Senior Level
Office
Full-Time

Featured
Gauteng, ZA
Mid Level
Office
Full-Time

Featured
Gauteng, ZA
Mid Level
Office
Full-Time

Gauteng, ZA
Entry Level
Office
Contract

Gauteng, ZA
Mid Level
Office
Full-Time

Gauteng, ZA
Senior Level
Office
Full-Time

Gauteng, ZA
Mid Level
Office
Full-Time

Gauteng, ZA
Mid Level
Hybrid
Full-Time

Verified
Featured
Gauteng, ZA
Contract
Entry Level
Office
Market Related

Verified
Featured
Gauteng, ZA
Full-Time
Mid Level
Office
Market Related

Verified
Featured
Gauteng, ZA
Full-Time
Senior Level
Office
R896 436 p.a

Verified
Featured
Gauteng, ZA
Full-Time
Mid Level
Office
Market Related

Verified
Featured
Gauteng, ZA
Full-Time
Mid Level
Hybrid
Market Related
You Find Jobs. We Build Careers. Own Your Future.
You Find Jobs. We Build Careers. Own Your Future.
You Find Jobs. We Build Careers. Own Your Future.
You Find Jobs. We Build Careers. Own Your Future.
South Africa's leading job board connecting talented professionals with amazing opportunities across the country.
Follow us on:
Get instantly notified on your inbox when new job added
© 2026
Spanisam. All rights reserved.
South Africa's leading job board connecting talented professionals with amazing opportunities across the country.
Follow us on:
Get instantly notified on your inbox when new job added
© 2026
Spanisam. All rights reserved.
South Africa's leading job board connecting talented professionals with amazing opportunities across the country.
Follow us on:
Get instantly notified on your inbox when new job added
© 2026
Spanisam. All rights reserved.
South Africa's leading job board connecting talented professionals with amazing opportunities across the country.
Follow us on:
Get instantly notified on your inbox when new job added
© 2026
Spanisam. All rights reserved.