Manager (Customer Reliability Engineer) (MTN Software Solutions) (Id-04)

Manager (Customer Reliability Engineer) (MTN Software Solutions) (Id-04)

Logo of City of Joburg
Logo of City of Joburg
Logo of City of Joburg

MTN Group

MTN Group

Verified

Featured

Johannesburg, MTN Hea…

Johannesburg, MTN Head Office, Innovation Centre, Fairland

I

I

Gauteng, ZA

Gauteng, ZA

Posted Date:

19 Sept 2025

Closing Date

Closing Date:

27 Sept 2025

Posted Date:

19 Sept 2025

Closing Date:

19 Sept 2025

Posted Date:

19 Sept 2025

Closing Date:

19 Sept 2025

Tech

Tech

Tech

Tech

Full-Time

Full-Time

Full-Time

Full-Time

Executive level

Executive level

Executive level

Executive level

Hybrid

Hybrid

Hybrid

Hybrid

Negotiable

Negotiable

Negotiable

Negotiable

Back

Back

Job Description

  • Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention initiatives.

  • Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through streamlined incident response protocols and team readiness, ensuring minimal disruption to customers.

  • Build, lead, and develop a skilled team of Customer Reliability Engineers with a strong focus on ownership, collaboration, and continuous learning.

  • Ensure that reliability is embedded into service design, development, deployment, and operations by partnering with engineering, product, and operations teams.

  • Deliver clear and actionable reporting on reliability metrics to support leadership decision-making and continuous improvement.

  • Align reliability goals with customer expectations by addressing root causes of service degradation and championing seamless user experiences.

Job Description

  • Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention initiatives.

  • Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through streamlined incident response protocols and team readiness, ensuring minimal disruption to customers.

  • Build, lead, and develop a skilled team of Customer Reliability Engineers with a strong focus on ownership, collaboration, and continuous learning.

  • Ensure that reliability is embedded into service design, development, deployment, and operations by partnering with engineering, product, and operations teams.

  • Deliver clear and actionable reporting on reliability metrics to support leadership decision-making and continuous improvement.

  • Align reliability goals with customer expectations by addressing root causes of service degradation and championing seamless user experiences.

Job Description

  • Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention initiatives.

  • Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through streamlined incident response protocols and team readiness, ensuring minimal disruption to customers.

  • Build, lead, and develop a skilled team of Customer Reliability Engineers with a strong focus on ownership, collaboration, and continuous learning.

  • Ensure that reliability is embedded into service design, development, deployment, and operations by partnering with engineering, product, and operations teams.

  • Deliver clear and actionable reporting on reliability metrics to support leadership decision-making and continuous improvement.

  • Align reliability goals with customer expectations by addressing root causes of service degradation and championing seamless user experiences.

Job Description

  • Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention initiatives.

  • Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through streamlined incident response protocols and team readiness, ensuring minimal disruption to customers.

  • Build, lead, and develop a skilled team of Customer Reliability Engineers with a strong focus on ownership, collaboration, and continuous learning.

  • Ensure that reliability is embedded into service design, development, deployment, and operations by partnering with engineering, product, and operations teams.

  • Deliver clear and actionable reporting on reliability metrics to support leadership decision-making and continuous improvement.

  • Align reliability goals with customer expectations by addressing root causes of service degradation and championing seamless user experiences.

Responsibilities

  • Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.

  • Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.

  • Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.

  • Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.

  • Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.

  • Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.

  • Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.

  • Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.

  • Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.

  • Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.

  • Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.

  • Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices

Responsibilities

  • Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.

  • Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.

  • Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.

  • Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.

  • Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.

  • Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.

  • Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.

  • Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.

  • Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.

  • Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.

  • Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.

  • Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices

Responsibilities

  • Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.

  • Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.

  • Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.

  • Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.

  • Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.

  • Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.

  • Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.

  • Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.

  • Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.

  • Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.

  • Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.

  • Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices

Responsibilities

  • Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.

  • Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.

  • Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.

  • Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.

  • Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.

  • Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.

  • Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyse system performance and failures.

  • Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.

  • Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and work to continuously enhance the end-user experience.

  • Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.

  • Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.

  • Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices

Qualifications

Education:

  • Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.

  • Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.

  • A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage 

Qualifications

Education:

  • Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.

  • Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.

  • A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage 

Qualifications

Education:

  • Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.

  • Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.

  • A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage 

Qualifications

Education:

  • Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.

  • Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.

  • A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage 

Experience

  • Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.

  • At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team

Experience

  • Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.

  • At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team

Experience

  • Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.

  • At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team

Experience

  • Experience:7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.

  • At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team

Skills

  • Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.

  • Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.

  • Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.

  • Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).

Skills

  • Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.

  • Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.

  • Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.

  • Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).

Skills

  • Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.

  • Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.

  • Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.

  • Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).

Skills

  • Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.

  • Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.

  • Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.

  • Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).

Job Identification


  • (6489)

Job Identification


  • (6489)

Job Identification


  • (6489)

Job Identification


  • (6489)

MTN Group

Johannesburg, MTN Head Office, Innovation Centre, Fairland

Visit Company Website

MTN Group

Johannesburg, MTN Head Office, Innovation Centre, Fairland

Visit Company Website

MTN Group

Johannesburg, MTN Head Office, Innovation Centre, Fairland

Visit Company Website

Stay Updated

Get the latest career advice and job market insights delivered to your inbox.

Stay Updated

Get the latest career advice and job market insights delivered to your inbox.

Stay Updated

Get the latest career advice and job market insights delivered to your inbox.

Similar Jobs

Coat Of Arm (South Africa)
Department of Tourism

Featured

Pretoria

Gauteng, ZA

Information Technology Audit (Id-31)

Senior Level

Office

Full-Time

1 week ago
R896 436 p.a
Coat Of Arm (South Africa)
Department of Tourism

Featured

Pretoria

Gauteng, ZA

Information Technology Audit (Id-31)

Senior Level

Office

Full-Time

1 week ago
R896 436 p.a
Vodacom Group

Featured

Vodacom Midrand Campus

Gauteng, ZA

Senior Data Scientist (Id-14)

Mid Level

Office

Full-Time

2 weeks ago
Market Related
Vodacom Group

Featured

Vodacom Midrand Campus

Gauteng, ZA

Senior Data Scientist (Id-14)

Mid Level

Office

Full-Time

2 weeks ago
Market Related
Logo of MTN
MTN Group

Featured

Roodepoort

Gauteng, ZA

Data Engineer Manager (Id-11)

Mid Level

Hybrid

Full-Time

2 weeks ago
Market Related
Logo of MTN
MTN Group

Featured

Roodepoort

Gauteng, ZA

Data Engineer Manager (Id-11)

Mid Level

Hybrid

Full-Time

2 weeks ago
Market Related
South African Goverment (Coat of Arms)
Department of Employment & Labour

Featured

Pretoria

Gauteng, ZA

Developer (Id-07)

Mid Level

Office

Contract

3 weeks ago
R896 436 p.a
South African Goverment (Coat of Arms)
Department of Employment & Labour

Featured

Pretoria

Gauteng, ZA

Developer (Id-07)

Mid Level

Office

Contract

3 weeks ago
R896 436 p.a

SPANISAM

You Find Jobs. We Build Careers. Own Your Future.

SPANISAM

You Find Jobs. We Build Careers. Own Your Future.

SPANISAM

You Find Jobs. We Build Careers. Own Your Future.

SPANISAM

You Find Jobs. We Build Careers. Own Your Future.

South Africa's leading job board connecting talented professionals with amazing opportunities across the country.

Follow us on:

Get instantly notified on your inbox when new job added

© 2025

Spanisam. All rights reserved.

South Africa's leading job board connecting talented professionals with amazing opportunities across the country.

Follow us on:

Get instantly notified on your inbox when new job added

© 2025

Spanisam. All rights reserved.

South Africa's leading job board connecting talented professionals with amazing opportunities across the country.

Follow us on:

Get instantly notified on your inbox when new job added

© 2025

Spanisam. All rights reserved.

South Africa's leading job board connecting talented professionals with amazing opportunities across the country.

Follow us on:

Get instantly notified on your inbox when new job added

© 2025

Spanisam. All rights reserved.