Senior Site Reliability Engineer (Automation & Observability)
About the role
As a Senior Site Reliability Engineer, you will make an impact by driving automation, improving system reliability, and enabling intelligent, self-healing operations across critical services. You will be a valued member of the Engineering / Production Support team and work collaboratively with DevOps, platform engineering, and application teams to enhance performance and operational excellence.
In this role, you will:
- Design and implement automation solutions to eliminate repetitive manual support and operational tasks
- Define and manage Service Level Objectives (SLOs) and apply error budget principles to guide reliability and release decisions
- Build and enhance observability frameworks, including dashboards, monitoring, and alerting systems
- Develop runbooks and convert them into automated remediation workflows and self-service capabilities
- Implement self-healing solutions and optimize alerting to reduce noise and improve incident response efficiency
Work model
We strive to provide flexibility wherever possible. Based on this role’s business requirements, this is a hybrid position requiring 3 days per week in a client or Cognizant office in Pittsburgh, PA. Regardless of your working arrangement, we are here to support a healthy work-life balance through our various wellbeing programs. The working arrangements for this role are accurate as of the date of posting. This may change based on the project you’re engaged in, as well as business and client requirements. Rest assured; we will always be clear about role expectations.
*You must be legally authorized to work in the USA without the need for employer sponsorship, now or at any time in the future*
What you need to have to be considered:
- Proven experience in Production Support, Site Reliability Engineering (SRE), or DevOps environments
- Strong programming or scripting skills (Python, Java, or similar)
- Hands-on experience with automation, monitoring, and observability tools
- Solid understanding of SLOs, SLIs, error budgets, and reliability engineering principles
- Demonstrated ability to troubleshoot complex systems and drive root cause analysis and resolution
These will help you stand out:
- Experience implementing self-healing systems and intelligent automation (AIOps)
- Familiarity with alerting and event management tools such as Moogsoft or similar platforms
- Experience improving batch processing reliability and recovery patterns
- Track record of reducing incident volume through automation and permanent fixes
- Exposure to large-scale distributed systems and cloud-native environments
Salary and Other Compensation:
Applications will be accepted until July 30th, 2026.
The annual salary for this position is between $ 63,000 $ to 115,000 depending on experience and other qualifications of the successful candidate.
This position is also eligible for Cognizant’s discretionary annual incentive program, based on performance and subject to the terms of Cognizant’s applicable plans.
Benefits: Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:
- Medical/Dental/Vision/Life Insurance
- Paid holidays plus Paid Time Off
- 401(k) plan and contributions
- Long-term/Short-term Disability
关于高知特 (Cognizant)
高知特(Cognizant)(纳斯达克代码:CTSH)作为一家AI Builder和相关技术服务提供商,致力于通过打造全栈AI解决方案,帮助企业将人工智能投资转化为实际价值。公司凭借深厚的行业经验、流程优化和工程技术专长,将企业独特的业务场景融入科技系统,赋能组织释放人才潜能,推动切实成果,并帮助全球企业在瞬息万变的环境中保持领先。如需了解更多详情,敬请访问 cognizant.ai 或关注@cognizant。
补充雇佣信息
薪酬信息截至本职位发布之日为准。Cognizant 保留在适用法律允许的范围内随时修改该信息的权利。
申请人可能需要通过现场面试或视频会议的方式参加面试。此外,候选人在每次面试时可能需要出示其当前所在州或政府签发的有效身份证件。
Cognizant 是一家提供平等就业机会的雇主。在招聘过程中,您的申请和候选资格不会因种族、肤色、性别、宗教、信仰、性取向、性别认同、国籍、残疾、遗传信息、怀孕、退伍军人身份或任何其他受联邦、州或地方法律保护的特征而受到影响。







