Job Summary
15+ years of experience in DevOps Infrastructure Automation and Kubernetes administration.
Proven leadership in managing on-prem container orchestration platforms at scale.
Architectural understanding of microservices distributed systems and secure automation frameworks.
Deep expertise in Docker Kubernetes OpenShift and CI/CD tooling.
Experience with Helm GitOps and secure credential management.
Strong proficiency in Linux administration Shell scripting and Python.
Responsibilities
Kubernetes Cluster Leadership
Architect administer and scale enterprise-grade Kubernetes clusters in on-prem datacentre.
- Lead cluster lifecycle management: provisioning upgrades patching node pools and capacity planning.
Define and enforce multi-tenant governance using RBAC network policies Pod-Security Policies and Namespaces.
Implement and optimize Ingress controllers service meshes and API gateways for secure traffic routing.
Establish high availability disaster recovery and backup strategies for cluster components and workloads.
Drive root cause analysis and resolution of complex cluster-level issues.
Containerization & Orchestration Strategy
Oversee containerization standards using Docker Compose and private registries.
Lead deployment and orchestration of microservices via Kubernetes Helm.
Define resource optimization strategies including autoscaling affinity rules and quota enforcement.
CI/CD Architecture
Architect and govern CI/CD pipelines
Standardize build and release processes across diverse tech stacks
Design reusable pipeline frameworks and automation templates for rapid onboarding and delivery.
Integrate CI/CD with Kubernetes for seamless rollout rollback and canary deployments.
AI Workflow Enablement (ClearML)
Lead integration of ClearML for experiment tracking model versioning and pipeline orchestration.
Collaborate with AI/ML teams to containerize models and automate GPU job scheduling.
Build and maintain custom ClearML agents and workflows for reproducible experimentation and deployment.
Scripting & Tooling
Develop robust automation scripts in Shell Python
Build internal tools and dashboards to enhance infrastructure observability and operational efficiency.
Understanding of NIM services CUDA frameworks & libraries/models from OpenAI/Hugging face are good-to-have from Infrastructure perspective.
Certifications Required
relevant certifications
关于高知特 (Cognizant)
高知特(Cognizant)(纳斯达克代码:CTSH)作为一家AI Builder和相关技术服务提供商,致力于通过打造全栈AI解决方案,帮助企业将人工智能投资转化为实际价值。公司凭借深厚的行业经验、流程优化和工程技术专长,将企业独特的业务场景融入科技系统,赋能组织释放人才潜能,推动切实成果,并帮助全球企业在瞬息万变的环境中保持领先。如需了解更多详情,敬请访问 cognizant.ai 或关注@cognizant。
补充雇佣信息
薪酬信息截至本职位发布之日为准。Cognizant 保留在适用法律允许的范围内随时修改该信息的权利。
申请人可能需要通过现场面试或视频会议的方式参加面试。此外,候选人在每次面试时可能需要出示其当前所在州或政府签发的有效身份证件。
Cognizant 是一家提供平等就业机会的雇主。在招聘过程中,您的申请和候选资格不会因种族、肤色、性别、宗教、信仰、性取向、性别认同、国籍、残疾、遗传信息、怀孕、退伍军人身份或任何其他受联邦、州或地方法律保护的特征而受到影响。







