Key Responsibilities
Infrastructure & Environment Management
• Manage the full lifecycle of environments (Dev / QA / Staging / Production) across cloud platforms
• Design, provision, and maintain infrastructure primarily based on Kubernetes
• Operate and manage Kubernetes clusters and containerized workloads in production
• Ensure systems are scalable, highly available, and cost-efficient
Deployment, Automation & GitOps
• Implement and manage GitOps-based deployments using ArgoCD or FluxCD on Kubernetes
• Design and maintain CI/CD workflows aligned with GitOps practices
• Build and manage infrastructure using Terraform and/or Ansible
• Automate operational processes to reduce manual intervention
• Develop internal tools and scripts using Go, Python, or TypeScript
• Integrate systems with external services (GitHub/GitLab, Slack, webhooks, APIs)
• Ensure consistent, repeatable, and auditable deployments across environments
Monitoring, Logging & Reliability
• Implement and maintain observability using Prometheus, Grafana, InfluxDB
• Define alerting and monitoring strategies to ensure system health
• Analyze metrics and logs to proactively identify issues and performance bottlenecks
• Continuously improve system reliability and availability
Troubleshooting & Incident Management
• Investigate and resolve production issues in Kubernetes-based distributed systems
• Analyze logs and metrics to identify root causes
• Work with engineering teams to implement fixes and long-term improvements
• Participate in incident reviews and improve system resilience
Performance & Testing
• Design and execute load and stress testing scenarios
• Support creation of test environments and mock services when needed
• Analyze performance results and recommend optimizations
Architecture & Delivery Support
• Collaborate with developers and architects on system design (Kubernetes-native architectures)
• Provide input on scalability, high availability, and resilience
• Support project delivery by defining infrastructure and release requirements
• Work with PMs to ensure smooth and timely releases
Documentation & Collaboration
• Maintain clear documentation (runbooks, procedures, architecture diagrams)
• Support knowledge sharing within the team and with customers
• Participate in technical discussions with internal teams and external stakeholders
-----------------------------------------
Required Skills & Experience
Core Requirements
• Strong hands-on Kubernetes experience in production environments (mandatory)
• Proven experience with GitOps tools (ArgoCD or FluxCD)
• Strong experience with Infrastructure as Code: Terraform (preferred) and/or Ansible
• Solid Linux system administration skills
• Experience working with at least one major cloud provider (AWS / Azure / GCP / Alibaba Cloud)
Programming / Automation
• Hands-on coding experience in at least one: Go / Python / TypeScript
• Ability to build tools, automation, and integrations (not just basic scripting)
• Comfortable with a 'vibe coding' mindset — pragmatic, iterative development to solve real operational problems quickly
Observability
• Experience with Prometheus, Grafana, InfluxDB
• Strong understanding of monitoring, alerting, and system reliability
Nice to Have
• Experience with multi-cloud environments
• Experience operating high-scale distributed systems
Profile
• Senior-level experience in DevOps / SRE roles
• Strong ownership and problem-solving mindset
• Comfortable working across teams (Dev, QA, PM, Customers)
• Strong troubleshooting and system-level thinking
关于我们
Wiredcraft 创办于 2009 年,2022 年 6 月正式加入阳狮集团中国( Publicis Groupe China )。目前在上海、新加坡设有办公室,服务范围辐射中国和亚太地区。我们拥有逾 100 名本土和国际专业人士,深耕技术、设计、研发、产品管理、战略咨询以及数据领域,助力企业实现全渠道、跨国界的数字化转型战略蓝图。
📩 如何申请上述岗位:
(可选择以下任意一种方式进行申请)
1.官网投递:
https://wiredcraft.com/expertise/通过上述对应官网链接投递,或者访问我们的官方招聘平台,了解更多招聘资讯。
2.Email:
[email protected](可发送你的中英文简历到上述邮箱,标题格式:申请岗位|姓名|来自 V2EX )