Skip to content

Site Reliability Engineer

  • On-site
    • Tehran, Tehrān, Iran, Islamic Republic of
  • Tech

Job description

In this role, you will strengthen the SRE Platform team’s mission by advancing the foundational platforms that automate manual workflows and elevate system reliability. Your work will ensure our staging environments remain stable and production-like, empowering QA and development teams to test, validate, and deploy their applications with confidence. You will also contribute to operational excellence through active participation in the weekly on-call rotation, supporting consistent and dependable infrastructure performance.

  • Automate and optimize operational processes

  • Enhance and maintain the observability stack

  • Oversee test/staging environments management

  • Develop and support critical production components

  • Handle and resolve production incidents

  • Participate in the on-call rotation

Job requirements

  • Strong teamwork and collaboration skills

  • Solid understanding of SRE concepts, including SLIs, SLOs, SLAs, and Error Budgets

  • Proficiency in Python or another scripting language

  • Strong grasp of software engineering principles

  • Hands-on experience with observability and monitoring tools such as Prometheus and Grafana

  • Familiarity with logging stacks (e.g., ELK, Loki) and tracing systems (e.g., Jaeger, Tempo)

  • Understanding of RDBMS and Redis

  • Experience working with Kubernetes and related tooling (e.g., Helm)

or