
Site Reliability Engineer
- On-site
- Tehran, Tehrān, Iran, Islamic Republic of
- Tech
Job description
In this role, you will strengthen the SRE Platform team’s mission by advancing the foundational platforms that automate manual workflows and elevate system reliability. Your work will ensure our staging environments remain stable and production-like, empowering QA and development teams to test, validate, and deploy their applications with confidence. You will also contribute to operational excellence through active participation in the weekly on-call rotation, supporting consistent and dependable infrastructure performance.
Automate and optimize operational processes
Enhance and maintain the observability stack
Oversee test/staging environments management
Develop and support critical production components
Handle and resolve production incidents
Participate in the on-call rotation
Job requirements
Strong teamwork and collaboration skills
Solid understanding of SRE concepts, including SLIs, SLOs, SLAs, and Error Budgets
Proficiency in Python or another scripting language
Strong grasp of software engineering principles
Hands-on experience with observability and monitoring tools such as Prometheus and Grafana
Familiarity with logging stacks (e.g., ELK, Loki) and tracing systems (e.g., Jaeger, Tempo)
Understanding of RDBMS and Redis
Experience working with Kubernetes and related tooling (e.g., Helm)
or
All done!
Your application has been successfully submitted!
You've already applied for this job
We appreciate your interest in this position. Unfortunately, you have already applied for this job.
