
Infrastructure Observability Engineer
- On-site
- Tehran, Tehrān, Iran, Islamic Republic of
- Tech
Job description
Our Journey So Far
At Snapp, we’re redefining how cities move. Our ride-hailing and mobility platform connects millions of riders and drivers every day, delivering safe, reliable, and efficient transport solutions. Powered by real-time data and robust infrastructure, we make urban travel faster, simpler, and more sustainable.
We operate with the mindset of a global tech leader and the agility of a startup, building services that scale across markets while staying responsive to local needs.

Your Impact
As an Infrastructure Observability Engineer within the Platform team, you will work across observability platforms, infrastructure monitoring, and DevOps automation to ensure comprehensive visibility and high system reliability. You will maintain and enhance monitoring and logging stacks, analyze infrastructure events, and drive proactive improvements that strengthen performance and resilience. This highly technical role emphasizes automation and continuous optimization rather than reactive support.
What You’ll Drive Forward
Build, operate, and optimize monitoring and logging systems (Prometheus, Grafana, ELK, Zabbix, etc.)
Ensure full observability coverage for infrastructure, networks, and services.
Maintain alerting rules, dashboards, SLO/SLA metrics, and anomaly detection.
Analyze logs and metrics to identify patterns and potential risks.
Monitor infrastructure health across compute, storage, virtualization, and network layers.
Perform root cause analysis of network-related incidents (Routing/Switching, load balancing, DNS, firewalls)
Collaborate with network and datacenter teams on incident follow-ups.
Maintain knowledge of network topologies, protocols, and traffic flows.
Support improvement of infrastructure reliability and performance.
Work with CI/CD pipelines to ensure reliable delivery and deployment processes.
Develop automation for observability, monitoring, and operational workflows.
Maintain Linux-based systems and automate routine infrastructure tasks.
Contribute to reliability engineering initiatives (IaC, Docker, GitOps, auto-remediation, etc.)
What Powers Your Drive
At least 2+ years of experience in NOC/IOC, SRE, infrastructure operations, DevOps, or a similar technical role.
Strong hands-on experience with monitoring & logging stacks (Prometheus, Grafana, ELK, Zabbix, etc.).
Solid understanding of networking fundamentals (CCNA Routing, Switching, VLANs, BGP, OSPF, load balancing)
Strong Linux administration background.
Familiarity with CI/CD tools (GitLab CI, ArgoCD, Jenkins, GitHub Actions, etc.)
Hands-on experience with containerization (Docker) and service mesh tools
Practical knowledge of automation using Bash, Python, or similar scripting languages.
Ability to read and interpret logs, metrics, traces, and alerts.
Strong communication and documentation skills, especially in technical reporting.
Preferred Qualifications (optional)
Experience designing observability architecture for large-scale infrastructure.
Contribute to reliability engineering initiatives (Terraform, Ansible, Docker, GitOps, auto-remediation, etc.)
Knowledge of ITIL Incident/Problem Management practices.
Experience with cloud infrastructure or private cloud platforms.
Experience with Kubernetes (cluster operation, troubleshooting, manifests, Helm, etc.)
Ready to Get on Board?
Help us shape the future of ride-hailing and urban mobility. Submit your CV and let’s build smarter cities together.
or
All done!
Your application has been successfully submitted!
