Site Reliability Engineer (SRE)
Exness, a leading fintech company, multi-asset broker, we are looking for SRE Engineer to join Exness Technology to be part of a 150+ engineers team who create cutting-edge solutions and always raise the bar. We count on our site reliability engineers to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand our customer deployments, we are currently seeking an experienced SRE to deliver insights from massive scale data in real time.
What you will do:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding;
- Partner with development teams to improve services through rigorous testing and release procedures;
- Facilitate incidents, run blameless postmortems and complete root cause analysis investigations;
- Create sustainable systems and services through automation and uplifts;
- Balance feature development speed and reliability with well-defined service level objectives;
- Participate in system design consulting, platform management, and capacity planning;
- Contribute to handbooks/runbooks, general documentation;
- Share knowledge, mentor, provide training materials.
What you need to succeed:
- Minimum 5+ years of strong hands on Linux and Windows experience;
- Experience programming in Golang, Python, C++ or Java at least 3+ years;
- Understanding of Linux and TCP/IP network fundamentals;
- Experience running services such as load balancers, relational databases, messaging systems and orchestration systems;
- Demonstrated expertise with Kubernetes and Docker in a hybrid environment;
- Experience with Gitlab CI/CD, Terraform (Iac);
- Experience analyzing and troubleshooting systems;
- Ability to quickly learn new technologies, frameworks, and architectures.
Will be a plus:
- Experience working with globally distributed systems or infrastructure;
- Experience with Amazon AWS, Alibaba or Google clouds;
- Experience with Nginx, HaProxy, Envoy, Traefik;
- Experience with Istio, Consul Mesh;
- Experience with Postgres, Clickhouse, MongoDB, Elasticsearch;
- Experience with Etcd, Zookeeper, Consul;
- Experience with Redis, Kafka, RabbitMQ, Nats;
- Experience with Graylog, Loki, Prometheus, Thanos, Grafana, Zabbix;
- Experience with Jaeger, Sentry, DataDog;
- Experience with WAF, CDN;
- Experience with Rancher, Rancher2;
- Experience with NOC & SOC;
- Fluency in Golang and/or Python.
- Medical insurance for employees.
- Coworking expenses compensation.
- “Get to know your Team” trip to Cyprus.
- Sports activities compensation.
- Extensive learning opportunities.
- Flexible public holidays.
- 100% pay of Annual leave.
- Professional Development.