Posted Jun 17, 2026

[Remote] Senior Site Reliability Engineer

Note: The job is a remote job and is open to candidates in USA. Doghouse Recruitment is seeking a Senior/Staff Site Reliability Engineer to join their client's team building a cloud platform for high-throughput, compute-heavy workloads. The role involves owning production reliability, defining SLIs/SLOs, and improving deployment safety while working in a bare-metal environment.

Responsibilities

Define SLIs/SLOs
Run error budget conversations
Ship changes that reduce incidents and improve latency (p95/p99)
Build automation to kill toil
Improve deployment safety (canary/rollback)
Turn observability into signal rather than noise

Skills

Extensive Production Engineering experience running bare metal / on-prem / data center infrastructure (not public cloud only)
Deep hands-on expertise in Linux systems debugging and performance (CPU, memory, IO, - level behaviors)
Strong understanding of networking (DNS/TCP/TLS, latency, packet loss, congestion, troubleshooting under load)
Strong Kubernetes experience beyond manifests: scheduler behavior, autoscaling edge cases, kubelet pressure/evictions, etcd/control plane
Experience with Terraform, Docker, Helm, and modern CI/CD practices
Strong coding skills are required for this role either in Go, and/or Python, beyond automation scripting - Real engineering capability is a must
Experience in Low Latency environments

Company Overview

Recruitment for your technology teams. You don't need another agency flooding your inbox with mismatched candidates. It was founded in 2015, and is headquartered in Amsterdam, North Holland, NL, with a workforce of 11-50 employees. Its website is http://www.doghouse.nl.

Apply Now

[Remote] Senior Site Reliability Engineer

More Technology Jobs