Note: The job is a remote job and is open to candidates in USA. Vizcom is a visual creation platform that combines modern web tooling with AI-powered workflows. They are hiring a Senior Platform & Reliability Engineer to own service reliability end-to-end, prevent incidents, and lead recovery efforts when production degrades.
Responsibilities
- Reliability bar: Set and enforce SLIs/SLOs/error budgets for critical user flows
- Production architecture resilience: Drive failure isolation across API, workers, queues, and dependencies so one subsystem cannot take down core access
- Kubernetes runtime reliability: Define probe contracts, rollout/rollback standards, graceful shutdown behavior, scaling/resource policies, and startup safety
- Queue + job safety (BullMQ/Redis): Own poison pill containment and workload isolation
- Incident command quality: Lead Sev1/Sev2 response end-to-end (containment, communications, technical direction, RCA, corrective action execution)
- Reliability operating system: Own observability quality (signals over noise), on-call effectiveness, runbooks, and postmortem discipline
- Release safety authority: Gate risky deploys and enforce reliability guardrails when production health is at risk
Skills
- Experience with setting and enforcing SLIs/SLOs/error budgets for critical user flows
- Proven ability to drive failure isolation across API, workers, queues, and dependencies
- Expertise in defining probe contracts, rollout/rollback standards, graceful shutdown behavior, scaling/resource policies, and startup safety in Kubernetes
- Experience with BullMQ/Redis for queue and job safety, including poison pill containment and workload isolation
- Demonstrated ability to lead Sev1/Sev2 incident response end-to-end
- Strong skills in observability quality, on-call effectiveness, runbooks, and postmortem discipline
- Ability to gate risky deploys and enforce reliability guardrails
- Calm, structured incident commander under pressure
- Ability to think in failure modes and blast radius by default
- Pragmatic approach to stabilizing quickly and implementing durable fixes
- High ownership and strong written communication skills
Benefits
Company Overview
Building tools that shorten the distance between having ideas and bringing them to life. It was founded in 2021, and is headquartered in San Francisco, California, US, with a workforce of 51-200 employees. Its website is https://www.vizcom.com.Company H1B Sponsorship
Vizcom has a track record of offering H1B sponsorships, with 1 in 2026. Please note that this does not guarantee sponsorship for this specific role.