Senior Site Reliability Engineer
- Los Angeles, CA, United States
- San Francisco, CA, United States
- Irvine, CA, United States
Who we are:
We are a tech company, operating a thriving and growing broadcast platform, Alexa ranked in the top 100 sites internationally, and the top 25 in the United States, with approximately 10 million daily users, and a worldwide community of fans. Independent Broadcasters use our platform to create and share live streaming video, photographs, and similar content, generally adult in nature, (but no adult content is required).
Site stats you will improve:
- 728+ Nvidia P100/T4 GPUs
- 32k+ physical cores over 24 carrier hotels and 6Tbps capacity
- 10k+ concurrent live video broadcasts
- 400k+ concurrent live video streams
- 26B+ weekly web requests
- 95% of web requests completed in 59ms-72ms
- 2M database queries per minute, average response 3.5ms
- 300k+ cmd/sec Redis Clusters
What you will do:
- Performance analysis to identify sources of instability using data from APM and distributed telemetry data tools
- Analyze complex systems to identify operational surprises and minimize downtime.
- Software engineering and patching in to incrementally improve performance, scalability, and reliability
- Infrastructure modifications in both a data center metal environment with advanced routing/switching and in the public cloud
- Predictive failure analysis and disaster planning
- Author new tools and automation to streamline the devops pipeline
- Collaborate with Frontend/Backend engineering, QA, DevSecOps, and Data teams
- Database and kv store administration and configuration with a focus on uptime and performance
- Incident response and postmortem reports
What you bring:
- STEM degree and relevant experience as a Site Reliability Engineer
- Exceptional problem solving skills
- High proficiency in one of the following: C, C++, Java, Python, Go, etc.
- High proficiency in Unix/Linux environment, excellent knowledge of internals (e.g., filesystems, system calls)
- Networking knowledge (e.g., routing, switching, TCP stack) for both metal and cloud (VPC, Security Groups) environments
- Experience in database administration and configuration.
- Experience with DevOps tools such as Ansible, Docker, Kubernetes,
- On call reporting to monitoring and alerting of core website functions as needed
- Experience in growing data center teams (nice to have)
What will you receive:
- A strong team of A-players
- A robust engineering culture
- Opportunity to make an impact on the highly popular product
- Freedom to bring the ideas to the table and to make technical decisions
- Support and guidance of the highly professional and knowledgeable team
- Flexible working environment
We are a technology company that operates a sex-positive video and chat platform, Chaturbate.com, that allows a very diverse, worldwide group of independent broadcasters to safely earn a very comfortable income from home. Our community has millions of fans, and we rank among the top 100 of all websites.
We have a great team of driven individuals who love being creative while solving challenging problems. We take pride in our work and love being the best at what we do.
Skip straight to final-round interviews by applying through Triplebyte.