Sr. Site Reliability Engineering, Big Data
- Silicon Valley, CA, United States
Sr. Site Reliability Engineer, Big Data
Machine Zone is seeking an experienced Site Reliability Engineer to join our engineering team and own Big Data within the company. You should have experience with large scale big data environments and a knack for problem solving and optimization. Qualified candidates will be responsible to lead a team of highly skilled engineers that push the limits of scalability through our specialized single-shard technology and cross-functional collaboration with all engineering groups at Machine Zone. You should be well-versed and hands-on with different Big Data frameworks, and will be responsible for architecture and design. You will be viewed as an expert in your field and should have experience leading or mentoring a team. As part of this role you will be responsible for helping deliver high quality online, mobile MMOs, and your work will be seen by tens of millions of global players in the future!
What you'll be doing:
- Hadoop / HBase Administration - maintaining, developing and implementing policies and procedures for ensuring the security and integrity of the clusters.
- Administer, manage and scale multiple Storm, Kafka and Druid clusters
- Monitoring and resolving performance, and capacity issues.
- Assist Engineering teams with troubleshooting and fine-tuning Spark and MapReduce jobs.
- Automation is in everything we do. Ability to create scripts and programs that would automate daily tasks
- Provisioning of new servers for existing clusters and making sure they are monitored accordingly.
- Investigating new versions of Hadoop and other data stores, as needed
- Work closely with the Engineering teams in ensuring good practices are followed.
- Investigate and benchmark other Big Data solutions
- Experience with Graphite or other monitoring tools and implementing graphs
Your background and who you are:
- 5+ years of experience in the job offered or a related field
- Extensive knowledge of Hadoop, HBase and its internals
- Knowledge in Storm and Kafka cluster administration
- Expertise in large scale, high volume operations environments
- Optimize clusters to its peak performance under heavy load
- Experience in automation using bash or python
- Good understanding and knowledge of Linux (CentOS)
- BS in Computer Science or a related field
- Knowledge in Druid cluster administration
- Knowledge/Experience in Kubernetes, dockers and OpenShift Administration
MZ is an equal opportunity employer and considers qualified applicants without regard to race, gender, sexual orientation, gender identity or expression, genetic information, national origin, age, disability, medical condition, religion, marital status or veteran status, or any other basis protected by law.
About Machine Zone
Machine Zone is a global leader in mobile gaming, with a track record of delivering some of the world’s most successful mobile games including Game of War, Mobile Strike, and Final Fantasy XV: A New Empire. We combine the power of technology and creative vision to create experiences that connect people from all corners of the globe.
Our massive mobile games break down linguistic and geographic barriers by uniting an unprecedented number of global players in one gaming world. We empower our game developers to push the boundaries of innovation in a player-driven ecosystem.
Everything we’ve built, we’ve built together. We are passionate about what we do. We take risks and push the boundaries of what is possible. We challenge each other continuously, learn from our mistakes, and support one another. We triumph together!
Skip straight to final-round interviews by applying through Triplebyte.