Database Reliability Engineer
Seattle, WA, USA
This job was posted on:
September 2, 2020
$ - $
The Database Reliability team at Outreach is responsible for automating management and providing critical support for all stateful systems that the Outreach Engineers use to build their services. In this capacity, DBREs are peers to the other SRE teams and bring stateful experience to the SRE and product engineering teams. That means we need to be empathetic to the needs of our co-workers in the performance of their jobs. It also means that we must be pretty focused on how our systems are performing according to our SLOs and SLIs.
About the Team
As a small team, DBREs are providing guidelines and best practices for how stateful services are used at Outreach and to provide critical support for the management of these services. The DBRE’s purview includes RDBMS systems, Queues, NoSQL and Object storage. These systems are mostly managed systems in AWS - RDS, ElasticCache, etc - along with self hosted services - Rabbit, Kafka, etc. With the continued growth of Outreach it’s critical that we automate the provisioning and management of these systems. We are looking to grow the team with experienced software engineers with deep technical knowledge of managing databases and queues at scale in a production environment.
You should join our team if you have an interest in Kafka, MySQL, PostgreSQL and managing these systems at scale.
Your Daily Adventures Include
- Ensure database reliability and performance aspects for Outreach.io from within the SRE team as well as across teams as needed.
- Tackle the automation of stateful infrastructure and help engineering succeed by providing self-service tools.
- Analyze solutions and implement best practices for our database clusters and other stateful components.
- Develop solutions for migrating data between systems.
- Provide database expertise to engineering teams (for example through reviews of database migrations, queries and performance optimizations).
- Work with peer SREs to roll out changes to our production environment and help mitigate database-related production incidents.
- OnCall support on rotation within the SRE organization.
- Support and debug database production issues across services and levels of the stack.
- Proactively configure monitors and alerts to alert on symptoms and not on outages.
- Document every action so your learnings turn into repeatable actions and then into automation.
- Demonstrable expertise in database performance
- Infrastructure as code with tools like chef, cloud formation and terraform
- Experience developing large-scale schema migration solutions
- Experience with database performance analysis in either MySQL or PostgreSQL
- Experience with NoSQL datastores like DynamoDB or Cassandra
- Demonstrated ability to implement proper test coverage
- Experience coaching and mentoring junior engineers
- Experience writing, deploying and maintaining production code at scale (production Go experience preferred)
- Experience in disaster recovery planning and execution
- Familiarity with running Kafka in production environments under heavy load
- Understanding of microservice architecture and best practicesKnowledge of Cloud native architecture