Job Description
THE TEAM
The Ticketmaster SRE team builds and runs large, complicated, resilient, and reliable distributed systems that operate at huge scale. Our job is to ensure appropriate reliability, both for internal and fan-facing stakeholders. Our focus is around continual improvement of systems - ensuring our capacity / saturation is optimal, and looking for opportunities to improve performance, simplify infrastructure and architecture, eliminate work that doesn't directly add value, through automation, and provide information and actionable insights to help understand our systems when they inevitably misbehave.
You will be part of the Marketplace SRE team, which is responsible for the stability
of our systems, the team is fully working with the engineer teams to deploy all requirements and keep the systems safe.
The role is remote, but there is the option to work in any Ticketmaster office, and the team convene to work together throughout the year.
THE JOB
You will be working with various development teams across US and EU timezones, most of whom have services deployed in Kubernetes, either on-prem (Rancher) or in AWS (EKS), so we would expect you to have experience of K8s or similar container orchestration platforms.
We have many complex and highly distributed systems, spanning decades of history, so we would expect you to be comfortable working in a complex, multi-technology platform. Our business model demands that we sustain exceptional high traffic at specific times, so staying calm and being able to troubleshoot under high pressure is essential.
We would expect you to make contributions to improve our internal systems and challenge their design and implementation. The systems are mostly in Go and Python, so we would need you to have skills in one, or ideally both languages. The services we support are mostly written in Java, .Net, and Javascript. Knowledge of these languages/platforms/technologies would be beneficial.
Challenging each other, and management, is mandatory in our team, so we would expect you to have the confidence to ask questions, regardless of whether they seem silly, or if they seem critical.
Our team works closely with engineers developing a wide range of fan-facing services, and we would expect you to integrate quickly with these teams and with your own team members, mindful of the challenges posed by different time zones and first languages (less than a third of our team speak English as a first language). Networking with engineers across the organisation is an important part of what we do, so we would expect you to build connections with senior engineers across Ticketmaster.
We believe that monitoring, alerting, and observability are foundational to reliability, so we would want you to be contributing to continuous improvement of measuring and alerting on both potential causes and symptoms, as we define service level indicators and objectives, and strive to meet them.
All engineers in our team are in an on-call rotation - we always seek to reduce the amount of pages and ensure that people are only ever paged when only human intervention can prevent revenue or reputational impact. We focus on redundancy and automatic healing as design fundamentals and expect on-call engineers to put in place fixes to ensure the same page doesn't happen again.
We value documentation - we continuously update our team guidebook, and READMEs for projects. We maintain tickets for project work, which are kept up to date and discussed daily, and are a very useful reference for future work and troubleshooting, so we would expect you to have a high opinion of documentation.
We aim to keep toil to considerably less than half our time, so we would expect you to look for opportunities to automate and simply to allow you and others to focus more time of project work.
The world of SRE / software engineering is in constant flux - we often must learn new things at short notice and be prepared to change technologies. We would like you to be open minded and always willing to learn and try new things.
We would expect you to show initiative and work independently in this role, although you will have guidance from management and more senior engineers in the team. Nevertheless, we would expect you to be contributing to the design and improvement of our systems and demonstrating good taste!
You will also be working with more junior engineers, and with people in teams that receive SRE support. We would expect you to teach and mentor more junior people in the areas in which you are particularly strong.
We would expect you to have had about three years of experience in progressively more complex environments. We do not have any preference for educational background or level and have no interest in professional qualifications. We optimize for a willingness to learn and grow, regardless of background.
WHAT YOU WILL BE DOING
• Support the full system lifecycle for automation and tools including the design, assessment, selection, commissioning, validation, and implementation of systems.
• Provide input into the design, development and implementation of systems automation and tooling for software engineering teams to achieve their goals.
• Work closely with peers in software engineering teams to implement solutions that are scalable, secure, and easily maintained .
• Provide infrastructure support for B2C products both in the public cloud and on premise .
• Develop tools, both command line and web based, that are responsible for maintenance and management functions of development and production systems .
• Work with systems and software engineers to develop and document requirements and functional specifications.
• Implement monitoring and health check scripts.
• Administer and develop cloud management tools (e.g. self-provisioning scripts.)
WHAT YOU NEED TO KNOW (or TECHNICAL SKILLS)
Minimum Qualifications:
• Excellent knowledge of high-level languages such as Python or batch scripting
• Previous experience of public clouds (AWS and Terraform)
• Significant experience working with CMS at large scale.
• Knowledge and experience of containers and Kubernetes cluster
• Knowledge of Gitlab CI-CD
• Technical writing skills for documenting environments and procedures
Preferred Qualifications:
• A strong understanding of core network protocols and services
• A strong L inux experience as system engineer (RHEL, centos, CoreOS )
• Experience architecting, developing, and troubleshooting systems
• Solid knowledge of working with third party APIs
• Experience with automation tools (Chef, Ansible, …)
• Experience with monitoring and alerting system
YOU (BEHAVIOURAL SKILLS)
• Autonomous and proactive.
• Self-motivated, energetic, and tenacious.
• Able to work as part of a team as well as independently.
• Enjoy working in cross functional and multidisciplinary teams.
• Flexible and pragmatic.
• Strong organisational skills and time management.
• A desire to learn and use a broad range of skills in a highly complex environment.
• Excellent analytical, problem solving and resolution skills.
• A keen interest in new technologies and open source.
• Passionate about automation and tooling.
LIFE AT TICKETMASTER
We are proud to be a part of Live Nation Entertainment, the world's largest live entertainment company.
Our vision at Ticketmaster is to connect people around the world to t
[more...]
Jobcode: Reference SBJ-dyn553-3-142-195-167-42 in your application.