Sr. Engineer Site Reliability @ Warner Bros. Entertainment Group - Burbank, CA
Sr. Engineer Site Reliability
Post a job for free in Burbank, CA
Business: Warner Bros. Entertainment Group
Position Type: Full TimeJob ID 177557BR
WarnerMedia is a leading media and entertainment company that creates and distributes premium and popular content from a diverse array of talented storytellers and journalists to global audiences through its consumer brands including: HBO, HBO Now, HBO Max, Warner Bros., TNT, TBS, truTV, CNN, DC Entertainment, New Line, Cartoon Network, Adult Swim, Turner Classic Movies and others.
Warner Media, LLC and its subsidiaries are equal opportunity employers. Qualified candidates will receive consideration for employment without regard to race, color, religion, national origin, gender, sexual orientation, gender identity or expression, age, mental or physical disability, and genetic information, marital status, citizenship status, military status, protected veteran status or any other category protected by law.
Business Unit Overview
WB Technology combines Warner Bros.’ industry-leading technologists and disciplines to ensure global alignment with business strategy and accelerated delivery of innovative technology solutions studio- and industry-wide. From pre-production through archiving, the WBT organization will provide critical business and technology intelligence and services to all Studio business units. WBT manages the Studio’s enterprise systems and solutions, emerging platforms, information security, consumer intelligence, content mastering and delivery, and more.
Warner Bros. has been entertaining audiences for more than 90 years through the world’s most-loved characters and franchises. Warner Bros. employs people all over the world in a wide variety of disciplines. We're always on the lookout for energetic, creative people to join our team.
The Site Reliability Engineer (SRE) will be an integral part of the Security and System Engineering team (SSE) working alongside the Engineering, Analytics and Data Science teams driving operational efficiency, ensuring application and platform security as well as implementing DevSecOps processes throughout the Warner Bros. Technology organization.
Site Reliability/DevSecOps/System Engineering – encompasses networking, security, software development, and server administration. SRE will:
- Identify and prioritize opportunities to create business process, workflow, or system efficiencies. Help evangelize treating infrastructure and cloud resource deployment with the same rigor as we treat app code.
- Deploy, automate, maintain and manage AWS cloud systems; ensuring availability, performance, scalability and security.
- Create solution designs that increase productivity and profitability through innovation standardization, optimization and automation.
- Create meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively.
- Collaborate with application developers to strengthen the operational viability and maturity of applications as they are designing, writing and testing their code.
- Support engineering efforts by providing tools, automation, and hardware assistance.
- Help manage the establishment and configuration of infrastructure in an agile way by storing infrastructure as code and employing automated configuration management tooling with a goal to be able to provision environments rapidly at any point in time with a strong focus on automation, scalability and reliability.
- Use Terraform and Ansible to automate all aspects of AWS resource implementation, ensuring no task that will be repeated is done manually.
- Strengthen our application and environment security, applying standards and best practices.
- Work closely with engineers and data scientists to automate CI/CD.
- Serve as subject matter expert related to our 10+ DI cloud environments;
- Analyze the effectiveness of existing processes and recommend effective solutions to improve operations.
Cloud Platform Ops
- Support all users of the DI cloud environments including (Engineering, Data science, Data Analytics, MarTech).
- Troubleshoot and solve user issues related to cloud resources across all environments.
- Draft postmortem and root cause analysis on incidents in the cloud to prevent future recurrences.
- Create runbooks and user guides to complement all processes.
- Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from cloud resources, operating systems, network, and application.
- Work across team boundaries to identify and solve pain points that affect DI team velocity and overall system reliability.
- Be an active participant in managing infrastructure operations to support regulatory compliance and Warner Media Security policies and standards.
Cloud Subject Matter Expertise
- Stay abreast of all new features and capabilities in AWS ensuring the SSE team as well as the rest of the DI organization is aware of what’s available in the market and uses the right tools for the right jobs, configured optimally from a financial and technical perspective.
- Stay on top of security best practices and well architected frameworks.
- Desirable Cloud Certifications:
- 5+ years’ experience (mandatory) cloud support environment (to include Amazon AWS).
- 5 – 10 years overall DevSecOps or Site Reliability engineering experience.
- Vast knowledge and experience with Linux OS (system Administrator).
- 5 – 10 years scripting languages, like YAML, HCL, C shell or Bash shell.
- 3+ years programming experience Python, Node.js or any organized, high level language.
- 2-3 years of experience with Terraform and Ansible.
- 5 – 7 years CI/CD pipeline development experience (i.e. Jenkins, etc.).
- 5+ years’ experience working in an Agile DevOps environment.
- Experience with containers, containers orchestration.
- Experience with firewalls, routing, switching, load balancers, security and DNS plus.
- Experience with metrics and monitoring (Datadog, Evident IO, Config, etc.).
- Business Continuity/Disaster Recovery planning concepts, strategies and methodologies.
- Knowledge of TCP/IP networking and DNS, LDAP, NFS and SMTP, load balancing and high availability architectures.
- Experience with Databases like MySQL.
- Familiarity with web-based applications and microservice architecture -advantage.
- Experience with Docker, ECS, Fargate and Kubernetes is a plus.
- Experience building effective monitoring, alerts, logging and metrics for production services.
Security Focused (Ideal Experience)
- Enterprise conformance with industry security “Best Practices”, as communicated by industry standards bodies, to include; Warner Media, AWS Security White Papers, OWASP, ISC2, ISSA, ISA, FIPS and NIST.
- Worked with enterprise clients to ensure that principal security practices are woven into design, from inception, such as least privilege, SSO, security by design, etc.
- Evaluates new Cloud technologies and Managed Service offerings, in an effort to bring those resources under management, or exclude them from use, based on perceived risk.
- Provided technical support for organizations seeking to introduce new technology into enterprise technology stacks or who could benefit from security and technical architecture consulting.
- In depth knowledge of cloud platforms.
- Knowledge of common DevOps design patterns for implementation.
- Terraform, Ansible and Packer expertise.
- In depth knowledge of testing methodologies (black, white gray box, etc.).
- In depth knowledge of pipeline and workflow build practices.
- Security compliance standards and Best Practices knowledge.
- Technical architecture expertise.
- Agile team collaboration.
- Jenkins (as well as other CI/CD tools).
- Jira, GitHub and Bitbucket.
- Atlassian tools (JIRA, Confluence and Bitbucket).
- Git and git derived tools, like GitHub, etc.
- Knowledge of GitFlow.
- Knowledge of governance, centralized logging and auditing technologies, to include SPLUNK, etc.