Site Reliability Engineer
Pune, MH, IN
PubMatic (Nasdaq: PUBM) is an independent technology company maximizing customer value by delivering digital advertising’s supply chain of the future.
PubMatic’s sell-side platform empowers the world’s leading digital content creators across the open internet to control access to their inventory and increase monetization by enabling marketers to drive return on investment and reach addressable audiences across ad formats and devices.
Since 2006, our infrastructure-driven approach has allowed for the efficient processing and utilization of data in real time. By delivering scalable and flexible programmatic innovation, we improve outcomes for our customers while championing a vibrant and transparent digital advertising supply chain.
Position Description:
The Ad Server and RTB Production Infrastructure is pivotal to ensuring our software applications reliability, availability, and overall excellence. As an SRE Engineer, you will be responsible for the Ad Server and RTB Production Infrastructure. Your essential duties encompass ensuring the seamless operation and optimal performance of large-scale distributed software applications. Your role revolves around maintaining a robust and high-performing environment, contributing to the reliability of our services, and innovating solutions to guarantee 24/7 availability. By leveraging your technical expertise and dedication, you contribute to maintaining a seamless experience for our users while upholding the highest standards of operational excellence. Your specific responsibilities include:
Responsibilities:
- Operational Support
- Be a primary point of contact for operational support of multiple large-scale distributed software applications in the Ad Server environment.
- Monitor availability of applications, promptly detect anomalies, analyze the impact, debug the problems in production, and follow up for the resolution by working closely with the engineering team.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Diligently work with the engineering team to expedite the resolution of incidents and ensure a swift return to normal operations.
- Be innovative in building dashboards, adding metrics, writing automation scripts to reduce operation toil, and streamlining processes to enhance system reliability and stability.
- Design and construct software and systems to effectively manage the Ad Serving platform, its underlying infrastructure, and applications.
- On Call Availability and Support
- Work in shifts to provide continuous on-call support for the production systems and resolve issues on your own by using predefined handbooks.
- Show a sense of urgency for high-priority issues and arrange war rooms to resolve the problems.
- Provide timely updates for high-priority issues and do handovers when a problem needs to be worked out 24*7.
- Conduct post-incident reviews to identify root causes, recommend preventive measures, and contribute to a culture of learning and improvement.
Requirements:
- Total 3+ years' experience in software development.
- Ability to program using programming languages like C or C++, Scripting languages like Shell or Python.
- Good to have prior experience in technical engineering.
- A proactive approach to identify the problems, performance bottlenecks, and areas of improvement.
- Must know, Networking, Database (MySQL) and Linux System concepts, Debugging and analyzing the core dumps.
- Hands-on experience with monitoring and observability tools like Grafana, Nagios, Influx, ELK, etc.
- Familiarity with orchestration tools like Docker and Grafana and incident management systems like Zenduty.
- Excellent communication and collaboration skills, with the ability to work effectively across teams.
- Self-motivated and positive mindset to examine any incidents.
- Excellent interpersonal, written, and verbal communication skills.
Qualifications:
- B.E./ B.Tech. in Computers or equivalent.
Return to Office: PubMatic employees throughout the global have returned to our offices via a hybrid work schedule (3 days “in office” and 2 days “working remotely”) that is intended to maximize collaboration, innovation, and productivity among teams and across functions.
Benefits: Our benefits package includes the best of what leading organizations provide, such as stock options, paternity/maternity leave, healthcare insurance, broadband reimbursement. As well, when we’re back in the office, we all benefit from a kitchen loaded with healthy snacks and drinks and catered lunches and much more!
Diversity and Inclusion: PubMatic is proud to be an equal opportunity employer; we don’t just value diversity, we promote and celebrate it. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.