Principal Member of Technical Staff, AI Workload Orchestration
Oracle Cloud Infrastructure
Principal Member of Technical Staff, AI Workload Orchestration
Austin, TX, United States
- Job Identification 271248
- Job Category Product Development
- Posting Date 12/17/2024, 03:32 PM
- Role Individual Contributor
- Job Type Regular Employee
- Experience Level Professional
- Does this position require a security clearance? No
- Years 6 to 10+ years
- Applicants Less than 10 applicants
- Applicants are required to read, write, and speak the following languages English
Job Description
We are looking for a highly skilled distributed systems engineer to optimize Kubernetes schedulers for AI workloads to increase GPU workload utilization and throughput. In this role, you will ensure top performance for AI workloads scheduled on our platform. You will provide technical leadership to the team and bring clarity to ambiguous problems and come up with innovative solutions that make it easy for our customers to deploy AI workloads on our GPU infrastructure. You will collaborate with cross-functional teams to enhance GPU control plane and GPU data plane to deliver exceptional customer experience.
- BS (or equivalent experience) in Computer Science, Engineering, or related field.
- 6 years of experience in software development with programming languages including, but not limited to, C, C++, C#, Java, Go, Rust.
- 3 years of experience designing and developing large-scale infrastructure, distributed systems, and services.
- 1 year of experience with Kubernetes.
- 1 year of experience providing technical leadership and clarity to cross-functional teams and projects while collaborating across stake holders.
- Systematic problem-solving approach, strong communication skills, a sense of ownership, and drive.
- Ability to adapt to a fast-paced, dynamic environment and manage multiple tasks and priorities effectively.
- Experience in scheduling high-performance workloads on Kubernetes using tools like Apache YuniKorn, Volcano, Slurm.
Experience in managing cloud infrastructure with hundreds of thousands of servers
Career Level - IC4
Responsibilities
- Design and develop orchestration solutions to optimize Kubernetes schedulers for AI workloads to increase GPU workload utilization and throughput, and to ensure top performance for AI workloads scheduled on our platform.
- Develop “best-in-class” AI workload orchestration system for our customers by ensuring that the services and the components are well-defined and modularized, secure, reliable, diagnosable, actively monitored, compliant and reusable.
- Collaborate with cross-functional teams, including development, operations, and product management, to understand their requirements and design innovative orchestration solutions.
- Mentor junior developers and drive modern software engineering practices like leveraging data/telemetry to make decisions, well-defined interfaces across components, design reviews, coding standards, code reviews, and comprehensive coverage from unit test, integration test and active production monitoring.
- Develop benchmark metrics and automation to drive and track performance and reliability across customer workload and lower infrastructure stack.
Qualifications
About Us
As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s problems. True innovation starts with diverse perspectives and various abilities and backgrounds.
When everyone’s voice is heard, we’re inspired to go beyond what’s been done before. It’s why we’re committed to expanding our inclusive workforce that promotes diverse insights and perspectives.
We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by calling +1 888 404 2494, option one.
Disclaimer:
Oracle is an Equal Employment Opportunity Employer*. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
* Which includes being a United States Affirmative Action Employer
Request a referral from an Oracle employee.
Similar Jobs