Cloud Engineering Infrastructure Development
OCI (Oracle Cloud Infrastructure) AI Infrastructure is at the forefront of building a cutting-edge, ultra-high-performance GPU platform designed to support AI/ML/HPC workloads. This is your chance to be part of the AI revolution, creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance.
Our team is responsible for designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services. These are essential for running distributed AI/ML/HPC workloads across thousands of GPUs, leveraging technologies like RoCE and Infiniband.
Why Join Us?
- Innovative Projects: Build groundbreaking solutions for our customers from the ground up.
- Exciting Times: Be part of a young, fast-growing team working on ambitious new initiatives.
- Dynamic Environment: Collaborate in a vibrant, agile environment where learning and adaptability are key.
What We’re Looking For:
- Adaptable Engineers: Self-motivated individuals with a quick learning ability.
- Technical Excellence: Rock-solid developers and distributed systems engineers with a deep understanding of distributed systems and algorithms. Comfortable diving deep into any part of the stack, as well as software debugging and low-level systems troubleshooting.
- Passion for Simplicity and Scale: Value simplicity and scalability in design and implementation.
- Collaborative Spirit: Comfortable working in a collaborative, agile environment and eager to learn. Ability to collaborate effectively with various dependencies, including Network and Data Center operations.
Join us and be a part of the team that's pushing the boundaries of AI technology!
Career Level - IC5
Job Responsibilities:
- Provide technical leadership to engineering teams in the development, deployment and validation of high-speed optical transceivers (short reach to long haul), fiber optic components, passive optical components and fiber assemblies.
- Conduct in-depth analysis of system requirements and translate them into detailed design specifications.
- Perform simulations and modeling to optimize transceiver performance and system integration.
- Collaborate closely with cross-functional teams (hardware, software, test engineering, manufacturing) to ensure deployment success.
- Coordinate with qualification teams and develop and execute comprehensive test plans to evaluate link performance and reliability.
- Conduct failure analysis and root cause investigations for issues.
- Stay abreast of the latest advancements in optical technology, industry trends, and emerging standards (IEEE, OIF, and industry MSAs). Participate in industry forums.
- Support source selection and vendor due diligence by providing technical expertise.
- Contribute to roadmap development for next generation interconnects.
Qualifications:
- Master's degree in Optical Engineering, Electrical Engineering, or Physics.
- Minimum of 5 years of hands-on experience in the design, development, testing and installation of optical transceivers, fiber optic components, and passive optical components.
- Understanding of cloud infrastructure, server and switch platforms.
- Deep understanding of optical communication principles, including modulation formats, fiber optics, lasers, optical receivers, and link budgets.
- Proficiency in optical simulation and modeling tools (e.g., OptiSystem, RSoft, VPI).
- Extensive experience with optical test and measurement equipment and methodologies.
- In-depth knowledge of industry standards (IEEE, OIF, industry MSAs) and their impact on product development.
- Strong problem-solving, analytical, and troubleshooting skills.
- Excellent communication and interpersonal skills.
Preferred Qualifications:
- Experience with high-speed optical interconnect technologies (100G, 400G, 800G, etc.).
- Knowledge of transceiver design, optical packaging, assembly processes, lasers and failure modes.
- Track record of successful product development and commercialization.