Design and implement high-performance inference pipelines
Optimize model serving for throughput, latency, and cost across different workloads
Collaborate with research and product teams to integrate inference into real-world applications
Help enhance and manage the deployment pipeline and monitor production clusters
Debug production inference issues
Stay up-to-date with the latest in inference tech and open-source frameworks

Qualifications:

Deep experience developing and tuning LLM inference frameworks (e.g. vLLM)
Solid communication skills; ability to work independently and within a team
Experience with cloud infrastructure (AWS, GCP, Azure) and Kubernetes
Passion for AI and practical ML systems
Experience building, deploying and operating highly available, scalable, distributed cloud services.