Build a High-Quality AI Image Generation Inference Pipeline
We're looking for an experienced ML/MLOps engineer to design and deploy an inference pipeline that generates high-quality AI images from text prompts. The goal is a reliable, production-ready system we can call via API to produce images at consistent quality, with reasonable latency and cost.
Scope of Work
- Select and deploy a state-of-the-art text-to-image model (e.g., FLUX, Stable Diffusion 3.5 / SDXL, or your recommendation based on our needs)
- Build a clean inference API (REST endpoint) that accepts a prompt and parameters and returns generated images
- Optimize output quality: schedulers/samplers, refiners, and upscaling where appropriate
- Set up GPU-backed deployment
- Optimize for latency, throughput, and cost; implement queueing/batching if needed
- Provide documentation covering setup, API usage, parameters, and how to update or swap models
Required Skills
- Strong experience with diffusion models and image-generation inference
- Python + PyTorch; familiarity with Hugging Face Diffusers
- GPU deployment and serving experience (containers, autoscaling, cold-start handling)
- API development and basic security (auth, rate limiting)
Nice to Have
- Experience with LoRA, ControlNet, IP-Adapter, or fine-tuning
- Prompt engineering and quality benchmarking
- Cost-optimization on serverless GPU platforms
Deliverables
- Deployed, documented inference endpoint
- Source code in a repository we own
- Brief handoff doc or call