Running AI Models on Cloudflare Workers AI
Workers AI provides serverless GPU inference — run LLMs, image models, and embeddings with zero infrastructure.
Setup
```bash
npm create cloudflare@latest my-ai-app
cd my-ai-app
```
Basic LLM Inference
```typescript
export default {
async fetch(request, env) {
const response = await env.AI.run(
'@cf/meta/llama-3.2-3b-instruct',
{
messages: [
{ role: 'user', content: 'What is machine learning?' }
],
max_tokens: 256,
}
)
return Response.json(response)
},
}
```
Image Generation
```typescript
const image = await env.AI.run(
'@cf/stabilityai/stable-diffusion-xl-base-1.0',
{ prompt: 'A futuristic cityscape at sunset' }
)
return new Response(image, { headers: { 'Content-Type': 'image/png' } })
```
Embeddings for RAG
```typescript
const embeddings = await env.AI.run(
'@cf/baai/bge-base-en-v1.5',
{ text: ['Your document text'] }
)
// Store in Vectorize for similarity search
```
Available Models (50+)
- **LLMs**: Llama 3.2, Gemma 2, Mistral, Qwen
- **Image**: Stable Diffusion XL
- **Embeddings**: BGE, GTE
- **Speech**: Whisper
- **Translation**: M2M-100