Running AI Models on Cloudflare Workers AI

Workers AI provides serverless GPU inference — run LLMs, image models, and embeddings with zero infrastructure.

```bash

npm create cloudflare@latest my-ai-app

cd my-ai-app

```

```typescript

export default {

async fetch(request, env) {

const response = await env.AI.run(

'@cf/meta/llama-3.2-3b-instruct',

{

messages: [

{ role: 'user', content: 'What is machine learning?' }

max_tokens: 256,

}

)

return Response.json(response)

}

```

```typescript

const image = await env.AI.run(

'@cf/stabilityai/stable-diffusion-xl-base-1.0',

{ prompt: 'A futuristic cityscape at sunset' }

)

return new Response(image, { headers: { 'Content-Type': 'image/png' } })

```

```typescript

const embeddings = await env.AI.run(

'@cf/baai/bge-base-en-v1.5',

{ text: ['Your document text'] }

)

// Store in Vectorize for similarity search

```

- **LLMs**: Llama 3.2, Gemma 2, Mistral, Qwen

- **Image**: Stable Diffusion XL

- **Embeddings**: BGE, GTE

- **Speech**: Whisper

- **Translation**: M2M-100