Why Run Models Locally?
Local AI gives you complete data privacy, zero API costs after hardware investment, and no rate limits. Essential for sensitive data workloads.
Install Ollama
Download from ollama.com for Mac, Windows, or Linux. It installs as a background service.
Run Your First Model
```bash
Download and run Llama 3.3 70B
ollama run llama3.3
Or smaller/faster Phi-4
ollama run phi4
List available models
ollama list
```
Use the OpenAI-Compatible API
Ollama exposes an API at http://localhost:11434 that is compatible with the OpenAI SDK:
```javascript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:11434/v1",
apiKey: "ollama", // required but ignored
});
const response = await client.chat.completions.create({
model: "llama3.3",
messages: [{ role: "user", content: "Hello!" }],
});
```
Hardware Requirements
- 7B models: 8GB RAM (runs on most laptops)
- 13B models: 16GB RAM
- 70B models: 48GB RAM or VRAM (M3 Max or RTX 4090)