Native Multimodality
Gemini is natively multimodal — it processes text, images, audio, and video in a single request without separate model calls. This makes complex analysis workflows dramatically simpler.
Image Analysis
```javascript
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
const image = {
inlineData: {
data: Buffer.from(fs.readFileSync("screenshot.png")).toString("base64"),
mimeType: "image/png",
},
};
const result = await model.generateContent([image, "Describe any bugs visible in this UI"]);
```
Video Processing
Upload video files via the File API for analysis. Gemini 2.5 Pro can process up to 1 hour of video with full context.
Audio Transcription + Analysis
Send audio files directly to Gemini for transcription, summarization, or sentiment analysis in a single API call.
1M Token Context
Gemini 2.5 Pro's 1M context window enables full codebase analysis, entire book summaries, or processing hour-long meeting recordings.