Run AI models locally — Ollama, LM Studio, vLLM
New API authentication middleware supporting API keys and JWTs has been released for enhanced security.
Local AI Models now offers advanced 4-bit and 2-bit quantization to reduce model memory requirements.
Local AI Models introduces new quantization techniques to boost on-device model performance and reduce resource consumption.
Local AI Models now supports advanced quantization methods for faster, more efficient local AI model inference.
Local AI Models has launched a new API endpoint for real-time text embedding generation.
Local AI Models now supports model quantization, improving performance and reducing memory usage for local deployments.
Local AI Models has updated its real-time text generation API endpoint for increased stability and lower latency.
Local AI Models now offers improved quantization features for running LLMs efficiently on limited hardware.
Local AI Models has boosted performance through new optimized quantization techniques for on-device models.
Local AI Models introduces new API endpoints for enhanced control and batch processing of its image generation models.
Local AI Models has improved inference speed and reduced memory usage through updated quantization techniques.
Local AI Models has launched a new API endpoint enabling developers to fine-tune models locally.
Local AI Models now supports quantization for faster and more memory-efficient model inference.
An urgent security update has been released to fix a potential vulnerability in the model loading process; users should update immediately.
Local AI Models now offers enhanced fine-tuning options for open-source language models, allowing for greater customization with user datasets.
Local AI Models introduces a new API endpoint for optimized, low-latency real-time text generation.
Local AI Models now offers faster inference and lower memory usage with new automatic quantization features.
Local AI Models API v2.1 improves real-time streaming and provides more detailed error handling for developers.
Local AI Models now offers advanced quantization techniques to speed up model inference and reduce resource requirements.
Local AI Models now offers a new API endpoint for handling asynchronous AI tasks, improving scalability.
Local AI Models has rolled out enhanced quantization methods to boost inference speed and reduce model size.
Ollama 0.6 adds multi-GPU support, instant model switching, and AMD ROCm compatibility.