Running local models on Macs gets faster with Ollama’s MLX support 31.03.2026

Ollama, a system for running large language models locally, has enhanced its performance on Macs with Apple Silicon chips through the integration of Apple's open-source MLX framework. This update, available in preview with Ollama 0.19, also includes improvements to caching and support for Nvidia's NVFP4 format for more efficient memory usage in certain models. These advancements are particularly timely given the growing interest in local AI models, spurred by projects like OpenClaw and developer frustration with rate limits and costs associated with cloud-based coding tools. Currently, this new MLX support is limited to the 35 billion-parameter version of Alibaba's Qwen3.5 model and requires Macs with at least 32GB of RAM.

Ars Technica Full Article















