live · on your machine · no API key v3.5.x

LM Studio: the local LLM desktop app for Mac & Windows

LM Studio is the desktop application that downloads, configures and runs open-source language models directly on your Mac or Windows PC. No cloud account, no per-token billing, no prompt ever leaves the machine you typed it on.

// playground

Try LM Studio with a local LLM, right here

Pick a model on the left, type a prompt or click a chip, and watch a simulated streaming response with live tokens-per-second readout. This is a deterministic mock — the real thing runs on your hardware after install.

Models · GGUF

Pick a prompt below or type your own. The mock streams tokens at the picked model's typical local throughput.

▸ no network call · the playground responds deterministically from a small set of curated answers

your hardware

Will LM Studio run on my Mac or Windows PC?

Drag the RAM slider, pick your GPU class, and see which open-weight models will fit. Numbers assume Q4_K_M quantization — the default LM Studio recommends for most consumer hardware.

8162432486496128 GB

Apple Silicon (M1/M2/M3/M4) counts as "dedicated" — unified memory feeds the GPU directly.

model catalogue

LM Studio model library — open weights by family

LM Studio's model browser pulls open weights directly from Hugging Face. Below is a snapshot of the families most commonly downloaded in 2026 — click a chip to filter.

Llama-3.1-8B8 B
Meta · Llama 3.1

General-purpose instruct model. The default starting point for most LM Studio users — strong reasoning, fast on 16 GB hardware.

4.6 GB Q4_K_M8K ctx
Llama-3.1-70B70 B
Meta · Llama 3.1

Heavy-class generalist. Comfortable on a 64 GB Apple Silicon Mac or a 48 GB+ Nvidia rig; reasoning approaches GPT-4 class.

42 GB Q4_K_M128K ctx
Qwen3-14B14 B
Alibaba · Qwen3

Strong multilingual model with native tool-calling support. The current sweet spot for 32 GB machines.

9.8 GB Q5_K_M32K ctx
DeepSeek-R1-7B7 B
DeepSeek · R1 distill

Reasoning-tuned distill that shows its work. Excellent for code review and step-by-step math, on a 16 GB machine.

4.1 GB Q4_016K ctx
Mistral-Nemo-12B12 B
Mistral · Nemo

Apache-licensed mid-size workhorse with a 128K context window. Popular for retrieval-augmented workflows.

7.5 GB Q4_K_M128K ctx
gpt-oss-20B20 B
OpenAI · open release

OpenAI's 2025 open-weight release. Slower per token than smaller models but the highest local quality short of 70B-class.

12.4 GB Q4_K_M32K ctx
Gemma-3-9B9 B
Google · Gemma 3

Google's open-weight family, tuned for instruction following with light hardware footprint.

5.4 GB Q4_K_M8K ctx
Phi-4-14B14 B
Microsoft · Phi-4

Reasoning-focused 14B from Microsoft. Punches well above its weight on code and math benchmarks for the size.

8.7 GB Q4_K_M16K ctx
SDXL 1.03.5 B
Stability · diffusion

Stable Diffusion XL — the open-weight image generator most commonly run inside LM Studio's diffusion mode.

6.9 GB safetensors1024² px
FLUX.1-schnell12 B
Black Forest Labs · FLUX

Apache-licensed diffusion model. State-of-the-art quality at home on 24 GB+ unified or dedicated VRAM.

23 GB safetensors2048² px
tokens / second

LM Studio tokens-per-second across Mac and Windows

Illustrative throughput numbers measured at Q4_K_M quantization. Pick a hardware preset and the bars race to the matching tokens-per-second rate.

llama 3.1-8B-Q4
— tok/s
qwen 3-14B-Q5
— tok/s
deepseek r1-7B-Q4
— tok/s
gpt-oss 20B-Q4
— tok/s
phi 4-14B-Q4
— tok/s
platforms

LM Studio Mac vs LM Studio Windows — same UI, different back-ends

Same UI, same model catalogue, but the back-end stack changes per OS. The lm studio mac edition has the MLX runtime; the lmstudio windows edition has CUDA, ROCm and XPU.

lm studio mac

macOS 13.4 or later · Apple Silicon native

Notarized .dmg with the MLX runtime for measurably faster on-device inference on M-series chips. Integrates with Spotlight, Shortcuts and the menu bar.

  • Native Apple Silicon (M1 onwards)
  • MLX runtime for diffusion + LLMs
  • Unified memory used as VRAM
  • Shortcuts & CLI hooks
16 GB
min unified RAM
≥ M1
Apple Silicon
.dmg
notarized installer

lmstudio windows

Windows 10 / 11 · x64 + ARM64

Signed MSI installer with full GPU acceleration across NVIDIA CUDA, AMD ROCm and Intel Arc XPU back-ends. SmartScreen-clean, AppLocker-friendly.

  • NVIDIA CUDA back-end
  • AMD ROCm back-end
  • Intel Arc / XPU acceleration
  • Vulkan fallback for any GPU
16 GB
min system RAM
8 GB
recommended VRAM
.msi
signed installer
// marketplace

Extend the app with lm studio plugins

The plugin runtime turns LM Studio into a programmable surface. Browse community extensions, install with one click — every plugin runs in a sandboxed worker on your machine.

web-search
Search & RAG

Live web search before generation. Hooks into SearXNG or a self-hosted Brave Search endpoint.

↓ 42k · ★ 4.8
rag-folder
Search & RAG

Index a local folder of PDFs and Markdown, then chat against it with citations.

↓ 31k · ★ 4.7
code-runner
Code

Sandboxed Python and JavaScript execution for tool-calling workflows. Results stream back as messages.

↓ 28k · ★ 4.9
git-context
Code

Drops the current Git diff into the system prompt — great for code review with a local model.

↓ 14k · ★ 4.6
mcp-bridge
Tools

Connect any MCP server (Model Context Protocol) to your local model as a tool-calling target.

↓ 18k · ★ 4.8
shell-call
Tools

Allow the model to run whitelisted shell commands. Useful for local agentic workflows.

↓ 9k · ★ 4.4
theme-mono
UI

Monochrome editor theme with monospace everywhere and reduced motion.

↓ 6k · ★ 4.5
cmd-palette
UI

⌘K-style global command palette across models, prompts and conversations.

↓ 12k · ★ 4.7
quantization

LM Studio quantization explained — Q4, Q5, Q8 and what they cost you

Quantization shrinks a model's weights by storing them in fewer bits. The slider walks you through the common levels — watch the file size shrink and the quality meter drop in real time.

Q4_K_MQuantization level 5 of 7
FP16Q8Q6Q5Q4Q3Q2
File size (8 B model)— GB
Output quality (vs FP16)— %

▸ Q-suffixes from llama.cpp · K_M variants pack the most quality per byte at most levels

LM Studio is a desktop application that runs large language models on your own computer — no cloud account, no API key, no per-token billing. Built for Mac and Windows users who want to chat with open-source AI privately, the lm studio app turns a laptop into a self-contained AI workstation. It ships with a polished model browser that pulls open weights from Hugging Face, a fast inference engine with GPU and Apple Silicon acceleration, an OpenAI-compatible local server for developers, a plugin system for community extensions and recent support for image generation. The result is the most beginner-friendly entry point into the local AI ecosystem.

LM Studio in plain English

The plainest answer to what is lm studio is this: a desktop GUI for downloading, configuring and running open-source language models on local hardware. The longer answer is that lm studio is the most polished consumer application in the local AI tooling category — a category that two years ago required a terminal, a Python environment and a working knowledge of llama.cpp, and now requires nothing beyond an installer. The lm studio ai workflow is straightforward: open the app, browse the model catalog by quantization and parameter count, click download, then start a conversation in the built-in chat. No account is needed, no usage data is uploaded, and everything you type stays on the machine you typed it on.

How LM Studio handles security on Mac and Windows

For a tool that downloads model weights and exposes an inference server on your machine, lm studio is reassuringly conservative. The application is code-signed on Mac and Windows, installers are scanned cleanly on independent services, and the runtime itself does not phone home with the contents of your prompts. Models are pulled directly from Hugging Face over HTTPS using their published hashes, so a downloaded file can be verified byte-for-byte against the original. The local API server runs on a port you control (default 1234), binds to localhost by default, and never exposes your machine to the public internet unless you opt in to it.

You learn what a local LLM is really capable of the first time you run a 14B-parameter model on your own laptop and realize the response would have cost forty cents on a hosted API — and it stayed on your machine. Kenji Nakamura — Senior LLM Tooling Editor

The lm studio mac edition: Apple Silicon and MLX

The lm studio mac edition is shipped as a notarized .dmg, runs natively on Apple Silicon (M1 onwards) and requires macOS 13.4 or later. On Apple Silicon Macs it has one major advantage over the Windows build: support for the MLX runtime, Apple's own on-device machine learning framework, which delivers measurably higher tokens-per-second on quantized models than the cross-platform llama.cpp backend. A 7B parameter model runs comfortably on a 16 GB M-series Mac; 13B and 14B models become practical at 32 GB. The lmstudio mac install integrates cleanly with the menu bar and the Notification Center, and the model directory can be relocated to an external SSD when on-device storage runs short.

The lmstudio windows edition: CUDA, ROCm and XPU

On the PC side, the lm studio download windows installer is offered as a standard MSI for Windows 10 and Windows 11, with both x64 and ARM64 builds. The lmstudio windows edition is functionally identical to the Mac build with one big platform-specific addition: full GPU acceleration through NVIDIA CUDA, AMD ROCm and Intel Arc XPU back-ends. On a desktop with a recent NVIDIA card, inference speed on a 13B model can reach two to three times the throughput of a comparable Mac mini, simply because dedicated GPU memory removes the unified-memory bottleneck. The Windows installer is digitally signed and registers cleanly with SmartScreen, so corporate or AppLocker policies treat it as a trusted publisher.

Running open models with the lm studio local llm workflow

The lm studio local llm workflow has three steps: pick a model in the browser, download it to disk, load it into memory and start chatting. Behind that simplicity the lm studio local llm app is doing the heavy lifting of memory mapping, quantization detection, KV-cache management and context-window routing. As a lm studio local llm desktop app it competes directly with Ollama, GPT4All and Jan, and it differentiates itself with a noticeably better visual model catalogue (browsing by family, parameter count, quantization and licence), real-time tokens-per-second readout, and a side-by-side comparison mode that lets you load two models at once and prompt them against the same input.

Common model families

  • Llama family — Llama 3.1/3.2/4 in 8B, 70B and 405B sizes, with multiple quantizations.
  • Qwen, DeepSeek, Mistral, Gemma, gpt-oss and Phi — broad coverage of open-weight releases from major labs.
The plugin runtime is what turned LM Studio into a programmable surface — a place where a 14B model on your own hardware can finally act on the outside world without leaving your machine. Kenji Nakamura — Senior LLM Tooling Editor

Extending the app with lm studio plugins

The lm studio plugins system shipped in late 2024 and turned the application from a polished chat client into an extensible runtime. Plugins are written in TypeScript or Python, run inside the application's sandboxed worker, and can intercept inference requests, add new prompt processors, attach tool-calling backends, or expose entirely new interfaces. Common community plugins cover web search, code interpretation, retrieval-augmented generation over a local document folder, and integration with external API services that you choose explicitly. The plugin marketplace is curated by the publisher, so installing a community extension does not require sideloading from an arbitrary GitHub release. Each plugin declares its required permissions up front — network access, file-system read, or both — and the user can revoke them later from Settings without removing the plugin itself.

LM Studio image generation and multimodal models

The lm studio image generation feature joined the application alongside text generation and uses local diffusion models — Stable Diffusion, SDXL, FLUX-class — through the same model browser interface that handles language models. Memory requirements are heavier than text inference: SDXL needs about 12 GB of VRAM for comfortable generation, FLUX-class models start around 16 GB. On Apple Silicon the MLX runtime extends to diffusion models, and generation speed on an M3 Max is competitive with mid-range NVIDIA hardware. Outputs land in a configurable local folder, EXIF data records the prompt and seed, and there is no upload step, watermark or external API call involved. For anyone running batch generation overnight, the publisher's CLI exposes a non-interactive mode that loads a model once and processes prompts in a loop until the queue is empty.

The lm studio download: getting started cleanly

The cleanest lm studio download path is the publisher's official website, which auto-detects the visiting operating system and serves the matching installer. The Mac edition is a notarized .dmg suitable for drag-and-drop installation, the Windows edition is a standard MSI that supports unattended deployment in managed environments, and a Linux AppImage is offered for completeness. All releases are GPG-signed and the checksums are published alongside the binaries on the download page. After install, the application offers a first-run experience that walks new users through model selection, recommended quantizations for their hardware, and a sample chat — no account required at any step.

Final word: why lmstudio still leads the category

For anyone evaluating local AI in 2026, lmstudio remains the most accessible starting point — the UI does for local LLMs what early Bittorrent clients did for peer-to-peer file sharing. The combination of lm studio mac and lm studio for windows builds, Apple Silicon acceleration, NVIDIA/AMD/Intel GPU support, plugins, image generation and an OpenAI-compatible local server covers virtually every use case a power user might bring. Hardware is the only real constraint: 16 GB is the practical floor, 32 GB or a dedicated GPU is where the experience becomes pleasant.

help

LM Studio FAQ

What is the minimum hardware I need for LM Studio?
A 7B parameter model is comfortable on a 16 GB Apple Silicon Mac or a Windows laptop with 16 GB RAM and a recent GPU. For 13B–14B models, 32 GB of unified memory or a dedicated GPU with 12 GB+ VRAM is the practical minimum.
Can I run LM Studio on a machine without a GPU?
Yes. Smaller models (3B and 7B) run on CPU-only Mac and Windows systems via llama.cpp. Tokens-per-second will be lower than on a GPU, but the workflow is identical and the chat interface behaves the same way.
How much disk space do I need for the model files?
Plan for 5-15 GB per model at Q4_K_M for 7B-14B sizes, and 40 GB+ for 70B-class models. The model directory can be moved to an external SSD in Settings.
Does LM Studio send my prompts to a cloud server?
No. All inference happens on your local machine. The application only contacts external services to list models from Hugging Face and (optionally) to check for plugin updates. Your prompts and generated text never leave the device.
Is the OpenAI-compatible API server safe to leave running?
Yes. The server binds to 127.0.0.1 by default, which means only programs on the same machine can reach it. Exposing it on a LAN or to the internet requires changing the bind address explicitly.
What data does the app collect?
Anonymous launch telemetry can be disabled in Settings. Prompts, conversations and generated content are never collected — those stay in the local SQLite database that the application maintains in your user-data directory.
Which model formats does LM Studio support?
GGUF is the primary format (the llama.cpp-compatible quantization standard). On Apple Silicon, MLX-format models are also supported with the MLX runtime enabled. Both are browsable directly inside the model catalogue.
Where do downloaded models live on disk?
Inside the application's data directory by default. The location can be changed in Settings — which is the recommended approach if you want to keep multi-gigabyte model files on an external SSD instead of the system drive.
Can the app run image-generation models too?
Yes. Stable Diffusion, SDXL and FLUX-class diffusion models load through the same model browser. Memory requirements are heavier than text — SDXL needs ~12 GB of VRAM, FLUX starts around 16 GB.
Is there a CLI?
Yes — the lms command-line tool handles model loading, the local server lifecycle and plugin installs. Useful for headless boxes, CI workflows and scripted automation.
Does the local API really replace OpenAI in my code?
For chat completions and embeddings, yes. Point the OpenAI SDK at http://localhost:1234/v1 with any string as the API key and the SDK behaves identically — including streaming, system prompts and tool calls on models that support them.
How do I write a plugin?
Plugins are TypeScript or Python projects scaffolded by lms plugin create. The runtime exposes hooks for prompt processing, tool calling and new UI surfaces, all inside a sandboxed worker.

Download LM Studio for Mac and Windows

Install LM Studio on your Mac or Windows PC and try a local model end to end — chat, image generation and the OpenAI-compatible server, all on-device.

→ STEP 01

Download

Pull the installer from the publisher's official site or the platform store.

→ STEP 02

Pick a model

The first-run wizard recommends a Q4_K_M model sized to your hardware.

→ STEP 03

Chat

Start a conversation in the built-in interface — no account, no API key.