On‑premise AI · your data stays in your building

Local AI hardware and software, built to run open LLMs on‑prem.

Pico Software designs, builds and configures on‑premise hardware — and installs the full local‑LLM software stack — so companies and individuals can run today's best open‑weight models in‑house. No cloud. No prompts or documents leaving your network.

Apache 2.0 / MIT open models Data never leaves your network UK build, install & support
On‑prem inference node
In your building · air‑gapped capable
Memory in use
Llama 3.3 70B running
Qwen3‑235B‑A22B fits @4‑bit
GLM‑4.5 · 355B fits @4‑bit
Running locally · 0 bytes sent to cloud
Hardware + software, end to end

Everything you need to run AI in‑house

We supply the box and the brains: the right on‑prem hardware, the local‑LLM software stack, and the help to get it running for your team.

Hardware, built & shipped

From a silent Mac mini to multi‑GPU RTX 5090 workstations and RTX PRO 6000 Blackwell nodes — specced, assembled and burned‑in for inference.

Local‑LLM software stack

We install and tune the inference stack — open‑weight models, runtime, OpenAI‑compatible API and a chat UI — so your team can chat, code and build on day one.

Setup, tuning & support

On‑site or remote install, model selection, quantisation and ongoing support — sized to your workloads, context lengths and budget.

Total data sovereignty

Everything runs inside your building. Prompts, documents and model weights stay on your network — ideal for regulated and privacy‑first teams.

Hardware tiers & pricing

Pick the box that fits your models

Memory capacity decides which models fit; bandwidth decides how fast they generate. Each tier lists the largest open‑weight models that fit in memory at 4‑bit.

Mac Mini
Compact & silent
Apple M4 / M4 Pro Up to 48GB unified memory
Approx. cost £800–£1,400 single box · entry on‑prem
Get a quote
Largest models it runs (4‑bit)
  • Llama 3.3 70B 70B
  • Mixtral 8×7B 46.7B
  • Qwen3‑32B 32B
RTX PRO 6000 Blackwell
Prosumer flagship
1–2× RTX PRO 6000 Blackwell 96–192GB ECC GDDR7
Approx. cost £8,000–£18,000 1–2 cards · dept. server
Get a quote
Largest models it runs (4‑bit)
  • GLM‑4.5 355B
  • Qwen3‑235B‑A22B 235B
  • Command A+ 218B

Indicative UK pricing (mid‑2026) and example open‑weight models — the largest that fit in memory at 4‑bit quantisation, where all parameters must fit in RAM/VRAM (Mixture‑of‑Experts models keep every parameter resident even though only a subset is active per token). Prices were moving month‑to‑month under a DRAM shortage; figures are from our June 2026 on‑prem LLM hardware report — please re‑check before ordering.

Why on‑premise

Keep the model — and the data — in the building

  • 1

    Your data never leaves your network

    Run inference entirely on‑prem — no prompts, files or embeddings sent to third‑party APIs.

  • 2

    Right‑sized to the models you need

    Memory capacity decides which models fit; we match the hardware to your target models and context lengths.

  • 3

    Open, commercially‑usable weights

    We focus on Apache 2.0 / MIT models you can deploy and build on commercially — and we flag the licensing traps.

  • 4

    Built, installed & supported in the UK

    Assembly, burn‑in, on‑site install and ongoing support — whether it's one desktop or a department server.

On‑prem, by the numbers

0bytes of your data sent to the cloud during inference.
70–355Bparameter open models running on a single workstation.
4‑bitquantisation fits big models into a fraction of the memory.
UKdesign, build, install and support — start to finish.

Bring AI in‑house.

Tell us your use case and target models — we'll spec the right on‑prem build and quote it.

sales@pico-software.com