On‑premise AI · your data stays in your building

Local AI hardware and software, built to run open LLMs on‑prem.

Pico Software designs, builds and configures on‑premise hardware — and installs the full local‑LLM software stack — so companies and individuals can run today's best open‑weight models in‑house. No cloud. No prompts or documents leaving your network.

Apache 2.0 / MIT open models Data never leaves your network UK build, install & support UK/EU Cloud servers available Custom business context / LLM fine‑tuning available
On‑prem inference node
In your building · air‑gapped capable
Memory in use
Llama 3.3 70B running
Qwen3‑235B‑A22B fits @4‑bit
GLM‑4.5 · 355B fits @4‑bit
Running locally · 0 bytes sent to cloud
Hardware + software, end to end

Everything you need to run AI in‑house

We supply the box and the brains: the right on‑prem hardware, the local‑LLM software stack, and the help to get it running for your team.

Hardware, built & shipped

From a silent Mac mini to multi‑GPU RTX 5090 workstations and RTX PRO 6000 Blackwell nodes — specced, assembled and burned‑in for inference.

Local‑LLM software stack

We install and tune the inference stack — open‑weight models, runtime, OpenAI‑compatible API and a chat UI — so your team can chat, code and build on day one.

Setup, tuning & support

On‑site or remote install, model selection, quantisation and ongoing support — sized to your workloads, context lengths and budget.

Total data sovereignty

Everything runs inside your building. Prompts, documents and model weights stay on your network — ideal for regulated and privacy‑first teams.

Hardware tiers & pricing

Pick the box that fits your models

Memory capacity decides which models fit; bandwidth decides how fast they generate. Each tier lists the largest open‑weight models that fit in memory at 4‑bit.

Mac Mini
Compact & silent
Apple M4 / M4 Pro Up to 48GB unified memory
Approx. cost £800–£1,400 single box · entry on‑prem
Get a quote
Largest models it runs (4‑bit)
  • Llama 3.3 70B 70B
  • Mixtral 8×7B 46.7B
  • Qwen3‑32B 32B
RTX PRO 6000 Blackwell
Prosumer flagship
1–2× RTX PRO 6000 Blackwell 96–192GB ECC GDDR7
Approx. cost £8,000–£18,000 1–2 cards · dept. server
Get a quote
Largest models it runs (4‑bit)
  • GLM‑4.5 355B
  • Qwen3‑235B‑A22B 235B
  • Command A+ 218B

Indicative UK pricing (mid‑2026) and example open‑weight models — the largest that fit in memory at 4‑bit quantisation, where all parameters must fit in RAM/VRAM (Mixture‑of‑Experts models keep every parameter resident even though only a subset is active per token). Prices were moving month‑to‑month under a DRAM shortage; figures are from our June 2026 on‑prem LLM hardware report — please re‑check before ordering.

Why on‑premise

Keep the model — and the data — in the building

  • 1

    Your data never leaves your network

    Run inference entirely on‑prem — no prompts, files or embeddings sent to third‑party APIs.

  • 2

    Right‑sized to the models you need

    Memory capacity decides which models fit; we match the hardware to your target models and context lengths.

  • 3

    Open, commercially‑usable weights

    We focus on Apache 2.0 / MIT models you can deploy and build on commercially — and we flag the licensing traps.

  • 4

    Built, installed & supported in the UK

    Assembly, burn‑in, on‑site install and ongoing support — whether it's one desktop or a department server.

On‑prem, by the numbers

0bytes of your data sent to the cloud during inference.
70–355Bparameter open models running on a single workstation.
4‑bitquantisation fits big models into a fraction of the memory.
UKdesign, build, install and support — start to finish.
About Pico Software

Engineers who build what they sell

We're a small, focused UK team that designs on‑premise AI hardware and installs the full software stack — so you can run open‑weight LLMs locally without depending on cloud providers.

  • Hands‑on, end to end

    We spec the components, build and burn‑in every machine, install the inference stack and hand it over running. One team, no middlemen.

  • UK‑based, UK‑built

    Pico Software is a trading name of Pina Colada Software Limited (Company 9605428), registered in England. All hardware is assembled and supported from the UK.

  • Privacy as a first principle

    We started this business because we believe companies and individuals deserve to use powerful AI without handing their data to third parties.

What you get

HardwarePurpose‑built workstations and servers specced for the models you need — from a quiet Mac mini to multi‑GPU NVIDIA nodes.
SoftwareA fully configured local‑LLM stack: inference runtime, OpenAI‑compatible API, chat UI, and your chosen open‑weight models — ready on day one.
SupportOn‑site or remote install, model updates, quantisation advice and ongoing help — for as long as you need it.

Bring AI in‑house.

Tell us your use case and target models — we'll spec the right on‑prem build and quote it.

sales@pico-software.com