How to match CPU, RAM, storage type, and bandwidth to what your workload actually does, without overbuying or starving it.
Most VPS sizing mistakes come from guessing at peak instead of measuring the steady state. You either pay for headroom you never touch, or you starve a process that quietly degrades under load. This guide walks through how to pick CPU, memory, storage type, and bandwidth based on what your workload actually does.
Start with the bottleneck, not the spec sheet
Every workload is gated by one resource before the others. Identify which one and the rest of the decision gets simpler.
- CPU-bound: build pipelines, video transcoding, compression, anything that pins a core for sustained periods.
- Memory-bound: in-memory caches, large database working sets, JVM-heavy services, anything that pages to disk when it runs short.
- I/O-bound: write-heavy databases, queue brokers, log ingestion, small random reads and writes at high frequency.
- Network-bound: file distribution, media serving, reverse proxies fronting a lot of clients.
If you can only profile one thing, profile this. On an existing box, watch top or htop for CPU, free -m for memory pressure, iostat -x 1 for disk utilization and await, and vnstat or interface counters for throughput.
CPU: cores versus clock
Count how many things run in parallel. A single-threaded web app behind a worker pool wants higher per-core performance and a modest core count. A build server or a service with many concurrent workers wants more cores even if each is slightly slower.
A common error is buying eight cores for a process that never uses more than one. Run your service, generate representative load, and look at per-core utilization. If one core sits at 100% while the rest idle, more cores will not help. You need faster cores or a code path that parallelizes.
Memory: size for the working set plus a margin
Memory is the resource that fails the hardest. CPU contention slows you down. Running out of memory triggers the OOM killer and takes a process down with no warning.
Measure your resident set under real traffic, not at idle. For databases, account for the buffer pool or cache you have configured plus connection overhead. For application runtimes, watch behavior after the process has been running for hours, since heaps and caches grow.
A practical rule: provision so your steady-state usage sits around 70% of allocated memory. That leaves room for traffic spikes, page cache, and the occasional large request without forcing the kernel to reclaim aggressively.
A note on swap
Swap is a safety net, not a capacity plan. A small swap file prevents a single spike from killing a process. If your workload swaps continuously, you are memory-bound and need a larger plan. Watch si and so columns in vmstat 1. Sustained nonzero values mean you are paying a disk penalty for every memory access that misses.
SSD versus NVMe: where the difference is real
Both are fast. The gap matters in specific cases and is invisible in others.
Choose NVMe when your workload does heavy random I/O at high concurrency: a busy transactional database, a write-amplified queue, log aggregation, or anything where you see disk await climbing under load on slower storage. NVMe's advantage shows up in queue depth and latency under concurrent operations, not in a single large sequential copy.
Choose SSD when your I/O is sequential, bursty, or simply not the bottleneck: most web apps, content serving, dev and staging environments, and services where the working set lives in memory and disk is touched mainly on startup and logging. Paying for NVME on a workload that is CPU or memory-bound buys you nothing.
The honest test: if iostat -x shows disk utilization well under 50% during peak, storage type is not your constraint. Spend the budget on memory or cores instead.
Bandwidth: throughput and consistency
A 1Gbps uplink is plenty for the overwhelming majority of workloads. The question is rarely peak throughput and usually consistency and proximity to your users.
For services concentrated in the Midwest and the Detroit region, being on well-connected, redundant routing keeps round-trip times low and predictable, which matters more than raw ceiling for interactive applications. A login flow that does six sequential round trips feels the latency far more than it feels the bandwidth.
If you serve large files or media, calculate sustained throughput, not just total transfer. A single 1Gbps link carries a lot of concurrent streams, but if you are saturating it during peak you should be thinking about caching at the edge of your own stack before you think about raw pipe size.
A worked example
Say you are running a Postgres-backed web application with a few thousand daily active users.
- Profile: mostly memory-bound on the database, light CPU on the app tier, modest I/O, low bandwidth.
- Memory: size so the database working set and buffer pool fit comfortably in RAM with the app tier and OS on top, then add margin to land near 70% steady state.
- CPU: a few cores. The app is request-driven and parallel, but not compute-heavy per request.
- Storage: SSD is fine unless write volume or concurrent query load pushes disk
awaitup. If it does, move to NVMe. - Bandwidth: the 1Gbps uplink is far beyond what this needs. Latency to users matters more than ceiling.
Now flip one variable. Add heavy logging and analytics writes and the same app becomes I/O-bound. That is when NVMe earns its place and your storage decision changes even though CPU and memory did not.
How to validate before you commit
- Deploy on a plan you think is close, not your maximum guess.
- Run representative load. Synthetic benchmarks lie. Use real traffic patterns or a replay of them.
- Watch CPU, memory, disk await, and throughput together during peak.
- Resize toward the resource that actually saturated first.
Sizing is iterative. Measure, adjust, measure again. A plan that fits the workload runs cooler, fails less, and costs less than the one you sized by fear of running short.