GPU Servers and Clusters
Private dedicated hardware for your AI/ML workloads and more.
- Flexible monthly billing for A100 and H100 multi-GPU deployments.
- Private resources for your use case. Virtualize your hardware only if it fits your needs.
- Completely customizable and built to order.
- Consistent and reliable performance.
- Connect to our OpenStack deployments for more functionality, or deploy Bare Metal for complete control.

Private GPU Servers for AI/ML workloads
Fully customizable deployments ranging from large-scale 8x GPU setups to CPU-based inference.
| GPU | GPU Memory | GPU Cores | CPU | Storage | Memory | Price | |
|---|---|---|---|---|---|---|---|
| X-LargeThe most complete AI hardware we offer. It’s ideal for AI/ML training, high-throughput inference, and demanding compute workloads that push performance to the limit. | |||||||
| 8X NVIDIA SXM5 H100 | 640 GB HBM3 | Cuda: 135168 Tensor: 4224 | 2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz | Up to 16 NVMe drives 2x 960GB Boot Disk | Up to 8TB, DDR5 5600MTs | Contact Us | Contact Us |
| LargePerfect for mid-sized GPU workloads with maximum flexibility. These servers support up to 2x H100 GPUs, 2TB of memory, and 24 drives each! | |||||||
| 2X NVIDIA H100 PCIe | 160 GB HBM3 | Cuda: 33792 Tensor: 1056 | 2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz | 1x 6.4TB NVMe 2x 960GB Boot Disk | 1024GB DDR5 4800Mhz | $4,608.00/mo eq. $6.31/hr | Contact Us |
| 1X NVIDIA H100 PCIe | 80 GB HBM3 | Cuda: 16896 Tensor: 528 | 2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz | 1x 6.4TB NVMe 2x 960GB Boot Disk | 1024GB DDR5 4800Mhz | $2,995.20/mo eq. $4.10/hr | Contact Us |
| 2X NVIDIA A100 80G | 160 GB HBM2e | Cuda: 13824 Tensor: 864 | 2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz | 1x 6.4TB NVMe 2x 960GB Boot Disk | 1024GB DDR5 4800Mhz | $3,087.36/mo eq. $4.23/hr | Contact Us |
| 1X NVIDIA A100 80G | 80 GB HBM2e | Cuda: 6912 Tensor: 432 | 2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz | 1x 6.4TB NVMe 2x 960GB Boot Disk | 1024GB DDR5 4800Mhz | $2,234.88/mo eq. $3.06/hr | Contact Us |
| MediumLow cost GPU workloads. Less flexible than our large GPU deployments, but far more powerful than CPU inferencing. | |||||||
| 1X NVIDIA A100 40G | 40 GB HBM2e | Cuda: 6912 Tensor: 432 | AMD EPYC 7272 12C/24T 2.9Ghz | 1TB NVMe | 256GB DDR4 3200MHz | $714.24/mo eq. $0.98/hr | Contact Us |
Small – CPU Based
Running AI inference using Intel's 5th Generation and AMX is the most affordable option. Ideal for small models and non-production use-cases.
| Size | CPU | Cores | Storage | Memory | Private BW | Public BW | Price | |
|---|---|---|---|---|---|---|---|---|
| XXL v4 | 2X Intel Xeon Gold 6530 | 64C/128T 2.1/4.0Ghz | 6x 6.4TB NVMe 2x 960GB Boot Disk | 2048GB DDR5 4800Mhz | 20Gbps | 10Gbps | $2,223.36 | |
| XL v4Top Seller | 2X Intel Xeon Gold 6530 | 64C/128T 2.1/4.0Ghz | 4x 6.4TB NVMe 2x 960GB Boot Disk | 1024GB DDR5 4800Mhz | 20Gbps | 6Gbps | $1,589.76 | |
| XL v4 High Frequency | 2X Intel Xeon Gold 6544Y | 32C/64T 3.6/4.1Ghz | 4x 6.4TB NVMe 2x 960GB Boot Disk | 1024GB DDR5 5200Mhz | 20Gbps | 6Gbps | $1,751.04 | |
| Large v4Top Seller | 2X Intel Xeon Gold 6526Y | 32C/64T 2.8/3.9Ghz | 2x 6.4TB NVMe 2x 960GB Boot Disk | 512GB DDR5 5200MHz | 20Gbps | 4Gbps | $938.88 | |
| Medium v4Top Seller | 2X Intel Xeon Silver 4510 | 24C/48T 2.4/4.1Ghz | 6.4TB NVMe 2x 960GB Boot Disk | 256GB DDR5 4400MHz | 20Gbps | 2Gbps | $495.36 |
Pricing shown requires a 3-year agreement. Lower pricing may be available with longer commitments. Final pricing will be confirmed by your sales representative and is subject to change.
How is Private AI on OpenMetal Infrastructure Different?
It's private, customizable, and our engineers are on your team.
Private Resources
We provide dedicated hardware exclusively for your team. None of the resources are virtualized or shared with other users, ensuring consistent performance and allowing you to fully leverage your GPU’s capabilities.
Built to Order
Connect with our team to design your ideal AI/ML deployment. We’ll handle ordering, setup, and ensure everything runs reliably. The specifications listed are just a starting point.
Access to Engineers
Our engineers are here to help you evaluate hardware capabilities and identify the best solution for your specific use case. After deployment, we’ll work with you to maximize value.
What You Should Know Before Running Your Own AI Workloads
Performance Comparison of GPUs
Different GPU models offer varying levels of performance based on core counts, memory bandwidth, and architectural improvements. Comparing models like the A100 and H100 helps identify which hardware best supports specific AI workloads.
Read MoreInference on CPU
CPU-based inference remains a practical option for certain workloads, especially when GPUs are not required. Intel’s Advanced Matrix Extensions (AMX) on 5th Gen processors improve matrix computation performance, making CPU inference more viable.
Read MorePrivate vs Public AI
Bare metal provides direct access to physical hardware without virtualization overhead, offering predictable performance ideal for AI training and large inference tasks.
Read MoreComparing Costs
The cost of running AI workloads depends on hardware selection, usage patterns, and resource efficiency. Dedicated GPUs involve higher upfront costs but deliver faster results.
Read MoreMIG vs Time-Slicing
Multi-Instance GPU (MIG) and time-slicing are two methods for sharing GPU resources, each offering different levels of isolation and performance. OpenMetal supports both.
Read MoreMeasuring Inference Performance
Inference performance is measured by throughput, latency, and token generation speed for large language models. Accurate benchmarking is critical for production planning.
Read MoreGPU Server Deployment Sizes for Various Workloads
Access dedicated GPU servers with full control over resource utilization. Users can run workloads directly on bare metal or connect to OpenStack to create and manage virtual machines, networks, and storage.
X-Large GPU Server
Built for enterprise-grade AI/ML workloads requiring maximum performance and scalability. This deployment includes 8x NVIDIA H100 GPUs per node, designed to handle nearly all use cases, from large-scale model training to high-throughput inference and multi-user environments.

Large GPU Server
Ideal for teams running frequent AI experiments or large-scale model training jobs. This deployment is fully customizable, allowing selection of GPU type, CPU, memory, and storage to match specific workload requirements.

Medium GPU Server
Suited for teams transitioning from proof-of-concept to production workloads. This deployment supports a single NVIDIA A100 GPU per node, providing sufficient resources for moderate AI/ML pipelines.

Small – CPU Only
Recommended for development environments, application integration, or running smaller models in production where GPU acceleration is not required. Designed for CPU-only inference workloads.

Contact Us
Connect with our team to discuss your requirements, delivery timelines, capabilities, and agreement pricing.
Pricing FAQs, Eligibility and Usage Restrictions
Not Sure Yet? Our Welcome Team Can:
Get You Started Fast
Our expert hardware engineers will work closely with your team. Save time, improve performance, and lower costs.
Negotiate Ramp Pricing
Don’t pay twice during the move process. Work with your account manager to get a move plan that fits.
Beat Your Bill
Has your mega cloud provider hit you with a mega bill? Transparent prices, fixed budgets, and a team that cares.