GPU Servers and Clusters

Private dedicated hardware for your AI/ML workloads and more.

Flexible monthly billing for A100 and H100 multi-GPU deployments.
Private resources for your use case. Virtualize your hardware only if it fits your needs.
Completely customizable and built to order.
Consistent and reliable performance.
Connect to our OpenStack deployments for more functionality, or deploy Bare Metal for complete control.

Private GPU Servers for AI/ML workloads

Fully customizable deployments ranging from large-scale 8x GPU setups to CPU-based inference.

GPU	GPU Memory	GPU Cores	CPU	Storage	Memory	Price
X-LargeThe most complete AI hardware we offer. It’s ideal for AI/ML training, high-throughput inference, and demanding compute workloads that push performance to the limit.
8X NVIDIA SXM5 H100	640 GB HBM3	Cuda: 135168 Tensor: 4224	2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz	Up to 16 NVMe drives 2x 960GB Boot Disk	Up to 8TB, DDR5 5600MTs	Contact Us	Contact Us
LargePerfect for mid-sized GPU workloads with maximum flexibility. These servers support up to 2x H100 GPUs, 2TB of memory, and 24 drives each!
2X NVIDIA H100 PCIe	160 GB HBM3	Cuda: 33792 Tensor: 1056	2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz	1x 6.4TB NVMe 2x 960GB Boot Disk	1024GB DDR5 4800Mhz	$4,608.00/mo eq. $6.31/hr	Contact Us
1X NVIDIA H100 PCIe	80 GB HBM3	Cuda: 16896 Tensor: 528	2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz	1x 6.4TB NVMe 2x 960GB Boot Disk	1024GB DDR5 4800Mhz	$2,995.20/mo eq. $4.10/hr	Contact Us
2X NVIDIA A100 80G	160 GB HBM2e	Cuda: 13824 Tensor: 864	2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz	1x 6.4TB NVMe 2x 960GB Boot Disk	1024GB DDR5 4800Mhz	$3,087.36/mo eq. $4.23/hr	Contact Us
1X NVIDIA A100 80G	80 GB HBM2e	Cuda: 6912 Tensor: 432	2x Intel Xeon Gold 6530 64C/128T 2.1/4.0Ghz	1x 6.4TB NVMe 2x 960GB Boot Disk	1024GB DDR5 4800Mhz	$2,234.88/mo eq. $3.06/hr	Contact Us
MediumLow cost GPU workloads. Less flexible than our large GPU deployments, but far more powerful than CPU inferencing.
1X NVIDIA A100 40G	40 GB HBM2e	Cuda: 6912 Tensor: 432	AMD EPYC 7272 12C/24T 2.9Ghz	1TB NVMe	256GB DDR4 3200MHz	$714.24/mo eq. $0.98/hr	Contact Us

Small – CPU Based

Running AI inference using Intel's 5th Generation and AMX is the most affordable option. Ideal for small models and non-production use-cases.

Size	CPU	Cores	Storage	Memory	Private BW	Public BW	Price
XXL v4	2X Intel Xeon Gold 6530	64C/128T 2.1/4.0Ghz	6x 6.4TB NVMe 2x 960GB Boot Disk	2048GB DDR5 4800Mhz	20Gbps	10Gbps	$2,223.36	Quote Buy Now
XL v4Top Seller	2X Intel Xeon Gold 6530	64C/128T 2.1/4.0Ghz	4x 6.4TB NVMe 2x 960GB Boot Disk	1024GB DDR5 4800Mhz	20Gbps	6Gbps	$1,589.76	Quote Buy Now
XL v4 High Frequency	2X Intel Xeon Gold 6544Y	32C/64T 3.6/4.1Ghz	4x 6.4TB NVMe 2x 960GB Boot Disk	1024GB DDR5 5200Mhz	20Gbps	6Gbps	$1,751.04	Quote Buy Now
Large v4Top Seller	2X Intel Xeon Gold 6526Y	32C/64T 2.8/3.9Ghz	2x 6.4TB NVMe 2x 960GB Boot Disk	512GB DDR5 5200MHz	20Gbps	4Gbps	$938.88	Quote Buy Now
Medium v4Top Seller	2X Intel Xeon Silver 4510	24C/48T 2.4/4.1Ghz	6.4TB NVMe 2x 960GB Boot Disk	256GB DDR5 4400MHz	20Gbps	2Gbps	$495.36	Quote Buy Now

Pricing shown requires a 3-year agreement. Lower pricing may be available with longer commitments. Final pricing will be confirmed by your sales representative and is subject to change.

How is Private AI on OpenMetal Infrastructure Different?

It's private, customizable, and our engineers are on your team.

Private Resources

We provide dedicated hardware exclusively for your team. None of the resources are virtualized or shared with other users, ensuring consistent performance and allowing you to fully leverage your GPU’s capabilities.

Built to Order

Connect with our team to design your ideal AI/ML deployment. We’ll handle ordering, setup, and ensure everything runs reliably. The specifications listed are just a starting point.

Access to Engineers

Our engineers are here to help you evaluate hardware capabilities and identify the best solution for your specific use case. After deployment, we’ll work with you to maximize value.

What You Should Know Before Running Your Own AI Workloads

Performance Comparison of GPUs

Different GPU models offer varying levels of performance based on core counts, memory bandwidth, and architectural improvements. Comparing models like the A100 and H100 helps identify which hardware best supports specific AI workloads.

Inference on CPU

CPU-based inference remains a practical option for certain workloads, especially when GPUs are not required. Intel’s Advanced Matrix Extensions (AMX) on 5th Gen processors improve matrix computation performance, making CPU inference more viable.

Private vs Public AI

Bare metal provides direct access to physical hardware without virtualization overhead, offering predictable performance ideal for AI training and large inference tasks.

Comparing Costs

The cost of running AI workloads depends on hardware selection, usage patterns, and resource efficiency. Dedicated GPUs involve higher upfront costs but deliver faster results.

MIG vs Time-Slicing

Multi-Instance GPU (MIG) and time-slicing are two methods for sharing GPU resources, each offering different levels of isolation and performance. OpenMetal supports both.

Measuring Inference Performance

Inference performance is measured by throughput, latency, and token generation speed for large language models. Accurate benchmarking is critical for production planning.

GPU Server Deployment Sizes for Various Workloads

Access dedicated GPU servers with full control over resource utilization. Users can run workloads directly on bare metal or connect to OpenStack to create and manage virtual machines, networks, and storage.

X-Large GPU Server

Built for enterprise-grade AI/ML workloads requiring maximum performance and scalability. This deployment includes 8x NVIDIA H100 GPUs per node, designed to handle nearly all use cases, from large-scale model training to high-throughput inference and multi-user environments.

Large GPU Server

Ideal for teams running frequent AI experiments or large-scale model training jobs. This deployment is fully customizable, allowing selection of GPU type, CPU, memory, and storage to match specific workload requirements.

Medium GPU Server

Suited for teams transitioning from proof-of-concept to production workloads. This deployment supports a single NVIDIA A100 GPU per node, providing sufficient resources for moderate AI/ML pipelines.

Small – CPU Only

Recommended for development environments, application integration, or running smaller models in production where GPU acceleration is not required. Designed for CPU-only inference workloads.

Contact Us

Connect with our team to discuss your requirements, delivery timelines, capabilities, and agreement pricing.

First name *

Last name *

Business email *

Phone number

Please share your GPU hardware request or inquiry *

Which data center location(s) do you prefer?

US - West Coast (Los Angeles, California)US - East Coast (Ashburn, Virginia)EU - Netherlands (Amsterdam)APAC - Singapore (International Business Park)Don't Know

When is the hardware needed by?

How did you hear about us? *

OpenMetal uses the contact information you provide to share details about our products and services. By using our website, you signify that you agree to be bound by our Universal Terms of Service. To learn how we handle your personal information and to opt out at any time, see our Privacy Policy.

Pricing FAQs, Eligibility and Usage Restrictions

Different regions have different legal requirements for content and workloads that are your responsibility to understand and conform with.

We provide true bare metal resources, giving you complete control over the GPU, BIOS, drivers, and full hardware capabilities. Unlike other providers who deploy bare metal servers but carve out and resell slices of GPU, storage, or compute, potentially impacting your performance due to shared usage, OpenMetal delivers dedicated hardware that is yours alone. No other tenants share your systems. This ensures maximum performance and gives you the flexibility to meet strict security and data protection requirements, including confidential computing, for your specific use case.

Power usage is included with the cost of your hardware. You will not receive a separate bill for power utilization.

No, we do not offer hourly GPUs. All billing is paid monthly and custom orders require an agreement of at least 1 year. Any hourly prices shown are solely for comparison purposes.

Each deployment comes with a base egress bandwidth allowance. Currently there’s no charge for ingress. For egress usage beyond the base allowance, check out our egress pricing or lock in your rate with a bandwidth plan from sales. For guaranteed bandwidth plans, the typical plan is billed on the 95th Percentile at $0.37 per Mbit per week, with a minimum usage commitment of 100Mbits at $37.00/wk.

The current network performance for 2025 is 99.99%. The base SLA is 99.96%.

Yes, all of our GPU deployments support MIG and Time-Slicing methods of sharing GPU resources.

Yes, we can support many other GPU models. Our published list is limited to the most in-demand and readily available options. If you’re interested in a specific model, reach out to our sales team and we’ll provide you with detailed information on availability and pricing.

Not Sure Yet? Our Welcome Team Can:

Get You Started Fast

Our expert hardware engineers will work closely with your team. Save time, improve performance, and lower costs.

Negotiate Ramp Pricing

Don’t pay twice during the move process. Work with your account manager to get a move plan that fits.

Beat Your Bill

Has your mega cloud provider hit you with a mega bill? Transparent prices, fixed budgets, and a team that cares.

Schedule a Meeting