Hosting and Scaling Unreal Pixel Streaming

Updated June 2026
Production pixel streaming requires cloud GPU infrastructure that can scale with user demand while keeping costs under control. This guide covers the full hosting stack, from choosing GPU instances through auto-scaling architecture, matchmaker deployment, TURN server configuration, and cost optimization strategies that make pixel streaming financially viable at scale.

Unlike browser-native web games where the server only handles lightweight tasks like matchmaking and save data, pixel streaming requires a dedicated GPU for every concurrent user session. This fundamentally different cost model means that hosting architecture decisions directly impact your operating budget. A poorly designed infrastructure can cost five to ten times more than a well-optimized one serving the same number of users.

Choose Your Cloud Provider and GPU Instance Type

The three major cloud providers all offer GPU instances suitable for pixel streaming, but each has different pricing, availability, and feature sets.

AWS is the most common choice for pixel streaming deployments. The G4dn instance family uses NVIDIA T4 GPUs and offers good price-to-performance for streaming workloads. The g4dn.xlarge (1 T4 GPU, 4 vCPUs, 16GB RAM) costs approximately $0.53/hour on-demand and can handle 1 to 4 concurrent streaming sessions depending on scene complexity. The G5 family uses NVIDIA A10G GPUs with significantly more rendering power, suitable for complex scenes or higher-resolution streaming, at approximately $1.01/hour for g5.xlarge.

Google Cloud Platform offers N1 instances with attached NVIDIA T4 GPUs. Pricing is comparable to AWS G4dn. GCP's preemptible instances (similar to AWS spot instances) can reduce costs by 60-80% for workloads that can tolerate interruption.

Microsoft Azure provides NV-series VMs with NVIDIA GPUs. The NVas_v4 series uses AMD GPUs (Radeon Instinct MI25), which is not compatible with NVENC encoding and requires different encoder configuration. The NCas_T4_v3 series uses NVIDIA T4 GPUs and is directly comparable to AWS G4dn for pixel streaming.

When selecting an instance type, consider: GPU memory determines how complex your scene can be before running out of VRAM. System memory needs to be sufficient for the Unreal application and all loaded assets. CPU cores matter if your application has significant game logic or physics computation. Network bandwidth determines how many concurrent streams the instance can push, each 1080p stream at 15 Mbps requires that the instance can sustain the aggregate bandwidth of all concurrent sessions.

Deploy the Base Infrastructure

Start with a single GPU instance running your Unreal application, the signaling server, and the web frontend. This baseline deployment serves one to a few concurrent users and validates that your application works correctly in a cloud environment before adding scaling complexity.

Create a GPU instance using your chosen instance type. Install the GPU drivers (NVIDIA's data center drivers for Linux, or the standard NVIDIA drivers for Windows Server). Verify GPU availability by running nvidia-smi to confirm the GPU is detected and the driver is loaded.

Upload your packaged Unreal application to the instance. Install Node.js for the signaling server. Configure the signaling server, Unreal application, and network settings as described in the setup guide.

For Linux deployments, you need a virtual display adapter since the server has no physical monitor. Install Xvfb (X Virtual Framebuffer) or use the -RenderOffScreen flag when launching the Unreal application. The GPU renders to an off-screen buffer that the Pixel Streaming plugin captures for encoding.

Create an AMI (on AWS) or machine image (on other providers) of this configured instance. This image becomes the template for auto-scaling, allowing new instances to launch with everything pre-installed and configured.

Add a Matchmaker for Multi-User Scaling

A single Unreal application instance serves one user at a time by default (the Pixel Streaming plugin establishes a one-to-one connection). Supporting multiple concurrent users requires multiple Unreal instances, and a matchmaker that routes each incoming user to an available instance.

Epic provides a reference matchmaker as part of the Pixel Streaming Infrastructure. The matchmaker maintains a registry of running Unreal instances, tracks how many sessions each is handling (if you have configured multi-session support), and redirects incoming browser connections to an available instance. When all instances are at capacity, the matchmaker can signal the auto-scaling system to launch additional instances.

The matchmaker runs on a lightweight instance (it does no GPU work) and listens on a public endpoint. Users connect to the matchmaker URL rather than directly to a specific Unreal instance. The matchmaker assigns them to an instance and the browser establishes its WebRTC connection with that specific instance's signaling server.

For more sophisticated deployments, TensorWorks' Scalable Pixel Streaming (SPS) provides a production-grade matchmaker and orchestration layer that runs on Amazon Elastic Kubernetes Service (EKS). It handles session management, health monitoring, graceful instance draining, and integration with Kubernetes auto-scaling, reducing the engineering effort compared to building custom orchestration.

Session management is a critical detail. Define session timeouts (how long an idle session persists before being terminated), maximum session duration (to prevent single users from monopolizing expensive GPU resources), and reconnection windows (how long a disconnected user has to reconnect before their session is cleaned up).

Configure Auto-Scaling

Auto-scaling provisions GPU instances when demand increases and terminates them when demand drops, ensuring you pay only for the capacity you need.

On AWS, create an Auto Scaling Group (ASG) using your GPU instance AMI as the launch template. Configure scaling policies based on custom CloudWatch metrics published by the matchmaker, such as the number of queued users waiting for an available instance, the percentage of running instances at capacity, or the total number of active sessions across all instances.

A practical scaling configuration uses a target tracking policy: maintain the average session utilization across all instances at 70%. When average utilization exceeds 70%, the ASG launches new instances. When it drops below, the ASG terminates idle instances after a cooldown period. The cooldown period (typically 5 to 10 minutes) prevents rapid scaling oscillation when demand fluctuates.

GPU instances take longer to boot than standard compute instances. A G4dn instance typically takes 3 to 5 minutes from launch to ready, including OS boot, GPU driver initialization, and Unreal application startup. Factor this warm-up time into your scaling strategy. If you expect demand spikes (a marketing event, a product launch), pre-scale by increasing the ASG minimum capacity before the expected traffic arrives.

For Kubernetes-based deployments, use the Cluster Autoscaler to provision GPU node pools and Horizontal Pod Autoscaler to scale Unreal application pods based on session metrics. This approach works well with TensorWorks SPS, which is designed for Kubernetes orchestration.

Deploy TURN Relay Servers

Approximately 10 to 15 percent of users cannot establish direct WebRTC connections due to restrictive network configurations. Without TURN relay servers, these users see a connection failure or timeout.

Deploy coturn, the most widely used open-source TURN server, on standard compute instances (no GPU needed). A single coturn instance can handle hundreds of relay connections, limited primarily by network bandwidth. Place TURN servers in the same region as your GPU instances to minimize relay latency.

Configure coturn with authentication credentials (either static long-term credentials or time-limited HMAC credentials for better security). Add these credentials to the Pixel Streaming signaling server configuration so it includes them in the ICE server list sent to connecting browsers.

TURN relay is bandwidth-intensive because it forwards the entire video stream through the relay server. Budget approximately 15 to 20 Mbps per relayed session for 1080p streaming. If 10% of your users require TURN relay, a deployment serving 100 concurrent users needs TURN bandwidth for approximately 10 simultaneous relay connections, or about 200 Mbps of relay throughput.

Monitor TURN usage through coturn's built-in metrics. If the relay percentage exceeds 15%, investigate whether your signaling server or network configuration can be improved to enable more direct connections. High relay percentages often indicate a STUN configuration issue rather than a genuine network restriction.

Optimize for Cost

GPU infrastructure is expensive, and pixel streaming costs scale linearly with concurrent users. Several strategies can significantly reduce your monthly bill.

Spot/Preemptible Instances: AWS spot instances offer 60 to 70% discounts on G4dn instances. The tradeoff is that AWS can reclaim the instance with two minutes notice when demand for the capacity increases. For non-critical use cases (demos, development, testing) or deployments that can gracefully handle instance termination, spot instances provide substantial savings. Implement graceful shutdown: when a spot termination notice arrives, notify connected users and save any session state.

Reserved Instances: For predictable baseline demand, AWS Reserved Instances (1 or 3 year commitments) provide 30 to 60% discounts compared to on-demand pricing. Use reserved instances for your minimum expected concurrent users and on-demand or spot instances for overflow.

Session Timeouts: Enforce idle timeouts that disconnect users after a period of inactivity (no input events). A 5-minute idle timeout prevents users from accidentally occupying GPU resources while away from their browser. Warn users before disconnecting and allow easy reconnection.

Regional Deployment: Deploy GPU instances only in regions where your users are located. Running instances in a single region reduces costs but may increase latency for distant users. Analyze your traffic patterns and deploy in the minimum number of regions needed for acceptable latency coverage.

Managed Platforms: If your team lacks the infrastructure engineering capacity to build and operate custom scaling infrastructure, managed platforms like Arcware Cloud and PureWeb eliminate the operational burden. They typically charge $0.50 to $1.00 per stream-hour per concurrent user, which is more expensive than optimized self-hosted infrastructure but includes everything: GPU provisioning, scaling, matchmaking, TURN relay, monitoring, and support.

Key Takeaway

Pixel streaming hosting costs scale linearly with concurrent users, making cost optimization essential for financial viability. Start with a single GPU instance to validate your deployment, then add matchmaking and auto-scaling incrementally. Use spot or reserved instances for the bulk of your capacity, enforce session timeouts, and consider managed platforms if infrastructure engineering is not your core competency.