Improving WebGL Performance on Mobile

Updated June 2026

Improving WebGL performance on mobile requires addressing the specific constraints of tile-based GPUs, limited memory bandwidth, and the browser's translation layer overhead. This guide walks through the highest-impact optimizations in order, from resolution scaling and shader precision through draw call batching, texture compression, and render target management.

Mobile WebGL performance is not just a scaled-down version of desktop performance. The hardware architecture is fundamentally different, and the browser adds overhead that native apps do not face. The following steps target the most impactful areas first, so even implementing the first two or three will produce measurable improvements on most mobile devices.

Scale Down Rendering Resolution

This is the single highest-impact optimization for any GPU-bound WebGL game on mobile. Modern phone screens have pixel densities of 400-500 PPI, which means the device's native resolution can be 1080p, 1440p, or even higher. Rendering at full native resolution means the fragment shader runs for every one of those millions of pixels, and every framebuffer read and write consumes bandwidth from the limited system RAM bus.

Reduce your WebGL canvas resolution to 60-75% of native by multiplying devicePixelRatio by 0.5-0.7 when setting the canvas width and height. On a phone with a devicePixelRatio of 3.0, rendering at an effective ratio of 1.5-2.0 cuts the fragment count by 55-75% while remaining visually sharp on a 6-inch screen. The browser upscales the canvas to fill the CSS dimensions automatically.

Expose this as a quality setting if possible. Let the player choose between "high" (70% of native), "medium" (50%), and "low" (35%) so they can balance visual quality against smooth performance on their specific device. Adaptive resolution, which dynamically adjusts the render scale based on recent frame times, is even better but requires careful implementation to avoid visual popping.

Use mediump Precision in Fragment Shaders

GLSL ES shaders support three precision qualifiers: highp (32-bit float), mediump (16-bit float), and lowp (10-bit float). On many mobile GPUs, the hardware has dedicated 16-bit floating-point ALUs that run at twice the throughput of the 32-bit units. Qualifying calculations as mediump where full 32-bit precision is unnecessary lets the GPU use these faster units.

Set precision mediump float; as the default at the top of your fragment shaders, then selectively use highp only for calculations that need it, such as world-space position reconstruction from depth, shadow map comparisons, or accumulated values that would lose accuracy at 16 bits. Color calculations, lighting normals, UV coordinate math, and most texture sampling are perfectly fine at mediump.

On Qualcomm Adreno GPUs in particular, the difference between mediump and highp in the fragment shader can be a 30-50% improvement in fragment throughput. ARM Mali GPUs show similar gains. Apple's GPU treats mediump and highp identically in the fragment shader (it always runs at 32-bit), but specifying mediump costs nothing and benefits the Android devices in your audience.

Batch Draw Calls with Texture Atlases

Each WebGL draw call passes through the browser's validation layer, the ANGLE translation layer, and the native driver before reaching the GPU. On mobile, this overhead limits practical draw call counts to 200-500 per frame before the CPU becomes the bottleneck. The solution is batching: combining multiple objects that share the same shader program and texture into a single draw call.

For 2D games, pack all sprite images into one or more texture atlases (large images containing many smaller images arranged in a grid). Instead of binding a different texture for each sprite and issuing a separate draw call, put all sprite vertices into a single dynamic vertex buffer with UV coordinates pointing to each sprite's region in the atlas, then draw everything in one call. A game with 300 on-screen sprites can go from 300 draw calls to 1-4 draw calls with this technique.

For 3D games, merge static geometry that shares the same material into combined meshes at load time. If you have 50 crate objects scattered around a level that all use the same texture and shader, merge their vertex and index buffers into a single mesh. Dynamic objects that move independently cannot be merged this way, but instanced rendering (gl.drawElementsInstanced() in WebGL 2) handles cases where many copies of the same mesh need different per-instance transforms.

Sort your draw calls to minimize state changes. Group all objects using shader A together, then all objects using shader B, and within each group sort by texture binding. This minimizes the number of expensive state transitions (program switches, texture rebinds) the driver must perform between draw calls.

Compress Textures with ASTC or ETC2

Uncompressed RGBA8 textures consume 4 bytes per pixel. A 2048x2048 texture uses 16 MB, and with mipmaps that grows to ~21 MB. GPU-compressed formats like ASTC and ETC2 reduce this by 4-16x with minimal visual quality loss, and the GPU decompresses them on the fly during sampling at no performance cost, which also reduces bandwidth consumption.

ASTC (Adaptive Scalable Texture Compression) is the preferred format. It is supported on all mobile GPUs shipped since roughly 2015, including all modern Adreno, Mali, and Apple GPUs. Check for the WEBGL_compressed_texture_astc extension at startup. ASTC 4x4 provides the best quality (8 bits per pixel, 4:1 compression versus RGBA8), while ASTC 6x6 (3.56 bpp) and 8x8 (2 bpp) offer progressively higher compression for textures where quality can be sacrificed, such as noise maps, terrain, or backgrounds.

ETC2 is the fallback. It is a required format in OpenGL ES 3.0 and supported in WebGL 2 via WEBGL_compressed_texture_etc. ETC2 offers 4:1 compression for RGB and 2:1 for RGBA, with decent quality for most game art. If a device supports neither ASTC nor ETC2, fall back to uncompressed textures at reduced resolution.

Use offline tools like astcenc (ARM's open-source ASTC encoder) or basisu (Basis Universal, which can transcode to ASTC, ETC2, and other formats at runtime) to prepare your textures. Basis Universal is especially useful for web games because you ship a single compressed file that the runtime transcodes to whatever format the device supports.

Minimize Render Target Switches

On tile-based mobile GPUs, switching the active framebuffer object (FBO) is expensive. When you bind a different render target, the GPU must write the current tile buffer back to system memory and load the new target's data. This flush-and-load cycle consumes bandwidth and serializes work that would otherwise be pipelined.

Audit your rendering pipeline for unnecessary render target switches. Common culprits include multi-pass post-processing chains (bloom, blur, tone mapping each using their own FBO), shadow map rendering to a separate depth texture, and deferred rendering with a G-buffer using multiple render targets. Each switch adds overhead that is proportionally higher on mobile than on desktop.

Consolidate post-processing passes where possible. Instead of separate passes for bloom threshold, horizontal blur, vertical blur, and compositing, consider combining operations into fewer passes even if each pass does slightly more work. The reduced FBO switching often more than offsets the cost of a more complex shader.

In WebGL 2, call gl.invalidateFramebuffer() when you are done with a render target's contents and will not read from it again. This tells the TBDR GPU that it does not need to write the tile buffer back to system memory, saving bandwidth. This call is essentially free and should be used whenever you are done with an FBO attachment.

Eliminate Per-Frame Allocations

JavaScript's garbage collector pauses all execution when it runs. In a game targeting 60 fps, even a 5 ms GC pause causes a visible frame hitch. Games that create temporary objects every frame, such as new Float32Array instances, new vector or matrix objects, or closure functions, accumulate garbage quickly and trigger frequent collections.

Pre-allocate all math objects (vectors, matrices, quaternions) at initialization time and reuse them throughout the game's lifetime. Instead of let result = new Vec3(x, y, z) every frame, use a module-level _tempVec3 that gets overwritten each time. Pool entities like bullets, particles, and enemy objects so that "creating" one means resetting a pooled instance rather than allocating new memory.

Avoid creating closures inside your render loop. Each () => {} or function() {} inside a per-frame callback allocates a new closure object. If you need callbacks, define them once at initialization and reference them by variable. Similarly, avoid Array.map(), Array.filter(), and other functional array methods in hot loops because they create new arrays on every call. Use traditional for loops with pre-allocated output arrays instead.

Profile on Real Devices

Desktop Chrome's device emulator simulates screen size and touch events, but it runs on your desktop GPU with desktop bandwidth and desktop thermal behavior. It cannot reproduce the performance characteristics of actual mobile hardware. The only way to know how your game performs on mobile is to test on a real phone.

Connect an Android phone via USB with developer mode enabled, open chrome://inspect on your desktop Chrome, and use the DevTools Performance panel to record a trace while playing the game on the device. The trace shows JavaScript execution time, rendering time, compositing time, and GPU timing (if available) for each frame. Look for frames that exceed 16.6 ms and drill into the flame chart to find the cause.

The Rendering tab in DevTools lets you overlay an FPS counter, highlight paint regions, and show layer borders directly on the device. The EXT_disjoint_timer_query_webgl2 extension, when available, allows you to add GPU timing queries to your code to measure how long specific groups of draw calls take on the GPU, independent of CPU time. Wrap your shadow pass, your main scene pass, and your post-processing in separate timer queries to see exactly where GPU time is spent.

Build a testing matrix that covers at least three device tiers: a recent flagship (high-end Adreno or Apple GPU), a mid-range phone from the current year, and a budget phone from 2-3 years ago. The budget device will surface bottlenecks that are invisible on high-end hardware, and the optimizations you make for it will improve performance across your entire audience.

Key Takeaway

The highest-impact WebGL mobile optimizations are resolution scaling, shader precision, and draw call batching. Implement these three first, then move to texture compression and render target management. Always profile on real devices because desktop simulation cannot reproduce mobile hardware behavior.

Scale Down Rendering Resolution

Use mediump Precision in Fragment Shaders

Batch Draw Calls with Texture Atlases

Compress Textures with ASTC or ETC2

Minimize Render Target Switches

Eliminate Per-Frame Allocations

Profile on Real Devices

Related Articles

Reducing Draw Calls and Overdraw

Texture and Asset Budgets for Mobile

Why Web Games Run Slower on Mobile

How to Profile a Mobile Web Game

WebGL Game Development