Reducing Draw Calls and Overdraw
On a desktop GPU, games can submit 2000-5000 draw calls per frame without issue. On mobile, through WebGL in a browser, the practical limit drops to 200-500 draw calls before the CPU becomes the bottleneck. This lower budget exists because every draw call passes through the browser's security validation, the ANGLE translation layer, and the native driver, each adding microseconds of overhead that compound across hundreds of calls. Meanwhile, overdraw, rendering fragments that end up hidden behind other geometry, wastes the mobile GPU's limited bandwidth and fragment processing capacity. Addressing both problems together yields the largest frame rate improvements on mobile devices.
Audit Your Current Draw Call Count
Before optimizing, measure. You need to know exactly how many draw calls your game submits per frame and which rendering systems are responsible for each batch. Without this data, you risk spending time optimizing areas that are not the actual bottleneck.
The simplest approach is adding a counter to your rendering code. Increment a variable every time you call gl.drawElements() or gl.drawArrays(), and display the total in a debug overlay. Group the counts by system: scene geometry, particles, UI elements, shadow maps, post-processing, and any other rendering passes. You will often discover that one system dominates the count, like a particle system submitting 150 draw calls for individual particles, or a UI layer drawing each text label separately.
Browser developer tools can also help. The SpectorJS extension for Chrome captures WebGL command streams and shows every API call for a single frame, including draw calls, state changes, and texture bindings. This gives you a complete picture of what the browser's WebGL layer is processing. Look for patterns of repeated state changes (binding the same shader program multiple times) and small draw calls that could be merged.
Set a target based on your device support range. If you need to support budget Android phones from 2023-2024, keep total draw calls under 150-200 per frame. If your minimum target is a flagship from 2025 or later, you can budget 300-500. Write these limits down and enforce them during development so they do not creep upward without anyone noticing.
Build and Use Texture Atlases
Texture binding is one of the more expensive state changes in WebGL. Each time you bind a different texture, the driver must update internal state and the GPU may need to flush texture cache entries. If every sprite, tile, or UI icon uses its own texture, each one requires a separate draw call (or at minimum a texture bind between draw calls within the same batch).
A texture atlas combines many small images into a single large texture. Instead of binding a separate 64x64 pixel texture for each game sprite, you pack all sprites into a 2048x2048 atlas and use UV coordinates to select which region of the atlas each quad draws from. Since all sprites share the same texture, they can all be drawn in a single draw call with a single vertex buffer containing all their quad geometry.
Build atlases offline as part of your asset pipeline using tools like TexturePacker, free-tex-packer, or a custom script that reads source images and outputs a packed atlas plus a JSON file mapping sprite names to UV rectangles. Size your atlas to the largest power-of-two dimension that fits your needs while staying under 2048x2048 for maximum mobile compatibility (some older devices have a max texture size of 2048). If you need more sprites than fit in one atlas, use 2-4 atlases grouped by rendering context: one for game sprites, one for UI elements, one for terrain tiles.
For 3D games, the equivalent technique is a material atlas: combining multiple material textures (diffuse, normal, roughness) into shared atlas textures so that objects with visually different surfaces can share a single material binding and be batched together.
Merge Static Geometry into Combined Meshes
Any scene object that never moves relative to other objects can be merged at load time. Instead of submitting 50 draw calls for 50 static crates, combine their vertex positions, normals, UVs, and indices into a single vertex buffer and a single index buffer. One draw call renders all 50 crates.
The merging process is straightforward. For each static object, transform its vertices from local space into world space (applying the object's position, rotation, and scale), then append those transformed vertices to a combined buffer. Adjust index values to account for the growing vertex count. Group objects by material, since only objects sharing the same shader and texture bindings can be merged into one draw call.
This technique works best for environment geometry: walls, floors, props, decorations, and other static scene elements. It does not work for objects that need to move independently, animate, or be culled individually. For large levels, split the merged geometry by spatial region (chunks or sectors) so that you can still cull entire regions when they are off-screen, rather than submitting the entire merged level geometry every frame.
The cost is flexibility. Merged geometry cannot be moved, hidden, or colored individually without rebuilding the combined buffer. For truly static scenery this tradeoff is worth it. For semi-static objects (like doors that open or lights that turn off), keep them as separate draw calls and focus batching effort on the truly immovable geometry.
Use Instanced Rendering for Repeated Objects
Instanced rendering draws multiple copies of the same mesh in a single draw call, with per-instance data (transform, color, scale, animation frame) stored in a separate vertex attribute buffer. WebGL 2 supports this through gl.drawElementsInstanced() and gl.vertexAttribDivisor().
The ideal use cases are systems that render many instances of the same geometry with varying attributes: particle systems (same quad, different positions and colors), foliage (same grass blade mesh, different positions and rotations), crowds or enemies (same character mesh, different transforms), and debris or projectiles (same mesh, different trajectories).
To implement instancing, create a per-instance attribute buffer containing the data that varies between instances. For particles, this might be a Float32Array with 7 floats per instance: x, y, z position, RGBA color as a packed float, and scale. Set this buffer's attribute divisor to 1 (meaning the attribute advances once per instance rather than once per vertex). In the vertex shader, read the instance attribute to position and color each instance.
A particle system using instancing can render 1000 particles in a single draw call instead of 1000 individual calls. The GPU handles the repetition in hardware, which is vastly more efficient than the CPU issuing separate commands for each particle. On mobile, this can be the difference between a particle system consuming 60% of the frame budget and consuming 2%.
One caveat: instancing has a setup cost per call that makes it inefficient for very small instance counts (under 10-20). For small counts, geometry merging into a single buffer is simpler and equally fast.
Sort Draw Calls by State
Even after batching and instancing reduce the total number of draw calls, the remaining calls can be ordered to minimize the cost of state changes between them. State changes, switching the active shader program, binding a different texture, changing blend mode, or modifying depth test settings, each carry a cost that accumulates across hundreds of calls.
Shader program changes are the most expensive single state change because they require the driver to flush the current pipeline state and load a completely new one. Sort all draw calls so that every object using shader A is drawn before every object using shader B. Within each shader group, sort by texture binding so that all objects using texture 1 draw before objects using texture 2. Within each texture group, sort by secondary state like blend mode or depth write.
A practical sort key can be a 64-bit integer where the high bits encode the shader program ID, the middle bits encode the texture ID, and the low bits encode secondary state flags. Sort your draw queue by this key each frame. The sort itself is O(n log n) where n is the number of draw calls, which is negligible for counts under 500.
For opaque geometry, sort front-to-back within each state group. This maximizes the GPU's ability to reject fragments early using the depth buffer. For transparent geometry, sort back-to-front to ensure correct blending. Most renderers maintain separate queues for opaque and transparent objects for this reason.
Reduce Overdraw with Depth Sorting and Culling
Overdraw occurs when multiple fragments are computed for the same pixel, with only the frontmost one surviving. On desktop GPUs, overdraw is wasteful but rarely a critical problem because desktop fragment shading capacity is enormous. On mobile, overdraw is more damaging because it wastes limited fragment processing throughput and precious memory bandwidth.
For opaque geometry, rendering front-to-back allows the early depth test to reject fragments that would be hidden by closer objects. The GPU checks the depth buffer before running the fragment shader, and if the current fragment is behind an already-written depth value, the fragment is discarded without running the shader. This is free on most mobile GPUs and can eliminate 30-60% of fragment shader invocations in scenes with significant depth overlap.
Frustum culling, skipping objects that are entirely outside the camera's view frustum, prevents the GPU from processing geometry that will produce no visible pixels. Implement frustum culling on the CPU by testing each object's bounding sphere or bounding box against the six frustum planes. Objects that fail the test are not submitted for rendering at all, saving both draw call overhead and GPU processing time.
For 2D games, overdraw typically comes from overlapping sprites, full-screen backgrounds drawn behind the game world, and transparent UI layers on top. Avoid drawing large background quads that are fully covered by game content. If your background is completely hidden by the game world in most frames, skip rendering it entirely and let the clear color serve as the background.
Measure overdraw by rendering the scene in a debug mode where each fragment increments a counter and the final image uses a heat map color based on the count. White or red pixels are being drawn 4+ times, indicating areas where overdraw is highest and optimization effort should focus.
Reducing draw calls and overdraw together yields the largest performance improvement on mobile. Batch aggressively with texture atlases and merged geometry, use instancing for repeated objects, sort by state to minimize transitions, and cull early to avoid wasting fragments. Target under 200 draw calls per frame for broad mobile device support.