How to Profile a Mobile Web Game
The cardinal rule of performance optimization is to measure before you change anything. Intuition about where performance is lost is frequently wrong, even for experienced developers. A shader that looks complex might execute in microseconds while a seemingly simple JavaScript loop dominates the frame budget. Profiling replaces intuition with data and ensures that every optimization addresses a real bottleneck rather than a perceived one.
Build a Frame Time Logger
The simplest and most important profiling tool is a frame time logger built into your game. At the start of each requestAnimationFrame callback, record performance.now(). Compute the delta from the previous frame's timestamp. This delta is your frame time, the total time from the start of one frame to the start of the next.
Display the current frame time and a rolling average in a debug overlay that you can toggle during development. Color-code it: green for frame times under 16.6 ms (60 fps), yellow for 16.6-33.3 ms (30-60 fps), and red for anything over 33.3 ms (under 30 fps). This gives you instant visual feedback while testing.
For extended profiling sessions (15+ minutes to capture thermal throttling), store frame times in a pre-allocated Float32Array rather than pushing to a regular JavaScript array. A Float32Array of 100,000 entries consumes only 400 KB and stores about 27 minutes of data at 60 fps. After the session, dump the array to the console or offer a download button so you can analyze it offline.
Plot the data as a time series. The X axis is time (or frame number), the Y axis is frame time in milliseconds. Draw a horizontal line at your target frame time (16.6 ms or 33.3 ms). Healthy performance shows a flat line at or below the target. Thermal throttling appears as a gradual upward drift after several minutes. GC pauses appear as sharp spikes. Consistent frame time above the target indicates a sustained bottleneck.
This logger costs essentially nothing in performance and should be present in every build, including production builds (just hidden from the player by default). It is your first line of defense against performance regressions and the easiest way to demonstrate thermal throttling to stakeholders who ask "why does the game slow down after a few minutes."
Record a Chrome DevTools Performance Trace
Chrome DevTools Performance panel provides the most detailed view of what happens during each frame. To profile on a real Android device, connect the phone via USB with developer mode and USB debugging enabled, open chrome://inspect/#devices on your desktop Chrome, and click "inspect" next to the tab running your game.
In the DevTools window, open the Performance panel. Click the gear icon and ensure "Screenshots" is enabled (to see visual output per frame) and "Web Vitals" is disabled (not relevant for games). Click Record, then play your game on the phone for 5-15 seconds during a representative gameplay moment (a busy scene with particles, enemies, and effects). Click Stop.
The recorded trace shows multiple tracks. The Frames track shows each frame as a colored bar: green frames met the frame budget, yellow frames were slightly over, and red frames were significantly over. Click any frame to see its breakdown. The Main track shows JavaScript execution as a flame chart, where wider bars indicate longer function calls. The GPU track (when available) shows GPU command processing time. The Compositor track shows how long the browser took to composite the final frame.
Focus on red frames first, since these are the worst offenders. Click a red frame and examine the flame chart beneath it. Look for functions that consume more than 2-3 ms individually. Common findings include render loops that submit draw calls one at a time, matrix math functions called thousands of times per frame, event handler callbacks that trigger unnecessary DOM updates, and long garbage collection pauses labeled "Minor GC" or "Major GC."
Analyze the Flame Chart for CPU Bottlenecks
The flame chart is a visualization of the call stack over time. Each horizontal bar represents a function call, with wider bars taking more time. Bars stacked vertically show the call hierarchy: the top-level bar called the function beneath it, which called the function beneath that, and so on.
Start at the widest bars and work downward. If the widest bar is your main update() function and it consumes 12 ms, drill into its children to find which subsystem dominates. Is it physics (collision detection, rigid body solving)? Is it rendering (draw call submission, buffer updates)? Is it AI (pathfinding, decision trees)? Is it animation (skeletal updates, blend tree evaluation)?
Look for repeated patterns. If you see 200 narrow bars for gl.bindTexture, gl.useProgram, gl.drawElements repeating, your draw call count is too high. If you see a single wide bar for a function like updateAllParticles, the implementation of that function is the bottleneck, not the draw calls. If you see periodic wide bars labeled "GC" every few hundred frames, garbage collection is causing hitches.
The Bottom-Up and Call Tree tabs in DevTools aggregate time across the entire trace rather than showing it frame-by-frame. Bottom-Up sorts functions by total self-time (time spent in the function itself, not its children), which surfaces the innermost functions that consume the most CPU. Call Tree shows the same data organized by caller hierarchy. Use Bottom-Up to find the hottest functions, then Call Tree to understand where they are called from.
Common CPU bottleneck patterns on mobile:
- Too many draw calls: Hundreds of small gl.bindX/gl.drawX sequences. Solution: batching, instancing, texture atlases.
- Expensive JavaScript logic: Physics loops, pathfinding, or AI taking 5+ ms per frame. Solution: WebAssembly, algorithmic optimization, or reducing entity count.
- GC pauses: 5-15 ms pauses every few seconds. Solution: object pooling, pre-allocation, avoiding closures in hot loops.
- Layout thrashing: Reading DOM properties (offsetWidth, getBoundingClientRect) between writes forces the browser to recalculate layout. Solution: batch DOM reads before DOM writes, or avoid DOM interaction in the render loop entirely.
Add GPU Timing Queries
Chrome DevTools shows CPU time for JavaScript execution and draw call submission, but it does not show how long the GPU takes to execute those draw calls. To measure GPU time, use the EXT_disjoint_timer_query_webgl2 extension (WebGL 2) or the timestamp query feature (WebGPU).
In WebGL 2, check for the extension at startup:
const ext = gl.getExtension('EXT_disjoint_timer_query_webgl2');
If available, create a query object, begin the query before a rendering pass, end it after, then read the result in a later frame (GPU queries are asynchronous, the result is not available immediately). The result is the GPU execution time in nanoseconds for the commands between begin and end.
Wrap each major rendering pass in its own query: shadow map rendering, main scene opaque pass, main scene transparent pass, post-processing. This tells you exactly how much GPU time each pass consumes. A common discovery is that a single post-processing pass (like bloom with multiple blur iterations) consumes more GPU time than the entire main scene render, which immediately tells you where to focus optimization.
For WebGPU, timestamp queries are part of the core API (when the "timestamp-query" feature is available). Create a GPUQuerySet with type "timestamp" and write timestamps into it at the boundaries of your rendering passes using writeTimestamp() in your command encoder. Resolve the query set into a buffer and read the results.
Note that GPU timing queries are not available on all devices. Some mobile GPUs or browser implementations do not expose the extension, particularly on iOS where Safari does not support disjoint timer queries. On devices without timing query support, you can approximate GPU time by measuring the interval between submitting commands and the next requestAnimationFrame callback, though this approximation includes compositing and display sync latency.
Profile Memory Usage
Memory problems on mobile manifest as silent tab crashes rather than error messages, which makes them particularly insidious. Profile memory proactively to catch overruns before players encounter them.
In Chrome DevTools Memory panel (connected to an Android device), take a heap snapshot at various points: after initial load, after loading a level, during peak gameplay, and after transitioning between levels. Each snapshot shows total JavaScript heap size and a breakdown by object type. Look for unexpected growth between snapshots, which indicates retained references or memory leaks.
Compare the heap size after loading Level 1 and then transitioning to Level 2. If the heap is larger after transitioning than it was after initially loading Level 2 directly, assets or objects from Level 1 are being retained. Use the "Comparison" view in the heap snapshot to see exactly which objects were added between two snapshots.
For GPU texture memory, which is not visible in JavaScript heap snapshots, maintain a running total in your asset manager. Every time you call gl.texImage2D() or gl.compressedTexImage2D(), add the texture's calculated memory size to your total. Every time you call gl.deleteTexture(), subtract it. Display this total in your debug overlay alongside the JavaScript heap size from performance.memory.usedJSHeapSize.
The Allocation Timeline recording mode (in the Memory panel) shows memory allocations over time as you play the game. Start a recording, play for 30-60 seconds, and stop. The timeline shows when allocations occurred and which functions triggered them. This is invaluable for finding per-frame allocations that create GC pressure: if you see a steady stream of small allocations during gameplay, trace them to the allocating function and convert them to pooled or pre-allocated objects.
Determine CPU-Bound vs GPU-Bound
Knowing whether your game is CPU-bound or GPU-bound on a given device tells you where to focus optimization effort. Optimizing shaders when the CPU is the bottleneck wastes time, and reducing draw calls when the GPU is saturated does not help either.
Compare two measurements: CPU frame time (how long your JavaScript update and render submission takes) and GPU frame time (how long the GPU takes to execute those commands). If CPU frame time exceeds GPU frame time, you are CPU-bound. If GPU frame time exceeds CPU frame time, you are GPU-bound. If both are close to the frame budget, you are balanced, which is the ideal state.
Without GPU timing queries, you can approximate the distinction with a simple test. Reduce your rendering resolution to the minimum (e.g., 100x100 pixels) while keeping everything else the same. If frame rate improves significantly, you were GPU fragment-bound. If frame rate barely changes, the GPU was not the bottleneck, the CPU was. Similarly, reduce draw call count by half (render only half the scene). If frame rate improves significantly, you were CPU-bound on draw call submission.
On mobile, the most common scenario is CPU-bound on draw call submission, especially in WebGL games with many individual objects. This is because the browser's per-draw-call overhead is proportionally higher on mobile CPUs than on desktop CPUs. The solution is always the same: batch more aggressively, use instancing, reduce draw call count through texture atlases and merged geometry.
The second most common scenario is GPU fragment-bound, which happens in games that render at high resolution with complex shaders or heavy post-processing. The solution is resolution scaling, shader simplification (mediump precision, fewer texture samples), and reducing post-processing passes.
Iterate and Verify
Profiling is not a one-time activity. It is a cycle: measure, identify the bottleneck, make a targeted change, re-measure to verify improvement, and repeat. Each cycle should address the single largest bottleneck identified in the previous measurement.
After making an optimization, profile again on the same device with the same scene. Compare the new trace to the old one. Did frame time decrease? Did the bottleneck shift to a different system? If the optimization reduced frame time from 22 ms to 18 ms, the bottleneck may have moved from draw call submission to shader execution, and the next optimization should target shaders rather than continuing to reduce draw calls.
Keep a log of each optimization, what was changed, the frame time before and after, and the device tested on. This log prevents repeated work (re-optimizing something that was already addressed) and provides evidence that the changes had measurable impact. It also helps when deciding between competing optimization strategies: pick the one with demonstrated results.
Profile on all devices in your testing matrix after significant changes. An optimization that improves performance on a flagship Adreno GPU might have no effect on a budget Mali GPU, or vice versa. GPU architectures handle different workloads differently, and an optimization that reduces Adreno's bottleneck (say, reducing overdraw) might not affect Mali (which handles overdraw differently through its own hidden surface removal). Cross-device profiling ensures optimizations benefit your entire audience.
Set performance budgets as part of your development workflow. Define maximum acceptable frame times for each device tier (e.g., 16.6 ms on flagship, 20 ms on mid-range, 33.3 ms on budget). Run automated performance tests (even just launching the game and logging frame times for 60 seconds) as part of your CI pipeline. Flag any build that regresses past the budget so it can be investigated before merging.
Profile before optimizing, using frame time logging for sustained behavior, Chrome DevTools for CPU breakdowns, GPU timing queries for rendering pass costs, and memory snapshots for leak detection. Determine whether you are CPU-bound or GPU-bound before choosing which optimization to apply, and always verify improvements with re-measurement on real devices.