Performance Analysis Plotter
Visualize llama.cpp telemetry by turning one or more server logs into plots for prompt tokens, input tokens, output tokens, time-to-first-token (TTFT), tokens-per-second (TPS), and a prefill vs decode time split.Requirements
- Python 3.8+
matplotlib(pip install matplotlib)
Run the Plotter (GAIA CLI)
- Pass multiple log files to compare runs; each plot adds one line per log with a legend.
Collecting llama.cpp Logs
The script expects llama.cpp server logs. With Lemonade, you can capture telemetry like this:Outputs
Images are written to the directory where you run the script:prompt_token_counts.png— prompt token totals per callinput_token_counts.png— input token countsoutput_token_counts.png— output token countsttft_seconds.png— time to first tokentps.png— tokens per secondprefill_decode_split.png— one pie per log showing prefill (TTFT) vs decode (output tokens / TPS) time