Performance Analysis Plotter

Visualize llama.cpp telemetry by turning one or more server logs into plots for prompt tokens, input tokens, output tokens, time-to-first-token (TTFT), tokens-per-second (TPS), and a prefill vs decode time split.

Requirements

Python 3.8+
matplotlib (pip install matplotlib)

Run the Plotter (GAIA CLI)

gaia perf-vis <log_file> [<log_file> ...]

Pass multiple log files to compare runs; each plot adds one line per log with a legend.

Collecting llama.cpp Logs

The script expects llama.cpp server logs. With Lemonade, you can capture telemetry like this:

lemonade-server serve --ctx-size 32768 2>&1 | tee agent.log
gaia perf-vis agent.log

Outputs

Images are written to the directory where you run the script:

prompt_token_counts.png — prompt token totals per call
input_token_counts.png — input token counts
output_token_counts.png — output token counts
ttft_seconds.png — time to first token
tps.png — tokens per second
prefill_decode_split.png — one pie per log showing prefill (TTFT) vs decode (output tokens / TPS) time

When multiple logs are provided, every plot includes one line/pie per log plus legends mapping each series to its log filename.

Getting Started

User Guides

Playbooks

SDK Reference

Performance analysis plotter

Performance Analysis Plotter

Requirements

Run the Plotter (GAIA CLI)

Collecting llama.cpp Logs

Outputs

Getting Started

User Guides

Playbooks

SDK Reference

​Performance Analysis Plotter

​Requirements

​Run the Plotter (GAIA CLI)

​Collecting llama.cpp Logs

​Outputs

Performance Analysis Plotter

Requirements

Run the Plotter (GAIA CLI)

Collecting llama.cpp Logs

Outputs