Code Performance Testing Technical Specification
1. Introduction
This document outlines the technical specification for performance test design within the Foundation project. The goal is to create a robust and reliable framework for performance testing that produces meaningful and actionable results. This will help us avoid flaky tests, accurately identify performance regressions, and ensure the system meets its performance targets.
2. Guiding Principles
- Isolate and Measure with Precision: Performance tests must measure the specific component under test, avoiding noise from unrelated system activity.
- Realistic Scenarios: Tests should reflect realistic usage patterns and workloads.
- Statistical Rigor: Performance metrics should be analyzed statistically to distinguish true regressions from random fluctuations.
- Actionable Failures: Test failures should provide clear, actionable information to developers.
3. Performance Test Design Framework
3.1. Test Structure
All performance tests should follow a consistent structure:
- Setup: Initialize the system to a known baseline state. This includes starting necessary services and warming up caches.
- Measurement: Execute the code under test and measure the relevant performance metrics.
- Teardown: Clean up any resources created during the test.
- Analysis: Compare the measured metrics against a baseline or a predefined threshold.
3.2. Memory Testing
Measure Process Memory, Not System Memory: When testing the memory usage of a specific process or set of processes, use
Process.info(pid, :memory)
to get the memory usage of that specific process. Avoid using:erlang.memory(:total)
which is too broad and susceptible to noise.Account for Garbage Collection: The BEAM’s garbage collector can introduce variability. To mitigate this:
- Force garbage collection before and after the measurement phase using
:erlang.garbage_collect()
. - Introduce a small
Process.sleep()
after garbage collection to allow the collector to run.
- Force garbage collection before and after the measurement phase using
Use Statistical Baselines: Instead of asserting a fixed reduction in memory, establish a baseline memory usage for the component under test. The test should then assert that the memory usage does not exceed the baseline by a statistically significant margin.
Example:
test "component memory usage does not grow over time" do # 1. Setup and warm-up {:ok, pid} = Component.start_link() :erlang.garbage_collect(pid) Process.sleep(10) {:memory, initial_memory} = Process.info(pid, :memory) # 2. Run workload for _ <- 1..100 do Component.do_work(pid) end # 3. Measure final memory :erlang.garbage_collect(pid) Process.sleep(10) {:memory, final_memory} = Process.info(pid, :memory) # 4. Analysis memory_growth = final_memory - initial_memory assert memory_growth < 1024 * 10 # Allow for 10KB growth end
3.3. Latency and Throughput Testing
- Use Benchmarking Tools: For latency and throughput testing, use dedicated benchmarking tools like
Benchee
. These tools provide statistical analysis of the results, making them more reliable than manual timing. - Define Service Level Objectives (SLOs): Each performance test should have a clear SLO. For example, “the 99th percentile latency for API endpoint X should be less than 100ms”.
- Isolate External Dependencies: When testing the performance of a specific component, mock or stub any external dependencies to avoid network latency and other external factors from influencing the results.
3.4. Property-Based Performance Testing
Property-based testing can be a powerful tool for performance testing, but it requires careful design:
- Focus on Relative Properties: Instead of asserting absolute performance numbers, focus on relative properties. For example, “doubling the workload should not more than double the execution time”.
- Model Realistic Behavior: The properties should model the expected performance behavior of the system. Avoid overly simplistic or aggressive assertions.
- Use Statistical Assertions: Instead of a simple
assert
, use statistical methods to verify the property. For example, you could collect multiple measurements and assert that the mean or median falls within an acceptable range.
4. Review and Maintenance
Performance tests should be reviewed regularly to ensure they remain relevant and reliable. When a performance test fails, the first step should be to analyze the test itself for flaws before investigating the code under test.