Changelog¶
Version 2.0.0 (2020-03-19)¶
Drop Python 2 support. Use pyperf 1.7.1 if you still need Python 2.7 support.
Remove
python_unicodemetadata.pyperf.perf_counter() is now deprecated: use time.perf_counter() directly.
Version 1.7.1 (2020-03-09)¶
Support Python 3.8:
time.clock()no longer exists.
Version 1.7.0 (2019-12-17)¶
metadata: add
python_compilerWindows: inherit
SystemDriveenvironment variable by default. Contribution by Steve Dower.Fix tests on ARM and PPC: cpu_model_name metadata is no longer required on Linux.
tests: Do not allow test suite to execute without unittest2 on Python2, otherwise man failures occur due to missing ‘assertRegex’. Contribution by John Vandenberg.
doc: Update old/dead links.
Travis CI: drop Python 3.4 support.
Version 1.6.1 (2019-05-21)¶
The project name changes to “pyperf” from “perf”, to avoid confusion with the Linux perf project which has a Python binding called “perf” as well.
Version 1.6.0 (2019-01-11)¶
Add teardown optional parameter to
Runner.timeitand--teardownoption to the perf timeit command. Patch by Alex Khomchenko.Runner.timeit(stmt)can now be used to use the statement as the benchmark name.Port system tune command to Python 2 (use lseek+read/write instead of pread/pwrite which aren’t available on Python 2). Patch by Stefan Talpalaru.
perf collect_metadata now also supports reading CPU frequencies on IBM Z.
Version 1.5.1 (2018-01-10)¶
Fix
--track-memoryoption of theRunner.bench_command()command.
Version 1.5 (2018-01-09)¶
Fix
--track-memoryand--tracemallocoptions. Add non regression tests.Remove the
--max-timeoption of Runner, it was ignored.Project moved from https://github.com/haypo/perf to https://github.com/vstinner/perf
system command: In case the system is not ready for benchmarking, makes system show exits with return code 2 so bash scripts could put ‘python -m perf system show’ directly without greping for the output. Contributed by Boris Feld.
On Windows: Enables high priority for processes when benchmarking (
REALTIME_PRIORITY_CLASS). Contributed by Steve Dower.
Version 1.4 (2017-07-06)¶
Fix parse_cpu_list(): strip also NUL characters
Add examples to the README file. Contributed by Alex Willmer.
Version 1.3 (2017-05-29)¶
Add
get_loops()andget_inner_loops()methods to Run and Benchmark classesDocumentation: add export_csv.py and plot.py examples
Rewrite warmup calibration for PyPy:
Use Q1, Q3 and stdev, rather than mean and checking if the first value is an outlier
Always use a sample of 10 values, rather than using a sample of a variable size starting with 3 values
Use lazy import for most imports of the largest modules to reduce the number of imported module on ‘import perf’.
Fix handling of broken pipe error to prevent logging the error: “Exception ignored in: … BrokenPipeError: …”
collect_metadatagets more metadata on FreeBSD:use
os.getloadavg()if/proc/loadavgis not available (ex: FreeBSD)use
psutil.boot_time()if/proc/statis not available (ex: FreeBSD) to getboot_timeanduptimemetadata
The Runner constructor now raises an exception if more than one instance is created.
Version 1.2 (2017-04-10)¶
statscommand: count the number of outliersRewrite the calibration code to support PyPy:
On PyPy, calibrate also the number of warmups
On PyPy, recalibrate the number of loops and warmups
Loop calibration now uses the number of warmups and values instead of 1 to compute warmup values
A worker process cannot calibrate the number of loops and compute values. These two operations now require two worker processes.
Command line interface (CLI): the
--benchmark,--include-benchmarkand--exclude-benchmarkoptions can now be specified multiple times.Rewrite
dumpcommand:Writes one value per line
Now display also metadata of calibration runs
Enhance formatting of calibration runs
Display number of warmup, value and loop
Add new run metadata:
calibrate_loops,recalibrate_loops: number of loops of loop calibration/recalibration runscalibrate_warmups,recalibrate_warmups: number of warmups of warmup calibration/recalibration runs
Version 1.1 (2017-03-27)¶
Add a new “perf command” command to measure the timing of a program
Runner.bench_command()now measures also the maximum RSS memory if available.Fix Windows 32bit issue on Python 2.7, fix by yattom.
Runner.bench_func()now usesfunctools.partial()if the function has argument. Callingpartial()is now 1.07x faster (-6%) than callingfunc(*args).Store memory values as integers, not float, when tracking memory usage (
--track-memoryand--tracemallocoptions)
Version 1.0 (2017-03-17)¶
Enhancements:
statscommand now displays percentileshistcommand now also checks the benchmark stability by defaultdump command now displays raw value of calibration runs.
Add
Benchmark.percentile()method
Backward incompatible changes:
Remove the
comparecommand to only keep thecompare_tocommand which is better definedRun warmup values must now be normalized per loop iteration.
Remove
format()and__str__()methods from Benchmark. These methods were too opinionated.Rename
--name=NAMEoption to--benchmark=NAMERemove
perf.monotonic_clock()since it wasn’t monotonic on Python 2.7.Remove
is_significant()from the public API
Other changes:
check command now only complains if min/max is 50% smaller/larger than the mean, instead of 25%.
Version 0.9.6 (2017-03-15)¶
Major change:
Display
Mean +- std devinstead ofMedian +- std dev
Enhancements:
Add a new
Runner.bench_command()method to measure the execution time of a command.Add
mean(),median_abs_dev()andstdev()methods toBenchmarkcheckcommand: test also minimum and maximum compared to the mean
Major API change, rename “sample” to “value”:
Rename attributes and methods:
Benchmark.bench_sample_func()=>Benchmark.bench_time_func().Run.samples=>Run.valuesBenchmark.get_samples()=>Benchmark.get_values()get_nsample()=>get_nvalue()Benchmark.format_sample()=>Benchmark.format_value()Benchmark.format_samples()=>Benchmark.format_values()
Rename Runner command line options:
--samples=>--values--debug-single-sample=>--debug-single-value
Changes:
convert: Remove--remove-outliersoptioncheckcommand now tests stdev/mean, instead of testing stdev/mediansetup.py: statistics dependency is now installed using
extras_requireto support setuptools 18 and newerAdd setup.cfg to enable universal builds: same wheel package for Python 2 and Python 3
Add
perf.VERSIONconstant: tuple of intJSON version 6: write metadata common to all benchmarks (common to all runs of all benchmarks) at the root; rename ‘samples’ to ‘values’ in runs.
Version 0.9.5 (2017-03-06)¶
Add
--python-namesoption to the Runner CLIsystem showcommand now checks if the system is ready for benchmarkingFix
--compare-tooption: the benchmark was run twice with the reference Python, instead of being run first with reference Python and then changed Python.Runner now raises an exception if a benchmark name is not unique.
compare_tocommand now keeps the original order of benchmarks, only sort if--by-speedoption is used.Fix
systemcommand on macOS on non-existent/procand/syspseudo-files.Fix
systembugs on systems with more than 32 processors.
Version 0.9.4 (2017-03-01)¶
New features:
Add
--compare-tooption to the Runner CLIcompare_to command: Add
--tableoption to render a table
Bugfixes:
Fix the
abs_executable()function used to find the absolute path to the Python program. Don’t follow symbolic links to support correctly virtual environments.
Version 0.9.3 (2017-01-16)¶
Fix the Windows support.
system: Don’t try to read or write CPU frequency when the /sys/devices/system/cpu/cpu0/cpufreq/ directory doesn’t exist. For example, virtual machines don’t have this directory.
Fix a
ResourceWarninginBenchmarkSuite.dump()for gzip files.
Version 0.9.2 (2016-12-15)¶
Issue #15: Added
--no-localecommand line option and locale environment variables are now inherited by default.Add
Runner.timeit()method.Fix
statscommand: display again statistics on the whole benchmark suite.Fix a ResourceWarning if interrupted: Runner now kills the worker process when interrupted.
compareandcompare_to: add percent difference to faster/slowerRewrite timeit internally: copy code from CPython 3.7 and adapt it to PyPy.
Version 0.9.1 (2016-11-18)¶
system tunenow also sets the maximum sample rate of perf event.system showcommand now also displays advices, not onlysystem tunesystemnow detects when running on a laptop with the power cable unplugged.system tunenow handles errors when /dev/cpu/N/msr device is missing: log an error suggesting to load themsrkernel moduleFix a ResourceWarning in Runner._spawn_worker_suite(): wait until the worker completes.
Version 0.9.0 (2016-11-07)¶
Enhancements:
Runner doesn’t ignore worker stdout and stderr anymore. Regular
print()now works as expected.systemcommand: Add a new--affinitycommand line optioncheck and system emit a warning if nohz_full is used with the intel_pstate driver.
collect_metadata: On CPUs not using the intel_pstate driver, don’t run the cpupower command anymore to check if the Turbo Boost is enabled. It avoids to spawn N processes in each worker process, where N is the number of CPUs used by the worker process. Thesystemcommand can be used to tune correctly Turbo Boost, or just to check the state of Turbo Boost.
Changes:
system: tune stops the irqbalance service and sets the CPU affinity of interruptions (IRQ).
The
--stdoutinternal option has been removed, replaced by a new--pipeoption. Workers can now use stdout for regular messages.get_dates()methods now returnNonerather than an empty tuple if runs don’t have thedatemetadata.
Version 0.8.3 (2016-11-03)¶
Enhancement:
New
system tunecommand to tune the system for benchmarks: disable Turbo Boost, check isolated CPUs, set CPU frequency, set CPU scaling governor to “performance”, etc.Support reading and writing JSON files compressed by gzip: use gzip if the filename ends with
.gzThe detection of isolated CPUs now works also on Linux older than 4.2:
/proc/cmdlineis now parsed to read theisolcpus=option if/sys/devices/system/cpu/isolatedsysfs doesn’t exist.
Backward incompatible changes:
JSON file produced by perf 0.8.3 cannot be read by perf 0.8.2 anymore.
Remove the Metadata class: values of get_metadata() are directly metadata values.
Drop support for JSON produced with perf 0.7.3 and older. Use perf 0.8.2 to convert old JSON to new JSON.
Optimizations:
Loading a large JSON file is now 10x faster (5 sec => 500 ms).
Optimize
Benchmark.add_run(): don’t recompute common metadata at each call, but update existing common metadata.Don’t store dates of metadata as datetime.datetime but strings to optimize
Benchmark.load()
Version 0.8.2 (2016-10-19)¶
Fix formatting of benchmark which only contains calibration runs.
Version 0.8.1 (2016-10-19)¶
Rename
metadatacommand tocollect_metadataAdd new commands:
metadata(display metadata of benchmarks files) andcheck(check if benchmarks seem stable)timeit: add
--duplicateoption to reduce the overhead of the outer loop.BenchmarkSuite constructor now requires a non-empty sequence of Benchmark objects.
Store date in metadata with microsecond resolution.
collect_metadata: add--outputcommand line option.Bugfix: don’t follow symbolic links when getting the absolute path to a Python executable. The venv module requires to use the symlink to get the modules installed in a virtual environment.
Version 0.8.0 (2016-10-14)¶
The API was redesigned to support running multiple benchmarks with a single Runner object.
Enhancements:
--loopscommand line argument now acceptsx^ysyntax. For example,--loops=2^8uses256iterationsCalibratation is now done in a dedicated process to avoid side effect on the first process. This change is important if Python has a JIT compiler, to get more reliable timings on the first worker computing samples.
Incompatible API changes:
Benchmark constructor now requires a non-empty sequence of Run objects.
A benchmark must now have a name: all runs must have a name metadata.
Remove name argument from Runner constructor and add name parameter to
Benchmark.bench_func()andBenchmark.bench_sample_func()perf.text_runner.TextRunnerbecomes simplyperf.Runner. Remove theperf.text_runnermodule.TextRunner.program_argsattribute becomes a parameter ofRunnerconstructor. program_args must no more start withsys.executablewhich is automatically added, since the executable can now be overridden by the--pythoncommand line option.The
TextRunner.prepare_subprocess_argsattribute becomes a new add_cmdline_args parameter ofRunnerconstructor which is called with different arguments than the old prepare_subprocess_args callback.
Changes:
Add show_name optional parameter to
Runner. The runner now displays the benchmark name by default.The calibration is now done after starting tracing memory
Run constructor now accepts an empty list of samples. Moreover, it also accepts
intandlongnumber types for warmup sample values, not onlyfloat.Add a new private
--worker-taskcommand line option to only execute a specific benchmark function by its identifier.Runner now supports calling more than one benchmark function using
--worker-taskinternally.Benchmark.dump() and BenchmarkSuite.dump() now fails by default if the file already exists. Set the new replace parameter to true to allow to replace an existing file.
Version 0.7.12 (2016-09-30)¶
Add
--pythoncommand line optiontimeit: add--name,--inner-loopsand--compare-tooptionsTextRunner don’t set CPU affinity of the main process, only on worker processes. It may help a little bit when using NOHZ_FULL.
metadata: add
boot_timeanduptimeon Linuxmetadata: add idle driver to
cpu_config
Version 0.7.11 (2016-09-19)¶
Fix metadata when NOHZ is not used: when /sys/devices/system/cpu/nohz_full contains `` (null)n``
Version 0.7.10 (2016-09-17)¶
Fix metadata when there is no isolated CPU
Fix collecting metadata when /sys/devices/system/cpu/nohz_full doesn’t exist
Version 0.7.9 (2016-09-17)¶
Add
Benchmark.get_unit()methodAdd
BenchmarkSuite.get_metadata()methodmetadata: add
nohz_fullandisolatedtocpu_configadd
--affinityoption to themetadatacommandconvert: fix--remove-all-metadata, keep the unitmetadata: fix regex to get the Mercurial revision for
python_version, support also locally modified source code (revision ending with “+”)
Version 0.7.8 (2016-09-10)¶
Worker child processes are now run in a fresh environment: environment variables are removed, to enhance reproducibility.
Add
--inherit-environcommand line argument.metadata: add
python_cflags, fixpython_versionfor PyPy and add also the Mercurial version intopython_version(if available)
Version 0.7.7 (2016-09-07)¶
Reintroduce TextRunner._spawn_worker_suite() as a temporary workaround to fix the pybench benchmark of the performance module.
Version 0.7.6 (2016-09-02)¶
Tracking memory usage now works correctly on Linux and Windows. The calibration is now done in the first worker process.
--tracemallocand--track-memorynow use the memory peak as the unique sample for the run.Rewrite code to track memory usage on Windows. Add
mem_peak_pagefile_usagemetadata. Thewin32apimodule is no more needed, the code now uses thectypesmodule.convert: add--remove-all-metadataand--update-metadatacommandsAdd
unitmetadata:byte,integerorsecond.Run samples can now be integer (not only float).
Don’t round samples to 1 nanosecond anymore: with a large number of loops (ex: 2^24), rounding reduces the accuracy.
The benchmark calibration is now done by the first worker process
Version 0.7.5 (2016-09-01)¶
Add
Benchmark.update_metadata()methodWarmup samples can now be zero. TextRunner now raises an error if a sample function returns zero for a sample, except of calibration and warmup samples.
Version 0.7.4 (2016-08-18)¶
Support PyPy
metadata: add
mem_max_rssandpython_hash_seedAdd
perf.python_implementation()andperf.python_has_jit()functionsIn workers, calibration samples are now stored as warmup samples.
With a JIT (PyPy), the calibration is now done in each worker. The warmup step can compute more warmup samples if a raw sample is shorter than the minimum time.
Warmups of Run objects are now lists of (loops, raw_sample) rather than lists of samples. This change requires a change in the JSON format.
Version 0.7.3 (2016-08-17)¶
add a new
slowestcommandconvert: add
--extract-metadata=NAMEadd
--tracemallocoption: use thetracemallocmodule to track Python memory allocation and get the peak of memory usage in metadata (tracemalloc_peak)add
--track-memoryoption: run a thread reading the memory usage every millisecond and store the peak asmem_peakmetadatacompare_to: add--group-by-speed(-G) and--min-speedoptionsmetadata: add
runnable_threadsFix issues on ppc64le Power8
Version 0.7.2 (2016-07-21)¶
Add start/end dates and duration to the
statscommandFix the program name:
pyperf, notpybench!Fix the
-bcommand line option of show/stats/… commandsFix metadata:
load_avg_1min=0.0is valid!
Version 0.7.1 (2016-07-18)¶
Fix the
--appendcommand line option
Version 0.7 (2016-07-18)¶
Add a new
pybenchprogram, similar topython3 -m perfMost perf CLI commands now support multiple files and support benchmark suites.
Add a new
dumpcommand to the perf CLI and a--dumpoption to the TextRunner CLIconvertcommand: add--indentand--remove-warmupsoptionsreplace
--jsonoption with-o/--outputNew metadata:
cpu_config
cpu_freq
cpu_temp
load_avg_1min
Changes:
New
add_runs()function.Once again, rewrite Run and Benchmark API. Benchmark name is now optional.
New
Runclass: it now stores normalized samples rather than raw samplesMetadata are now stored in Run, no more in Benchmark. Benchmark.get_metadata() return metadata common to all runs.
Metadata become typed (can have a different type than string), the new
Metadataclass formats them.
Version 0.6 (2016-07-06)¶
Major change: perf now supports benchmark suites. A benchmark suite is made of multiple benchmarks. perf commands now accepts benchmark suites as well.
New features:
New
convertcommandAdd new command line options to TextRunner:
--fast,--rigorous--hist,--stats--json-append--quiet
Changes:
Remove
--max-timeoption of TextRunnerReplace
--rawoption with--workerReplace
--jsonwith--stdoutReplace
--json-filewith--jsonNew
perf convertcommand to convert or modify a benchmark suiteRemove
perf hist_scipycommand, replaced with an example in the docAdd back “Mean +- Std dev” to the stats command
Add get_loops() method to Benchmark
Replace
python3 -m perf.timeit(with dot) CLI with-m perf timeit(without dot)Add
perf.BenchmarkSuiteclassname is now mandatory: it must be a non-empty string in Benchmark and TextRunner.
A single JSON file can now contain multiple benchmarks
Add a dependency to the
sixmoduleBenchmark.add_run()now raises an exception if a sample is zero.Benchmark.name becomes a property and is now stored in metadata
TextRunner now uses powers of 2, rather than powers of 10, to calibrate the number of loops
Version 0.5 (2016-06-29)¶
Changes:
The
histcommand now accepts multiple fileshistandhist_scipycommands got a new--binsoptionReplace mean with median
Add
perf.Benchmark.median()method, removeBenchmark.mean()methodBenchmark.get_metadata()method removed: use directly theperf.Benchmark.metadataattributeAdd
timermetadata.python_versionnow also contains the architecture (32 or 64 bits).
Version 0.4 (2016-06-15)¶
New features:
New
histandhist_scipycommands: display an histogram (text or graphical mode)New
statscommand: display statistics on a benchmark resultNew
--affinity=CPU_LISTcommand line optionEmit a warning or an error in English if the standard deviation is larger than 10% and/or the shortest sample is shorter than 1 ms
Emit a warning or an error if the shortest sample took less than 1 ms
Add
perf_version,durationmetadata. Moreover, thedatemetadata is now displayed.
API:
The API deeply changed to minimize duplications of data and make the JSON files more compact
Changes:
The command line interface also changed. For example,
perf.metadatacommand becomesperf metadata.On Python 2,
psutiloptional dependency is now used for CPU affinity. It ensures that CPU affinity is set for loop calibration too.On Python 2, add dependency to the backported
statisticsmoduleperf.mean()andperf.stdev()functions have been removed: use thestatisticsmodule (which is available on Python 2.7 and Python 3)New optional dependency on
boltons(boltons.statsutils) to compute even more statistics in thestatsandhist_scipycommands
Version 0.3 (2016-06-10)¶
Add
compareandcompare_tocommands to the-m perfCLITextRunner is now able to spawn child processes, parse command arguments and more features
If TextRunner detects isolated CPUs, it sets automatically the CPU affinity to these isolated CPUs
Add
--json-filecommand line optionAdd
TextRunner.bench_sample_func()methodAdd examples of the API to the documentation. Split also the documentation into subpages.
Add metadata
cpu_affinityAdd
perf.is_significant()functionMove metadata from
BenchmarktoRunResultRename the
Resultsclass toBenchmarkAdd
inner_loopsattribute toTextRunner, used for microbenchmarks when an instruction is manually duplicated multiple times
Version 0.2 (2016-06-07)¶
use JSON to exchange results between processes
new
python3 -m perfCLInew
TextRunnerclasshuge enhancement of the timeit module
timeit has a better output format in verbose mode and now also supports a
-vv(very verbose) mode. Minimum and maximum are not more shown in verbose module, only in very verbose mode.metadata: add
python_implementationandaslr
Version 0.1 (2016-06-02)¶
First public release