6. Callgrind: a call-graph generating
cache and branch prediction profiler
To use this tool, you must specify
--tool=callgrind
on the Valgrind command line.
6.1. Overview
Callgrind is a profiling tool that records the call history among functions in a program’s run as a call-graph. By default,
the collected data consists of the number of instructions executed, their relationship to source lines, the caller/callee
relationship between functions, and the numbers of such calls. Optionally, cache simulation and/or branch prediction
(similar to Cachegrind) can produce further information about the runtime behavior of an application.
The profile data is written out to a file at program termination. For presentation of the data, and interactive control of
the profiling, two command line tools are provided:
callgrind_annotate
This command reads in the profile data, and prints a sorted lists of functions, optionally with source annotation.
For graphical visualization of the data, try KCachegrind, which is a KDE/Qt based GUI that makes it easy to navigate
the large amount of data that Callgrind produces.
callgrind_control
This command enables you to interactively observe and control the status of a program currently running under
Callgrind’s control, without stopping the program.
You can get statistics information as well as the current stack
trace, and you can request zeroing of counters or dumping of profile data.
6.1.1. Functionality
Cachegrind collects flat profile data: event counts (data reads, cache misses, etc.) are attributed directly to the function
they occurred in. This cost attribution mechanism is called
self
or
exclusive
attribution.
Callgrind extends this functionality by propagating costs across function call boundaries. If function
foo
calls
bar
,
the costs from
bar
are added into
foo
’s costs. When applied to the program as a whole, this builds up a picture of
so called
inclusive
costs, that is, where the cost of each function includes the costs of all functions it called, directly or
indirectly.
As an example, the inclusive cost of
main
should be almost 100 percent of the total program cost. Because of costs
arising before
main
is run, such as initialization of the run time linker and construction of global C++ objects, the
inclusive cost of
main
is not exactly 100 percent of the total program cost.
Together with the call graph, this allows you to find the specific call chains starting from
main
in which the majority
of the program’s costs occur. Caller/callee cost attribution is also useful for profiling functions called from multiple
call sites, and where optimization opportunities depend on changing code in the callers, in particular by reducing the
call count.
Callgrind’s cache simulation is based on that of Cachegrind. Read the documentation for
Cachegrind: a cache and
branch-prediction profiler
first.
The material below describes the features supported in addition to Cachegrind’s
features.
Callgrind’s ability to detect function calls and returns depends on the instruction set of the platform it is run on.
It
works best on x86 and amd64, and unfortunately currently does not work so well on PowerPC, ARM, Thumb or MIPS
93