145
|
1 @c Copyright (C) 2010-2020 Free Software Foundation, Inc.
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
2 @c This is part of the GCC manual.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
3 @c For copying conditions, see the file gcc.texi.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
4 @c Contributed by Jan Hubicka <jh@suse.cz> and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
5 @c Diego Novillo <dnovillo@google.com>
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
6
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
7 @node LTO
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
8 @chapter Link Time Optimization
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
9 @cindex lto
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
10 @cindex whopr
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
11 @cindex wpa
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
12 @cindex ltrans
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
13
|
111
|
14 Link Time Optimization (LTO) gives GCC the capability of
|
|
15 dumping its internal representation (GIMPLE) to disk,
|
|
16 so that all the different compilation units that make up
|
|
17 a single executable can be optimized as a single module.
|
|
18 This expands the scope of inter-procedural optimizations
|
|
19 to encompass the whole program (or, rather, everything
|
|
20 that is visible at link time).
|
|
21
|
|
22 @menu
|
|
23 * LTO Overview:: Overview of LTO.
|
|
24 * LTO object file layout:: LTO file sections in ELF.
|
|
25 * IPA:: Using summary information in IPA passes.
|
|
26 * WHOPR:: Whole program assumptions,
|
|
27 linker plugin and symbol visibilities.
|
|
28 * Internal flags:: Internal flags controlling @code{lto1}.
|
|
29 @end menu
|
|
30
|
|
31 @node LTO Overview
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
32 @section Design Overview
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
33
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
34 Link time optimization is implemented as a GCC front end for a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
35 bytecode representation of GIMPLE that is emitted in special sections
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
36 of @code{.o} files. Currently, LTO support is enabled in most
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
37 ELF-based systems, as well as darwin, cygwin and mingw systems.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
38
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
39 Since GIMPLE bytecode is saved alongside final object code, object
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
40 files generated with LTO support are larger than regular object files.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
41 This ``fat'' object format makes it easy to integrate LTO into
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
42 existing build systems, as one can, for instance, produce archives of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
43 the files. Additionally, one might be able to ship one set of fat
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
44 objects which could be used both for development and the production of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
45 optimized builds. A, perhaps surprising, side effect of this feature
|
111
|
46 is that any mistake in the toolchain leads to LTO information not
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
47 being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
48 This is both an advantage, as the system is more robust, and a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
49 disadvantage, as the user is not informed that the optimization has
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
50 been disabled.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
51
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
52 The current implementation only produces ``fat'' objects, effectively
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
53 doubling compilation time and increasing file sizes up to 5x the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
54 original size. This hides the problem that some tools, such as
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
55 @code{ar} and @code{nm}, need to understand symbol tables of LTO
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
56 sections. These tools were extended to use the plugin infrastructure,
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
57 and with these problems solved, GCC will also support ``slim'' objects
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
58 consisting of the intermediate code alone.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
59
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
60 At the highest level, LTO splits the compiler in two. The first half
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
61 (the ``writer'') produces a streaming representation of all the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
62 internal data structures needed to optimize and generate code. This
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
63 includes declarations, types, the callgraph and the GIMPLE representation
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
64 of function bodies.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
65
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
66 When @option{-flto} is given during compilation of a source file, the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
67 pass manager executes all the passes in @code{all_lto_gen_passes}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
68 Currently, this phase is composed of two IPA passes:
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
69
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
70 @itemize @bullet
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
71 @item @code{pass_ipa_lto_gimple_out}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
72 This pass executes the function @code{lto_output} in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
73 @file{lto-streamer-out.c}, which traverses the call graph encoding
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
74 every reachable declaration, type and function. This generates a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
75 memory representation of all the file sections described below.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
76
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
77 @item @code{pass_ipa_lto_finish_out}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
78 This pass executes the function @code{produce_asm_for_decls} in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
79 @file{lto-streamer-out.c}, which takes the memory image built in the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
80 previous pass and encodes it in the corresponding ELF file sections.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
81 @end itemize
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
82
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
83 The second half of LTO support is the ``reader''. This is implemented
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
84 as the GCC front end @file{lto1} in @file{lto/lto.c}. When
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
85 @file{collect2} detects a link set of @code{.o}/@code{.a} files with
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
86 LTO information and the @option{-flto} is enabled, it invokes
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
87 @file{lto1} which reads the set of files and aggregates them into a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
88 single translation unit for optimization. The main entry point for
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
89 the reader is @file{lto/lto.c}:@code{lto_main}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
90
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
91 @subsection LTO modes of operation
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
92
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
93 One of the main goals of the GCC link-time infrastructure was to allow
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
94 effective compilation of large programs. For this reason GCC implements two
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
95 link-time compilation modes.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
96
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
97 @enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
98 @item @emph{LTO mode}, in which the whole program is read into the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
99 compiler at link-time and optimized in a similar way as if it
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
100 were a single source-level compilation unit.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
101
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
102 @item @emph{WHOPR or partitioned mode}, designed to utilize multiple
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
103 CPUs and/or a distributed compilation environment to quickly link
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
104 large applications. WHOPR stands for WHOle Program optimizeR (not to
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
105 be confused with the semantics of @option{-fwhole-program}). It
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
106 partitions the aggregated callgraph from many different @code{.o}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
107 files and distributes the compilation of the sub-graphs to different
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
108 CPUs.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
109
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
110 Note that distributed compilation is not implemented yet, but since
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
111 the parallelism is facilitated via generating a @code{Makefile}, it
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
112 would be easy to implement.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
113 @end enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
114
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
115 WHOPR splits LTO into three main stages:
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
116 @enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
117 @item Local generation (LGEN)
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
118 This stage executes in parallel. Every file in the program is compiled
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
119 into the intermediate language and packaged together with the local
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
120 call-graph and summary information. This stage is the same for both
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
121 the LTO and WHOPR compilation mode.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
122
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
123 @item Whole Program Analysis (WPA)
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
124 WPA is performed sequentially. The global call-graph is generated, and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
125 a global analysis procedure makes transformation decisions. The global
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
126 call-graph is partitioned to facilitate parallel optimization during
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
127 phase 3. The results of the WPA stage are stored into new object files
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
128 which contain the partitions of program expressed in the intermediate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
129 language and the optimization decisions.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
130
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
131 @item Local transformations (LTRANS)
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
132 This stage executes in parallel. All the decisions made during phase 2
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
133 are implemented locally in each partitioned object file, and the final
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
134 object code is generated. Optimizations which cannot be decided
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
135 efficiently during the phase 2 may be performed on the local
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
136 call-graph partitions.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
137 @end enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
138
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
139 WHOPR can be seen as an extension of the usual LTO mode of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
140 compilation. In LTO, WPA and LTRANS are executed within a single
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
141 execution of the compiler, after the whole program has been read into
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
142 memory.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
143
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
144 When compiling in WHOPR mode, the callgraph is partitioned during
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
145 the WPA stage. The whole program is split into a given number of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
146 partitions of roughly the same size. The compiler tries to
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
147 minimize the number of references which cross partition boundaries.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
148 The main advantage of WHOPR is to allow the parallel execution of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
149 LTRANS stages, which are the most time-consuming part of the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
150 compilation process. Additionally, it avoids the need to load the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
151 whole program into memory.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
152
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
153
|
111
|
154 @node LTO object file layout
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
155 @section LTO file sections
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
156
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
157 LTO information is stored in several ELF sections inside object files.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
158 Data structures and enum codes for sections are defined in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
159 @file{lto-streamer.h}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
160
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
161 These sections are emitted from @file{lto-streamer-out.c} and mapped
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
162 in all at once from @file{lto/lto.c}:@code{lto_file_read}. The
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
163 individual functions dealing with the reading/writing of each section
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
164 are described below.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
165
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
166 @itemize @bullet
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
167 @item Command line options (@code{.gnu.lto_.opts})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
168
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
169 This section contains the command line options used to generate the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
170 object files. This is used at link time to determine the optimization
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
171 level and other settings when they are not explicitly specified at the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
172 linker command line.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
173
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
174 Currently, GCC does not support combining LTO object files compiled
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
175 with different set of the command line options into a single binary.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
176 At link time, the options given on the command line and the options
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
177 saved on all the files in a link-time set are applied globally. No
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
178 attempt is made at validating the combination of flags (other than the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
179 usual validation done by option processing). This is implemented in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
180 @file{lto/lto.c}:@code{lto_read_all_file_options}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
181
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
182
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
183 @item Symbol table (@code{.gnu.lto_.symtab})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
184
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
185 This table replaces the ELF symbol table for functions and variables
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
186 represented in the LTO IL. Symbols used and exported by the optimized
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
187 assembly code of ``fat'' objects might not match the ones used and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
188 exported by the intermediate code. This table is necessary because
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
189 the intermediate code is less optimized and thus requires a separate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
190 symbol table.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
191
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
192 Additionally, the binary code in the ``fat'' object will lack a call
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
193 to a function, since the call was optimized out at compilation time
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
194 after the intermediate language was streamed out. In some special
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
195 cases, the same optimization may not happen during link-time
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
196 optimization. This would lead to an undefined symbol if only one
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
197 symbol table was used.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
198
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
199 The symbol table is emitted in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
200 @file{lto-streamer-out.c}:@code{produce_symtab}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
201
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
202
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
203 @item Global declarations and types (@code{.gnu.lto_.decls})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
204
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
205 This section contains an intermediate language dump of all
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
206 declarations and types required to represent the callgraph, static
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
207 variables and top-level debug info.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
208
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
209 The contents of this section are emitted in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
210 @file{lto-streamer-out.c}:@code{produce_asm_for_decls}. Types and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
211 symbols are emitted in a topological order that preserves the sharing
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
212 of pointers when the file is read back in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
213 (@file{lto.c}:@code{read_cgraph_and_symbols}).
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
214
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
215
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
216 @item The callgraph (@code{.gnu.lto_.cgraph})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
217
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
218 This section contains the basic data structure used by the GCC
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
219 inter-procedural optimization infrastructure. This section stores an
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
220 annotated multi-graph which represents the functions and call sites as
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
221 well as the variables, aliases and top-level @code{asm} statements.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
222
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
223 This section is emitted in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
224 @file{lto-streamer-out.c}:@code{output_cgraph} and read in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
225 @file{lto-cgraph.c}:@code{input_cgraph}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
226
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
227
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
228 @item IPA references (@code{.gnu.lto_.refs})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
229
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
230 This section contains references between function and static
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
231 variables. It is emitted by @file{lto-cgraph.c}:@code{output_refs}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
232 and read by @file{lto-cgraph.c}:@code{input_refs}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
233
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
234
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
235 @item Function bodies (@code{.gnu.lto_.function_body.<name>})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
236
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
237 This section contains function bodies in the intermediate language
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
238 representation. Every function body is in a separate section to allow
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
239 copying of the section independently to different object files or
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
240 reading the function on demand.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
241
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
242 Functions are emitted in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
243 @file{lto-streamer-out.c}:@code{output_function} and read in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
244 @file{lto-streamer-in.c}:@code{input_function}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
245
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
246
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
247 @item Static variable initializers (@code{.gnu.lto_.vars})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
248
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
249 This section contains all the symbols in the global variable pool. It
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
250 is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
251 @file{lto-cgraph.c}:@code{input_cgraph}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
252
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
253 @item Summaries and optimization summaries used by IPA passes
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
254 (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
255 @code{pureconst} or @code{reference})
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
256
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
257 These sections are used by IPA passes that need to emit summary
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
258 information during LTO generation to be read and aggregated at
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
259 link time. Each pass is responsible for implementing two pass manager
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
260 hooks: one for writing the summary and another for reading it in. The
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
261 format of these sections is entirely up to each individual pass. The
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
262 only requirement is that the writer and reader hooks agree on the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
263 format.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
264 @end itemize
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
265
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
266
|
111
|
267 @node IPA
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
268 @section Using summary information in IPA passes
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
269
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
270 Programs are represented internally as a @emph{callgraph} (a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
271 multi-graph where nodes are functions and edges are call sites)
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
272 and a @emph{varpool} (a list of static and external variables in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
273 the program).
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
274
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
275 The inter-procedural optimization is organized as a sequence of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
276 individual passes, which operate on the callgraph and the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
277 varpool. To make the implementation of WHOPR possible, every
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
278 inter-procedural optimization pass is split into several stages
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
279 that are executed at different times during WHOPR compilation:
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
280
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
281 @itemize @bullet
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
282 @item LGEN time
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
283 @enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
284 @item @emph{Generate summary} (@code{generate_summary} in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
285 @code{struct ipa_opt_pass_d}). This stage analyzes every function
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
286 body and variable initializer is examined and stores relevant
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
287 information into a pass-specific data structure.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
288
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
289 @item @emph{Write summary} (@code{write_summary} in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
290 @code{struct ipa_opt_pass_d}). This stage writes all the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
291 pass-specific information generated by @code{generate_summary}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
292 Summaries go into their own @code{LTO_section_*} sections that
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
293 have to be declared in @file{lto-streamer.h}:@code{enum
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
294 lto_section_type}. A new section is created by calling
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
295 @code{create_output_block} and data can be written using the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
296 @code{lto_output_*} routines.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
297 @end enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
298
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
299 @item WPA time
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
300 @enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
301 @item @emph{Read summary} (@code{read_summary} in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
302 @code{struct ipa_opt_pass_d}). This stage reads all the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
303 pass-specific information in exactly the same order that it was
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
304 written by @code{write_summary}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
305
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
306 @item @emph{Execute} (@code{execute} in @code{struct
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
307 opt_pass}). This performs inter-procedural propagation. This
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
308 must be done without actual access to the individual function
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
309 bodies or variable initializers. Typically, this results in a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
310 transitive closure operation over the summary information of all
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
311 the nodes in the callgraph.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
312
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
313 @item @emph{Write optimization summary}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
314 (@code{write_optimization_summary} in @code{struct
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
315 ipa_opt_pass_d}). This writes the result of the inter-procedural
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
316 propagation into the object file. This can use the same data
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
317 structures and helper routines used in @code{write_summary}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
318 @end enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
319
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
320 @item LTRANS time
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
321 @enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
322 @item @emph{Read optimization summary}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
323 (@code{read_optimization_summary} in @code{struct
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
324 ipa_opt_pass_d}). The counterpart to
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
325 @code{write_optimization_summary}. This reads the interprocedural
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
326 optimization decisions in exactly the same format emitted by
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
327 @code{write_optimization_summary}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
328
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
329 @item @emph{Transform} (@code{function_transform} and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
330 @code{variable_transform} in @code{struct ipa_opt_pass_d}).
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
331 The actual function bodies and variable initializers are updated
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
332 based on the information passed down from the @emph{Execute} stage.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
333 @end enumerate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
334 @end itemize
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
335
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
336 The implementation of the inter-procedural passes are shared
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
337 between LTO, WHOPR and classic non-LTO compilation.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
338
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
339 @itemize
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
340 @item During the traditional file-by-file mode every pass executes its
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
341 own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
342 stages within the single execution context of the compiler.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
343
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
344 @item In LTO compilation mode, every pass uses @emph{Generate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
345 summary} and @emph{Write summary} stages at compilation time,
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
346 while the @emph{Read summary}, @emph{Execute}, and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
347 @emph{Transform} stages are executed at link time.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
348
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
349 @item In WHOPR mode all stages are used.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
350 @end itemize
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
351
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
352 To simplify development, the GCC pass manager differentiates
|
145
|
353 between normal inter-procedural passes (@pxref{Regular IPA passes}),
|
|
354 small inter-procedural passes (@pxref{Small IPA passes})
|
|
355 and late inter-procedural passes (@pxref{Late IPA passes}).
|
|
356 A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does
|
|
357 everything at once and thus cannot be executed during WPA in
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
358 WHOPR mode. It defines only the @emph{Execute} stage and during
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
359 this stage it accesses and modifies the function bodies. Such
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
360 passes are useful for optimization at LGEN or LTRANS time and are
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
361 used, for example, to implement early optimization before writing
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
362 object files. The simple inter-procedural passes can also be used
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
363 for easier prototyping and development of a new inter-procedural
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
364 pass.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
365
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
366
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
367 @subsection Virtual clones
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
368
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
369 One of the main challenges of introducing the WHOPR compilation
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
370 mode was addressing the interactions between optimization passes.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
371 In LTO compilation mode, the passes are executed in a sequence,
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
372 each of which consists of analysis (or @emph{Generate summary}),
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
373 propagation (or @emph{Execute}) and @emph{Transform} stages.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
374 Once the work of one pass is finished, the next pass sees the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
375 updated program representation and can execute. This makes the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
376 individual passes dependent on each other.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
377
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
378 In WHOPR mode all passes first execute their @emph{Generate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
379 summary} stage. Then summary writing marks the end of the LGEN
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
380 stage. At WPA time,
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
381 the summaries are read back into memory and all passes run the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
382 @emph{Execute} stage. Optimization summaries are streamed and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
383 sent to LTRANS, where all the passes execute the @emph{Transform}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
384 stage.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
385
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
386 Most optimization passes split naturally into analysis,
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
387 propagation and transformation stages. But some do not. The
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
388 main problem arises when one pass performs changes and the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
389 following pass gets confused by seeing different callgraphs
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
390 between the @emph{Transform} stage and the @emph{Generate summary}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
391 or @emph{Execute} stage. This means that the passes are required
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
392 to communicate their decisions with each other.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
393
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
394 To facilitate this communication, the GCC callgraph
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
395 infrastructure implements @emph{virtual clones}, a method of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
396 representing the changes performed by the optimization passes in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
397 the callgraph without needing to update function bodies.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
398
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
399 A @emph{virtual clone} in the callgraph is a function that has no
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
400 associated body, just a description of how to create its body based
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
401 on a different function (which itself may be a virtual clone).
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
402
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
403 The description of function modifications includes adjustments to
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
404 the function's signature (which allows, for example, removing or
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
405 adding function arguments), substitutions to perform on the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
406 function body, and, for inlined functions, a pointer to the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
407 function that it will be inlined into.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
408
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
409 It is also possible to redirect any edge of the callgraph from a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
410 function to its virtual clone. This implies updating of the call
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
411 site to adjust for the new function signature.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
412
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
413 Most of the transformations performed by inter-procedural
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
414 optimizations can be represented via virtual clones. For
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
415 instance, a constant propagation pass can produce a virtual clone
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
416 of the function which replaces one of its arguments by a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
417 constant. The inliner can represent its decisions by producing a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
418 clone of a function whose body will be later integrated into
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
419 a given function.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
420
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
421 Using @emph{virtual clones}, the program can be easily updated
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
422 during the @emph{Execute} stage, solving most of pass interactions
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
423 problems that would otherwise occur during @emph{Transform}.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
424
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
425 Virtual clones are later materialized in the LTRANS stage and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
426 turned into real functions. Passes executed after the virtual
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
427 clone were introduced also perform their @emph{Transform} stage
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
428 on new functions, so for a pass there is no significant
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
429 difference between operating on a real function or a virtual
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
430 clone introduced before its @emph{Execute} stage.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
431
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
432 Optimization passes then work on virtual clones introduced before
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
433 their @emph{Execute} stage as if they were real functions. The
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
434 only difference is that clones are not visible during the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
435 @emph{Generate Summary} stage.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
436
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
437 To keep function summaries updated, the callgraph interface
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
438 allows an optimizer to register a callback that is called every
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
439 time a new clone is introduced as well as when the actual
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
440 function or variable is generated or when a function or variable
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
441 is removed. These hooks are registered in the @emph{Generate
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
442 summary} stage and allow the pass to keep its information intact
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
443 until the @emph{Execute} stage. The same hooks can also be
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
444 registered during the @emph{Execute} stage to keep the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
445 optimization summaries updated for the @emph{Transform} stage.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
446
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
447 @subsection IPA references
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
448
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
449 GCC represents IPA references in the callgraph. For a function
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
450 or variable @code{A}, the @emph{IPA reference} is a list of all
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
451 locations where the address of @code{A} is taken and, when
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
452 @code{A} is a variable, a list of all direct stores and reads
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
453 to/from @code{A}. References represent an oriented multi-graph on
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
454 the union of nodes of the callgraph and the varpool. See
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
455 @file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
456 and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
457 @file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
458 for details.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
459
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
460 @subsection Jump functions
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
461 Suppose that an optimization pass sees a function @code{A} and it
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
462 knows the values of (some of) its arguments. The @emph{jump
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
463 function} describes the value of a parameter of a given function
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
464 call in function @code{A} based on this knowledge.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
465
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
466 Jump functions are used by several optimizations, such as the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
467 inter-procedural constant propagation pass and the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
468 devirtualization pass. The inliner also uses jump functions to
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
469 perform inlining of callbacks.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
470
|
111
|
471 @node WHOPR
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
472 @section Whole program assumptions, linker plugin and symbol visibilities
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
473
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
474 Link-time optimization gives relatively minor benefits when used
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
475 alone. The problem is that propagation of inter-procedural
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
476 information does not work well across functions and variables
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
477 that are called or referenced by other compilation units (such as
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
478 from a dynamically linked library). We say that such functions
|
111
|
479 and variables are @emph{externally visible}.
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
480
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
481 To make the situation even more difficult, many applications
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
482 organize themselves as a set of shared libraries, and the default
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
483 ELF visibility rules allow one to overwrite any externally
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
484 visible symbol with a different symbol at runtime. This
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
485 basically disables any optimizations across such functions and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
486 variables, because the compiler cannot be sure that the function
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
487 body it is seeing is the same function body that will be used at
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
488 runtime. Any function or variable not declared @code{static} in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
489 the sources degrades the quality of inter-procedural
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
490 optimization.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
491
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
492 To avoid this problem the compiler must assume that it sees the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
493 whole program when doing link-time optimization. Strictly
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
494 speaking, the whole program is rarely visible even at link-time.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
495 Standard system libraries are usually linked dynamically or not
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
496 provided with the link-time information. In GCC, the whole
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
497 program option (@option{-fwhole-program}) asserts that every
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
498 function and variable defined in the current compilation
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
499 unit is static, except for function @code{main} (note: at
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
500 link time, the current unit is the union of all objects compiled
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
501 with LTO). Since some functions and variables need to
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
502 be referenced externally, for example by another DSO or from an
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
503 assembler file, GCC also provides the function and variable
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
504 attribute @code{externally_visible} which can be used to disable
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
505 the effect of @option{-fwhole-program} on a specific symbol.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
506
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
507 The whole program mode assumptions are slightly more complex in
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
508 C++, where inline functions in headers are put into @emph{COMDAT}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
509 sections. COMDAT function and variables can be defined by
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
510 multiple object files and their bodies are unified at link-time
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
511 and dynamic link-time. COMDAT functions are changed to local only
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
512 when their address is not taken and thus un-sharing them with a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
513 library is not harmful. COMDAT variables always remain externally
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
514 visible, however for readonly variables it is assumed that their
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
515 initializers cannot be overwritten by a different value.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
516
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
517 GCC provides the function and variable attribute
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
518 @code{visibility} that can be used to specify the visibility of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
519 externally visible symbols (or alternatively an
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
520 @option{-fdefault-visibility} command line option). ELF defines
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
521 the @code{default}, @code{protected}, @code{hidden} and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
522 @code{internal} visibilities.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
523
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
524 The most commonly used is visibility is @code{hidden}. It
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
525 specifies that the symbol cannot be referenced from outside of
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
526 the current shared library. Unfortunately, this information
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
527 cannot be used directly by the link-time optimization in the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
528 compiler since the whole shared library also might contain
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
529 non-LTO objects and those are not visible to the compiler.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
530
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
531 GCC solves this problem using linker plugins. A @emph{linker
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
532 plugin} is an interface to the linker that allows an external
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
533 program to claim the ownership of a given object file. The linker
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
534 then performs the linking procedure by querying the plugin about
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
535 the symbol table of the claimed objects and once the linking
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
536 decisions are complete, the plugin is allowed to provide the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
537 final object file before the actual linking is made. The linker
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
538 plugin obtains the symbol resolution information which specifies
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
539 which symbols provided by the claimed objects are bound from the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
540 rest of a binary being linked.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
541
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
542 GCC is designed to be independent of the rest of the toolchain
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
543 and aims to support linkers without plugin support. For this
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
544 reason it does not use the linker plugin by default. Instead,
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
545 the object files are examined by @command{collect2} before being
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
546 passed to the linker and objects found to have LTO sections are
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
547 passed to @command{lto1} first. This mode does not work for
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
548 library archives. The decision on what object files from the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
549 archive are needed depends on the actual linking and thus GCC
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
550 would have to implement the linker itself. The resolution
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
551 information is missing too and thus GCC needs to make an educated
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
552 guess based on @option{-fwhole-program}. Without the linker
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
553 plugin GCC also assumes that symbols are declared @code{hidden}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
554 and not referred by non-LTO code by default.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
555
|
111
|
556 @node Internal flags
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
557 @section Internal flags controlling @code{lto1}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
558
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
559 The following flags are passed into @command{lto1} and are not
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
560 meant to be used directly from the command line.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
561
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
562 @itemize
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
563 @item -fwpa
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
564 @opindex fwpa
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
565 This option runs the serial part of the link-time optimizer
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
566 performing the inter-procedural propagation (WPA mode). The
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
567 compiler reads in summary information from all inputs and
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
568 performs an analysis based on summary information only. It
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
569 generates object files for subsequent runs of the link-time
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
570 optimizer where individual object files are optimized using both
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
571 summary information from the WPA mode and the actual function
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
572 bodies. It then drives the LTRANS phase.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
573
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
574 @item -fltrans
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
575 @opindex fltrans
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
576 This option runs the link-time optimizer in the
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
577 local-transformation (LTRANS) mode, which reads in output from a
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
578 previous run of the LTO in WPA mode. In the LTRANS mode, LTO
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
579 optimizes an object and produces the final assembly.
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
580
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
581 @item -fltrans-output-list=@var{file}
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
582 @opindex fltrans-output-list
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
583 This option specifies a file to which the names of LTRANS output
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
584 files are written. This option is only meaningful in conjunction
|
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
585 with @option{-fwpa}.
|
111
|
586
|
|
587 @item -fresolution=@var{file}
|
|
588 @opindex fresolution
|
|
589 This option specifies the linker resolution file. This option is
|
|
590 only meaningful in conjunction with @option{-fwpa} and as option
|
|
591 to pass through to the LTO linker plugin.
|
68
Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
592 @end itemize
|