Mercurial > hg > CbC > CbC_gcc
annotate gcc/doc/passes.texi @ 145:1830386684a0
gcc-9.2.0
author | anatofuz |
---|---|
date | Thu, 13 Feb 2020 11:34:05 +0900 |
parents | 84e7813d76e9 |
children |
rev | line source |
---|---|
111 | 1 @c markers: BUG TODO |
0 | 2 |
145 | 3 @c Copyright (C) 1988-2020 Free Software Foundation, Inc. |
0 | 4 @c This is part of the GCC manual. |
5 @c For copying conditions, see the file gcc.texi. | |
6 | |
7 @node Passes | |
8 @chapter Passes and Files of the Compiler | |
9 @cindex passes and files of the compiler | |
10 @cindex files and passes of the compiler | |
11 @cindex compiler passes and files | |
111 | 12 @cindex pass dumps |
0 | 13 |
14 This chapter is dedicated to giving an overview of the optimization and | |
15 code generation passes of the compiler. In the process, it describes | |
16 some of the language front end interface, though this description is no | |
17 where near complete. | |
18 | |
19 @menu | |
20 * Parsing pass:: The language front end turns text into bits. | |
21 * Gimplification pass:: The bits are turned into something we can optimize. | |
22 * Pass manager:: Sequencing the optimization passes. | |
145 | 23 * IPA passes:: Inter-procedural optimizations. |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
24 * Tree SSA passes:: Optimizations on a high-level representation. |
0 | 25 * RTL passes:: Optimizations on a low-level representation. |
111 | 26 * Optimization info:: Dumping optimization information from passes. |
0 | 27 @end menu |
28 | |
29 @node Parsing pass | |
30 @section Parsing pass | |
31 @cindex GENERIC | |
32 @findex lang_hooks.parse_file | |
33 The language front end is invoked only once, via | |
34 @code{lang_hooks.parse_file}, to parse the entire input. The language | |
35 front end may use any intermediate language representation deemed | |
111 | 36 appropriate. The C front end uses GENERIC trees (@pxref{GENERIC}), plus |
0 | 37 a double handful of language specific tree codes defined in |
38 @file{c-common.def}. The Fortran front end uses a completely different | |
39 private representation. | |
40 | |
41 @cindex GIMPLE | |
42 @cindex gimplification | |
43 @cindex gimplifier | |
44 @cindex language-independent intermediate representation | |
45 @cindex intermediate representation lowering | |
46 @cindex lowering, language-dependent intermediate representation | |
47 At some point the front end must translate the representation used in the | |
48 front end to a representation understood by the language-independent | |
49 portions of the compiler. Current practice takes one of two forms. | |
111 | 50 The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function, |
0 | 51 and uses the gimplifier callbacks to convert the language-specific tree |
111 | 52 nodes directly to GIMPLE before passing the function off to be compiled. |
0 | 53 The Fortran front end converts from a private representation to GENERIC, |
54 which is later lowered to GIMPLE when the function is compiled. Which | |
55 route to choose probably depends on how well GENERIC (plus extensions) | |
56 can be made to match up with the source language and necessary parsing | |
57 data structures. | |
58 | |
59 BUG: Gimplification must occur before nested function lowering, | |
60 and nested function lowering must be done by the front end before | |
61 passing the data off to cgraph. | |
62 | |
63 TODO: Cgraph should control nested function lowering. It would | |
64 only be invoked when it is certain that the outer-most function | |
65 is used. | |
66 | |
67 TODO: Cgraph needs a gimplify_function callback. It should be | |
68 invoked when (1) it is certain that the function is used, (2) | |
69 warning flags specified by the user require some amount of | |
70 compilation in order to honor, (3) the language indicates that | |
71 semantic analysis is not complete until gimplification occurs. | |
72 Hum@dots{} this sounds overly complicated. Perhaps we should just | |
73 have the front end gimplify always; in most cases it's only one | |
74 function call. | |
75 | |
76 The front end needs to pass all function definitions and top level | |
77 declarations off to the middle-end so that they can be compiled and | |
78 emitted to the object file. For a simple procedural language, it is | |
79 usually most convenient to do this as each top level declaration or | |
80 definition is seen. There is also a distinction to be made between | |
81 generating functional code and generating complete debug information. | |
82 The only thing that is absolutely required for functional code is that | |
83 function and data @emph{definitions} be passed to the middle-end. For | |
84 complete debug information, function, data and type declarations | |
85 should all be passed as well. | |
86 | |
87 @findex rest_of_decl_compilation | |
88 @findex rest_of_type_compilation | |
89 @findex cgraph_finalize_function | |
90 In any case, the front end needs each complete top-level function or | |
91 data declaration, and each data definition should be passed to | |
92 @code{rest_of_decl_compilation}. Each complete type definition should | |
93 be passed to @code{rest_of_type_compilation}. Each function definition | |
94 should be passed to @code{cgraph_finalize_function}. | |
95 | |
96 TODO: I know rest_of_compilation currently has all sorts of | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
97 RTL generation semantics. I plan to move all code generation |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
98 bits (both Tree and RTL) to compile_function. Should we hide |
0 | 99 cgraph from the front ends and move back to rest_of_compilation |
100 as the official interface? Possibly we should rename all three | |
101 interfaces such that the names match in some meaningful way and | |
102 that is more descriptive than "rest_of". | |
103 | |
104 The middle-end will, at its option, emit the function and data | |
105 definitions immediately or queue them for later processing. | |
106 | |
107 @node Gimplification pass | |
108 @section Gimplification pass | |
109 | |
110 @cindex gimplification | |
111 @cindex GIMPLE | |
112 @dfn{Gimplification} is a whimsical term for the process of converting | |
113 the intermediate representation of a function into the GIMPLE language | |
111 | 114 (@pxref{GIMPLE}). The term stuck, and so words like ``gimplification'', |
0 | 115 ``gimplify'', ``gimplifier'' and the like are sprinkled throughout this |
116 section of code. | |
117 | |
118 While a front end may certainly choose to generate GIMPLE directly if | |
119 it chooses, this can be a moderately complex process unless the | |
120 intermediate language used by the front end is already fairly simple. | |
121 Usually it is easier to generate GENERIC trees plus extensions | |
122 and let the language-independent gimplifier do most of the work. | |
123 | |
124 @findex gimplify_function_tree | |
125 @findex gimplify_expr | |
126 @findex lang_hooks.gimplify_expr | |
127 The main entry point to this pass is @code{gimplify_function_tree} | |
128 located in @file{gimplify.c}. From here we process the entire | |
129 function gimplifying each statement in turn. The main workhorse | |
130 for this pass is @code{gimplify_expr}. Approximately everything | |
131 passes through here at least once, and it is from here that we | |
132 invoke the @code{lang_hooks.gimplify_expr} callback. | |
133 | |
134 The callback should examine the expression in question and return | |
135 @code{GS_UNHANDLED} if the expression is not a language specific | |
136 construct that requires attention. Otherwise it should alter the | |
137 expression in some way to such that forward progress is made toward | |
138 producing valid GIMPLE@. If the callback is certain that the | |
139 transformation is complete and the expression is valid GIMPLE, it | |
140 should return @code{GS_ALL_DONE}. Otherwise it should return | |
141 @code{GS_OK}, which will cause the expression to be processed again. | |
142 If the callback encounters an error during the transformation (because | |
143 the front end is relying on the gimplification process to finish | |
144 semantic checks), it should return @code{GS_ERROR}. | |
145 | |
146 @node Pass manager | |
147 @section Pass manager | |
148 | |
149 The pass manager is located in @file{passes.c}, @file{tree-optimize.c} | |
150 and @file{tree-pass.h}. | |
111 | 151 It processes passes as described in @file{passes.def}. |
0 | 152 Its job is to run all of the individual passes in the correct order, |
153 and take care of standard bookkeeping that applies to every pass. | |
154 | |
155 The theory of operation is that each pass defines a structure that | |
156 represents everything we need to know about that pass---when it | |
157 should be run, how it should be run, what intermediate language | |
158 form or on-the-side data structures it needs. We register the pass | |
159 to be run in some particular order, and the pass manager arranges | |
160 for everything to happen in the correct order. | |
161 | |
162 The actuality doesn't completely live up to the theory at present. | |
163 Command-line switches and @code{timevar_id_t} enumerations must still | |
164 be defined elsewhere. The pass manager validates constraints but does | |
165 not attempt to (re-)generate data structures or lower intermediate | |
166 language form based on the requirements of the next pass. Nevertheless, | |
167 what is present is useful, and a far sight better than nothing at all. | |
168 | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
169 Each pass should have a unique name. |
0 | 170 Each pass may have its own dump file (for GCC debugging purposes). |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
171 Passes with a name starting with a star do not dump anything. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
172 Sometimes passes are supposed to share a dump file / option name. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
173 To still give these unique names, you can use a prefix that is delimited |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
174 by a space from the part that is used for the dump file / option name. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
175 E.g. When the pass name is "ud dce", the name used for dump file/options |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
176 is "dce". |
0 | 177 |
178 TODO: describe the global variables set up by the pass manager, | |
179 and a brief description of how a new pass should use it. | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
180 I need to look at what info RTL passes use first@enddots{} |
0 | 181 |
145 | 182 @node IPA passes |
183 @section Inter-procedural optimization passes | |
184 @cindex IPA passes | |
185 @cindex inter-procedural optimization passes | |
186 | |
187 The inter-procedural optimization (IPA) passes use call graph | |
188 information to perform transformations across function boundaries. | |
189 IPA is a critical part of link-time optimization (LTO) and | |
190 whole-program (WHOPR) optimization, and these passes are structured | |
191 with the needs of LTO and WHOPR in mind by dividing their operations | |
192 into stages. For detailed discussion of the LTO/WHOPR IPA pass stages | |
193 and interfaces, see @ref{IPA}. | |
194 | |
195 The following briefly describes the inter-procedural optimization (IPA) | |
196 passes, which are split into small IPA passes, regular IPA passes, | |
197 and late IPA passes, according to the LTO/WHOPR processing model. | |
198 | |
199 @menu | |
200 * Small IPA passes:: | |
201 * Regular IPA passes:: | |
202 * Late IPA passes:: | |
203 @end menu | |
204 | |
205 @node Small IPA passes | |
206 @subsection Small IPA passes | |
207 @cindex small IPA passes | |
208 A small IPA pass is a pass derived from @code{simple_ipa_opt_pass}. | |
209 As described in @ref{IPA}, it does everything at once and | |
210 defines only the @emph{Execute} stage. During this | |
211 stage it accesses and modifies the function bodies. | |
212 No @code{generate_summary}, @code{read_summary}, or @code{write_summary} | |
213 hooks are defined. | |
214 | |
215 @itemize @bullet | |
216 @item IPA free lang data | |
217 | |
218 This pass frees resources that are used by the front end but are | |
219 not needed once it is done. It is located in @file{tree.c} and is described by | |
220 @code{pass_ipa_free_lang_data}. | |
221 | |
222 @item IPA function and variable visibility | |
223 | |
224 This is a local function pass handling visibilities of all symbols. This | |
225 happens before LTO streaming, so @option{-fwhole-program} should be ignored | |
226 at this level. It is located in @file{ipa-visibility.c} and is described by | |
227 @code{pass_ipa_function_and_variable_visibility}. | |
228 | |
229 @item IPA remove symbols | |
230 | |
231 This pass performs reachability analysis and reclaims all unreachable nodes. | |
232 It is located in @file{passes.c} and is described by | |
233 @code{pass_ipa_remove_symbols}. | |
234 | |
235 @item IPA OpenACC | |
236 | |
237 This is a pass group for OpenACC processing. It is located in | |
238 @file{tree-ssa-loop.c} and is described by @code{pass_ipa_oacc}. | |
239 | |
240 @item IPA points-to analysis | |
241 | |
242 This is a tree-based points-to analysis pass. The idea behind this analyzer | |
243 is to generate set constraints from the program, then solve the resulting | |
244 constraints in order to generate the points-to sets. It is located in | |
245 @file{tree-ssa-structalias.c} and is described by @code{pass_ipa_pta}. | |
246 | |
247 @item IPA OpenACC kernels | |
248 | |
249 This is a pass group for processing OpenACC kernels regions. It is a | |
250 subpass of the IPA OpenACC pass group that runs on offloaded functions | |
251 containing OpenACC kernels loops. It is located in | |
252 @file{tree-ssa-loop.c} and is described by | |
253 @code{pass_ipa_oacc_kernels}. | |
254 | |
255 @item Target clone | |
256 | |
257 This is a pass for parsing functions with multiple target attributes. | |
258 It is located in @file{multiple_target.c} and is described by | |
259 @code{pass_target_clone}. | |
260 | |
261 @item IPA auto profile | |
262 | |
263 This pass uses AutoFDO profiling data to annotate the control flow graph. | |
264 It is located in @file{auto-profile.c} and is described by | |
265 @code{pass_ipa_auto_profile}. | |
266 | |
267 @item IPA tree profile | |
268 | |
269 This pass does profiling for all functions in the call graph. | |
270 It calculates branch | |
271 probabilities and basic block execution counts. It is located | |
272 in @file{tree-profile.c} and is described by @code{pass_ipa_tree_profile}. | |
273 | |
274 @item IPA free function summary | |
275 | |
276 This pass is a small IPA pass when argument @code{small_p} is true. | |
277 It releases inline function summaries and call summaries. | |
278 It is located in @file{ipa-fnsummary.c} and is described by | |
279 @code{pass_ipa_free_free_fn_summary}. | |
280 | |
281 @item IPA increase alignment | |
282 | |
283 This pass increases the alignment of global arrays to improve | |
284 vectorization. It is located in @file{tree-vectorizer.c} | |
285 and is described by @code{pass_ipa_increase_alignment}. | |
286 | |
287 @item IPA transactional memory | |
288 | |
289 This pass is for transactional memory support. | |
290 It is located in @file{trans-mem.c} and is described by | |
291 @code{pass_ipa_tm}. | |
292 | |
293 @item IPA lower emulated TLS | |
294 | |
295 This pass lowers thread-local storage (TLS) operations | |
296 to emulation functions provided by libgcc. | |
297 It is located in @file{tree-emutls.c} and is described by | |
298 @code{pass_ipa_lower_emutls}. | |
299 | |
300 @end itemize | |
301 | |
302 @node Regular IPA passes | |
303 @subsection Regular IPA passes | |
304 @cindex regular IPA passes | |
305 | |
306 A regular IPA pass is a pass derived from @code{ipa_opt_pass_d} that | |
307 is executed in WHOPR compilation. Regular IPA passes may have summary | |
308 hooks implemented in any of the LGEN, WPA or LTRANS stages (@pxref{IPA}). | |
309 | |
310 @itemize @bullet | |
311 @item IPA whole program visibility | |
312 | |
313 This pass performs various optimizations involving symbol visibility | |
314 with @option{-fwhole-program}, including symbol privatization, | |
315 discovering local functions, and dismantling comdat groups. It is | |
316 located in @file{ipa-visibility.c} and is described by | |
317 @code{pass_ipa_whole_program_visibility}. | |
318 | |
319 @item IPA profile | |
320 | |
321 The IPA profile pass propagates profiling frequencies across the call | |
322 graph. It is located in @file{ipa-profile.c} and is described by | |
323 @code{pass_ipa_profile}. | |
324 | |
325 @item IPA identical code folding | |
326 | |
327 This is the inter-procedural identical code folding pass. | |
328 The goal of this transformation is to discover functions | |
329 and read-only variables that have exactly the same semantics. It is | |
330 located in @file{ipa-icf.c} and is described by @code{pass_ipa_icf}. | |
331 | |
332 @item IPA devirtualization | |
333 | |
334 This pass performs speculative devirtualization based on the type | |
335 inheritance graph. When a polymorphic call has only one likely target | |
336 in the unit, it is turned into a speculative call. It is located in | |
337 @file{ipa-devirt.c} and is described by @code{pass_ipa_devirt}. | |
338 | |
339 @item IPA constant propagation | |
340 | |
341 The goal of this pass is to discover functions that are always invoked | |
342 with some arguments with the same known constant values and to modify | |
343 the functions accordingly. It can also do partial specialization and | |
344 type-based devirtualization. It is located in @file{ipa-cp.c} and is | |
345 described by @code{pass_ipa_cp}. | |
346 | |
347 @item IPA scalar replacement of aggregates | |
348 | |
349 This pass can replace an aggregate parameter with a set of other parameters | |
350 representing part of the original, turning those passed by reference | |
351 into new ones which pass the value directly. It also removes unused | |
352 function return values and unused function parameters. This pass is | |
353 located in @file{ipa-sra.c} and is described by @code{pass_ipa_sra}. | |
354 | |
355 @item IPA constructor/destructor merge | |
356 | |
357 This pass merges multiple constructors and destructors for static | |
358 objects into single functions. It's only run at LTO time unless the | |
359 target doesn't support constructors and destructors natively. The | |
360 pass is located in @file{ipa.c} and is described by | |
361 @code{pass_ipa_cdtor_merge}. | |
362 | |
363 @item IPA HSA | |
364 | |
365 This pass is part of the GCC support for HSA (Heterogeneous System | |
366 Architecture) accelerators. It is responsible for creation of HSA | |
367 clones and emitting HSAIL instructions for them. It is located in | |
368 @file{ipa-hsa.c} and is described by @code{pass_ipa_hsa}. | |
369 | |
370 @item IPA function summary | |
371 | |
372 This pass provides function analysis for inter-procedural passes. | |
373 It collects estimates of function body size, execution time, and frame | |
374 size for each function. It also estimates information about function | |
375 calls: call statement size, time and how often the parameters change | |
376 for each call. It is located in @file{ipa-fnsummary.c} and is | |
377 described by @code{pass_ipa_fn_summary}. | |
378 | |
379 @item IPA inline | |
380 | |
381 The IPA inline pass handles function inlining with whole-program | |
382 knowledge. Small functions that are candidates for inlining are | |
383 ordered in increasing badness, bounded by unit growth parameters. | |
384 Unreachable functions are removed from the call graph. Functions called | |
385 once and not exported from the unit are inlined. This pass is located in | |
386 @file{ipa-inline.c} and is described by @code{pass_ipa_inline}. | |
387 | |
388 @item IPA pure/const analysis | |
389 | |
390 This pass marks functions as being either const (@code{TREE_READONLY}) or | |
391 pure (@code{DECL_PURE_P}). The per-function information is produced | |
392 by @code{pure_const_generate_summary}, then the global information is computed | |
393 by performing a transitive closure over the call graph. It is located in | |
394 @file{ipa-pure-const.c} and is described by @code{pass_ipa_pure_const}. | |
395 | |
396 @item IPA free function summary | |
397 | |
398 This pass is a regular IPA pass when argument @code{small_p} is false. | |
399 It releases inline function summaries and call summaries. | |
400 It is located in @file{ipa-fnsummary.c} and is described by | |
401 @code{pass_ipa_free_fn_summary}. | |
402 | |
403 @item IPA reference | |
404 | |
405 This pass gathers information about how variables whose scope is | |
406 confined to the compilation unit are used. It is located in | |
407 @file{ipa-reference.c} and is described by @code{pass_ipa_reference}. | |
408 | |
409 @item IPA single use | |
410 | |
411 This pass checks whether variables are used by a single function. | |
412 It is located in @file{ipa.c} and is described by | |
413 @code{pass_ipa_single_use}. | |
414 | |
415 @item IPA comdats | |
416 | |
417 This pass looks for static symbols that are used exclusively | |
418 within one comdat group, and moves them into that comdat group. It is | |
419 located in @file{ipa-comdats.c} and is described by | |
420 @code{pass_ipa_comdats}. | |
421 | |
422 @end itemize | |
423 | |
424 @node Late IPA passes | |
425 @subsection Late IPA passes | |
426 @cindex late IPA passes | |
427 | |
428 Late IPA passes are simple IPA passes executed after | |
429 the regular passes. In WHOPR mode the passes are executed after | |
430 partitioning and thus see just parts of the compiled unit. | |
431 | |
432 @itemize @bullet | |
433 @item Materialize all clones | |
434 | |
435 Once all functions from compilation unit are in memory, produce all clones | |
436 and update all calls. It is located in @file{ipa.c} and is described by | |
437 @code{pass_materialize_all_clones}. | |
438 | |
439 @item IPA points-to analysis | |
440 | |
441 Points-to analysis; this is the same as the points-to-analysis pass | |
442 run with the small IPA passes (@pxref{Small IPA passes}). | |
443 | |
444 @item OpenMP simd clone | |
445 | |
446 This is the OpenMP constructs' SIMD clone pass. It creates the appropriate | |
447 SIMD clones for functions tagged as elemental SIMD functions. | |
448 It is located in @file{omp-simd-clone.c} and is described by | |
449 @code{pass_omp_simd_clone}. | |
450 | |
451 @end itemize | |
452 | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
453 @node Tree SSA passes |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
454 @section Tree SSA passes |
0 | 455 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
456 The following briefly describes the Tree optimization passes that are |
0 | 457 run after gimplification and what source files they are located in. |
458 | |
459 @itemize @bullet | |
460 @item Remove useless statements | |
461 | |
462 This pass is an extremely simple sweep across the gimple code in which | |
463 we identify obviously dead code and remove it. Here we do things like | |
464 simplify @code{if} statements with constant conditions, remove | |
465 exception handling constructs surrounding code that obviously cannot | |
466 throw, remove lexical bindings that contain no variables, and other | |
467 assorted simplistic cleanups. The idea is to get rid of the obvious | |
468 stuff quickly rather than wait until later when it's more work to get | |
469 rid of it. This pass is located in @file{tree-cfg.c} and described by | |
470 @code{pass_remove_useless_stmts}. | |
471 | |
472 @item OpenMP lowering | |
473 | |
474 If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers | |
475 OpenMP constructs into GIMPLE. | |
476 | |
477 Lowering of OpenMP constructs involves creating replacement | |
478 expressions for local variables that have been mapped using data | |
479 sharing clauses, exposing the control flow of most synchronization | |
480 directives and adding region markers to facilitate the creation of the | |
481 control flow graph. The pass is located in @file{omp-low.c} and is | |
482 described by @code{pass_lower_omp}. | |
483 | |
484 @item OpenMP expansion | |
485 | |
486 If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands | |
487 parallel regions into their own functions to be invoked by the thread | |
488 library. The pass is located in @file{omp-low.c} and is described by | |
489 @code{pass_expand_omp}. | |
490 | |
491 @item Lower control flow | |
492 | |
493 This pass flattens @code{if} statements (@code{COND_EXPR}) | |
494 and moves lexical bindings (@code{BIND_EXPR}) out of line. After | |
495 this pass, all @code{if} statements will have exactly two @code{goto} | |
496 statements in its @code{then} and @code{else} arms. Lexical binding | |
497 information for each statement will be found in @code{TREE_BLOCK} rather | |
498 than being inferred from its position under a @code{BIND_EXPR}. This | |
499 pass is found in @file{gimple-low.c} and is described by | |
500 @code{pass_lower_cf}. | |
501 | |
502 @item Lower exception handling control flow | |
503 | |
504 This pass decomposes high-level exception handling constructs | |
505 (@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form | |
506 that explicitly represents the control flow involved. After this | |
507 pass, @code{lookup_stmt_eh_region} will return a non-negative | |
508 number for any statement that may have EH control flow semantics; | |
509 examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} | |
510 for exact semantics. Exact control flow may be extracted from | |
511 @code{foreach_reachable_handler}. The EH region nesting tree is defined | |
512 in @file{except.h} and built in @file{except.c}. The lowering pass | |
513 itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. | |
514 | |
515 @item Build the control flow graph | |
516 | |
517 This pass decomposes a function into basic blocks and creates all of | |
518 the edges that connect them. It is located in @file{tree-cfg.c} and | |
519 is described by @code{pass_build_cfg}. | |
520 | |
521 @item Find all referenced variables | |
522 | |
523 This pass walks the entire function and collects an array of all | |
524 variables referenced in the function, @code{referenced_vars}. The | |
525 index at which a variable is found in the array is used as a UID | |
526 for the variable within this function. This data is needed by the | |
527 SSA rewriting routines. The pass is located in @file{tree-dfa.c} | |
528 and is described by @code{pass_referenced_vars}. | |
529 | |
530 @item Enter static single assignment form | |
531 | |
532 This pass rewrites the function such that it is in SSA form. After | |
533 this pass, all @code{is_gimple_reg} variables will be referenced by | |
534 @code{SSA_NAME}, and all occurrences of other variables will be | |
535 annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have | |
536 been inserted as necessary for each basic block. This pass is | |
537 located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. | |
538 | |
539 @item Warn for uninitialized variables | |
540 | |
541 This pass scans the function for uses of @code{SSA_NAME}s that | |
542 are fed by default definition. For non-parameter variables, such | |
543 uses are uninitialized. The pass is run twice, before and after | |
544 optimization (if turned on). In the first pass we only warn for uses that are | |
545 positively uninitialized; in the second pass we warn for uses that | |
546 are possibly uninitialized. The pass is located in @file{tree-ssa.c} | |
547 and is defined by @code{pass_early_warn_uninitialized} and | |
548 @code{pass_late_warn_uninitialized}. | |
549 | |
550 @item Dead code elimination | |
551 | |
552 This pass scans the function for statements without side effects whose | |
553 result is unused. It does not do memory life analysis, so any value | |
554 that is stored in memory is considered used. The pass is run multiple | |
555 times throughout the optimization process. It is located in | |
556 @file{tree-ssa-dce.c} and is described by @code{pass_dce}. | |
557 | |
558 @item Dominator optimizations | |
559 | |
560 This pass performs trivial dominator-based copy and constant propagation, | |
561 expression simplification, and jump threading. It is run multiple times | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
562 throughout the optimization process. It is located in @file{tree-ssa-dom.c} |
0 | 563 and is described by @code{pass_dominator}. |
564 | |
565 @item Forward propagation of single-use variables | |
566 | |
567 This pass attempts to remove redundant computation by substituting | |
568 variables that are used once into the expression that uses them and | |
569 seeing if the result can be simplified. It is located in | |
570 @file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. | |
571 | |
572 @item Copy Renaming | |
573 | |
574 This pass attempts to change the name of compiler temporaries involved in | |
575 copy operations such that SSA->normal can coalesce the copy away. When compiler | |
576 temporaries are copies of user variables, it also renames the compiler | |
577 temporary to the user variable resulting in better use of user symbols. It is | |
578 located in @file{tree-ssa-copyrename.c} and is described by | |
579 @code{pass_copyrename}. | |
580 | |
581 @item PHI node optimizations | |
582 | |
583 This pass recognizes forms of PHI inputs that can be represented as | |
584 conditional expressions and rewrites them into straight line code. | |
585 It is located in @file{tree-ssa-phiopt.c} and is described by | |
586 @code{pass_phiopt}. | |
587 | |
588 @item May-alias optimization | |
589 | |
590 This pass performs a flow sensitive SSA-based points-to analysis. | |
591 The resulting may-alias, must-alias, and escape analysis information | |
592 is used to promote variables from in-memory addressable objects to | |
593 non-aliased variables that can be renamed into SSA form. We also | |
594 update the @code{VDEF}/@code{VUSE} memory tags for non-renameable | |
595 aggregates so that we get fewer false kills. The pass is located | |
596 in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. | |
597 | |
598 Interprocedural points-to information is located in | |
599 @file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. | |
600 | |
601 @item Profiling | |
602 | |
111 | 603 This pass instruments the function in order to collect runtime block |
0 | 604 and value profiling data. Such data may be fed back into the compiler |
605 on a subsequent run so as to allow optimization based on expected | |
111 | 606 execution frequencies. The pass is located in @file{tree-profile.c} and |
607 is described by @code{pass_ipa_tree_profile}. | |
608 | |
609 @item Static profile estimation | |
610 | |
611 This pass implements series of heuristics to guess propababilities | |
612 of branches. The resulting predictions are turned into edge profile | |
613 by propagating branches across the control flow graphs. | |
614 The pass is located in @file{tree-profile.c} and is described by | |
615 @code{pass_profile}. | |
0 | 616 |
617 @item Lower complex arithmetic | |
618 | |
619 This pass rewrites complex arithmetic operations into their component | |
620 scalar arithmetic operations. The pass is located in @file{tree-complex.c} | |
621 and is described by @code{pass_lower_complex}. | |
622 | |
623 @item Scalar replacement of aggregates | |
624 | |
625 This pass rewrites suitable non-aliased local aggregate variables into | |
626 a set of scalar variables. The resulting scalar variables are | |
627 rewritten into SSA form, which allows subsequent optimization passes | |
628 to do a significantly better job with them. The pass is located in | |
629 @file{tree-sra.c} and is described by @code{pass_sra}. | |
630 | |
631 @item Dead store elimination | |
632 | |
633 This pass eliminates stores to memory that are subsequently overwritten | |
634 by another store, without any intervening loads. The pass is located | |
635 in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. | |
636 | |
637 @item Tail recursion elimination | |
638 | |
639 This pass transforms tail recursion into a loop. It is located in | |
640 @file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. | |
641 | |
642 @item Forward store motion | |
643 | |
644 This pass sinks stores and assignments down the flowgraph closer to their | |
645 use point. The pass is located in @file{tree-ssa-sink.c} and is | |
646 described by @code{pass_sink_code}. | |
647 | |
648 @item Partial redundancy elimination | |
649 | |
650 This pass eliminates partially redundant computations, as well as | |
651 performing load motion. The pass is located in @file{tree-ssa-pre.c} | |
652 and is described by @code{pass_pre}. | |
653 | |
654 Just before partial redundancy elimination, if | |
655 @option{-funsafe-math-optimizations} is on, GCC tries to convert | |
656 divisions to multiplications by the reciprocal. The pass is located | |
657 in @file{tree-ssa-math-opts.c} and is described by | |
658 @code{pass_cse_reciprocal}. | |
659 | |
660 @item Full redundancy elimination | |
661 | |
662 This is a simpler form of PRE that only eliminates redundancies that | |
111 | 663 occur on all paths. It is located in @file{tree-ssa-pre.c} and |
0 | 664 described by @code{pass_fre}. |
665 | |
666 @item Loop optimization | |
667 | |
668 The main driver of the pass is placed in @file{tree-ssa-loop.c} | |
669 and described by @code{pass_loop}. | |
670 | |
671 The optimizations performed by this pass are: | |
672 | |
673 Loop invariant motion. This pass moves only invariants that | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
674 would be hard to handle on RTL level (function calls, operations that expand to |
0 | 675 nontrivial sequences of insns). With @option{-funswitch-loops} it also moves |
676 operands of conditions that are invariant out of the loop, so that we can use | |
677 just trivial invariantness analysis in loop unswitching. The pass also includes | |
678 store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. | |
679 | |
680 Canonical induction variable creation. This pass creates a simple counter | |
681 for number of iterations of the loop and replaces the exit condition of the | |
682 loop using it, in case when a complicated analysis is necessary to determine | |
683 the number of iterations. Later optimizations then may determine the number | |
684 easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. | |
685 | |
686 Induction variable optimizations. This pass performs standard induction | |
687 variable optimizations, including strength reduction, induction variable | |
688 merging and induction variable elimination. The pass is implemented in | |
689 @file{tree-ssa-loop-ivopts.c}. | |
690 | |
691 Loop unswitching. This pass moves the conditional jumps that are invariant | |
692 out of the loops. To achieve this, a duplicate of the loop is created for | |
693 each possible outcome of conditional jump(s). The pass is implemented in | |
111 | 694 @file{tree-ssa-loop-unswitch.c}. |
695 | |
696 Loop splitting. If a loop contains a conditional statement that is | |
697 always true for one part of the iteration space and false for the other | |
698 this pass splits the loop into two, one dealing with one side the other | |
699 only with the other, thereby removing one inner-loop conditional. The | |
700 pass is implemented in @file{tree-ssa-loop-split.c}. | |
0 | 701 |
702 The optimizations also use various utility functions contained in | |
703 @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and | |
704 @file{cfgloopmanip.c}. | |
705 | |
706 Vectorization. This pass transforms loops to operate on vector types | |
707 instead of scalar types. Data parallelism across loop iterations is exploited | |
111 | 708 to group data elements from consecutive iterations into a vector and operate |
709 on them in parallel. Depending on available target support the loop is | |
0 | 710 conceptually unrolled by a factor @code{VF} (vectorization factor), which is |
111 | 711 the number of elements operated upon in parallel in each iteration, and the |
0 | 712 @code{VF} copies of each scalar operation are fused to form a vector operation. |
713 Additional loop transformations such as peeling and versioning may take place | |
111 | 714 to align the number of iterations, and to align the memory accesses in the |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
715 loop. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
716 The pass is implemented in @file{tree-vectorizer.c} (the main driver), |
111 | 717 @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts |
718 and general loop utilities), @file{tree-vect-slp} (loop-aware SLP | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
719 functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. |
0 | 720 Analysis of data references is in @file{tree-data-ref.c}. |
721 | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
722 SLP Vectorization. This pass performs vectorization of straight-line code. The |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
723 pass is implemented in @file{tree-vectorizer.c} (the main driver), |
111 | 724 @file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
725 @file{tree-vect-data-refs.c}. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
726 |
0 | 727 Autoparallelization. This pass splits the loop iteration space to run |
728 into several threads. The pass is implemented in @file{tree-parloops.c}. | |
729 | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
730 Graphite is a loop transformation framework based on the polyhedral |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
731 model. Graphite stands for Gimple Represented as Polyhedra. The |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
732 internals of this infrastructure are documented in |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
733 @w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
734 this representation are implemented in the various @file{graphite-*} |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
735 files. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
736 |
0 | 737 @item Tree level if-conversion for vectorizer |
738 | |
739 This pass applies if-conversion to simple loops to help vectorizer. | |
740 We identify if convertible loops, if-convert statements and merge | |
741 basic blocks in one big block. The idea is to present loop in such | |
742 form so that vectorizer can have one to one mapping between statements | |
111 | 743 and available vector operations. This pass is located in |
67
f6334be47118
update gcc from gcc-4.6-20100522 to gcc-4.6-20110318
nobuyasu <dimolto@cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
744 @file{tree-if-conv.c} and is described by @code{pass_if_conversion}. |
0 | 745 |
746 @item Conditional constant propagation | |
747 | |
748 This pass relaxes a lattice of values in order to identify those | |
749 that must be constant even in the presence of conditional branches. | |
750 The pass is located in @file{tree-ssa-ccp.c} and is described | |
751 by @code{pass_ccp}. | |
752 | |
753 A related pass that works on memory loads and stores, and not just | |
754 register values, is located in @file{tree-ssa-ccp.c} and described by | |
755 @code{pass_store_ccp}. | |
756 | |
757 @item Conditional copy propagation | |
758 | |
759 This is similar to constant propagation but the lattice of values is | |
760 the ``copy-of'' relation. It eliminates redundant copies from the | |
761 code. The pass is located in @file{tree-ssa-copy.c} and described by | |
762 @code{pass_copy_prop}. | |
763 | |
764 A related pass that works on memory copies, and not just register | |
765 copies, is located in @file{tree-ssa-copy.c} and described by | |
766 @code{pass_store_copy_prop}. | |
767 | |
768 @item Value range propagation | |
769 | |
770 This transformation is similar to constant propagation but | |
771 instead of propagating single constant values, it propagates | |
772 known value ranges. The implementation is based on Patterson's | |
773 range propagation algorithm (Accurate Static Branch Prediction by | |
774 Value Range Propagation, J. R. C. Patterson, PLDI '95). In | |
775 contrast to Patterson's algorithm, this implementation does not | |
776 propagate branch probabilities nor it uses more than a single | |
777 range per SSA name. This means that the current implementation | |
778 cannot be used for branch prediction (though adapting it would | |
779 not be difficult). The pass is located in @file{tree-vrp.c} and is | |
780 described by @code{pass_vrp}. | |
781 | |
782 @item Folding built-in functions | |
783 | |
784 This pass simplifies built-in functions, as applicable, with constant | |
785 arguments or with inferable string lengths. It is located in | |
786 @file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. | |
787 | |
788 @item Split critical edges | |
789 | |
790 This pass identifies critical edges and inserts empty basic blocks | |
791 such that the edge is no longer critical. The pass is located in | |
792 @file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. | |
793 | |
794 @item Control dependence dead code elimination | |
795 | |
796 This pass is a stronger form of dead code elimination that can | |
797 eliminate unnecessary control flow statements. It is located | |
798 in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. | |
799 | |
800 @item Tail call elimination | |
801 | |
802 This pass identifies function calls that may be rewritten into | |
803 jumps. No code transformation is actually applied here, but the | |
804 data and control flow problem is solved. The code transformation | |
805 requires target support, and so is delayed until RTL@. In the | |
806 meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. | |
807 The pass is located in @file{tree-tailcall.c} and is described by | |
808 @code{pass_tail_calls}. The RTL transformation is handled by | |
809 @code{fixup_tail_calls} in @file{calls.c}. | |
810 | |
811 @item Warn for function return without value | |
812 | |
813 For non-void functions, this pass locates return statements that do | |
814 not specify a value and issues a warning. Such a statement may have | |
815 been injected by falling off the end of the function. This pass is | |
816 run last so that we have as much time as possible to prove that the | |
817 statement is not reachable. It is located in @file{tree-cfg.c} and | |
818 is described by @code{pass_warn_function_return}. | |
819 | |
820 @item Leave static single assignment form | |
821 | |
822 This pass rewrites the function such that it is in normal form. At | |
823 the same time, we eliminate as many single-use temporaries as possible, | |
824 so the intermediate language is no longer GIMPLE, but GENERIC@. The | |
825 pass is located in @file{tree-outof-ssa.c} and is described by | |
826 @code{pass_del_ssa}. | |
827 | |
828 @item Merge PHI nodes that feed into one another | |
829 | |
830 This is part of the CFG cleanup passes. It attempts to join PHI nodes | |
831 from a forwarder CFG block into another block with PHI nodes. The | |
832 pass is located in @file{tree-cfgcleanup.c} and is described by | |
833 @code{pass_merge_phi}. | |
834 | |
835 @item Return value optimization | |
836 | |
837 If a function always returns the same local variable, and that local | |
838 variable is an aggregate type, then the variable is replaced with the | |
839 return value for the function (i.e., the function's DECL_RESULT). This | |
840 is equivalent to the C++ named return value optimization applied to | |
841 GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by | |
842 @code{pass_nrv}. | |
843 | |
844 @item Return slot optimization | |
845 | |
846 If a function returns a memory object and is called as @code{var = | |
847 foo()}, this pass tries to change the call so that the address of | |
848 @code{var} is sent to the caller to avoid an extra memory copy. This | |
849 pass is located in @code{tree-nrv.c} and is described by | |
850 @code{pass_return_slot}. | |
851 | |
852 @item Optimize calls to @code{__builtin_object_size} | |
853 | |
854 This is a propagation pass similar to CCP that tries to remove calls | |
855 to @code{__builtin_object_size} when the size of the object can be | |
856 computed at compile-time. This pass is located in | |
857 @file{tree-object-size.c} and is described by | |
858 @code{pass_object_sizes}. | |
859 | |
860 @item Loop invariant motion | |
861 | |
862 This pass removes expensive loop-invariant computations out of loops. | |
863 The pass is located in @file{tree-ssa-loop.c} and described by | |
864 @code{pass_lim}. | |
865 | |
866 @item Loop nest optimizations | |
867 | |
868 This is a family of loop transformations that works on loop nests. It | |
869 includes loop interchange, scaling, skewing and reversal and they are | |
870 all geared to the optimization of data locality in array traversals | |
871 and the removal of dependencies that hamper optimizations such as loop | |
872 parallelization and vectorization. The pass is located in | |
873 @file{tree-loop-linear.c} and described by | |
874 @code{pass_linear_transform}. | |
875 | |
876 @item Removal of empty loops | |
877 | |
878 This pass removes loops with no code in them. The pass is located in | |
879 @file{tree-ssa-loop-ivcanon.c} and described by | |
880 @code{pass_empty_loop}. | |
881 | |
882 @item Unrolling of small loops | |
883 | |
884 This pass completely unrolls loops with few iterations. The pass | |
885 is located in @file{tree-ssa-loop-ivcanon.c} and described by | |
886 @code{pass_complete_unroll}. | |
887 | |
888 @item Predictive commoning | |
889 | |
890 This pass makes the code reuse the computations from the previous | |
891 iterations of the loops, especially loads and stores to memory. | |
892 It does so by storing the values of these computations to a bank | |
893 of temporary variables that are rotated at the end of loop. To avoid | |
894 the need for this rotation, the loop is then unrolled and the copies | |
895 of the loop body are rewritten to use the appropriate version of | |
896 the temporary variable. This pass is located in @file{tree-predcom.c} | |
897 and described by @code{pass_predcom}. | |
898 | |
899 @item Array prefetching | |
900 | |
901 This pass issues prefetch instructions for array references inside | |
902 loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and | |
903 described by @code{pass_loop_prefetch}. | |
904 | |
905 @item Reassociation | |
906 | |
907 This pass rewrites arithmetic expressions to enable optimizations that | |
908 operate on them, like redundancy elimination and vectorization. The | |
909 pass is located in @file{tree-ssa-reassoc.c} and described by | |
910 @code{pass_reassoc}. | |
911 | |
912 @item Optimization of @code{stdarg} functions | |
913 | |
914 This pass tries to avoid the saving of register arguments into the | |
915 stack on entry to @code{stdarg} functions. If the function doesn't | |
916 use any @code{va_start} macros, no registers need to be saved. If | |
917 @code{va_start} macros are used, the @code{va_list} variables don't | |
918 escape the function, it is only necessary to save registers that will | |
919 be used in @code{va_arg} macros. For instance, if @code{va_arg} is | |
920 only used with integral types in the function, floating point | |
921 registers don't need to be saved. This pass is located in | |
922 @code{tree-stdarg.c} and described by @code{pass_stdarg}. | |
923 | |
924 @end itemize | |
925 | |
926 @node RTL passes | |
927 @section RTL passes | |
928 | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
929 The following briefly describes the RTL generation and optimization |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
930 passes that are run after the Tree optimization passes. |
0 | 931 |
932 @itemize @bullet | |
933 @item RTL generation | |
934 | |
935 @c Avoiding overfull is tricky here. | |
936 The source files for RTL generation include | |
937 @file{stmt.c}, | |
938 @file{calls.c}, | |
939 @file{expr.c}, | |
940 @file{explow.c}, | |
941 @file{expmed.c}, | |
942 @file{function.c}, | |
943 @file{optabs.c} | |
944 and @file{emit-rtl.c}. | |
945 Also, the file | |
946 @file{insn-emit.c}, generated from the machine description by the | |
947 program @code{genemit}, is used in this pass. The header file | |
948 @file{expr.h} is used for communication within this pass. | |
949 | |
950 @findex genflags | |
951 @findex gencodes | |
952 The header files @file{insn-flags.h} and @file{insn-codes.h}, | |
953 generated from the machine description by the programs @code{genflags} | |
954 and @code{gencodes}, tell this pass which standard names are available | |
955 for use and which patterns correspond to them. | |
956 | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
957 @item Generation of exception landing pads |
0 | 958 |
959 This pass generates the glue that handles communication between the | |
960 exception handling library routines and the exception handlers within | |
961 the function. Entry points in the function that are invoked by the | |
962 exception handling library are called @dfn{landing pads}. The code | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
963 for this pass is located in @file{except.c}. |
0 | 964 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
965 @item Control flow graph cleanup |
0 | 966 |
967 This pass removes unreachable code, simplifies jumps to next, jumps to | |
968 jump, jumps across jumps, etc. The pass is run multiple times. | |
969 For historical reasons, it is occasionally referred to as the ``jump | |
970 optimization pass''. The bulk of the code for this pass is in | |
971 @file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} | |
972 and @file{jump.c}. | |
973 | |
974 @item Forward propagation of single-def values | |
975 | |
976 This pass attempts to remove redundant computation by substituting | |
977 variables that come from a single definition, and | |
978 seeing if the result can be simplified. It performs copy propagation | |
979 and addressing mode selection. The pass is run twice, with values | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
980 being propagated into loops only on the second run. The code is |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
981 located in @file{fwprop.c}. |
0 | 982 |
983 @item Common subexpression elimination | |
984 | |
985 This pass removes redundant computation within basic blocks, and | |
986 optimizes addressing modes based on cost. The pass is run twice. | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
987 The code for this pass is located in @file{cse.c}. |
0 | 988 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
989 @item Global common subexpression elimination |
0 | 990 |
991 This pass performs two | |
992 different types of GCSE depending on whether you are optimizing for | |
993 size or not (LCM based GCSE tends to increase code size for a gain in | |
994 speed, while Morel-Renvoise based GCSE does not). | |
995 When optimizing for size, GCSE is done using Morel-Renvoise Partial | |
996 Redundancy Elimination, with the exception that it does not try to move | |
997 invariants out of loops---that is left to the loop optimization pass. | |
998 If MR PRE GCSE is done, code hoisting (aka unification) is also done, as | |
999 well as load motion. | |
1000 If you are optimizing for speed, LCM (lazy code motion) based GCSE is | |
1001 done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM | |
1002 based GCSE also does loop invariant code motion. We also perform load | |
1003 and store motion when optimizing for speed. | |
1004 Regardless of which type of GCSE is used, the GCSE pass also performs | |
1005 global constant and copy propagation. | |
1006 The source file for this pass is @file{gcse.c}, and the LCM routines | |
1007 are in @file{lcm.c}. | |
1008 | |
1009 @item Loop optimization | |
1010 | |
1011 This pass performs several loop related optimizations. | |
1012 The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain | |
1013 generic loop analysis and manipulation code. Initialization and finalization | |
1014 of loop structures is handled by @file{loop-init.c}. | |
1015 A loop invariant motion pass is implemented in @file{loop-invariant.c}. | |
111 | 1016 Basic block level optimizations---unrolling, and peeling loops--- |
1017 are implemented in @file{loop-unroll.c}. | |
0 | 1018 Replacing of the exit condition of loops by special machine-dependent |
1019 instructions is handled by @file{loop-doloop.c}. | |
1020 | |
1021 @item Jump bypassing | |
1022 | |
1023 This pass is an aggressive form of GCSE that transforms the control | |
1024 flow graph of a function by propagating constants into conditional | |
1025 branch instructions. The source file for this pass is @file{gcse.c}. | |
1026 | |
1027 @item If conversion | |
1028 | |
1029 This pass attempts to replace conditional branches and surrounding | |
1030 assignments with arithmetic, boolean value producing comparison | |
1031 instructions, and conditional move instructions. In the very last | |
111 | 1032 invocation after reload/LRA, it will generate predicated instructions |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1033 when supported by the target. The code is located in @file{ifcvt.c}. |
0 | 1034 |
1035 @item Web construction | |
1036 | |
1037 This pass splits independent uses of each pseudo-register. This can | |
1038 improve effect of the other transformation, such as CSE or register | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1039 allocation. The code for this pass is located in @file{web.c}. |
0 | 1040 |
1041 @item Instruction combination | |
1042 | |
1043 This pass attempts to combine groups of two or three instructions that | |
1044 are related by data flow into single instructions. It combines the | |
1045 RTL expressions for the instructions by substitution, simplifies the | |
1046 result using algebra, and then attempts to match the result against | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1047 the machine description. The code is located in @file{combine.c}. |
0 | 1048 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1049 @item Mode switching optimization |
0 | 1050 |
1051 This pass looks for instructions that require the processor to be in a | |
1052 specific ``mode'' and minimizes the number of mode changes required to | |
1053 satisfy all users. What these modes are, and what they apply to are | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1054 completely target-specific. The code for this pass is located in |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1055 @file{mode-switching.c}. |
0 | 1056 |
1057 @cindex modulo scheduling | |
1058 @cindex sms, swing, software pipelining | |
1059 @item Modulo scheduling | |
1060 | |
1061 This pass looks at innermost loops and reorders their instructions | |
1062 by overlapping different iterations. Modulo scheduling is performed | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1063 immediately before instruction scheduling. The code for this pass is |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1064 located in @file{modulo-sched.c}. |
0 | 1065 |
1066 @item Instruction scheduling | |
1067 | |
1068 This pass looks for instructions whose output will not be available by | |
1069 the time that it is used in subsequent instructions. Memory loads and | |
1070 floating point instructions often have this behavior on RISC machines. | |
1071 It re-orders instructions within a basic block to try to separate the | |
1072 definition and use of items that otherwise would cause pipeline | |
1073 stalls. This pass is performed twice, before and after register | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1074 allocation. The code for this pass is located in @file{haifa-sched.c}, |
0 | 1075 @file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and |
1076 @file{sched-vis.c}. | |
1077 | |
1078 @item Register allocation | |
1079 | |
1080 These passes make sure that all occurrences of pseudo registers are | |
1081 eliminated, either by allocating them to a hard register, replacing | |
1082 them by an equivalent expression (e.g.@: a constant) or by placing | |
1083 them on the stack. This is done in several subpasses: | |
1084 | |
1085 @itemize @bullet | |
1086 @item | |
1087 The integrated register allocator (@acronym{IRA}). It is called | |
1088 integrated because coalescing, register live range splitting, and hard | |
1089 register preferencing are done on-the-fly during coloring. It also | |
111 | 1090 has better integration with the reload/LRA pass. Pseudo-registers spilled |
1091 by the allocator or the reload/LRA have still a chance to get | |
1092 hard-registers if the reload/LRA evicts some pseudo-registers from | |
0 | 1093 hard-registers. The allocator helps to choose better pseudos for |
1094 spilling based on their live ranges and to coalesce stack slots | |
1095 allocated for the spilled pseudo-registers. IRA is a regional | |
1096 register allocator which is transformed into Chaitin-Briggs allocator | |
1097 if there is one region. By default, IRA chooses regions using | |
1098 register pressure but the user can force it to use one region or | |
1099 regions corresponding to all loops. | |
1100 | |
1101 Source files of the allocator are @file{ira.c}, @file{ira-build.c}, | |
1102 @file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c}, | |
1103 @file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h} | |
1104 and @file{ira-int.h} used for the communication between the allocator | |
1105 and the rest of the compiler and between the IRA files. | |
1106 | |
1107 @cindex reloading | |
1108 @item | |
1109 Reloading. This pass renumbers pseudo registers with the hardware | |
1110 registers numbers they were allocated. Pseudo registers that did not | |
1111 get hard registers are replaced with stack slots. Then it finds | |
1112 instructions that are invalid because a value has failed to end up in | |
1113 a register, or has ended up in a register of the wrong kind. It fixes | |
1114 up these instructions by reloading the problematical values | |
1115 temporarily into registers. Additional instructions are generated to | |
1116 do the copying. | |
1117 | |
1118 The reload pass also optionally eliminates the frame pointer and inserts | |
1119 instructions to save and restore call-clobbered registers around calls. | |
1120 | |
1121 Source files are @file{reload.c} and @file{reload1.c}, plus the header | |
1122 @file{reload.h} used for communication between them. | |
111 | 1123 |
1124 @cindex Local Register Allocator (LRA) | |
1125 @item | |
1126 This pass is a modern replacement of the reload pass. Source files | |
1127 are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c}, | |
1128 @file{lra-constraints.c}, @file{lra-eliminations.c}, | |
1129 @file{lra-lives.c}, @file{lra-remat.c}, @file{lra-spills.c}, the | |
1130 header @file{lra-int.h} used for communication between them, and the | |
1131 header @file{lra.h} used for communication between LRA and the rest of | |
1132 compiler. | |
1133 | |
1134 Unlike the reload pass, intermediate LRA decisions are reflected in | |
1135 RTL as much as possible. This reduces the number of target-dependent | |
1136 macros and hooks, leaving instruction constraints as the primary | |
1137 source of control. | |
1138 | |
1139 LRA is run on targets for which TARGET_LRA_P returns true. | |
0 | 1140 @end itemize |
1141 | |
1142 @item Basic block reordering | |
1143 | |
1144 This pass implements profile guided code positioning. If profile | |
1145 information is not available, various types of static analysis are | |
1146 performed to make the predictions normally coming from the profile | |
1147 feedback (IE execution frequency, branch probability, etc). It is | |
1148 implemented in the file @file{bb-reorder.c}, and the various | |
1149 prediction routines are in @file{predict.c}. | |
1150 | |
1151 @item Variable tracking | |
1152 | |
1153 This pass computes where the variables are stored at each | |
1154 position in code and generates notes describing the variable locations | |
1155 to RTL code. The location lists are then generated according to these | |
1156 notes to debug information if the debugging information format supports | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1157 location lists. The code is located in @file{var-tracking.c}. |
0 | 1158 |
1159 @item Delayed branch scheduling | |
1160 | |
1161 This optional pass attempts to find instructions that can go into the | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1162 delay slots of other instructions, usually jumps and calls. The code |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1163 for this pass is located in @file{reorg.c}. |
0 | 1164 |
1165 @item Branch shortening | |
1166 | |
1167 On many RISC machines, branch instructions have a limited range. | |
1168 Thus, longer sequences of instructions must be used for long branches. | |
1169 In this pass, the compiler figures out what how far each instruction | |
1170 will be from each other instruction, and therefore whether the usual | |
1171 instructions, or the longer sequences, must be used for each branch. | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1172 The code for this pass is located in @file{final.c}. |
0 | 1173 |
1174 @item Register-to-stack conversion | |
1175 | |
1176 Conversion from usage of some hard registers to usage of a register | |
1177 stack may be done at this point. Currently, this is supported only | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1178 for the floating-point registers of the Intel 80387 coprocessor. The |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
1179 code for this pass is located in @file{reg-stack.c}. |
0 | 1180 |
1181 @item Final | |
1182 | |
1183 This pass outputs the assembler code for the function. The source files | |
1184 are @file{final.c} plus @file{insn-output.c}; the latter is generated | |
1185 automatically from the machine description by the tool @file{genoutput}. | |
1186 The header file @file{conditions.h} is used for communication between | |
111 | 1187 these files. |
0 | 1188 |
1189 @item Debugging information output | |
1190 | |
1191 This is run after final because it must output the stack slot offsets | |
1192 for pseudo registers that did not get hard registers. Source files | |
131 | 1193 are @file{dbxout.c} for DBX symbol table format, @file{dwarfout.c} for |
1194 DWARF symbol table format, files @file{dwarf2out.c} and @file{dwarf2asm.c} | |
1195 for DWARF2 symbol table format, and @file{vmsdbgout.c} for VMS debug | |
1196 symbol table format. | |
0 | 1197 |
1198 @end itemize | |
111 | 1199 |
1200 @node Optimization info | |
1201 @section Optimization info | |
1202 @include optinfo.texi |