0
|
1 @c markers: CROSSREF BUG TODO
|
|
2
|
|
3 @c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
|
|
4 @c 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software
|
|
5 @c Foundation, Inc.
|
|
6 @c This is part of the GCC manual.
|
|
7 @c For copying conditions, see the file gcc.texi.
|
|
8
|
|
9 @node Passes
|
|
10 @chapter Passes and Files of the Compiler
|
|
11 @cindex passes and files of the compiler
|
|
12 @cindex files and passes of the compiler
|
|
13 @cindex compiler passes and files
|
|
14
|
|
15 This chapter is dedicated to giving an overview of the optimization and
|
|
16 code generation passes of the compiler. In the process, it describes
|
|
17 some of the language front end interface, though this description is no
|
|
18 where near complete.
|
|
19
|
|
20 @menu
|
|
21 * Parsing pass:: The language front end turns text into bits.
|
|
22 * Gimplification pass:: The bits are turned into something we can optimize.
|
|
23 * Pass manager:: Sequencing the optimization passes.
|
|
24 * Tree-SSA passes:: Optimizations on a high-level representation.
|
|
25 * RTL passes:: Optimizations on a low-level representation.
|
|
26 @end menu
|
|
27
|
|
28 @node Parsing pass
|
|
29 @section Parsing pass
|
|
30 @cindex GENERIC
|
|
31 @findex lang_hooks.parse_file
|
|
32 The language front end is invoked only once, via
|
|
33 @code{lang_hooks.parse_file}, to parse the entire input. The language
|
|
34 front end may use any intermediate language representation deemed
|
|
35 appropriate. The C front end uses GENERIC trees (CROSSREF), plus
|
|
36 a double handful of language specific tree codes defined in
|
|
37 @file{c-common.def}. The Fortran front end uses a completely different
|
|
38 private representation.
|
|
39
|
|
40 @cindex GIMPLE
|
|
41 @cindex gimplification
|
|
42 @cindex gimplifier
|
|
43 @cindex language-independent intermediate representation
|
|
44 @cindex intermediate representation lowering
|
|
45 @cindex lowering, language-dependent intermediate representation
|
|
46 At some point the front end must translate the representation used in the
|
|
47 front end to a representation understood by the language-independent
|
|
48 portions of the compiler. Current practice takes one of two forms.
|
|
49 The C front end manually invokes the gimplifier (CROSSREF) on each function,
|
|
50 and uses the gimplifier callbacks to convert the language-specific tree
|
|
51 nodes directly to GIMPLE (CROSSREF) before passing the function off to
|
|
52 be compiled.
|
|
53 The Fortran front end converts from a private representation to GENERIC,
|
|
54 which is later lowered to GIMPLE when the function is compiled. Which
|
|
55 route to choose probably depends on how well GENERIC (plus extensions)
|
|
56 can be made to match up with the source language and necessary parsing
|
|
57 data structures.
|
|
58
|
|
59 BUG: Gimplification must occur before nested function lowering,
|
|
60 and nested function lowering must be done by the front end before
|
|
61 passing the data off to cgraph.
|
|
62
|
|
63 TODO: Cgraph should control nested function lowering. It would
|
|
64 only be invoked when it is certain that the outer-most function
|
|
65 is used.
|
|
66
|
|
67 TODO: Cgraph needs a gimplify_function callback. It should be
|
|
68 invoked when (1) it is certain that the function is used, (2)
|
|
69 warning flags specified by the user require some amount of
|
|
70 compilation in order to honor, (3) the language indicates that
|
|
71 semantic analysis is not complete until gimplification occurs.
|
|
72 Hum@dots{} this sounds overly complicated. Perhaps we should just
|
|
73 have the front end gimplify always; in most cases it's only one
|
|
74 function call.
|
|
75
|
|
76 The front end needs to pass all function definitions and top level
|
|
77 declarations off to the middle-end so that they can be compiled and
|
|
78 emitted to the object file. For a simple procedural language, it is
|
|
79 usually most convenient to do this as each top level declaration or
|
|
80 definition is seen. There is also a distinction to be made between
|
|
81 generating functional code and generating complete debug information.
|
|
82 The only thing that is absolutely required for functional code is that
|
|
83 function and data @emph{definitions} be passed to the middle-end. For
|
|
84 complete debug information, function, data and type declarations
|
|
85 should all be passed as well.
|
|
86
|
|
87 @findex rest_of_decl_compilation
|
|
88 @findex rest_of_type_compilation
|
|
89 @findex cgraph_finalize_function
|
|
90 In any case, the front end needs each complete top-level function or
|
|
91 data declaration, and each data definition should be passed to
|
|
92 @code{rest_of_decl_compilation}. Each complete type definition should
|
|
93 be passed to @code{rest_of_type_compilation}. Each function definition
|
|
94 should be passed to @code{cgraph_finalize_function}.
|
|
95
|
|
96 TODO: I know rest_of_compilation currently has all sorts of
|
|
97 rtl-generation semantics. I plan to move all code generation
|
|
98 bits (both tree and rtl) to compile_function. Should we hide
|
|
99 cgraph from the front ends and move back to rest_of_compilation
|
|
100 as the official interface? Possibly we should rename all three
|
|
101 interfaces such that the names match in some meaningful way and
|
|
102 that is more descriptive than "rest_of".
|
|
103
|
|
104 The middle-end will, at its option, emit the function and data
|
|
105 definitions immediately or queue them for later processing.
|
|
106
|
|
107 @node Gimplification pass
|
|
108 @section Gimplification pass
|
|
109
|
|
110 @cindex gimplification
|
|
111 @cindex GIMPLE
|
|
112 @dfn{Gimplification} is a whimsical term for the process of converting
|
|
113 the intermediate representation of a function into the GIMPLE language
|
|
114 (CROSSREF). The term stuck, and so words like ``gimplification'',
|
|
115 ``gimplify'', ``gimplifier'' and the like are sprinkled throughout this
|
|
116 section of code.
|
|
117
|
|
118 @cindex GENERIC
|
|
119 While a front end may certainly choose to generate GIMPLE directly if
|
|
120 it chooses, this can be a moderately complex process unless the
|
|
121 intermediate language used by the front end is already fairly simple.
|
|
122 Usually it is easier to generate GENERIC trees plus extensions
|
|
123 and let the language-independent gimplifier do most of the work.
|
|
124
|
|
125 @findex gimplify_function_tree
|
|
126 @findex gimplify_expr
|
|
127 @findex lang_hooks.gimplify_expr
|
|
128 The main entry point to this pass is @code{gimplify_function_tree}
|
|
129 located in @file{gimplify.c}. From here we process the entire
|
|
130 function gimplifying each statement in turn. The main workhorse
|
|
131 for this pass is @code{gimplify_expr}. Approximately everything
|
|
132 passes through here at least once, and it is from here that we
|
|
133 invoke the @code{lang_hooks.gimplify_expr} callback.
|
|
134
|
|
135 The callback should examine the expression in question and return
|
|
136 @code{GS_UNHANDLED} if the expression is not a language specific
|
|
137 construct that requires attention. Otherwise it should alter the
|
|
138 expression in some way to such that forward progress is made toward
|
|
139 producing valid GIMPLE@. If the callback is certain that the
|
|
140 transformation is complete and the expression is valid GIMPLE, it
|
|
141 should return @code{GS_ALL_DONE}. Otherwise it should return
|
|
142 @code{GS_OK}, which will cause the expression to be processed again.
|
|
143 If the callback encounters an error during the transformation (because
|
|
144 the front end is relying on the gimplification process to finish
|
|
145 semantic checks), it should return @code{GS_ERROR}.
|
|
146
|
|
147 @node Pass manager
|
|
148 @section Pass manager
|
|
149
|
|
150 The pass manager is located in @file{passes.c}, @file{tree-optimize.c}
|
|
151 and @file{tree-pass.h}.
|
|
152 Its job is to run all of the individual passes in the correct order,
|
|
153 and take care of standard bookkeeping that applies to every pass.
|
|
154
|
|
155 The theory of operation is that each pass defines a structure that
|
|
156 represents everything we need to know about that pass---when it
|
|
157 should be run, how it should be run, what intermediate language
|
|
158 form or on-the-side data structures it needs. We register the pass
|
|
159 to be run in some particular order, and the pass manager arranges
|
|
160 for everything to happen in the correct order.
|
|
161
|
|
162 The actuality doesn't completely live up to the theory at present.
|
|
163 Command-line switches and @code{timevar_id_t} enumerations must still
|
|
164 be defined elsewhere. The pass manager validates constraints but does
|
|
165 not attempt to (re-)generate data structures or lower intermediate
|
|
166 language form based on the requirements of the next pass. Nevertheless,
|
|
167 what is present is useful, and a far sight better than nothing at all.
|
|
168
|
|
169 Each pass may have its own dump file (for GCC debugging purposes).
|
|
170 Passes without any names, or with a name starting with a star, do not
|
|
171 dump anything.
|
|
172
|
|
173 TODO: describe the global variables set up by the pass manager,
|
|
174 and a brief description of how a new pass should use it.
|
|
175 I need to look at what info rtl passes use first@enddots{}
|
|
176
|
|
177 @node Tree-SSA passes
|
|
178 @section Tree-SSA passes
|
|
179
|
|
180 The following briefly describes the tree optimization passes that are
|
|
181 run after gimplification and what source files they are located in.
|
|
182
|
|
183 @itemize @bullet
|
|
184 @item Remove useless statements
|
|
185
|
|
186 This pass is an extremely simple sweep across the gimple code in which
|
|
187 we identify obviously dead code and remove it. Here we do things like
|
|
188 simplify @code{if} statements with constant conditions, remove
|
|
189 exception handling constructs surrounding code that obviously cannot
|
|
190 throw, remove lexical bindings that contain no variables, and other
|
|
191 assorted simplistic cleanups. The idea is to get rid of the obvious
|
|
192 stuff quickly rather than wait until later when it's more work to get
|
|
193 rid of it. This pass is located in @file{tree-cfg.c} and described by
|
|
194 @code{pass_remove_useless_stmts}.
|
|
195
|
|
196 @item Mudflap declaration registration
|
|
197
|
|
198 If mudflap (@pxref{Optimize Options,,-fmudflap -fmudflapth
|
|
199 -fmudflapir,gcc,Using the GNU Compiler Collection (GCC)}) is
|
|
200 enabled, we generate code to register some variable declarations with
|
|
201 the mudflap runtime. Specifically, the runtime tracks the lifetimes of
|
|
202 those variable declarations that have their addresses taken, or whose
|
|
203 bounds are unknown at compile time (@code{extern}). This pass generates
|
|
204 new exception handling constructs (@code{try}/@code{finally}), and so
|
|
205 must run before those are lowered. In addition, the pass enqueues
|
|
206 declarations of static variables whose lifetimes extend to the entire
|
|
207 program. The pass is located in @file{tree-mudflap.c} and is described
|
|
208 by @code{pass_mudflap_1}.
|
|
209
|
|
210 @item OpenMP lowering
|
|
211
|
|
212 If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers
|
|
213 OpenMP constructs into GIMPLE.
|
|
214
|
|
215 Lowering of OpenMP constructs involves creating replacement
|
|
216 expressions for local variables that have been mapped using data
|
|
217 sharing clauses, exposing the control flow of most synchronization
|
|
218 directives and adding region markers to facilitate the creation of the
|
|
219 control flow graph. The pass is located in @file{omp-low.c} and is
|
|
220 described by @code{pass_lower_omp}.
|
|
221
|
|
222 @item OpenMP expansion
|
|
223
|
|
224 If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
|
|
225 parallel regions into their own functions to be invoked by the thread
|
|
226 library. The pass is located in @file{omp-low.c} and is described by
|
|
227 @code{pass_expand_omp}.
|
|
228
|
|
229 @item Lower control flow
|
|
230
|
|
231 This pass flattens @code{if} statements (@code{COND_EXPR})
|
|
232 and moves lexical bindings (@code{BIND_EXPR}) out of line. After
|
|
233 this pass, all @code{if} statements will have exactly two @code{goto}
|
|
234 statements in its @code{then} and @code{else} arms. Lexical binding
|
|
235 information for each statement will be found in @code{TREE_BLOCK} rather
|
|
236 than being inferred from its position under a @code{BIND_EXPR}. This
|
|
237 pass is found in @file{gimple-low.c} and is described by
|
|
238 @code{pass_lower_cf}.
|
|
239
|
|
240 @item Lower exception handling control flow
|
|
241
|
|
242 This pass decomposes high-level exception handling constructs
|
|
243 (@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form
|
|
244 that explicitly represents the control flow involved. After this
|
|
245 pass, @code{lookup_stmt_eh_region} will return a non-negative
|
|
246 number for any statement that may have EH control flow semantics;
|
|
247 examine @code{tree_can_throw_internal} or @code{tree_can_throw_external}
|
|
248 for exact semantics. Exact control flow may be extracted from
|
|
249 @code{foreach_reachable_handler}. The EH region nesting tree is defined
|
|
250 in @file{except.h} and built in @file{except.c}. The lowering pass
|
|
251 itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}.
|
|
252
|
|
253 @item Build the control flow graph
|
|
254
|
|
255 This pass decomposes a function into basic blocks and creates all of
|
|
256 the edges that connect them. It is located in @file{tree-cfg.c} and
|
|
257 is described by @code{pass_build_cfg}.
|
|
258
|
|
259 @item Find all referenced variables
|
|
260
|
|
261 This pass walks the entire function and collects an array of all
|
|
262 variables referenced in the function, @code{referenced_vars}. The
|
|
263 index at which a variable is found in the array is used as a UID
|
|
264 for the variable within this function. This data is needed by the
|
|
265 SSA rewriting routines. The pass is located in @file{tree-dfa.c}
|
|
266 and is described by @code{pass_referenced_vars}.
|
|
267
|
|
268 @item Enter static single assignment form
|
|
269
|
|
270 This pass rewrites the function such that it is in SSA form. After
|
|
271 this pass, all @code{is_gimple_reg} variables will be referenced by
|
|
272 @code{SSA_NAME}, and all occurrences of other variables will be
|
|
273 annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have
|
|
274 been inserted as necessary for each basic block. This pass is
|
|
275 located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}.
|
|
276
|
|
277 @item Warn for uninitialized variables
|
|
278
|
|
279 This pass scans the function for uses of @code{SSA_NAME}s that
|
|
280 are fed by default definition. For non-parameter variables, such
|
|
281 uses are uninitialized. The pass is run twice, before and after
|
|
282 optimization (if turned on). In the first pass we only warn for uses that are
|
|
283 positively uninitialized; in the second pass we warn for uses that
|
|
284 are possibly uninitialized. The pass is located in @file{tree-ssa.c}
|
|
285 and is defined by @code{pass_early_warn_uninitialized} and
|
|
286 @code{pass_late_warn_uninitialized}.
|
|
287
|
|
288 @item Dead code elimination
|
|
289
|
|
290 This pass scans the function for statements without side effects whose
|
|
291 result is unused. It does not do memory life analysis, so any value
|
|
292 that is stored in memory is considered used. The pass is run multiple
|
|
293 times throughout the optimization process. It is located in
|
|
294 @file{tree-ssa-dce.c} and is described by @code{pass_dce}.
|
|
295
|
|
296 @item Dominator optimizations
|
|
297
|
|
298 This pass performs trivial dominator-based copy and constant propagation,
|
|
299 expression simplification, and jump threading. It is run multiple times
|
|
300 throughout the optimization process. It it located in @file{tree-ssa-dom.c}
|
|
301 and is described by @code{pass_dominator}.
|
|
302
|
|
303 @item Forward propagation of single-use variables
|
|
304
|
|
305 This pass attempts to remove redundant computation by substituting
|
|
306 variables that are used once into the expression that uses them and
|
|
307 seeing if the result can be simplified. It is located in
|
|
308 @file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}.
|
|
309
|
|
310 @item Copy Renaming
|
|
311
|
|
312 This pass attempts to change the name of compiler temporaries involved in
|
|
313 copy operations such that SSA->normal can coalesce the copy away. When compiler
|
|
314 temporaries are copies of user variables, it also renames the compiler
|
|
315 temporary to the user variable resulting in better use of user symbols. It is
|
|
316 located in @file{tree-ssa-copyrename.c} and is described by
|
|
317 @code{pass_copyrename}.
|
|
318
|
|
319 @item PHI node optimizations
|
|
320
|
|
321 This pass recognizes forms of PHI inputs that can be represented as
|
|
322 conditional expressions and rewrites them into straight line code.
|
|
323 It is located in @file{tree-ssa-phiopt.c} and is described by
|
|
324 @code{pass_phiopt}.
|
|
325
|
|
326 @item May-alias optimization
|
|
327
|
|
328 This pass performs a flow sensitive SSA-based points-to analysis.
|
|
329 The resulting may-alias, must-alias, and escape analysis information
|
|
330 is used to promote variables from in-memory addressable objects to
|
|
331 non-aliased variables that can be renamed into SSA form. We also
|
|
332 update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
|
|
333 aggregates so that we get fewer false kills. The pass is located
|
|
334 in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}.
|
|
335
|
|
336 Interprocedural points-to information is located in
|
|
337 @file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}.
|
|
338
|
|
339 @item Profiling
|
|
340
|
|
341 This pass rewrites the function in order to collect runtime block
|
|
342 and value profiling data. Such data may be fed back into the compiler
|
|
343 on a subsequent run so as to allow optimization based on expected
|
|
344 execution frequencies. The pass is located in @file{predict.c} and
|
|
345 is described by @code{pass_profile}.
|
|
346
|
|
347 @item Lower complex arithmetic
|
|
348
|
|
349 This pass rewrites complex arithmetic operations into their component
|
|
350 scalar arithmetic operations. The pass is located in @file{tree-complex.c}
|
|
351 and is described by @code{pass_lower_complex}.
|
|
352
|
|
353 @item Scalar replacement of aggregates
|
|
354
|
|
355 This pass rewrites suitable non-aliased local aggregate variables into
|
|
356 a set of scalar variables. The resulting scalar variables are
|
|
357 rewritten into SSA form, which allows subsequent optimization passes
|
|
358 to do a significantly better job with them. The pass is located in
|
|
359 @file{tree-sra.c} and is described by @code{pass_sra}.
|
|
360
|
|
361 @item Dead store elimination
|
|
362
|
|
363 This pass eliminates stores to memory that are subsequently overwritten
|
|
364 by another store, without any intervening loads. The pass is located
|
|
365 in @file{tree-ssa-dse.c} and is described by @code{pass_dse}.
|
|
366
|
|
367 @item Tail recursion elimination
|
|
368
|
|
369 This pass transforms tail recursion into a loop. It is located in
|
|
370 @file{tree-tailcall.c} and is described by @code{pass_tail_recursion}.
|
|
371
|
|
372 @item Forward store motion
|
|
373
|
|
374 This pass sinks stores and assignments down the flowgraph closer to their
|
|
375 use point. The pass is located in @file{tree-ssa-sink.c} and is
|
|
376 described by @code{pass_sink_code}.
|
|
377
|
|
378 @item Partial redundancy elimination
|
|
379
|
|
380 This pass eliminates partially redundant computations, as well as
|
|
381 performing load motion. The pass is located in @file{tree-ssa-pre.c}
|
|
382 and is described by @code{pass_pre}.
|
|
383
|
|
384 Just before partial redundancy elimination, if
|
|
385 @option{-funsafe-math-optimizations} is on, GCC tries to convert
|
|
386 divisions to multiplications by the reciprocal. The pass is located
|
|
387 in @file{tree-ssa-math-opts.c} and is described by
|
|
388 @code{pass_cse_reciprocal}.
|
|
389
|
|
390 @item Full redundancy elimination
|
|
391
|
|
392 This is a simpler form of PRE that only eliminates redundancies that
|
|
393 occur an all paths. It is located in @file{tree-ssa-pre.c} and
|
|
394 described by @code{pass_fre}.
|
|
395
|
|
396 @item Loop optimization
|
|
397
|
|
398 The main driver of the pass is placed in @file{tree-ssa-loop.c}
|
|
399 and described by @code{pass_loop}.
|
|
400
|
|
401 The optimizations performed by this pass are:
|
|
402
|
|
403 Loop invariant motion. This pass moves only invariants that
|
|
404 would be hard to handle on rtl level (function calls, operations that expand to
|
|
405 nontrivial sequences of insns). With @option{-funswitch-loops} it also moves
|
|
406 operands of conditions that are invariant out of the loop, so that we can use
|
|
407 just trivial invariantness analysis in loop unswitching. The pass also includes
|
|
408 store motion. The pass is implemented in @file{tree-ssa-loop-im.c}.
|
|
409
|
|
410 Canonical induction variable creation. This pass creates a simple counter
|
|
411 for number of iterations of the loop and replaces the exit condition of the
|
|
412 loop using it, in case when a complicated analysis is necessary to determine
|
|
413 the number of iterations. Later optimizations then may determine the number
|
|
414 easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}.
|
|
415
|
|
416 Induction variable optimizations. This pass performs standard induction
|
|
417 variable optimizations, including strength reduction, induction variable
|
|
418 merging and induction variable elimination. The pass is implemented in
|
|
419 @file{tree-ssa-loop-ivopts.c}.
|
|
420
|
|
421 Loop unswitching. This pass moves the conditional jumps that are invariant
|
|
422 out of the loops. To achieve this, a duplicate of the loop is created for
|
|
423 each possible outcome of conditional jump(s). The pass is implemented in
|
|
424 @file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the
|
|
425 rtl-level loop unswitching in @file{loop-unswitch.c}, but currently
|
|
426 the rtl-level pass is not completely redundant yet due to deficiencies
|
|
427 in tree level alias analysis.
|
|
428
|
|
429 The optimizations also use various utility functions contained in
|
|
430 @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
|
|
431 @file{cfgloopmanip.c}.
|
|
432
|
|
433 Vectorization. This pass transforms loops to operate on vector types
|
|
434 instead of scalar types. Data parallelism across loop iterations is exploited
|
|
435 to group data elements from consecutive iterations into a vector and operate
|
|
436 on them in parallel. Depending on available target support the loop is
|
|
437 conceptually unrolled by a factor @code{VF} (vectorization factor), which is
|
|
438 the number of elements operated upon in parallel in each iteration, and the
|
|
439 @code{VF} copies of each scalar operation are fused to form a vector operation.
|
|
440 Additional loop transformations such as peeling and versioning may take place
|
|
441 to align the number of iterations, and to align the memory accesses in the loop.
|
|
442 The pass is implemented in @file{tree-vectorizer.c} (the main driver and general
|
|
443 utilities), @file{tree-vect-analyze.c} and @file{tree-vect-transform.c}.
|
|
444 Analysis of data references is in @file{tree-data-ref.c}.
|
|
445
|
|
446 Autoparallelization. This pass splits the loop iteration space to run
|
|
447 into several threads. The pass is implemented in @file{tree-parloops.c}.
|
|
448
|
|
449 @item Tree level if-conversion for vectorizer
|
|
450
|
|
451 This pass applies if-conversion to simple loops to help vectorizer.
|
|
452 We identify if convertible loops, if-convert statements and merge
|
|
453 basic blocks in one big block. The idea is to present loop in such
|
|
454 form so that vectorizer can have one to one mapping between statements
|
|
455 and available vector operations. This patch re-introduces COND_EXPR
|
|
456 at GIMPLE level. This pass is located in @file{tree-if-conv.c} and is
|
|
457 described by @code{pass_if_conversion}.
|
|
458
|
|
459 @item Conditional constant propagation
|
|
460
|
|
461 This pass relaxes a lattice of values in order to identify those
|
|
462 that must be constant even in the presence of conditional branches.
|
|
463 The pass is located in @file{tree-ssa-ccp.c} and is described
|
|
464 by @code{pass_ccp}.
|
|
465
|
|
466 A related pass that works on memory loads and stores, and not just
|
|
467 register values, is located in @file{tree-ssa-ccp.c} and described by
|
|
468 @code{pass_store_ccp}.
|
|
469
|
|
470 @item Conditional copy propagation
|
|
471
|
|
472 This is similar to constant propagation but the lattice of values is
|
|
473 the ``copy-of'' relation. It eliminates redundant copies from the
|
|
474 code. The pass is located in @file{tree-ssa-copy.c} and described by
|
|
475 @code{pass_copy_prop}.
|
|
476
|
|
477 A related pass that works on memory copies, and not just register
|
|
478 copies, is located in @file{tree-ssa-copy.c} and described by
|
|
479 @code{pass_store_copy_prop}.
|
|
480
|
|
481 @item Value range propagation
|
|
482
|
|
483 This transformation is similar to constant propagation but
|
|
484 instead of propagating single constant values, it propagates
|
|
485 known value ranges. The implementation is based on Patterson's
|
|
486 range propagation algorithm (Accurate Static Branch Prediction by
|
|
487 Value Range Propagation, J. R. C. Patterson, PLDI '95). In
|
|
488 contrast to Patterson's algorithm, this implementation does not
|
|
489 propagate branch probabilities nor it uses more than a single
|
|
490 range per SSA name. This means that the current implementation
|
|
491 cannot be used for branch prediction (though adapting it would
|
|
492 not be difficult). The pass is located in @file{tree-vrp.c} and is
|
|
493 described by @code{pass_vrp}.
|
|
494
|
|
495 @item Folding built-in functions
|
|
496
|
|
497 This pass simplifies built-in functions, as applicable, with constant
|
|
498 arguments or with inferable string lengths. It is located in
|
|
499 @file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}.
|
|
500
|
|
501 @item Split critical edges
|
|
502
|
|
503 This pass identifies critical edges and inserts empty basic blocks
|
|
504 such that the edge is no longer critical. The pass is located in
|
|
505 @file{tree-cfg.c} and is described by @code{pass_split_crit_edges}.
|
|
506
|
|
507 @item Control dependence dead code elimination
|
|
508
|
|
509 This pass is a stronger form of dead code elimination that can
|
|
510 eliminate unnecessary control flow statements. It is located
|
|
511 in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}.
|
|
512
|
|
513 @item Tail call elimination
|
|
514
|
|
515 This pass identifies function calls that may be rewritten into
|
|
516 jumps. No code transformation is actually applied here, but the
|
|
517 data and control flow problem is solved. The code transformation
|
|
518 requires target support, and so is delayed until RTL@. In the
|
|
519 meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility.
|
|
520 The pass is located in @file{tree-tailcall.c} and is described by
|
|
521 @code{pass_tail_calls}. The RTL transformation is handled by
|
|
522 @code{fixup_tail_calls} in @file{calls.c}.
|
|
523
|
|
524 @item Warn for function return without value
|
|
525
|
|
526 For non-void functions, this pass locates return statements that do
|
|
527 not specify a value and issues a warning. Such a statement may have
|
|
528 been injected by falling off the end of the function. This pass is
|
|
529 run last so that we have as much time as possible to prove that the
|
|
530 statement is not reachable. It is located in @file{tree-cfg.c} and
|
|
531 is described by @code{pass_warn_function_return}.
|
|
532
|
|
533 @item Mudflap statement annotation
|
|
534
|
|
535 If mudflap is enabled, we rewrite some memory accesses with code to
|
|
536 validate that the memory access is correct. In particular, expressions
|
|
537 involving pointer dereferences (@code{INDIRECT_REF}, @code{ARRAY_REF},
|
|
538 etc.) are replaced by code that checks the selected address range
|
|
539 against the mudflap runtime's database of valid regions. This check
|
|
540 includes an inline lookup into a direct-mapped cache, based on
|
|
541 shift/mask operations of the pointer value, with a fallback function
|
|
542 call into the runtime. The pass is located in @file{tree-mudflap.c} and
|
|
543 is described by @code{pass_mudflap_2}.
|
|
544
|
|
545 @item Leave static single assignment form
|
|
546
|
|
547 This pass rewrites the function such that it is in normal form. At
|
|
548 the same time, we eliminate as many single-use temporaries as possible,
|
|
549 so the intermediate language is no longer GIMPLE, but GENERIC@. The
|
|
550 pass is located in @file{tree-outof-ssa.c} and is described by
|
|
551 @code{pass_del_ssa}.
|
|
552
|
|
553 @item Merge PHI nodes that feed into one another
|
|
554
|
|
555 This is part of the CFG cleanup passes. It attempts to join PHI nodes
|
|
556 from a forwarder CFG block into another block with PHI nodes. The
|
|
557 pass is located in @file{tree-cfgcleanup.c} and is described by
|
|
558 @code{pass_merge_phi}.
|
|
559
|
|
560 @item Return value optimization
|
|
561
|
|
562 If a function always returns the same local variable, and that local
|
|
563 variable is an aggregate type, then the variable is replaced with the
|
|
564 return value for the function (i.e., the function's DECL_RESULT). This
|
|
565 is equivalent to the C++ named return value optimization applied to
|
|
566 GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by
|
|
567 @code{pass_nrv}.
|
|
568
|
|
569 @item Return slot optimization
|
|
570
|
|
571 If a function returns a memory object and is called as @code{var =
|
|
572 foo()}, this pass tries to change the call so that the address of
|
|
573 @code{var} is sent to the caller to avoid an extra memory copy. This
|
|
574 pass is located in @code{tree-nrv.c} and is described by
|
|
575 @code{pass_return_slot}.
|
|
576
|
|
577 @item Optimize calls to @code{__builtin_object_size}
|
|
578
|
|
579 This is a propagation pass similar to CCP that tries to remove calls
|
|
580 to @code{__builtin_object_size} when the size of the object can be
|
|
581 computed at compile-time. This pass is located in
|
|
582 @file{tree-object-size.c} and is described by
|
|
583 @code{pass_object_sizes}.
|
|
584
|
|
585 @item Loop invariant motion
|
|
586
|
|
587 This pass removes expensive loop-invariant computations out of loops.
|
|
588 The pass is located in @file{tree-ssa-loop.c} and described by
|
|
589 @code{pass_lim}.
|
|
590
|
|
591 @item Loop nest optimizations
|
|
592
|
|
593 This is a family of loop transformations that works on loop nests. It
|
|
594 includes loop interchange, scaling, skewing and reversal and they are
|
|
595 all geared to the optimization of data locality in array traversals
|
|
596 and the removal of dependencies that hamper optimizations such as loop
|
|
597 parallelization and vectorization. The pass is located in
|
|
598 @file{tree-loop-linear.c} and described by
|
|
599 @code{pass_linear_transform}.
|
|
600
|
|
601 @item Removal of empty loops
|
|
602
|
|
603 This pass removes loops with no code in them. The pass is located in
|
|
604 @file{tree-ssa-loop-ivcanon.c} and described by
|
|
605 @code{pass_empty_loop}.
|
|
606
|
|
607 @item Unrolling of small loops
|
|
608
|
|
609 This pass completely unrolls loops with few iterations. The pass
|
|
610 is located in @file{tree-ssa-loop-ivcanon.c} and described by
|
|
611 @code{pass_complete_unroll}.
|
|
612
|
|
613 @item Predictive commoning
|
|
614
|
|
615 This pass makes the code reuse the computations from the previous
|
|
616 iterations of the loops, especially loads and stores to memory.
|
|
617 It does so by storing the values of these computations to a bank
|
|
618 of temporary variables that are rotated at the end of loop. To avoid
|
|
619 the need for this rotation, the loop is then unrolled and the copies
|
|
620 of the loop body are rewritten to use the appropriate version of
|
|
621 the temporary variable. This pass is located in @file{tree-predcom.c}
|
|
622 and described by @code{pass_predcom}.
|
|
623
|
|
624 @item Array prefetching
|
|
625
|
|
626 This pass issues prefetch instructions for array references inside
|
|
627 loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and
|
|
628 described by @code{pass_loop_prefetch}.
|
|
629
|
|
630 @item Reassociation
|
|
631
|
|
632 This pass rewrites arithmetic expressions to enable optimizations that
|
|
633 operate on them, like redundancy elimination and vectorization. The
|
|
634 pass is located in @file{tree-ssa-reassoc.c} and described by
|
|
635 @code{pass_reassoc}.
|
|
636
|
|
637 @item Optimization of @code{stdarg} functions
|
|
638
|
|
639 This pass tries to avoid the saving of register arguments into the
|
|
640 stack on entry to @code{stdarg} functions. If the function doesn't
|
|
641 use any @code{va_start} macros, no registers need to be saved. If
|
|
642 @code{va_start} macros are used, the @code{va_list} variables don't
|
|
643 escape the function, it is only necessary to save registers that will
|
|
644 be used in @code{va_arg} macros. For instance, if @code{va_arg} is
|
|
645 only used with integral types in the function, floating point
|
|
646 registers don't need to be saved. This pass is located in
|
|
647 @code{tree-stdarg.c} and described by @code{pass_stdarg}.
|
|
648
|
|
649 @end itemize
|
|
650
|
|
651 @node RTL passes
|
|
652 @section RTL passes
|
|
653
|
|
654 The following briefly describes the rtl generation and optimization
|
|
655 passes that are run after tree optimization.
|
|
656
|
|
657 @itemize @bullet
|
|
658 @item RTL generation
|
|
659
|
|
660 @c Avoiding overfull is tricky here.
|
|
661 The source files for RTL generation include
|
|
662 @file{stmt.c},
|
|
663 @file{calls.c},
|
|
664 @file{expr.c},
|
|
665 @file{explow.c},
|
|
666 @file{expmed.c},
|
|
667 @file{function.c},
|
|
668 @file{optabs.c}
|
|
669 and @file{emit-rtl.c}.
|
|
670 Also, the file
|
|
671 @file{insn-emit.c}, generated from the machine description by the
|
|
672 program @code{genemit}, is used in this pass. The header file
|
|
673 @file{expr.h} is used for communication within this pass.
|
|
674
|
|
675 @findex genflags
|
|
676 @findex gencodes
|
|
677 The header files @file{insn-flags.h} and @file{insn-codes.h},
|
|
678 generated from the machine description by the programs @code{genflags}
|
|
679 and @code{gencodes}, tell this pass which standard names are available
|
|
680 for use and which patterns correspond to them.
|
|
681
|
|
682 @item Generate exception handling landing pads
|
|
683
|
|
684 This pass generates the glue that handles communication between the
|
|
685 exception handling library routines and the exception handlers within
|
|
686 the function. Entry points in the function that are invoked by the
|
|
687 exception handling library are called @dfn{landing pads}. The code
|
|
688 for this pass is located within @file{except.c}.
|
|
689
|
|
690 @item Cleanup control flow graph
|
|
691
|
|
692 This pass removes unreachable code, simplifies jumps to next, jumps to
|
|
693 jump, jumps across jumps, etc. The pass is run multiple times.
|
|
694 For historical reasons, it is occasionally referred to as the ``jump
|
|
695 optimization pass''. The bulk of the code for this pass is in
|
|
696 @file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c}
|
|
697 and @file{jump.c}.
|
|
698
|
|
699 @item Forward propagation of single-def values
|
|
700
|
|
701 This pass attempts to remove redundant computation by substituting
|
|
702 variables that come from a single definition, and
|
|
703 seeing if the result can be simplified. It performs copy propagation
|
|
704 and addressing mode selection. The pass is run twice, with values
|
|
705 being propagated into loops only on the second run. It is located in
|
|
706 @file{fwprop.c}.
|
|
707
|
|
708 @item Common subexpression elimination
|
|
709
|
|
710 This pass removes redundant computation within basic blocks, and
|
|
711 optimizes addressing modes based on cost. The pass is run twice.
|
|
712 The source is located in @file{cse.c}.
|
|
713
|
|
714 @item Global common subexpression elimination.
|
|
715
|
|
716 This pass performs two
|
|
717 different types of GCSE depending on whether you are optimizing for
|
|
718 size or not (LCM based GCSE tends to increase code size for a gain in
|
|
719 speed, while Morel-Renvoise based GCSE does not).
|
|
720 When optimizing for size, GCSE is done using Morel-Renvoise Partial
|
|
721 Redundancy Elimination, with the exception that it does not try to move
|
|
722 invariants out of loops---that is left to the loop optimization pass.
|
|
723 If MR PRE GCSE is done, code hoisting (aka unification) is also done, as
|
|
724 well as load motion.
|
|
725 If you are optimizing for speed, LCM (lazy code motion) based GCSE is
|
|
726 done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM
|
|
727 based GCSE also does loop invariant code motion. We also perform load
|
|
728 and store motion when optimizing for speed.
|
|
729 Regardless of which type of GCSE is used, the GCSE pass also performs
|
|
730 global constant and copy propagation.
|
|
731 The source file for this pass is @file{gcse.c}, and the LCM routines
|
|
732 are in @file{lcm.c}.
|
|
733
|
|
734 @item Loop optimization
|
|
735
|
|
736 This pass performs several loop related optimizations.
|
|
737 The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain
|
|
738 generic loop analysis and manipulation code. Initialization and finalization
|
|
739 of loop structures is handled by @file{loop-init.c}.
|
|
740 A loop invariant motion pass is implemented in @file{loop-invariant.c}.
|
|
741 Basic block level optimizations---unrolling, peeling and unswitching loops---
|
|
742 are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}.
|
|
743 Replacing of the exit condition of loops by special machine-dependent
|
|
744 instructions is handled by @file{loop-doloop.c}.
|
|
745
|
|
746 @item Jump bypassing
|
|
747
|
|
748 This pass is an aggressive form of GCSE that transforms the control
|
|
749 flow graph of a function by propagating constants into conditional
|
|
750 branch instructions. The source file for this pass is @file{gcse.c}.
|
|
751
|
|
752 @item If conversion
|
|
753
|
|
754 This pass attempts to replace conditional branches and surrounding
|
|
755 assignments with arithmetic, boolean value producing comparison
|
|
756 instructions, and conditional move instructions. In the very last
|
|
757 invocation after reload, it will generate predicated instructions
|
|
758 when supported by the target. The pass is located in @file{ifcvt.c}.
|
|
759
|
|
760 @item Web construction
|
|
761
|
|
762 This pass splits independent uses of each pseudo-register. This can
|
|
763 improve effect of the other transformation, such as CSE or register
|
|
764 allocation. Its source files are @file{web.c}.
|
|
765
|
|
766 @item Life analysis
|
|
767
|
|
768 This pass computes which pseudo-registers are live at each point in
|
|
769 the program, and makes the first instruction that uses a value point
|
|
770 at the instruction that computed the value. It then deletes
|
|
771 computations whose results are never used, and combines memory
|
|
772 references with add or subtract instructions to make autoincrement or
|
|
773 autodecrement addressing. The pass is located in @file{flow.c}.
|
|
774
|
|
775 @item Instruction combination
|
|
776
|
|
777 This pass attempts to combine groups of two or three instructions that
|
|
778 are related by data flow into single instructions. It combines the
|
|
779 RTL expressions for the instructions by substitution, simplifies the
|
|
780 result using algebra, and then attempts to match the result against
|
|
781 the machine description. The pass is located in @file{combine.c}.
|
|
782
|
|
783 @item Register movement
|
|
784
|
|
785 This pass looks for cases where matching constraints would force an
|
|
786 instruction to need a reload, and this reload would be a
|
|
787 register-to-register move. It then attempts to change the registers
|
|
788 used by the instruction to avoid the move instruction.
|
|
789 The pass is located in @file{regmove.c}.
|
|
790
|
|
791 @item Optimize mode switching
|
|
792
|
|
793 This pass looks for instructions that require the processor to be in a
|
|
794 specific ``mode'' and minimizes the number of mode changes required to
|
|
795 satisfy all users. What these modes are, and what they apply to are
|
|
796 completely target-specific.
|
|
797 The source is located in @file{mode-switching.c}.
|
|
798
|
|
799 @cindex modulo scheduling
|
|
800 @cindex sms, swing, software pipelining
|
|
801 @item Modulo scheduling
|
|
802
|
|
803 This pass looks at innermost loops and reorders their instructions
|
|
804 by overlapping different iterations. Modulo scheduling is performed
|
|
805 immediately before instruction scheduling.
|
|
806 The pass is located in (@file{modulo-sched.c}).
|
|
807
|
|
808 @item Instruction scheduling
|
|
809
|
|
810 This pass looks for instructions whose output will not be available by
|
|
811 the time that it is used in subsequent instructions. Memory loads and
|
|
812 floating point instructions often have this behavior on RISC machines.
|
|
813 It re-orders instructions within a basic block to try to separate the
|
|
814 definition and use of items that otherwise would cause pipeline
|
|
815 stalls. This pass is performed twice, before and after register
|
|
816 allocation. The pass is located in @file{haifa-sched.c},
|
|
817 @file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and
|
|
818 @file{sched-vis.c}.
|
|
819
|
|
820 @item Register allocation
|
|
821
|
|
822 These passes make sure that all occurrences of pseudo registers are
|
|
823 eliminated, either by allocating them to a hard register, replacing
|
|
824 them by an equivalent expression (e.g.@: a constant) or by placing
|
|
825 them on the stack. This is done in several subpasses:
|
|
826
|
|
827 @itemize @bullet
|
|
828 @item
|
|
829 Register move optimizations. This pass makes some simple RTL code
|
|
830 transformations which improve the subsequent register allocation. The
|
|
831 source file is @file{regmove.c}.
|
|
832
|
|
833 @item
|
|
834 The integrated register allocator (@acronym{IRA}). It is called
|
|
835 integrated because coalescing, register live range splitting, and hard
|
|
836 register preferencing are done on-the-fly during coloring. It also
|
|
837 has better integration with the reload pass. Pseudo-registers spilled
|
|
838 by the allocator or the reload have still a chance to get
|
|
839 hard-registers if the reload evicts some pseudo-registers from
|
|
840 hard-registers. The allocator helps to choose better pseudos for
|
|
841 spilling based on their live ranges and to coalesce stack slots
|
|
842 allocated for the spilled pseudo-registers. IRA is a regional
|
|
843 register allocator which is transformed into Chaitin-Briggs allocator
|
|
844 if there is one region. By default, IRA chooses regions using
|
|
845 register pressure but the user can force it to use one region or
|
|
846 regions corresponding to all loops.
|
|
847
|
|
848 Source files of the allocator are @file{ira.c}, @file{ira-build.c},
|
|
849 @file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c},
|
|
850 @file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h}
|
|
851 and @file{ira-int.h} used for the communication between the allocator
|
|
852 and the rest of the compiler and between the IRA files.
|
|
853
|
|
854 @cindex reloading
|
|
855 @item
|
|
856 Reloading. This pass renumbers pseudo registers with the hardware
|
|
857 registers numbers they were allocated. Pseudo registers that did not
|
|
858 get hard registers are replaced with stack slots. Then it finds
|
|
859 instructions that are invalid because a value has failed to end up in
|
|
860 a register, or has ended up in a register of the wrong kind. It fixes
|
|
861 up these instructions by reloading the problematical values
|
|
862 temporarily into registers. Additional instructions are generated to
|
|
863 do the copying.
|
|
864
|
|
865 The reload pass also optionally eliminates the frame pointer and inserts
|
|
866 instructions to save and restore call-clobbered registers around calls.
|
|
867
|
|
868 Source files are @file{reload.c} and @file{reload1.c}, plus the header
|
|
869 @file{reload.h} used for communication between them.
|
|
870 @end itemize
|
|
871
|
|
872 @item Basic block reordering
|
|
873
|
|
874 This pass implements profile guided code positioning. If profile
|
|
875 information is not available, various types of static analysis are
|
|
876 performed to make the predictions normally coming from the profile
|
|
877 feedback (IE execution frequency, branch probability, etc). It is
|
|
878 implemented in the file @file{bb-reorder.c}, and the various
|
|
879 prediction routines are in @file{predict.c}.
|
|
880
|
|
881 @item Variable tracking
|
|
882
|
|
883 This pass computes where the variables are stored at each
|
|
884 position in code and generates notes describing the variable locations
|
|
885 to RTL code. The location lists are then generated according to these
|
|
886 notes to debug information if the debugging information format supports
|
|
887 location lists.
|
|
888
|
|
889 @item Delayed branch scheduling
|
|
890
|
|
891 This optional pass attempts to find instructions that can go into the
|
|
892 delay slots of other instructions, usually jumps and calls. The
|
|
893 source file name is @file{reorg.c}.
|
|
894
|
|
895 @item Branch shortening
|
|
896
|
|
897 On many RISC machines, branch instructions have a limited range.
|
|
898 Thus, longer sequences of instructions must be used for long branches.
|
|
899 In this pass, the compiler figures out what how far each instruction
|
|
900 will be from each other instruction, and therefore whether the usual
|
|
901 instructions, or the longer sequences, must be used for each branch.
|
|
902
|
|
903 @item Register-to-stack conversion
|
|
904
|
|
905 Conversion from usage of some hard registers to usage of a register
|
|
906 stack may be done at this point. Currently, this is supported only
|
|
907 for the floating-point registers of the Intel 80387 coprocessor. The
|
|
908 source file name is @file{reg-stack.c}.
|
|
909
|
|
910 @item Final
|
|
911
|
|
912 This pass outputs the assembler code for the function. The source files
|
|
913 are @file{final.c} plus @file{insn-output.c}; the latter is generated
|
|
914 automatically from the machine description by the tool @file{genoutput}.
|
|
915 The header file @file{conditions.h} is used for communication between
|
|
916 these files. If mudflap is enabled, the queue of deferred declarations
|
|
917 and any addressed constants (e.g., string literals) is processed by
|
|
918 @code{mudflap_finish_file} into a synthetic constructor function
|
|
919 containing calls into the mudflap runtime.
|
|
920
|
|
921 @item Debugging information output
|
|
922
|
|
923 This is run after final because it must output the stack slot offsets
|
|
924 for pseudo registers that did not get hard registers. Source files
|
|
925 are @file{dbxout.c} for DBX symbol table format, @file{sdbout.c} for
|
|
926 SDB symbol table format, @file{dwarfout.c} for DWARF symbol table
|
|
927 format, files @file{dwarf2out.c} and @file{dwarf2asm.c} for DWARF2
|
|
928 symbol table format, and @file{vmsdbgout.c} for VMS debug symbol table
|
|
929 format.
|
|
930
|
|
931 @end itemize
|