Mercurial > hg > CbC > CbC_gcc
comparison gcc/doc/tree-ssa.texi @ 0:a06113de4d67
first commit
author | kent <kent@cr.ie.u-ryukyu.ac.jp> |
---|---|
date | Fri, 17 Jul 2009 14:47:48 +0900 |
parents | |
children | 77e2b8dfacca |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:a06113de4d67 |
---|---|
1 @c Copyright (c) 2004, 2005, 2007, 2008 Free Software Foundation, Inc. | |
2 @c Free Software Foundation, Inc. | |
3 @c This is part of the GCC manual. | |
4 @c For copying conditions, see the file gcc.texi. | |
5 | |
6 @c --------------------------------------------------------------------- | |
7 @c Tree SSA | |
8 @c --------------------------------------------------------------------- | |
9 | |
10 @node Tree SSA | |
11 @chapter Analysis and Optimization of GIMPLE tuples | |
12 @cindex Tree SSA | |
13 @cindex Optimization infrastructure for GIMPLE | |
14 | |
15 GCC uses three main intermediate languages to represent the program | |
16 during compilation: GENERIC, GIMPLE and RTL@. GENERIC is a | |
17 language-independent representation generated by each front end. It | |
18 is used to serve as an interface between the parser and optimizer. | |
19 GENERIC is a common representation that is able to represent programs | |
20 written in all the languages supported by GCC@. | |
21 | |
22 GIMPLE and RTL are used to optimize the program. GIMPLE is used for | |
23 target and language independent optimizations (e.g., inlining, | |
24 constant propagation, tail call elimination, redundancy elimination, | |
25 etc). Much like GENERIC, GIMPLE is a language independent, tree based | |
26 representation. However, it differs from GENERIC in that the GIMPLE | |
27 grammar is more restrictive: expressions contain no more than 3 | |
28 operands (except function calls), it has no control flow structures | |
29 and expressions with side-effects are only allowed on the right hand | |
30 side of assignments. See the chapter describing GENERIC and GIMPLE | |
31 for more details. | |
32 | |
33 This chapter describes the data structures and functions used in the | |
34 GIMPLE optimizers (also known as ``tree optimizers'' or ``middle | |
35 end''). In particular, it focuses on all the macros, data structures, | |
36 functions and programming constructs needed to implement optimization | |
37 passes for GIMPLE@. | |
38 | |
39 @menu | |
40 * Annotations:: Attributes for variables. | |
41 * SSA Operands:: SSA names referenced by GIMPLE statements. | |
42 * SSA:: Static Single Assignment representation. | |
43 * Alias analysis:: Representing aliased loads and stores. | |
44 @end menu | |
45 | |
46 @node Annotations | |
47 @section Annotations | |
48 @cindex annotations | |
49 | |
50 The optimizers need to associate attributes with variables during the | |
51 optimization process. For instance, we need to know whether a | |
52 variable has aliases. All these attributes are stored in data | |
53 structures called annotations which are then linked to the field | |
54 @code{ann} in @code{struct tree_common}. | |
55 | |
56 Presently, we define annotations for variables (@code{var_ann_t}). | |
57 Annotations are defined and documented in @file{tree-flow.h}. | |
58 | |
59 | |
60 @node SSA Operands | |
61 @section SSA Operands | |
62 @cindex operands | |
63 @cindex virtual operands | |
64 @cindex real operands | |
65 @findex update_stmt | |
66 | |
67 Almost every GIMPLE statement will contain a reference to a variable | |
68 or memory location. Since statements come in different shapes and | |
69 sizes, their operands are going to be located at various spots inside | |
70 the statement's tree. To facilitate access to the statement's | |
71 operands, they are organized into lists associated inside each | |
72 statement's annotation. Each element in an operand list is a pointer | |
73 to a @code{VAR_DECL}, @code{PARM_DECL} or @code{SSA_NAME} tree node. | |
74 This provides a very convenient way of examining and replacing | |
75 operands. | |
76 | |
77 Data flow analysis and optimization is done on all tree nodes | |
78 representing variables. Any node for which @code{SSA_VAR_P} returns | |
79 nonzero is considered when scanning statement operands. However, not | |
80 all @code{SSA_VAR_P} variables are processed in the same way. For the | |
81 purposes of optimization, we need to distinguish between references to | |
82 local scalar variables and references to globals, statics, structures, | |
83 arrays, aliased variables, etc. The reason is simple, the compiler | |
84 can gather complete data flow information for a local scalar. On the | |
85 other hand, a global variable may be modified by a function call, it | |
86 may not be possible to keep track of all the elements of an array or | |
87 the fields of a structure, etc. | |
88 | |
89 The operand scanner gathers two kinds of operands: @dfn{real} and | |
90 @dfn{virtual}. An operand for which @code{is_gimple_reg} returns true | |
91 is considered real, otherwise it is a virtual operand. We also | |
92 distinguish between uses and definitions. An operand is used if its | |
93 value is loaded by the statement (e.g., the operand at the RHS of an | |
94 assignment). If the statement assigns a new value to the operand, the | |
95 operand is considered a definition (e.g., the operand at the LHS of | |
96 an assignment). | |
97 | |
98 Virtual and real operands also have very different data flow | |
99 properties. Real operands are unambiguous references to the | |
100 full object that they represent. For instance, given | |
101 | |
102 @smallexample | |
103 @{ | |
104 int a, b; | |
105 a = b | |
106 @} | |
107 @end smallexample | |
108 | |
109 Since @code{a} and @code{b} are non-aliased locals, the statement | |
110 @code{a = b} will have one real definition and one real use because | |
111 variable @code{b} is completely modified with the contents of | |
112 variable @code{a}. Real definition are also known as @dfn{killing | |
113 definitions}. Similarly, the use of @code{a} reads all its bits. | |
114 | |
115 In contrast, virtual operands are used with variables that can have | |
116 a partial or ambiguous reference. This includes structures, arrays, | |
117 globals, and aliased variables. In these cases, we have two types of | |
118 definitions. For globals, structures, and arrays, we can determine from | |
119 a statement whether a variable of these types has a killing definition. | |
120 If the variable does, then the statement is marked as having a | |
121 @dfn{must definition} of that variable. However, if a statement is only | |
122 defining a part of the variable (i.e.@: a field in a structure), or if we | |
123 know that a statement might define the variable but we cannot say for sure, | |
124 then we mark that statement as having a @dfn{may definition}. For | |
125 instance, given | |
126 | |
127 @smallexample | |
128 @{ | |
129 int a, b, *p; | |
130 | |
131 if (@dots{}) | |
132 p = &a; | |
133 else | |
134 p = &b; | |
135 *p = 5; | |
136 return *p; | |
137 @} | |
138 @end smallexample | |
139 | |
140 The assignment @code{*p = 5} may be a definition of @code{a} or | |
141 @code{b}. If we cannot determine statically where @code{p} is | |
142 pointing to at the time of the store operation, we create virtual | |
143 definitions to mark that statement as a potential definition site for | |
144 @code{a} and @code{b}. Memory loads are similarly marked with virtual | |
145 use operands. Virtual operands are shown in tree dumps right before | |
146 the statement that contains them. To request a tree dump with virtual | |
147 operands, use the @option{-vops} option to @option{-fdump-tree}: | |
148 | |
149 @smallexample | |
150 @{ | |
151 int a, b, *p; | |
152 | |
153 if (@dots{}) | |
154 p = &a; | |
155 else | |
156 p = &b; | |
157 # a = VDEF <a> | |
158 # b = VDEF <b> | |
159 *p = 5; | |
160 | |
161 # VUSE <a> | |
162 # VUSE <b> | |
163 return *p; | |
164 @} | |
165 @end smallexample | |
166 | |
167 Notice that @code{VDEF} operands have two copies of the referenced | |
168 variable. This indicates that this is not a killing definition of | |
169 that variable. In this case we refer to it as a @dfn{may definition} | |
170 or @dfn{aliased store}. The presence of the second copy of the | |
171 variable in the @code{VDEF} operand will become important when the | |
172 function is converted into SSA form. This will be used to link all | |
173 the non-killing definitions to prevent optimizations from making | |
174 incorrect assumptions about them. | |
175 | |
176 Operands are updated as soon as the statement is finished via a call | |
177 to @code{update_stmt}. If statement elements are changed via | |
178 @code{SET_USE} or @code{SET_DEF}, then no further action is required | |
179 (i.e., those macros take care of updating the statement). If changes | |
180 are made by manipulating the statement's tree directly, then a call | |
181 must be made to @code{update_stmt} when complete. Calling one of the | |
182 @code{bsi_insert} routines or @code{bsi_replace} performs an implicit | |
183 call to @code{update_stmt}. | |
184 | |
185 @subsection Operand Iterators And Access Routines | |
186 @cindex Operand Iterators | |
187 @cindex Operand Access Routines | |
188 | |
189 Operands are collected by @file{tree-ssa-operands.c}. They are stored | |
190 inside each statement's annotation and can be accessed through either the | |
191 operand iterators or an access routine. | |
192 | |
193 The following access routines are available for examining operands: | |
194 | |
195 @enumerate | |
196 @item @code{SINGLE_SSA_@{USE,DEF,TREE@}_OPERAND}: These accessors will return | |
197 NULL unless there is exactly one operand matching the specified flags. If | |
198 there is exactly one operand, the operand is returned as either a @code{tree}, | |
199 @code{def_operand_p}, or @code{use_operand_p}. | |
200 | |
201 @smallexample | |
202 tree t = SINGLE_SSA_TREE_OPERAND (stmt, flags); | |
203 use_operand_p u = SINGLE_SSA_USE_OPERAND (stmt, SSA_ALL_VIRTUAL_USES); | |
204 def_operand_p d = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_ALL_DEFS); | |
205 @end smallexample | |
206 | |
207 @item @code{ZERO_SSA_OPERANDS}: This macro returns true if there are no | |
208 operands matching the specified flags. | |
209 | |
210 @smallexample | |
211 if (ZERO_SSA_OPERANDS (stmt, SSA_OP_ALL_VIRTUALS)) | |
212 return; | |
213 @end smallexample | |
214 | |
215 @item @code{NUM_SSA_OPERANDS}: This macro Returns the number of operands | |
216 matching 'flags'. This actually executes a loop to perform the count, so | |
217 only use this if it is really needed. | |
218 | |
219 @smallexample | |
220 int count = NUM_SSA_OPERANDS (stmt, flags) | |
221 @end smallexample | |
222 @end enumerate | |
223 | |
224 | |
225 If you wish to iterate over some or all operands, use the | |
226 @code{FOR_EACH_SSA_@{USE,DEF,TREE@}_OPERAND} iterator. For example, to print | |
227 all the operands for a statement: | |
228 | |
229 @smallexample | |
230 void | |
231 print_ops (tree stmt) | |
232 @{ | |
233 ssa_op_iter; | |
234 tree var; | |
235 | |
236 FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_ALL_OPERANDS) | |
237 print_generic_expr (stderr, var, TDF_SLIM); | |
238 @} | |
239 @end smallexample | |
240 | |
241 | |
242 How to choose the appropriate iterator: | |
243 | |
244 @enumerate | |
245 @item Determine whether you are need to see the operand pointers, or just the | |
246 trees, and choose the appropriate macro: | |
247 | |
248 @smallexample | |
249 Need Macro: | |
250 ---- ------- | |
251 use_operand_p FOR_EACH_SSA_USE_OPERAND | |
252 def_operand_p FOR_EACH_SSA_DEF_OPERAND | |
253 tree FOR_EACH_SSA_TREE_OPERAND | |
254 @end smallexample | |
255 | |
256 @item You need to declare a variable of the type you are interested | |
257 in, and an ssa_op_iter structure which serves as the loop controlling | |
258 variable. | |
259 | |
260 @item Determine which operands you wish to use, and specify the flags of | |
261 those you are interested in. They are documented in | |
262 @file{tree-ssa-operands.h}: | |
263 | |
264 @smallexample | |
265 #define SSA_OP_USE 0x01 /* @r{Real USE operands.} */ | |
266 #define SSA_OP_DEF 0x02 /* @r{Real DEF operands.} */ | |
267 #define SSA_OP_VUSE 0x04 /* @r{VUSE operands.} */ | |
268 #define SSA_OP_VMAYUSE 0x08 /* @r{USE portion of VDEFS.} */ | |
269 #define SSA_OP_VDEF 0x10 /* @r{DEF portion of VDEFS.} */ | |
270 | |
271 /* @r{These are commonly grouped operand flags.} */ | |
272 #define SSA_OP_VIRTUAL_USES (SSA_OP_VUSE | SSA_OP_VMAYUSE) | |
273 #define SSA_OP_VIRTUAL_DEFS (SSA_OP_VDEF) | |
274 #define SSA_OP_ALL_USES (SSA_OP_VIRTUAL_USES | SSA_OP_USE) | |
275 #define SSA_OP_ALL_DEFS (SSA_OP_VIRTUAL_DEFS | SSA_OP_DEF) | |
276 #define SSA_OP_ALL_OPERANDS (SSA_OP_ALL_USES | SSA_OP_ALL_DEFS) | |
277 @end smallexample | |
278 @end enumerate | |
279 | |
280 So if you want to look at the use pointers for all the @code{USE} and | |
281 @code{VUSE} operands, you would do something like: | |
282 | |
283 @smallexample | |
284 use_operand_p use_p; | |
285 ssa_op_iter iter; | |
286 | |
287 FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, (SSA_OP_USE | SSA_OP_VUSE)) | |
288 @{ | |
289 process_use_ptr (use_p); | |
290 @} | |
291 @end smallexample | |
292 | |
293 The @code{TREE} macro is basically the same as the @code{USE} and | |
294 @code{DEF} macros, only with the use or def dereferenced via | |
295 @code{USE_FROM_PTR (use_p)} and @code{DEF_FROM_PTR (def_p)}. Since we | |
296 aren't using operand pointers, use and defs flags can be mixed. | |
297 | |
298 @smallexample | |
299 tree var; | |
300 ssa_op_iter iter; | |
301 | |
302 FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_VUSE) | |
303 @{ | |
304 print_generic_expr (stderr, var, TDF_SLIM); | |
305 @} | |
306 @end smallexample | |
307 | |
308 @code{VDEF}s are broken into two flags, one for the | |
309 @code{DEF} portion (@code{SSA_OP_VDEF}) and one for the USE portion | |
310 (@code{SSA_OP_VMAYUSE}). If all you want to look at are the | |
311 @code{VDEF}s together, there is a fourth iterator macro for this, | |
312 which returns both a def_operand_p and a use_operand_p for each | |
313 @code{VDEF} in the statement. Note that you don't need any flags for | |
314 this one. | |
315 | |
316 @smallexample | |
317 use_operand_p use_p; | |
318 def_operand_p def_p; | |
319 ssa_op_iter iter; | |
320 | |
321 FOR_EACH_SSA_MAYDEF_OPERAND (def_p, use_p, stmt, iter) | |
322 @{ | |
323 my_code; | |
324 @} | |
325 @end smallexample | |
326 | |
327 There are many examples in the code as well, as well as the | |
328 documentation in @file{tree-ssa-operands.h}. | |
329 | |
330 There are also a couple of variants on the stmt iterators regarding PHI | |
331 nodes. | |
332 | |
333 @code{FOR_EACH_PHI_ARG} Works exactly like | |
334 @code{FOR_EACH_SSA_USE_OPERAND}, except it works over @code{PHI} arguments | |
335 instead of statement operands. | |
336 | |
337 @smallexample | |
338 /* Look at every virtual PHI use. */ | |
339 FOR_EACH_PHI_ARG (use_p, phi_stmt, iter, SSA_OP_VIRTUAL_USES) | |
340 @{ | |
341 my_code; | |
342 @} | |
343 | |
344 /* Look at every real PHI use. */ | |
345 FOR_EACH_PHI_ARG (use_p, phi_stmt, iter, SSA_OP_USES) | |
346 my_code; | |
347 | |
348 /* Look at every PHI use. */ | |
349 FOR_EACH_PHI_ARG (use_p, phi_stmt, iter, SSA_OP_ALL_USES) | |
350 my_code; | |
351 @end smallexample | |
352 | |
353 @code{FOR_EACH_PHI_OR_STMT_@{USE,DEF@}} works exactly like | |
354 @code{FOR_EACH_SSA_@{USE,DEF@}_OPERAND}, except it will function on | |
355 either a statement or a @code{PHI} node. These should be used when it is | |
356 appropriate but they are not quite as efficient as the individual | |
357 @code{FOR_EACH_PHI} and @code{FOR_EACH_SSA} routines. | |
358 | |
359 @smallexample | |
360 FOR_EACH_PHI_OR_STMT_USE (use_operand_p, stmt, iter, flags) | |
361 @{ | |
362 my_code; | |
363 @} | |
364 | |
365 FOR_EACH_PHI_OR_STMT_DEF (def_operand_p, phi, iter, flags) | |
366 @{ | |
367 my_code; | |
368 @} | |
369 @end smallexample | |
370 | |
371 @subsection Immediate Uses | |
372 @cindex Immediate Uses | |
373 | |
374 Immediate use information is now always available. Using the immediate use | |
375 iterators, you may examine every use of any @code{SSA_NAME}. For instance, | |
376 to change each use of @code{ssa_var} to @code{ssa_var2} and call fold_stmt on | |
377 each stmt after that is done: | |
378 | |
379 @smallexample | |
380 use_operand_p imm_use_p; | |
381 imm_use_iterator iterator; | |
382 tree ssa_var, stmt; | |
383 | |
384 | |
385 FOR_EACH_IMM_USE_STMT (stmt, iterator, ssa_var) | |
386 @{ | |
387 FOR_EACH_IMM_USE_ON_STMT (imm_use_p, iterator) | |
388 SET_USE (imm_use_p, ssa_var_2); | |
389 fold_stmt (stmt); | |
390 @} | |
391 @end smallexample | |
392 | |
393 There are 2 iterators which can be used. @code{FOR_EACH_IMM_USE_FAST} is | |
394 used when the immediate uses are not changed, i.e., you are looking at the | |
395 uses, but not setting them. | |
396 | |
397 If they do get changed, then care must be taken that things are not changed | |
398 under the iterators, so use the @code{FOR_EACH_IMM_USE_STMT} and | |
399 @code{FOR_EACH_IMM_USE_ON_STMT} iterators. They attempt to preserve the | |
400 sanity of the use list by moving all the uses for a statement into | |
401 a controlled position, and then iterating over those uses. Then the | |
402 optimization can manipulate the stmt when all the uses have been | |
403 processed. This is a little slower than the FAST version since it adds a | |
404 placeholder element and must sort through the list a bit for each statement. | |
405 This placeholder element must be also be removed if the loop is | |
406 terminated early. The macro @code{BREAK_FROM_IMM_USE_SAFE} is provided | |
407 to do this : | |
408 | |
409 @smallexample | |
410 FOR_EACH_IMM_USE_STMT (stmt, iterator, ssa_var) | |
411 @{ | |
412 if (stmt == last_stmt) | |
413 BREAK_FROM_SAFE_IMM_USE (iter); | |
414 | |
415 FOR_EACH_IMM_USE_ON_STMT (imm_use_p, iterator) | |
416 SET_USE (imm_use_p, ssa_var_2); | |
417 fold_stmt (stmt); | |
418 @} | |
419 @end smallexample | |
420 | |
421 There are checks in @code{verify_ssa} which verify that the immediate use list | |
422 is up to date, as well as checking that an optimization didn't break from the | |
423 loop without using this macro. It is safe to simply 'break'; from a | |
424 @code{FOR_EACH_IMM_USE_FAST} traverse. | |
425 | |
426 Some useful functions and macros: | |
427 @enumerate | |
428 @item @code{has_zero_uses (ssa_var)} : Returns true if there are no uses of | |
429 @code{ssa_var}. | |
430 @item @code{has_single_use (ssa_var)} : Returns true if there is only a | |
431 single use of @code{ssa_var}. | |
432 @item @code{single_imm_use (ssa_var, use_operand_p *ptr, tree *stmt)} : | |
433 Returns true if there is only a single use of @code{ssa_var}, and also returns | |
434 the use pointer and statement it occurs in, in the second and third parameters. | |
435 @item @code{num_imm_uses (ssa_var)} : Returns the number of immediate uses of | |
436 @code{ssa_var}. It is better not to use this if possible since it simply | |
437 utilizes a loop to count the uses. | |
438 @item @code{PHI_ARG_INDEX_FROM_USE (use_p)} : Given a use within a @code{PHI} | |
439 node, return the index number for the use. An assert is triggered if the use | |
440 isn't located in a @code{PHI} node. | |
441 @item @code{USE_STMT (use_p)} : Return the statement a use occurs in. | |
442 @end enumerate | |
443 | |
444 Note that uses are not put into an immediate use list until their statement is | |
445 actually inserted into the instruction stream via a @code{bsi_*} routine. | |
446 | |
447 It is also still possible to utilize lazy updating of statements, but this | |
448 should be used only when absolutely required. Both alias analysis and the | |
449 dominator optimizations currently do this. | |
450 | |
451 When lazy updating is being used, the immediate use information is out of date | |
452 and cannot be used reliably. Lazy updating is achieved by simply marking | |
453 statements modified via calls to @code{mark_stmt_modified} instead of | |
454 @code{update_stmt}. When lazy updating is no longer required, all the | |
455 modified statements must have @code{update_stmt} called in order to bring them | |
456 up to date. This must be done before the optimization is finished, or | |
457 @code{verify_ssa} will trigger an abort. | |
458 | |
459 This is done with a simple loop over the instruction stream: | |
460 @smallexample | |
461 block_stmt_iterator bsi; | |
462 basic_block bb; | |
463 FOR_EACH_BB (bb) | |
464 @{ | |
465 for (bsi = bsi_start (bb); !bsi_end_p (bsi); bsi_next (&bsi)) | |
466 update_stmt_if_modified (bsi_stmt (bsi)); | |
467 @} | |
468 @end smallexample | |
469 | |
470 @node SSA | |
471 @section Static Single Assignment | |
472 @cindex SSA | |
473 @cindex static single assignment | |
474 | |
475 Most of the tree optimizers rely on the data flow information provided | |
476 by the Static Single Assignment (SSA) form. We implement the SSA form | |
477 as described in @cite{R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and | |
478 K. Zadeck. Efficiently Computing Static Single Assignment Form and the | |
479 Control Dependence Graph. ACM Transactions on Programming Languages | |
480 and Systems, 13(4):451-490, October 1991}. | |
481 | |
482 The SSA form is based on the premise that program variables are | |
483 assigned in exactly one location in the program. Multiple assignments | |
484 to the same variable create new versions of that variable. Naturally, | |
485 actual programs are seldom in SSA form initially because variables | |
486 tend to be assigned multiple times. The compiler modifies the program | |
487 representation so that every time a variable is assigned in the code, | |
488 a new version of the variable is created. Different versions of the | |
489 same variable are distinguished by subscripting the variable name with | |
490 its version number. Variables used in the right-hand side of | |
491 expressions are renamed so that their version number matches that of | |
492 the most recent assignment. | |
493 | |
494 We represent variable versions using @code{SSA_NAME} nodes. The | |
495 renaming process in @file{tree-ssa.c} wraps every real and | |
496 virtual operand with an @code{SSA_NAME} node which contains | |
497 the version number and the statement that created the | |
498 @code{SSA_NAME}. Only definitions and virtual definitions may | |
499 create new @code{SSA_NAME} nodes. | |
500 | |
501 @cindex PHI nodes | |
502 Sometimes, flow of control makes it impossible to determine the | |
503 most recent version of a variable. In these cases, the compiler | |
504 inserts an artificial definition for that variable called | |
505 @dfn{PHI function} or @dfn{PHI node}. This new definition merges | |
506 all the incoming versions of the variable to create a new name | |
507 for it. For instance, | |
508 | |
509 @smallexample | |
510 if (@dots{}) | |
511 a_1 = 5; | |
512 else if (@dots{}) | |
513 a_2 = 2; | |
514 else | |
515 a_3 = 13; | |
516 | |
517 # a_4 = PHI <a_1, a_2, a_3> | |
518 return a_4; | |
519 @end smallexample | |
520 | |
521 Since it is not possible to determine which of the three branches | |
522 will be taken at runtime, we don't know which of @code{a_1}, | |
523 @code{a_2} or @code{a_3} to use at the return statement. So, the | |
524 SSA renamer creates a new version @code{a_4} which is assigned | |
525 the result of ``merging'' @code{a_1}, @code{a_2} and @code{a_3}. | |
526 Hence, PHI nodes mean ``one of these operands. I don't know | |
527 which''. | |
528 | |
529 The following macros can be used to examine PHI nodes | |
530 | |
531 @defmac PHI_RESULT (@var{phi}) | |
532 Returns the @code{SSA_NAME} created by PHI node @var{phi} (i.e., | |
533 @var{phi}'s LHS)@. | |
534 @end defmac | |
535 | |
536 @defmac PHI_NUM_ARGS (@var{phi}) | |
537 Returns the number of arguments in @var{phi}. This number is exactly | |
538 the number of incoming edges to the basic block holding @var{phi}@. | |
539 @end defmac | |
540 | |
541 @defmac PHI_ARG_ELT (@var{phi}, @var{i}) | |
542 Returns a tuple representing the @var{i}th argument of @var{phi}@. | |
543 Each element of this tuple contains an @code{SSA_NAME} @var{var} and | |
544 the incoming edge through which @var{var} flows. | |
545 @end defmac | |
546 | |
547 @defmac PHI_ARG_EDGE (@var{phi}, @var{i}) | |
548 Returns the incoming edge for the @var{i}th argument of @var{phi}. | |
549 @end defmac | |
550 | |
551 @defmac PHI_ARG_DEF (@var{phi}, @var{i}) | |
552 Returns the @code{SSA_NAME} for the @var{i}th argument of @var{phi}. | |
553 @end defmac | |
554 | |
555 | |
556 @subsection Preserving the SSA form | |
557 @findex update_ssa | |
558 @cindex preserving SSA form | |
559 Some optimization passes make changes to the function that | |
560 invalidate the SSA property. This can happen when a pass has | |
561 added new symbols or changed the program so that variables that | |
562 were previously aliased aren't anymore. Whenever something like this | |
563 happens, the affected symbols must be renamed into SSA form again. | |
564 Transformations that emit new code or replicate existing statements | |
565 will also need to update the SSA form@. | |
566 | |
567 Since GCC implements two different SSA forms for register and virtual | |
568 variables, keeping the SSA form up to date depends on whether you are | |
569 updating register or virtual names. In both cases, the general idea | |
570 behind incremental SSA updates is similar: when new SSA names are | |
571 created, they typically are meant to replace other existing names in | |
572 the program@. | |
573 | |
574 For instance, given the following code: | |
575 | |
576 @smallexample | |
577 1 L0: | |
578 2 x_1 = PHI (0, x_5) | |
579 3 if (x_1 < 10) | |
580 4 if (x_1 > 7) | |
581 5 y_2 = 0 | |
582 6 else | |
583 7 y_3 = x_1 + x_7 | |
584 8 endif | |
585 9 x_5 = x_1 + 1 | |
586 10 goto L0; | |
587 11 endif | |
588 @end smallexample | |
589 | |
590 Suppose that we insert new names @code{x_10} and @code{x_11} (lines | |
591 @code{4} and @code{8})@. | |
592 | |
593 @smallexample | |
594 1 L0: | |
595 2 x_1 = PHI (0, x_5) | |
596 3 if (x_1 < 10) | |
597 4 x_10 = @dots{} | |
598 5 if (x_1 > 7) | |
599 6 y_2 = 0 | |
600 7 else | |
601 8 x_11 = @dots{} | |
602 9 y_3 = x_1 + x_7 | |
603 10 endif | |
604 11 x_5 = x_1 + 1 | |
605 12 goto L0; | |
606 13 endif | |
607 @end smallexample | |
608 | |
609 We want to replace all the uses of @code{x_1} with the new definitions | |
610 of @code{x_10} and @code{x_11}. Note that the only uses that should | |
611 be replaced are those at lines @code{5}, @code{9} and @code{11}. | |
612 Also, the use of @code{x_7} at line @code{9} should @emph{not} be | |
613 replaced (this is why we cannot just mark symbol @code{x} for | |
614 renaming)@. | |
615 | |
616 Additionally, we may need to insert a PHI node at line @code{11} | |
617 because that is a merge point for @code{x_10} and @code{x_11}. So the | |
618 use of @code{x_1} at line @code{11} will be replaced with the new PHI | |
619 node. The insertion of PHI nodes is optional. They are not strictly | |
620 necessary to preserve the SSA form, and depending on what the caller | |
621 inserted, they may not even be useful for the optimizers@. | |
622 | |
623 Updating the SSA form is a two step process. First, the pass has to | |
624 identify which names need to be updated and/or which symbols need to | |
625 be renamed into SSA form for the first time. When new names are | |
626 introduced to replace existing names in the program, the mapping | |
627 between the old and the new names are registered by calling | |
628 @code{register_new_name_mapping} (note that if your pass creates new | |
629 code by duplicating basic blocks, the call to @code{tree_duplicate_bb} | |
630 will set up the necessary mappings automatically). On the other hand, | |
631 if your pass exposes a new symbol that should be put in SSA form for | |
632 the first time, the new symbol should be registered with | |
633 @code{mark_sym_for_renaming}. | |
634 | |
635 After the replacement mappings have been registered and new symbols | |
636 marked for renaming, a call to @code{update_ssa} makes the registered | |
637 changes. This can be done with an explicit call or by creating | |
638 @code{TODO} flags in the @code{tree_opt_pass} structure for your pass. | |
639 There are several @code{TODO} flags that control the behavior of | |
640 @code{update_ssa}: | |
641 | |
642 @itemize @bullet | |
643 @item @code{TODO_update_ssa}. Update the SSA form inserting PHI nodes | |
644 for newly exposed symbols and virtual names marked for updating. | |
645 When updating real names, only insert PHI nodes for a real name | |
646 @code{O_j} in blocks reached by all the new and old definitions for | |
647 @code{O_j}. If the iterated dominance frontier for @code{O_j} | |
648 is not pruned, we may end up inserting PHI nodes in blocks that | |
649 have one or more edges with no incoming definition for | |
650 @code{O_j}. This would lead to uninitialized warnings for | |
651 @code{O_j}'s symbol@. | |
652 | |
653 @item @code{TODO_update_ssa_no_phi}. Update the SSA form without | |
654 inserting any new PHI nodes at all. This is used by passes that | |
655 have either inserted all the PHI nodes themselves or passes that | |
656 need only to patch use-def and def-def chains for virtuals | |
657 (e.g., DCE)@. | |
658 | |
659 | |
660 @item @code{TODO_update_ssa_full_phi}. Insert PHI nodes everywhere | |
661 they are needed. No pruning of the IDF is done. This is used | |
662 by passes that need the PHI nodes for @code{O_j} even if it | |
663 means that some arguments will come from the default definition | |
664 of @code{O_j}'s symbol (e.g., @code{pass_linear_transform})@. | |
665 | |
666 WARNING: If you need to use this flag, chances are that your | |
667 pass may be doing something wrong. Inserting PHI nodes for an | |
668 old name where not all edges carry a new replacement may lead to | |
669 silent codegen errors or spurious uninitialized warnings@. | |
670 | |
671 @item @code{TODO_update_ssa_only_virtuals}. Passes that update the | |
672 SSA form on their own may want to delegate the updating of | |
673 virtual names to the generic updater. Since FUD chains are | |
674 easier to maintain, this simplifies the work they need to do. | |
675 NOTE: If this flag is used, any OLD->NEW mappings for real names | |
676 are explicitly destroyed and only the symbols marked for | |
677 renaming are processed@. | |
678 @end itemize | |
679 | |
680 @subsection Preserving the virtual SSA form | |
681 @cindex preserving virtual SSA form | |
682 | |
683 The virtual SSA form is harder to preserve than the non-virtual SSA form | |
684 mainly because the set of virtual operands for a statement may change at | |
685 what some would consider unexpected times. In general, statement | |
686 modifications should be bracketed between calls to | |
687 @code{push_stmt_changes} and @code{pop_stmt_changes}. For example, | |
688 | |
689 @smallexample | |
690 munge_stmt (tree stmt) | |
691 @{ | |
692 push_stmt_changes (&stmt); | |
693 @dots{} rewrite STMT @dots{} | |
694 pop_stmt_changes (&stmt); | |
695 @} | |
696 @end smallexample | |
697 | |
698 The call to @code{push_stmt_changes} saves the current state of the | |
699 statement operands and the call to @code{pop_stmt_changes} compares | |
700 the saved state with the current one and does the appropriate symbol | |
701 marking for the SSA renamer. | |
702 | |
703 It is possible to modify several statements at a time, provided that | |
704 @code{push_stmt_changes} and @code{pop_stmt_changes} are called in | |
705 LIFO order, as when processing a stack of statements. | |
706 | |
707 Additionally, if the pass discovers that it did not need to make | |
708 changes to the statement after calling @code{push_stmt_changes}, it | |
709 can simply discard the topmost change buffer by calling | |
710 @code{discard_stmt_changes}. This will avoid the expensive operand | |
711 re-scan operation and the buffer comparison that determines if symbols | |
712 need to be marked for renaming. | |
713 | |
714 @subsection Examining @code{SSA_NAME} nodes | |
715 @cindex examining SSA_NAMEs | |
716 | |
717 The following macros can be used to examine @code{SSA_NAME} nodes | |
718 | |
719 @defmac SSA_NAME_DEF_STMT (@var{var}) | |
720 Returns the statement @var{s} that creates the @code{SSA_NAME} | |
721 @var{var}. If @var{s} is an empty statement (i.e., @code{IS_EMPTY_STMT | |
722 (@var{s})} returns @code{true}), it means that the first reference to | |
723 this variable is a USE or a VUSE@. | |
724 @end defmac | |
725 | |
726 @defmac SSA_NAME_VERSION (@var{var}) | |
727 Returns the version number of the @code{SSA_NAME} object @var{var}. | |
728 @end defmac | |
729 | |
730 | |
731 @subsection Walking use-def chains | |
732 | |
733 @deftypefn {Tree SSA function} void walk_use_def_chains (@var{var}, @var{fn}, @var{data}) | |
734 | |
735 Walks use-def chains starting at the @code{SSA_NAME} node @var{var}. | |
736 Calls function @var{fn} at each reaching definition found. Function | |
737 @var{FN} takes three arguments: @var{var}, its defining statement | |
738 (@var{def_stmt}) and a generic pointer to whatever state information | |
739 that @var{fn} may want to maintain (@var{data}). Function @var{fn} is | |
740 able to stop the walk by returning @code{true}, otherwise in order to | |
741 continue the walk, @var{fn} should return @code{false}. | |
742 | |
743 Note, that if @var{def_stmt} is a @code{PHI} node, the semantics are | |
744 slightly different. For each argument @var{arg} of the PHI node, this | |
745 function will: | |
746 | |
747 @enumerate | |
748 @item Walk the use-def chains for @var{arg}. | |
749 @item Call @code{FN (@var{arg}, @var{phi}, @var{data})}. | |
750 @end enumerate | |
751 | |
752 Note how the first argument to @var{fn} is no longer the original | |
753 variable @var{var}, but the PHI argument currently being examined. | |
754 If @var{fn} wants to get at @var{var}, it should call | |
755 @code{PHI_RESULT} (@var{phi}). | |
756 @end deftypefn | |
757 | |
758 @subsection Walking the dominator tree | |
759 | |
760 @deftypefn {Tree SSA function} void walk_dominator_tree (@var{walk_data}, @var{bb}) | |
761 | |
762 This function walks the dominator tree for the current CFG calling a | |
763 set of callback functions defined in @var{struct dom_walk_data} in | |
764 @file{domwalk.h}. The call back functions you need to define give you | |
765 hooks to execute custom code at various points during traversal: | |
766 | |
767 @enumerate | |
768 @item Once to initialize any local data needed while processing | |
769 @var{bb} and its children. This local data is pushed into an | |
770 internal stack which is automatically pushed and popped as the | |
771 walker traverses the dominator tree. | |
772 | |
773 @item Once before traversing all the statements in the @var{bb}. | |
774 | |
775 @item Once for every statement inside @var{bb}. | |
776 | |
777 @item Once after traversing all the statements and before recursing | |
778 into @var{bb}'s dominator children. | |
779 | |
780 @item It then recurses into all the dominator children of @var{bb}. | |
781 | |
782 @item After recursing into all the dominator children of @var{bb} it | |
783 can, optionally, traverse every statement in @var{bb} again | |
784 (i.e., repeating steps 2 and 3). | |
785 | |
786 @item Once after walking the statements in @var{bb} and @var{bb}'s | |
787 dominator children. At this stage, the block local data stack | |
788 is popped. | |
789 @end enumerate | |
790 @end deftypefn | |
791 | |
792 @node Alias analysis | |
793 @section Alias analysis | |
794 @cindex alias | |
795 @cindex flow-sensitive alias analysis | |
796 @cindex flow-insensitive alias analysis | |
797 | |
798 Alias analysis proceeds in 4 main phases: | |
799 | |
800 @enumerate | |
801 @item Structural alias analysis. | |
802 | |
803 This phase walks the types for structure variables, and determines which | |
804 of the fields can overlap using offset and size of each field. For each | |
805 field, a ``subvariable'' called a ``Structure field tag'' (SFT)@ is | |
806 created, which represents that field as a separate variable. All | |
807 accesses that could possibly overlap with a given field will have | |
808 virtual operands for the SFT of that field. | |
809 | |
810 @smallexample | |
811 struct foo | |
812 @{ | |
813 int a; | |
814 int b; | |
815 @} | |
816 struct foo temp; | |
817 int bar (void) | |
818 @{ | |
819 int tmp1, tmp2, tmp3; | |
820 SFT.0_2 = VDEF <SFT.0_1> | |
821 temp.a = 5; | |
822 SFT.1_4 = VDEF <SFT.1_3> | |
823 temp.b = 6; | |
824 | |
825 VUSE <SFT.1_4> | |
826 tmp1_5 = temp.b; | |
827 VUSE <SFT.0_2> | |
828 tmp2_6 = temp.a; | |
829 | |
830 tmp3_7 = tmp1_5 + tmp2_6; | |
831 return tmp3_7; | |
832 @} | |
833 @end smallexample | |
834 | |
835 If you copy the symbol tag for a variable for some reason, you probably | |
836 also want to copy the subvariables for that variable. | |
837 | |
838 @item Points-to and escape analysis. | |
839 | |
840 This phase walks the use-def chains in the SSA web looking for | |
841 three things: | |
842 | |
843 @itemize @bullet | |
844 @item Assignments of the form @code{P_i = &VAR} | |
845 @item Assignments of the form P_i = malloc() | |
846 @item Pointers and ADDR_EXPR that escape the current function. | |
847 @end itemize | |
848 | |
849 The concept of `escaping' is the same one used in the Java world. | |
850 When a pointer or an ADDR_EXPR escapes, it means that it has been | |
851 exposed outside of the current function. So, assignment to | |
852 global variables, function arguments and returning a pointer are | |
853 all escape sites. | |
854 | |
855 This is where we are currently limited. Since not everything is | |
856 renamed into SSA, we lose track of escape properties when a | |
857 pointer is stashed inside a field in a structure, for instance. | |
858 In those cases, we are assuming that the pointer does escape. | |
859 | |
860 We use escape analysis to determine whether a variable is | |
861 call-clobbered. Simply put, if an ADDR_EXPR escapes, then the | |
862 variable is call-clobbered. If a pointer P_i escapes, then all | |
863 the variables pointed-to by P_i (and its memory tag) also escape. | |
864 | |
865 @item Compute flow-sensitive aliases | |
866 | |
867 We have two classes of memory tags. Memory tags associated with | |
868 the pointed-to data type of the pointers in the program. These | |
869 tags are called ``symbol memory tag'' (SMT)@. The other class are | |
870 those associated with SSA_NAMEs, called ``name memory tag'' (NMT)@. | |
871 The basic idea is that when adding operands for an INDIRECT_REF | |
872 *P_i, we will first check whether P_i has a name tag, if it does | |
873 we use it, because that will have more precise aliasing | |
874 information. Otherwise, we use the standard symbol tag. | |
875 | |
876 In this phase, we go through all the pointers we found in | |
877 points-to analysis and create alias sets for the name memory tags | |
878 associated with each pointer P_i. If P_i escapes, we mark | |
879 call-clobbered the variables it points to and its tag. | |
880 | |
881 | |
882 @item Compute flow-insensitive aliases | |
883 | |
884 This pass will compare the alias set of every symbol memory tag and | |
885 every addressable variable found in the program. Given a symbol | |
886 memory tag SMT and an addressable variable V@. If the alias sets | |
887 of SMT and V conflict (as computed by may_alias_p), then V is | |
888 marked as an alias tag and added to the alias set of SMT@. | |
889 | |
890 Every language that wishes to perform language-specific alias analysis | |
891 should define a function that computes, given a @code{tree} | |
892 node, an alias set for the node. Nodes in different alias sets are not | |
893 allowed to alias. For an example, see the C front-end function | |
894 @code{c_get_alias_set}. | |
895 @end enumerate | |
896 | |
897 For instance, consider the following function: | |
898 | |
899 @smallexample | |
900 foo (int i) | |
901 @{ | |
902 int *p, *q, a, b; | |
903 | |
904 if (i > 10) | |
905 p = &a; | |
906 else | |
907 q = &b; | |
908 | |
909 *p = 3; | |
910 *q = 5; | |
911 a = b + 2; | |
912 return *p; | |
913 @} | |
914 @end smallexample | |
915 | |
916 After aliasing analysis has finished, the symbol memory tag for | |
917 pointer @code{p} will have two aliases, namely variables @code{a} and | |
918 @code{b}. | |
919 Every time pointer @code{p} is dereferenced, we want to mark the | |
920 operation as a potential reference to @code{a} and @code{b}. | |
921 | |
922 @smallexample | |
923 foo (int i) | |
924 @{ | |
925 int *p, a, b; | |
926 | |
927 if (i_2 > 10) | |
928 p_4 = &a; | |
929 else | |
930 p_6 = &b; | |
931 # p_1 = PHI <p_4(1), p_6(2)>; | |
932 | |
933 # a_7 = VDEF <a_3>; | |
934 # b_8 = VDEF <b_5>; | |
935 *p_1 = 3; | |
936 | |
937 # a_9 = VDEF <a_7> | |
938 # VUSE <b_8> | |
939 a_9 = b_8 + 2; | |
940 | |
941 # VUSE <a_9>; | |
942 # VUSE <b_8>; | |
943 return *p_1; | |
944 @} | |
945 @end smallexample | |
946 | |
947 In certain cases, the list of may aliases for a pointer may grow | |
948 too large. This may cause an explosion in the number of virtual | |
949 operands inserted in the code. Resulting in increased memory | |
950 consumption and compilation time. | |
951 | |
952 When the number of virtual operands needed to represent aliased | |
953 loads and stores grows too large (configurable with @option{--param | |
954 max-aliased-vops}), alias sets are grouped to avoid severe | |
955 compile-time slow downs and memory consumption. The alias | |
956 grouping heuristic proceeds as follows: | |
957 | |
958 @enumerate | |
959 @item Sort the list of pointers in decreasing number of contributed | |
960 virtual operands. | |
961 | |
962 @item Take the first pointer from the list and reverse the role | |
963 of the memory tag and its aliases. Usually, whenever an | |
964 aliased variable Vi is found to alias with a memory tag | |
965 T, we add Vi to the may-aliases set for T@. Meaning that | |
966 after alias analysis, we will have: | |
967 | |
968 @smallexample | |
969 may-aliases(T) = @{ V1, V2, V3, @dots{}, Vn @} | |
970 @end smallexample | |
971 | |
972 This means that every statement that references T, will get | |
973 @code{n} virtual operands for each of the Vi tags. But, when | |
974 alias grouping is enabled, we make T an alias tag and add it | |
975 to the alias set of all the Vi variables: | |
976 | |
977 @smallexample | |
978 may-aliases(V1) = @{ T @} | |
979 may-aliases(V2) = @{ T @} | |
980 @dots{} | |
981 may-aliases(Vn) = @{ T @} | |
982 @end smallexample | |
983 | |
984 This has two effects: (a) statements referencing T will only get | |
985 a single virtual operand, and, (b) all the variables Vi will now | |
986 appear to alias each other. So, we lose alias precision to | |
987 improve compile time. But, in theory, a program with such a high | |
988 level of aliasing should not be very optimizable in the first | |
989 place. | |
990 | |
991 @item Since variables may be in the alias set of more than one | |
992 memory tag, the grouping done in step (2) needs to be extended | |
993 to all the memory tags that have a non-empty intersection with | |
994 the may-aliases set of tag T@. For instance, if we originally | |
995 had these may-aliases sets: | |
996 | |
997 @smallexample | |
998 may-aliases(T) = @{ V1, V2, V3 @} | |
999 may-aliases(R) = @{ V2, V4 @} | |
1000 @end smallexample | |
1001 | |
1002 In step (2) we would have reverted the aliases for T as: | |
1003 | |
1004 @smallexample | |
1005 may-aliases(V1) = @{ T @} | |
1006 may-aliases(V2) = @{ T @} | |
1007 may-aliases(V3) = @{ T @} | |
1008 @end smallexample | |
1009 | |
1010 But note that now V2 is no longer aliased with R@. We could | |
1011 add R to may-aliases(V2), but we are in the process of | |
1012 grouping aliases to reduce virtual operands so what we do is | |
1013 add V4 to the grouping to obtain: | |
1014 | |
1015 @smallexample | |
1016 may-aliases(V1) = @{ T @} | |
1017 may-aliases(V2) = @{ T @} | |
1018 may-aliases(V3) = @{ T @} | |
1019 may-aliases(V4) = @{ T @} | |
1020 @end smallexample | |
1021 | |
1022 @item If the total number of virtual operands due to aliasing is | |
1023 still above the threshold set by max-alias-vops, go back to (2). | |
1024 @end enumerate |