Mercurial > hg > CbC > CbC_gcc
diff gcc/doc/passes.texi @ 111:04ced10e8804
gcc 7
author | kono |
---|---|
date | Fri, 27 Oct 2017 22:46:09 +0900 |
parents | f6334be47118 |
children | 84e7813d76e9 |
line wrap: on
line diff
--- a/gcc/doc/passes.texi Sun Aug 21 07:07:55 2011 +0900 +++ b/gcc/doc/passes.texi Fri Oct 27 22:46:09 2017 +0900 @@ -1,8 +1,6 @@ -@c markers: CROSSREF BUG TODO +@c markers: BUG TODO -@c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, -@c 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 -@c Free Software Foundation, Inc. +@c Copyright (C) 1988-2017 Free Software Foundation, Inc. @c This is part of the GCC manual. @c For copying conditions, see the file gcc.texi. @@ -11,6 +9,7 @@ @cindex passes and files of the compiler @cindex files and passes of the compiler @cindex compiler passes and files +@cindex pass dumps This chapter is dedicated to giving an overview of the optimization and code generation passes of the compiler. In the process, it describes @@ -19,10 +18,12 @@ @menu * Parsing pass:: The language front end turns text into bits. +* Cilk Plus Transformation:: Transform Cilk Plus Code to equivalent C/C++. * Gimplification pass:: The bits are turned into something we can optimize. * Pass manager:: Sequencing the optimization passes. * Tree SSA passes:: Optimizations on a high-level representation. * RTL passes:: Optimizations on a low-level representation. +* Optimization info:: Dumping optimization information from passes. @end menu @node Parsing pass @@ -32,7 +33,7 @@ The language front end is invoked only once, via @code{lang_hooks.parse_file}, to parse the entire input. The language front end may use any intermediate language representation deemed -appropriate. The C front end uses GENERIC trees (CROSSREF), plus +appropriate. The C front end uses GENERIC trees (@pxref{GENERIC}), plus a double handful of language specific tree codes defined in @file{c-common.def}. The Fortran front end uses a completely different private representation. @@ -46,10 +47,9 @@ At some point the front end must translate the representation used in the front end to a representation understood by the language-independent portions of the compiler. Current practice takes one of two forms. -The C front end manually invokes the gimplifier (CROSSREF) on each function, +The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function, and uses the gimplifier callbacks to convert the language-specific tree -nodes directly to GIMPLE (CROSSREF) before passing the function off to -be compiled. +nodes directly to GIMPLE before passing the function off to be compiled. The Fortran front end converts from a private representation to GENERIC, which is later lowered to GIMPLE when the function is compiled. Which route to choose probably depends on how well GENERIC (plus extensions) @@ -104,6 +104,68 @@ The middle-end will, at its option, emit the function and data definitions immediately or queue them for later processing. +@node Cilk Plus Transformation +@section Cilk Plus Transformation +@cindex CILK_PLUS + +If Cilk Plus generation (flag @option{-fcilkplus}) is enabled, all the Cilk +Plus code is transformed into equivalent C and C++ functions. Majority of this +transformation occurs toward the end of the parsing and right before the +gimplification pass. + +These are the major components to the Cilk Plus language extension: +@itemize @bullet +@item Array Notations: +During parsing phase, all the array notation specific information is stored in +@code{ARRAY_NOTATION_REF} tree using the function +@code{c_parser_array_notation}. During the end of parsing, we check the entire +function to see if there are any array notation specific code (using the +function @code{contains_array_notation_expr}). If this function returns +true, then we expand them using either @code{expand_array_notation_exprs} or +@code{build_array_notation_expr}. For the cases where array notations are +inside conditions, they are transformed using the function +@code{fix_conditional_array_notations}. The C language-specific routines are +located in @file{c/c-array-notation.c} and the equivalent C++ routines are in +the file @file{cp/cp-array-notation.c}. Common routines such as functions to +initialize built-in functions are stored in @file{array-notation-common.c}. + +@item Cilk keywords: +@itemize @bullet +@item @code{_Cilk_spawn}: +The @code{_Cilk_spawn} keyword is parsed and the function it contains is marked +as a spawning function. The spawning function is called the spawner. At +the end of the parsing phase, appropriate built-in functions are +added to the spawner that are defined in the Cilk runtime. The appropriate +locations of these functions, and the internal structures are detailed in +@code{cilk_init_builtins} in the file @file{cilk-common.c}. The pointers to +Cilk functions and fields of internal structures are described +in @file{cilk.h}. The built-in functions are described in +@file{cilk-builtins.def}. + +During gimplification, a new "spawn-helper" function is created. +The spawned function is replaced with a spawn helper function in the spawner. +The spawned function-call is moved into the spawn helper. The main function +that does these transformations is @code{gimplify_cilk_spawn} in +@file{c-family/cilk.c}. In the spawn-helper, the gimplification function +@code{gimplify_call_expr}, inserts a function call @code{__cilkrts_detach}. +This function is expanded by @code{builtin_expand_cilk_detach} located in +@file{c-family/cilk.c}. + +@item @code{_Cilk_sync}: +@code{_Cilk_sync} is parsed like a keyword. During gimplification, +the function @code{gimplify_cilk_sync} in @file{c-family/cilk.c}, will replace +this keyword with a set of functions that are stored in the Cilk runtime. +One of the internal functions inserted during gimplification, +@code{__cilkrts_pop_frame} must be expanded by the compiler and is +done by @code{builtin_expand_cilk_pop_frame} in @file{cilk-common.c}. + +@end itemize +@end itemize + +Documentation about Cilk Plus and language specification is provided under the +"Learn" section in @w{@uref{https://www.cilkplus.org}}. It is worth mentioning +that the current implementation follows ABI 1.1. + @node Gimplification pass @section Gimplification pass @@ -111,11 +173,10 @@ @cindex GIMPLE @dfn{Gimplification} is a whimsical term for the process of converting the intermediate representation of a function into the GIMPLE language -(CROSSREF). The term stuck, and so words like ``gimplification'', +(@pxref{GIMPLE}). The term stuck, and so words like ``gimplification'', ``gimplify'', ``gimplifier'' and the like are sprinkled throughout this section of code. -@cindex GENERIC While a front end may certainly choose to generate GIMPLE directly if it chooses, this can be a moderately complex process unless the intermediate language used by the front end is already fairly simple. @@ -149,6 +210,7 @@ The pass manager is located in @file{passes.c}, @file{tree-optimize.c} and @file{tree-pass.h}. +It processes passes as described in @file{passes.def}. Its job is to run all of the individual passes in the correct order, and take care of standard bookkeeping that applies to every pass. @@ -198,20 +260,6 @@ rid of it. This pass is located in @file{tree-cfg.c} and described by @code{pass_remove_useless_stmts}. -@item Mudflap declaration registration - -If mudflap (@pxref{Optimize Options,,-fmudflap -fmudflapth --fmudflapir,gcc,Using the GNU Compiler Collection (GCC)}) is -enabled, we generate code to register some variable declarations with -the mudflap runtime. Specifically, the runtime tracks the lifetimes of -those variable declarations that have their addresses taken, or whose -bounds are unknown at compile time (@code{extern}). This pass generates -new exception handling constructs (@code{try}/@code{finally}), and so -must run before those are lowered. In addition, the pass enqueues -declarations of static variables whose lifetimes extend to the entire -program. The pass is located in @file{tree-mudflap.c} and is described -by @code{pass_mudflap_1}. - @item OpenMP lowering If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers @@ -343,11 +391,19 @@ @item Profiling -This pass rewrites the function in order to collect runtime block +This pass instruments the function in order to collect runtime block and value profiling data. Such data may be fed back into the compiler on a subsequent run so as to allow optimization based on expected -execution frequencies. The pass is located in @file{predict.c} and -is described by @code{pass_profile}. +execution frequencies. The pass is located in @file{tree-profile.c} and +is described by @code{pass_ipa_tree_profile}. + +@item Static profile estimation + +This pass implements series of heuristics to guess propababilities +of branches. The resulting predictions are turned into edge profile +by propagating branches across the control flow graphs. +The pass is located in @file{tree-profile.c} and is described by +@code{pass_profile}. @item Lower complex arithmetic @@ -395,7 +451,7 @@ @item Full redundancy elimination This is a simpler form of PRE that only eliminates redundancies that -occur an all paths. It is located in @file{tree-ssa-pre.c} and +occur on all paths. It is located in @file{tree-ssa-pre.c} and described by @code{pass_fre}. @item Loop optimization @@ -426,10 +482,13 @@ Loop unswitching. This pass moves the conditional jumps that are invariant out of the loops. To achieve this, a duplicate of the loop is created for each possible outcome of conditional jump(s). The pass is implemented in -@file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the -RTL level loop unswitching in @file{loop-unswitch.c}, but currently -the RTL level pass is not completely redundant yet due to deficiencies -in tree level alias analysis. +@file{tree-ssa-loop-unswitch.c}. + +Loop splitting. If a loop contains a conditional statement that is +always true for one part of the iteration space and false for the other +this pass splits the loop into two, one dealing with one side the other +only with the other, thereby removing one inner-loop conditional. The +pass is implemented in @file{tree-ssa-loop-split.c}. The optimizations also use various utility functions contained in @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and @@ -437,23 +496,23 @@ Vectorization. This pass transforms loops to operate on vector types instead of scalar types. Data parallelism across loop iterations is exploited -to group data elements from consecutive iterations into a vector and operate -on them in parallel. Depending on available target support the loop is +to group data elements from consecutive iterations into a vector and operate +on them in parallel. Depending on available target support the loop is conceptually unrolled by a factor @code{VF} (vectorization factor), which is -the number of elements operated upon in parallel in each iteration, and the +the number of elements operated upon in parallel in each iteration, and the @code{VF} copies of each scalar operation are fused to form a vector operation. Additional loop transformations such as peeling and versioning may take place -to align the number of iterations, and to align the memory accesses in the +to align the number of iterations, and to align the memory accesses in the loop. The pass is implemented in @file{tree-vectorizer.c} (the main driver), -@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts -and general loop utilities), @file{tree-vect-slp} (loop-aware SLP +@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts +and general loop utilities), @file{tree-vect-slp} (loop-aware SLP functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. Analysis of data references is in @file{tree-data-ref.c}. SLP Vectorization. This pass performs vectorization of straight-line code. The pass is implemented in @file{tree-vectorizer.c} (the main driver), -@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and +@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. Autoparallelization. This pass splits the loop iteration space to run @@ -472,7 +531,7 @@ We identify if convertible loops, if-convert statements and merge basic blocks in one big block. The idea is to present loop in such form so that vectorizer can have one to one mapping between statements -and available vector operations. This pass is located in +and available vector operations. This pass is located in @file{tree-if-conv.c} and is described by @code{pass_if_conversion}. @item Conditional constant propagation @@ -549,18 +608,6 @@ statement is not reachable. It is located in @file{tree-cfg.c} and is described by @code{pass_warn_function_return}. -@item Mudflap statement annotation - -If mudflap is enabled, we rewrite some memory accesses with code to -validate that the memory access is correct. In particular, expressions -involving pointer dereferences (@code{INDIRECT_REF}, @code{ARRAY_REF}, -etc.) are replaced by code that checks the selected address range -against the mudflap runtime's database of valid regions. This check -includes an inline lookup into a direct-mapped cache, based on -shift/mask operations of the pointer value, with a fallback function -call into the runtime. The pass is located in @file{tree-mudflap.c} and -is described by @code{pass_mudflap_2}. - @item Leave static single assignment form This pass rewrites the function such that it is in normal form. At @@ -757,8 +804,8 @@ generic loop analysis and manipulation code. Initialization and finalization of loop structures is handled by @file{loop-init.c}. A loop invariant motion pass is implemented in @file{loop-invariant.c}. -Basic block level optimizations---unrolling, peeling and unswitching loops--- -are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}. +Basic block level optimizations---unrolling, and peeling loops--- +are implemented in @file{loop-unroll.c}. Replacing of the exit condition of loops by special machine-dependent instructions is handled by @file{loop-doloop.c}. @@ -773,7 +820,7 @@ This pass attempts to replace conditional branches and surrounding assignments with arithmetic, boolean value producing comparison instructions, and conditional move instructions. In the very last -invocation after reload, it will generate predicated instructions +invocation after reload/LRA, it will generate predicated instructions when supported by the target. The code is located in @file{ifcvt.c}. @item Web construction @@ -790,14 +837,6 @@ result using algebra, and then attempts to match the result against the machine description. The code is located in @file{combine.c}. -@item Register movement - -This pass looks for cases where matching constraints would force an -instruction to need a reload, and this reload would be a -register-to-register move. It then attempts to change the registers -used by the instruction to avoid the move instruction. The code is -located in @file{regmove.c}. - @item Mode switching optimization This pass looks for instructions that require the processor to be in a @@ -836,17 +875,12 @@ @itemize @bullet @item -Register move optimizations. This pass makes some simple RTL code -transformations which improve the subsequent register allocation. The -source file is @file{regmove.c}. - -@item The integrated register allocator (@acronym{IRA}). It is called integrated because coalescing, register live range splitting, and hard register preferencing are done on-the-fly during coloring. It also -has better integration with the reload pass. Pseudo-registers spilled -by the allocator or the reload have still a chance to get -hard-registers if the reload evicts some pseudo-registers from +has better integration with the reload/LRA pass. Pseudo-registers spilled +by the allocator or the reload/LRA have still a chance to get +hard-registers if the reload/LRA evicts some pseudo-registers from hard-registers. The allocator helps to choose better pseudos for spilling based on their live ranges and to coalesce stack slots allocated for the spilled pseudo-registers. IRA is a regional @@ -877,6 +911,23 @@ Source files are @file{reload.c} and @file{reload1.c}, plus the header @file{reload.h} used for communication between them. + +@cindex Local Register Allocator (LRA) +@item +This pass is a modern replacement of the reload pass. Source files +are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c}, +@file{lra-constraints.c}, @file{lra-eliminations.c}, +@file{lra-lives.c}, @file{lra-remat.c}, @file{lra-spills.c}, the +header @file{lra-int.h} used for communication between them, and the +header @file{lra.h} used for communication between LRA and the rest of +compiler. + +Unlike the reload pass, intermediate LRA decisions are reflected in +RTL as much as possible. This reduces the number of target-dependent +macros and hooks, leaving instruction constraints as the primary +source of control. + +LRA is run on targets for which TARGET_LRA_P returns true. @end itemize @item Basic block reordering @@ -924,10 +975,7 @@ are @file{final.c} plus @file{insn-output.c}; the latter is generated automatically from the machine description by the tool @file{genoutput}. The header file @file{conditions.h} is used for communication between -these files. If mudflap is enabled, the queue of deferred declarations -and any addressed constants (e.g., string literals) is processed by -@code{mudflap_finish_file} into a synthetic constructor function -containing calls into the mudflap runtime. +these files. @item Debugging information output @@ -940,3 +988,7 @@ format. @end itemize + +@node Optimization info +@section Optimization info +@include optinfo.texi