diff gcc/doc/passes.texi @ 111:04ced10e8804

gcc 7
author kono
date Fri, 27 Oct 2017 22:46:09 +0900
parents f6334be47118
children 84e7813d76e9
line wrap: on
line diff
--- a/gcc/doc/passes.texi	Sun Aug 21 07:07:55 2011 +0900
+++ b/gcc/doc/passes.texi	Fri Oct 27 22:46:09 2017 +0900
@@ -1,8 +1,6 @@
-@c markers: CROSSREF BUG TODO
+@c markers: BUG TODO
 
-@c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
-@c 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1988-2017 Free Software Foundation, Inc.
 @c This is part of the GCC manual.
 @c For copying conditions, see the file gcc.texi.
 
@@ -11,6 +9,7 @@
 @cindex passes and files of the compiler
 @cindex files and passes of the compiler
 @cindex compiler passes and files
+@cindex pass dumps
 
 This chapter is dedicated to giving an overview of the optimization and
 code generation passes of the compiler.  In the process, it describes
@@ -19,10 +18,12 @@
 
 @menu
 * Parsing pass::         The language front end turns text into bits.
+* Cilk Plus Transformation:: Transform Cilk Plus Code to equivalent C/C++.
 * Gimplification pass::  The bits are turned into something we can optimize.
 * Pass manager::         Sequencing the optimization passes.
 * Tree SSA passes::      Optimizations on a high-level representation.
 * RTL passes::           Optimizations on a low-level representation.
+* Optimization info::    Dumping optimization information from passes.
 @end menu
 
 @node Parsing pass
@@ -32,7 +33,7 @@
 The language front end is invoked only once, via
 @code{lang_hooks.parse_file}, to parse the entire input.  The language
 front end may use any intermediate language representation deemed
-appropriate.  The C front end uses GENERIC trees (CROSSREF), plus
+appropriate.  The C front end uses GENERIC trees (@pxref{GENERIC}), plus
 a double handful of language specific tree codes defined in
 @file{c-common.def}.  The Fortran front end uses a completely different
 private representation.
@@ -46,10 +47,9 @@
 At some point the front end must translate the representation used in the
 front end to a representation understood by the language-independent
 portions of the compiler.  Current practice takes one of two forms.
-The C front end manually invokes the gimplifier (CROSSREF) on each function,
+The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function,
 and uses the gimplifier callbacks to convert the language-specific tree
-nodes directly to GIMPLE (CROSSREF) before passing the function off to
-be compiled.
+nodes directly to GIMPLE before passing the function off to be compiled.
 The Fortran front end converts from a private representation to GENERIC,
 which is later lowered to GIMPLE when the function is compiled.  Which
 route to choose probably depends on how well GENERIC (plus extensions)
@@ -104,6 +104,68 @@
 The middle-end will, at its option, emit the function and data
 definitions immediately or queue them for later processing.
 
+@node Cilk Plus Transformation
+@section Cilk Plus Transformation
+@cindex CILK_PLUS
+
+If Cilk Plus generation (flag @option{-fcilkplus}) is enabled, all the Cilk 
+Plus code is transformed into equivalent C and C++ functions.  Majority of this 
+transformation occurs toward the end of the parsing and right before the 
+gimplification pass.  
+
+These are the major components to the Cilk Plus language extension:
+@itemize @bullet
+@item Array Notations:
+During parsing phase, all the array notation specific information is stored in 
+@code{ARRAY_NOTATION_REF} tree using the function 
+@code{c_parser_array_notation}.  During the end of parsing, we check the entire
+function to see if there are any array notation specific code (using the 
+function @code{contains_array_notation_expr}).  If this function returns 
+true, then we expand them using either @code{expand_array_notation_exprs} or
+@code{build_array_notation_expr}.  For the cases where array notations are 
+inside conditions, they are transformed using the function 
+@code{fix_conditional_array_notations}.  The C language-specific routines are 
+located in @file{c/c-array-notation.c} and the equivalent C++ routines are in 
+the file @file{cp/cp-array-notation.c}.  Common routines such as functions to 
+initialize built-in functions are stored in @file{array-notation-common.c}.
+
+@item Cilk keywords:
+@itemize @bullet 
+@item @code{_Cilk_spawn}:
+The @code{_Cilk_spawn} keyword is parsed and the function it contains is marked 
+as a spawning function.  The spawning function is called the spawner.  At 
+the end of the parsing phase, appropriate built-in functions are 
+added to the spawner that are defined in the Cilk runtime.  The appropriate 
+locations of these functions, and the internal structures are detailed in 
+@code{cilk_init_builtins} in the file @file{cilk-common.c}.  The pointers to 
+Cilk functions and fields of internal structures are described 
+in @file{cilk.h}.  The built-in functions are described in 
+@file{cilk-builtins.def}.
+
+During gimplification, a new "spawn-helper" function is created.  
+The spawned function is replaced with a spawn helper function in the spawner.  
+The spawned function-call is moved into the spawn helper.  The main function
+that does these transformations is @code{gimplify_cilk_spawn} in
+@file{c-family/cilk.c}.  In the spawn-helper, the gimplification function 
+@code{gimplify_call_expr}, inserts a function call @code{__cilkrts_detach}.
+This function is expanded by @code{builtin_expand_cilk_detach} located in
+@file{c-family/cilk.c}.
+
+@item @code{_Cilk_sync}:
+@code{_Cilk_sync} is parsed like a keyword.  During gimplification, 
+the function @code{gimplify_cilk_sync} in @file{c-family/cilk.c}, will replace
+this keyword with a set of functions that are stored in the Cilk runtime.  
+One of the internal functions inserted during gimplification, 
+@code{__cilkrts_pop_frame} must be expanded by the compiler and is 
+done by @code{builtin_expand_cilk_pop_frame} in @file{cilk-common.c}.
+
+@end itemize
+@end itemize
+
+Documentation about Cilk Plus and language specification is provided under the
+"Learn" section in @w{@uref{https://www.cilkplus.org}}.  It is worth mentioning
+that the current implementation follows ABI 1.1.
+
 @node Gimplification pass
 @section Gimplification pass
 
@@ -111,11 +173,10 @@
 @cindex GIMPLE
 @dfn{Gimplification} is a whimsical term for the process of converting
 the intermediate representation of a function into the GIMPLE language
-(CROSSREF).  The term stuck, and so words like ``gimplification'',
+(@pxref{GIMPLE}).  The term stuck, and so words like ``gimplification'',
 ``gimplify'', ``gimplifier'' and the like are sprinkled throughout this
 section of code.
 
-@cindex GENERIC
 While a front end may certainly choose to generate GIMPLE directly if
 it chooses, this can be a moderately complex process unless the
 intermediate language used by the front end is already fairly simple.
@@ -149,6 +210,7 @@
 
 The pass manager is located in @file{passes.c}, @file{tree-optimize.c}
 and @file{tree-pass.h}.
+It processes passes as described in @file{passes.def}.
 Its job is to run all of the individual passes in the correct order,
 and take care of standard bookkeeping that applies to every pass.
 
@@ -198,20 +260,6 @@
 rid of it.  This pass is located in @file{tree-cfg.c} and described by
 @code{pass_remove_useless_stmts}.
 
-@item Mudflap declaration registration
-
-If mudflap (@pxref{Optimize Options,,-fmudflap -fmudflapth
--fmudflapir,gcc,Using the GNU Compiler Collection (GCC)}) is
-enabled, we generate code to register some variable declarations with
-the mudflap runtime.  Specifically, the runtime tracks the lifetimes of
-those variable declarations that have their addresses taken, or whose
-bounds are unknown at compile time (@code{extern}).  This pass generates
-new exception handling constructs (@code{try}/@code{finally}), and so
-must run before those are lowered.  In addition, the pass enqueues
-declarations of static variables whose lifetimes extend to the entire
-program.  The pass is located in @file{tree-mudflap.c} and is described
-by @code{pass_mudflap_1}.
-
 @item OpenMP lowering
 
 If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers
@@ -343,11 +391,19 @@
 
 @item Profiling
 
-This pass rewrites the function in order to collect runtime block
+This pass instruments the function in order to collect runtime block
 and value profiling data.  Such data may be fed back into the compiler
 on a subsequent run so as to allow optimization based on expected
-execution frequencies.  The pass is located in @file{predict.c} and
-is described by @code{pass_profile}.
+execution frequencies.  The pass is located in @file{tree-profile.c} and
+is described by @code{pass_ipa_tree_profile}.
+
+@item Static profile estimation
+
+This pass implements series of heuristics to guess propababilities
+of branches.  The resulting predictions are turned into edge profile
+by propagating branches across the control flow graphs.
+The pass is located in @file{tree-profile.c} and is described by
+@code{pass_profile}.
 
 @item Lower complex arithmetic
 
@@ -395,7 +451,7 @@
 @item Full redundancy elimination
 
 This is a simpler form of PRE that only eliminates redundancies that
-occur an all paths.  It is located in @file{tree-ssa-pre.c} and
+occur on all paths.  It is located in @file{tree-ssa-pre.c} and
 described by @code{pass_fre}.
 
 @item Loop optimization
@@ -426,10 +482,13 @@
 Loop unswitching.  This pass moves the conditional jumps that are invariant
 out of the loops.  To achieve this, a duplicate of the loop is created for
 each possible outcome of conditional jump(s).  The pass is implemented in
-@file{tree-ssa-loop-unswitch.c}.  This pass should eventually replace the
-RTL level loop unswitching in @file{loop-unswitch.c}, but currently
-the RTL level pass is not completely redundant yet due to deficiencies
-in tree level alias analysis.
+@file{tree-ssa-loop-unswitch.c}.
+
+Loop splitting.  If a loop contains a conditional statement that is
+always true for one part of the iteration space and false for the other
+this pass splits the loop into two, one dealing with one side the other
+only with the other, thereby removing one inner-loop conditional.  The
+pass is implemented in @file{tree-ssa-loop-split.c}.
 
 The optimizations also use various utility functions contained in
 @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
@@ -437,23 +496,23 @@
 
 Vectorization.  This pass transforms loops to operate on vector types
 instead of scalar types.  Data parallelism across loop iterations is exploited
-to group data elements from consecutive iterations into a vector and operate 
-on them in parallel.  Depending on available target support the loop is 
+to group data elements from consecutive iterations into a vector and operate
+on them in parallel.  Depending on available target support the loop is
 conceptually unrolled by a factor @code{VF} (vectorization factor), which is
-the number of elements operated upon in parallel in each iteration, and the 
+the number of elements operated upon in parallel in each iteration, and the
 @code{VF} copies of each scalar operation are fused to form a vector operation.
 Additional loop transformations such as peeling and versioning may take place
-to align the number of iterations, and to align the memory accesses in the 
+to align the number of iterations, and to align the memory accesses in the
 loop.
 The pass is implemented in @file{tree-vectorizer.c} (the main driver),
-@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts 
-and general loop utilities), @file{tree-vect-slp} (loop-aware SLP 
+@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
+and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
 functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
 Analysis of data references is in @file{tree-data-ref.c}.
 
 SLP Vectorization.  This pass performs vectorization of straight-line code. The
 pass is implemented in @file{tree-vectorizer.c} (the main driver),
-@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and 
+@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and
 @file{tree-vect-data-refs.c}.
 
 Autoparallelization.  This pass splits the loop iteration space to run
@@ -472,7 +531,7 @@
 We identify if convertible loops, if-convert statements and merge
 basic blocks in one big block.  The idea is to present loop in such
 form so that vectorizer can have one to one mapping between statements
-and available vector operations.  This pass is located in 
+and available vector operations.  This pass is located in
 @file{tree-if-conv.c} and is described by @code{pass_if_conversion}.
 
 @item Conditional constant propagation
@@ -549,18 +608,6 @@
 statement is not reachable.  It is located in @file{tree-cfg.c} and
 is described by @code{pass_warn_function_return}.
 
-@item Mudflap statement annotation
-
-If mudflap is enabled, we rewrite some memory accesses with code to
-validate that the memory access is correct.  In particular, expressions
-involving pointer dereferences (@code{INDIRECT_REF}, @code{ARRAY_REF},
-etc.) are replaced by code that checks the selected address range
-against the mudflap runtime's database of valid regions.  This check
-includes an inline lookup into a direct-mapped cache, based on
-shift/mask operations of the pointer value, with a fallback function
-call into the runtime.  The pass is located in @file{tree-mudflap.c} and
-is described by @code{pass_mudflap_2}.
-
 @item Leave static single assignment form
 
 This pass rewrites the function such that it is in normal form.  At
@@ -757,8 +804,8 @@
 generic loop analysis and manipulation code.  Initialization and finalization
 of loop structures is handled by @file{loop-init.c}.
 A loop invariant motion pass is implemented in @file{loop-invariant.c}.
-Basic block level optimizations---unrolling, peeling and unswitching loops---
-are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}.
+Basic block level optimizations---unrolling, and peeling loops---
+are implemented in @file{loop-unroll.c}.
 Replacing of the exit condition of loops by special machine-dependent
 instructions is handled by @file{loop-doloop.c}.
 
@@ -773,7 +820,7 @@
 This pass attempts to replace conditional branches and surrounding
 assignments with arithmetic, boolean value producing comparison
 instructions, and conditional move instructions.  In the very last
-invocation after reload, it will generate predicated instructions
+invocation after reload/LRA, it will generate predicated instructions
 when supported by the target.  The code is located in @file{ifcvt.c}.
 
 @item Web construction
@@ -790,14 +837,6 @@
 result using algebra, and then attempts to match the result against
 the machine description.  The code is located in @file{combine.c}.
 
-@item Register movement
-
-This pass looks for cases where matching constraints would force an
-instruction to need a reload, and this reload would be a
-register-to-register move.  It then attempts to change the registers
-used by the instruction to avoid the move instruction.  The code is
-located in @file{regmove.c}.
-
 @item Mode switching optimization
 
 This pass looks for instructions that require the processor to be in a
@@ -836,17 +875,12 @@
 
 @itemize @bullet
 @item
-Register move optimizations.  This pass makes some simple RTL code
-transformations which improve the subsequent register allocation.  The
-source file is @file{regmove.c}.
-
-@item
 The integrated register allocator (@acronym{IRA}).  It is called
 integrated because coalescing, register live range splitting, and hard
 register preferencing are done on-the-fly during coloring.  It also
-has better integration with the reload pass.  Pseudo-registers spilled
-by the allocator or the reload have still a chance to get
-hard-registers if the reload evicts some pseudo-registers from
+has better integration with the reload/LRA pass.  Pseudo-registers spilled
+by the allocator or the reload/LRA have still a chance to get
+hard-registers if the reload/LRA evicts some pseudo-registers from
 hard-registers.  The allocator helps to choose better pseudos for
 spilling based on their live ranges and to coalesce stack slots
 allocated for the spilled pseudo-registers.  IRA is a regional
@@ -877,6 +911,23 @@
 
 Source files are @file{reload.c} and @file{reload1.c}, plus the header
 @file{reload.h} used for communication between them.
+
+@cindex Local Register Allocator (LRA)
+@item
+This pass is a modern replacement of the reload pass.  Source files
+are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c},
+@file{lra-constraints.c}, @file{lra-eliminations.c},
+@file{lra-lives.c}, @file{lra-remat.c}, @file{lra-spills.c}, the
+header @file{lra-int.h} used for communication between them, and the
+header @file{lra.h} used for communication between LRA and the rest of
+compiler.
+
+Unlike the reload pass, intermediate LRA decisions are reflected in
+RTL as much as possible.  This reduces the number of target-dependent
+macros and hooks, leaving instruction constraints as the primary
+source of control.
+
+LRA is run on targets for which TARGET_LRA_P returns true.
 @end itemize
 
 @item Basic block reordering
@@ -924,10 +975,7 @@
 are @file{final.c} plus @file{insn-output.c}; the latter is generated
 automatically from the machine description by the tool @file{genoutput}.
 The header file @file{conditions.h} is used for communication between
-these files.  If mudflap is enabled, the queue of deferred declarations
-and any addressed constants (e.g., string literals) is processed by
-@code{mudflap_finish_file} into a synthetic constructor function
-containing calls into the mudflap runtime.
+these files.
 
 @item Debugging information output
 
@@ -940,3 +988,7 @@
 format.
 
 @end itemize
+
+@node Optimization info
+@section Optimization info
+@include optinfo.texi