Mercurial > hg > CbC > CbC_gcc
comparison gcc/doc/md.texi @ 131:84e7813d76e9
gcc-8.2
author | mir3636 |
---|---|
date | Thu, 25 Oct 2018 07:37:49 +0900 |
parents | 04ced10e8804 |
children | 1830386684a0 |
comparison
equal
deleted
inserted
replaced
111:04ced10e8804 | 131:84e7813d76e9 |
---|---|
1 @c Copyright (C) 1988-2017 Free Software Foundation, Inc. | 1 @c Copyright (C) 1988-2018 Free Software Foundation, Inc. |
2 @c This is part of the GCC manual. | 2 @c This is part of the GCC manual. |
3 @c For copying conditions, see the file gcc.texi. | 3 @c For copying conditions, see the file gcc.texi. |
4 | 4 |
5 @ifset INTERNALS | 5 @ifset INTERNALS |
6 @node Machine Desc | 6 @node Machine Desc |
113 | 113 |
114 A @code{define_insn} is an RTL expression containing four or five operands: | 114 A @code{define_insn} is an RTL expression containing four or five operands: |
115 | 115 |
116 @enumerate | 116 @enumerate |
117 @item | 117 @item |
118 An optional name. The presence of a name indicates that this instruction | 118 An optional name @var{n}. When a name is present, the compiler |
119 pattern can perform a certain standard job for the RTL-generation | 119 automically generates a C++ function @samp{gen_@var{n}} that takes |
120 pass of the compiler. This pass knows certain names and will use | 120 the operands of the instruction as arguments and returns the instruction's |
121 the instruction patterns with those names, if the names are defined | 121 rtx pattern. The compiler also assigns the instruction a unique code |
122 in the machine description. | 122 @samp{CODE_FOR_@var{n}}, with all such codes belonging to an enum |
123 called @code{insn_code}. | |
124 | |
125 These names serve one of two purposes. The first is to indicate that the | |
126 instruction performs a certain standard job for the RTL-generation | |
127 pass of the compiler, such as a move, an addition, or a conditional | |
128 jump. The second is to help the target generate certain target-specific | |
129 operations, such as when implementing target-specific intrinsic functions. | |
130 | |
131 It is better to prefix target-specific names with the name of the | |
132 target, to avoid any clash with current or future standard names. | |
123 | 133 |
124 The absence of a name is indicated by writing an empty string | 134 The absence of a name is indicated by writing an empty string |
125 where the name should go. Nameless instruction patterns are never | 135 where the name should go. Nameless instruction patterns are never |
126 used for generating RTL code, but they may permit several simpler insns | 136 used for generating RTL code, but they may permit several simpler insns |
127 to be combined later on. | 137 to be combined later on. |
128 | |
129 Names that are not thus known and used in RTL-generation have no | |
130 effect; they are equivalent to no name at all. | |
131 | 138 |
132 For the purpose of debugging the compiler, you may also specify a | 139 For the purpose of debugging the compiler, you may also specify a |
133 name beginning with the @samp{*} character. Such a name is used only | 140 name beginning with the @samp{*} character. Such a name is used only |
134 for identifying the instruction in RTL dumps; it is equivalent to having | 141 for identifying the instruction in RTL dumps; it is equivalent to having |
135 a nameless pattern for all other purposes. Names beginning with the | 142 a nameless pattern for all other purposes. Names beginning with the |
136 @samp{*} character are not required to be unique. | 143 @samp{*} character are not required to be unique. |
144 | |
145 The name may also have the form @samp{@@@var{n}}. This has the same | |
146 effect as a name @samp{@var{n}}, but in addition tells the compiler to | |
147 generate further helper functions; see @ref{Parameterized Names} for details. | |
137 | 148 |
138 @item | 149 @item |
139 The @dfn{RTL template}: This is a vector of incomplete RTL expressions | 150 The @dfn{RTL template}: This is a vector of incomplete RTL expressions |
140 which describe the semantics of the instruction (@pxref{RTL Template}). | 151 which describe the semantics of the instruction (@pxref{RTL Template}). |
141 It is incomplete because it may contain @code{match_operand}, | 152 It is incomplete because it may contain @code{match_operand}, |
865 @end defun | 876 @end defun |
866 | 877 |
867 @defun push_operand | 878 @defun push_operand |
868 This predicate allows a memory reference suitable for pushing a value | 879 This predicate allows a memory reference suitable for pushing a value |
869 onto the stack. This will be a @code{MEM} which refers to | 880 onto the stack. This will be a @code{MEM} which refers to |
870 @code{stack_pointer_rtx}, with a side-effect in its address expression | 881 @code{stack_pointer_rtx}, with a side effect in its address expression |
871 (@pxref{Incdec}); which one is determined by the | 882 (@pxref{Incdec}); which one is determined by the |
872 @code{STACK_PUSH_CODE} macro (@pxref{Frame Layout}). | 883 @code{STACK_PUSH_CODE} macro (@pxref{Frame Layout}). |
873 @end defun | 884 @end defun |
874 | 885 |
875 @defun pop_operand | 886 @defun pop_operand |
876 This predicate allows a memory reference suitable for popping a value | 887 This predicate allows a memory reference suitable for popping a value |
877 off the stack. Again, this will be a @code{MEM} referring to | 888 off the stack. Again, this will be a @code{MEM} referring to |
878 @code{stack_pointer_rtx}, with a side-effect in its address | 889 @code{stack_pointer_rtx}, with a side effect in its address |
879 expression. However, this time @code{STACK_POP_CODE} is expected. | 890 expression. However, this time @code{STACK_POP_CODE} is expected. |
880 @end defun | 891 @end defun |
881 | 892 |
882 @noindent | 893 @noindent |
883 The fourth category of predicates allow some combination of the above | 894 The fourth category of predicates allow some combination of the above |
1089 operand can be a memory reference, and which kinds of address; whether the | 1100 operand can be a memory reference, and which kinds of address; whether the |
1090 operand may be an immediate constant, and which possible values it may | 1101 operand may be an immediate constant, and which possible values it may |
1091 have. Constraints can also require two operands to match. | 1102 have. Constraints can also require two operands to match. |
1092 Side-effects aren't allowed in operands of inline @code{asm}, unless | 1103 Side-effects aren't allowed in operands of inline @code{asm}, unless |
1093 @samp{<} or @samp{>} constraints are used, because there is no guarantee | 1104 @samp{<} or @samp{>} constraints are used, because there is no guarantee |
1094 that the side-effects will happen exactly once in an instruction that can update | 1105 that the side effects will happen exactly once in an instruction that can update |
1095 the addressing register. | 1106 the addressing register. |
1096 | 1107 |
1097 @ifset INTERNALS | 1108 @ifset INTERNALS |
1098 @menu | 1109 @menu |
1099 * Simple Constraints:: Basic use of constraints. | 1110 * Simple Constraints:: Basic use of constraints. |
1170 @cindex @samp{<} in constraint | 1181 @cindex @samp{<} in constraint |
1171 @item @samp{<} | 1182 @item @samp{<} |
1172 A memory operand with autodecrement addressing (either predecrement or | 1183 A memory operand with autodecrement addressing (either predecrement or |
1173 postdecrement) is allowed. In inline @code{asm} this constraint is only | 1184 postdecrement) is allowed. In inline @code{asm} this constraint is only |
1174 allowed if the operand is used exactly once in an instruction that can | 1185 allowed if the operand is used exactly once in an instruction that can |
1175 handle the side-effects. Not using an operand with @samp{<} in constraint | 1186 handle the side effects. Not using an operand with @samp{<} in constraint |
1176 string in the inline @code{asm} pattern at all or using it in multiple | 1187 string in the inline @code{asm} pattern at all or using it in multiple |
1177 instructions isn't valid, because the side-effects wouldn't be performed | 1188 instructions isn't valid, because the side effects wouldn't be performed |
1178 or would be performed more than once. Furthermore, on some targets | 1189 or would be performed more than once. Furthermore, on some targets |
1179 the operand with @samp{<} in constraint string must be accompanied by | 1190 the operand with @samp{<} in constraint string must be accompanied by |
1180 special instruction suffixes like @code{%U0} instruction suffix on PowerPC | 1191 special instruction suffixes like @code{%U0} instruction suffix on PowerPC |
1181 or @code{%P0} on IA-64. | 1192 or @code{%P0} on IA-64. |
1182 | 1193 |
1733 @table @code | 1744 @table @code |
1734 @item k | 1745 @item k |
1735 The stack pointer register (@code{SP}) | 1746 The stack pointer register (@code{SP}) |
1736 | 1747 |
1737 @item w | 1748 @item w |
1738 Floating point or SIMD vector register | 1749 Floating point register, Advanced SIMD vector register or SVE vector register |
1750 | |
1751 @item Upl | |
1752 One of the low eight SVE predicate registers (@code{P0} to @code{P7}) | |
1753 | |
1754 @item Upa | |
1755 Any of the SVE predicate registers (@code{P0} to @code{P15}) | |
1739 | 1756 |
1740 @item I | 1757 @item I |
1741 Integer constant that is valid as an immediate operand in an @code{ADD} | 1758 Integer constant that is valid as an immediate operand in an @code{ADD} |
1742 instruction | 1759 instruction |
1743 | 1760 |
2112 Check for 64 bits wide constants for add/sub instructions | 2129 Check for 64 bits wide constants for add/sub instructions |
2113 | 2130 |
2114 @item G | 2131 @item G |
2115 Floating point constant that is legal for store immediate | 2132 Floating point constant that is legal for store immediate |
2116 @end table | 2133 @end table |
2134 | |
2135 @item C-SKY---@file{config/csky/constraints.md} | |
2136 @table @code | |
2137 | |
2138 @item a | |
2139 The mini registers r0 - r7. | |
2140 | |
2141 @item b | |
2142 The low registers r0 - r15. | |
2143 | |
2144 @item c | |
2145 C register. | |
2146 | |
2147 @item y | |
2148 HI and LO registers. | |
2149 | |
2150 @item l | |
2151 LO register. | |
2152 | |
2153 @item h | |
2154 HI register. | |
2155 | |
2156 @item v | |
2157 Vector registers. | |
2158 | |
2159 @item z | |
2160 Stack pointer register (SP). | |
2161 @end table | |
2162 | |
2163 @ifset INTERNALS | |
2164 The C-SKY back end supports a large set of additional constraints | |
2165 that are only useful for instruction selection or splitting rather | |
2166 than inline asm, such as constraints representing constant integer | |
2167 ranges accepted by particular instruction encodings. | |
2168 Refer to the source code for details. | |
2169 @end ifset | |
2117 | 2170 |
2118 @item Epiphany---@file{config/epiphany/constraints.md} | 2171 @item Epiphany---@file{config/epiphany/constraints.md} |
2119 @table @code | 2172 @table @code |
2120 @item U16 | 2173 @item U16 |
2121 An unsigned 16-bit constant. | 2174 An unsigned 16-bit constant. |
2958 | 3011 |
2959 @item d | 3012 @item d |
2960 Odd numbered general registers (R1, R3, R5). These are used for | 3013 Odd numbered general registers (R1, R3, R5). These are used for |
2961 16-bit multiply operations. | 3014 16-bit multiply operations. |
2962 | 3015 |
3016 @item D | |
3017 A memory reference that is encoded within the opcode, but not | |
3018 auto-increment or auto-decrement. | |
3019 | |
2963 @item f | 3020 @item f |
2964 Any of the floating point registers (AC0 through AC5). | 3021 Any of the floating point registers (AC0 through AC5). |
2965 | 3022 |
2966 @item G | 3023 @item G |
2967 Floating point constant 0. | 3024 Floating point constant 0. |
3025 | |
3026 @item h | |
3027 Floating point registers AC4 and AC5. These cannot be loaded from/to | |
3028 memory with a single instruction. | |
2968 | 3029 |
2969 @item I | 3030 @item I |
2970 An integer constant that fits in 16 bits. | 3031 An integer constant that fits in 16 bits. |
2971 | 3032 |
2972 @item J | 3033 @item J |
2984 | 3045 |
2985 @item N | 3046 @item N |
2986 The integer constant 0. | 3047 The integer constant 0. |
2987 | 3048 |
2988 @item O | 3049 @item O |
2989 Integer constants @minus{}4 through @minus{}1 and 1 through 4; shifts by these | 3050 Integer constants 0 through 3; shifts by these |
2990 amounts are handled as multiple single-bit shifts rather than a single | 3051 amounts are handled as multiple single-bit shifts rather than a single |
2991 variable-length shift. | 3052 variable-length shift. |
2992 | 3053 |
2993 @item Q | 3054 @item Q |
2994 A memory reference which requires an additional word (address or | 3055 A memory reference which requires an additional word (address or |
4181 VSIB address operand. | 4242 VSIB address operand. |
4182 | 4243 |
4183 @item Ts | 4244 @item Ts |
4184 Address operand without segment register. | 4245 Address operand without segment register. |
4185 | 4246 |
4186 @item Ti | |
4187 MPX address operand without index. | |
4188 | |
4189 @item Tb | |
4190 MPX address operand without base. | |
4191 | |
4192 @end table | 4247 @end table |
4193 | 4248 |
4194 @item Xstormy16---@file{config/stormy16/stormy16.h} | 4249 @item Xstormy16---@file{config/stormy16/stormy16.h} |
4195 @table @code | 4250 @table @code |
4196 @item a | 4251 @item a |
4847 instruction for some mode @var{n}, it also supports unaligned | 4902 instruction for some mode @var{n}, it also supports unaligned |
4848 loads for vectors of mode @var{n}. | 4903 loads for vectors of mode @var{n}. |
4849 | 4904 |
4850 This pattern is not allowed to @code{FAIL}. | 4905 This pattern is not allowed to @code{FAIL}. |
4851 | 4906 |
4907 @cindex @code{vec_mask_load_lanes@var{m}@var{n}} instruction pattern | |
4908 @item @samp{vec_mask_load_lanes@var{m}@var{n}} | |
4909 Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional | |
4910 mask operand (operand 2) that specifies which elements of the destination | |
4911 vectors should be loaded. Other elements of the destination | |
4912 vectors are set to zero. The operation is equivalent to: | |
4913 | |
4914 @smallexample | |
4915 int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); | |
4916 for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) | |
4917 if (operand2[j]) | |
4918 for (i = 0; i < c; i++) | |
4919 operand0[i][j] = operand1[j * c + i]; | |
4920 else | |
4921 for (i = 0; i < c; i++) | |
4922 operand0[i][j] = 0; | |
4923 @end smallexample | |
4924 | |
4925 This pattern is not allowed to @code{FAIL}. | |
4926 | |
4852 @cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern | 4927 @cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern |
4853 @item @samp{vec_store_lanes@var{m}@var{n}} | 4928 @item @samp{vec_store_lanes@var{m}@var{n}} |
4854 Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory | 4929 Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory |
4855 and register operands reversed. That is, the instruction is | 4930 and register operands reversed. That is, the instruction is |
4856 equivalent to: | 4931 equivalent to: |
4863 @end smallexample | 4938 @end smallexample |
4864 | 4939 |
4865 for a memory operand 0 and register operand 1. | 4940 for a memory operand 0 and register operand 1. |
4866 | 4941 |
4867 This pattern is not allowed to @code{FAIL}. | 4942 This pattern is not allowed to @code{FAIL}. |
4943 | |
4944 @cindex @code{vec_mask_store_lanes@var{m}@var{n}} instruction pattern | |
4945 @item @samp{vec_mask_store_lanes@var{m}@var{n}} | |
4946 Like @samp{vec_store_lanes@var{m}@var{n}}, but takes an additional | |
4947 mask operand (operand 2) that specifies which elements of the source | |
4948 vectors should be stored. The operation is equivalent to: | |
4949 | |
4950 @smallexample | |
4951 int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); | |
4952 for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) | |
4953 if (operand2[j]) | |
4954 for (i = 0; i < c; i++) | |
4955 operand0[j * c + i] = operand1[i][j]; | |
4956 @end smallexample | |
4957 | |
4958 This pattern is not allowed to @code{FAIL}. | |
4959 | |
4960 @cindex @code{gather_load@var{m}} instruction pattern | |
4961 @item @samp{gather_load@var{m}} | |
4962 Load several separate memory locations into a vector of mode @var{m}. | |
4963 Operand 1 is a scalar base address and operand 2 is a vector of | |
4964 offsets from that base. Operand 0 is a destination vector with the | |
4965 same number of elements as the offset. For each element index @var{i}: | |
4966 | |
4967 @itemize @bullet | |
4968 @item | |
4969 extend the offset element @var{i} to address width, using zero | |
4970 extension if operand 3 is 1 and sign extension if operand 3 is zero; | |
4971 @item | |
4972 multiply the extended offset by operand 4; | |
4973 @item | |
4974 add the result to the base; and | |
4975 @item | |
4976 load the value at that address into element @var{i} of operand 0. | |
4977 @end itemize | |
4978 | |
4979 The value of operand 3 does not matter if the offsets are already | |
4980 address width. | |
4981 | |
4982 @cindex @code{mask_gather_load@var{m}} instruction pattern | |
4983 @item @samp{mask_gather_load@var{m}} | |
4984 Like @samp{gather_load@var{m}}, but takes an extra mask operand as | |
4985 operand 5. Bit @var{i} of the mask is set if element @var{i} | |
4986 of the result should be loaded from memory and clear if element @var{i} | |
4987 of the result should be set to zero. | |
4988 | |
4989 @cindex @code{scatter_store@var{m}} instruction pattern | |
4990 @item @samp{scatter_store@var{m}} | |
4991 Store a vector of mode @var{m} into several distinct memory locations. | |
4992 Operand 0 is a scalar base address and operand 1 is a vector of offsets | |
4993 from that base. Operand 4 is the vector of values that should be stored, | |
4994 which has the same number of elements as the offset. For each element | |
4995 index @var{i}: | |
4996 | |
4997 @itemize @bullet | |
4998 @item | |
4999 extend the offset element @var{i} to address width, using zero | |
5000 extension if operand 2 is 1 and sign extension if operand 2 is zero; | |
5001 @item | |
5002 multiply the extended offset by operand 3; | |
5003 @item | |
5004 add the result to the base; and | |
5005 @item | |
5006 store element @var{i} of operand 4 to that address. | |
5007 @end itemize | |
5008 | |
5009 The value of operand 2 does not matter if the offsets are already | |
5010 address width. | |
5011 | |
5012 @cindex @code{mask_scatter_store@var{m}} instruction pattern | |
5013 @item @samp{mask_scatter_store@var{m}} | |
5014 Like @samp{scatter_store@var{m}}, but takes an extra mask operand as | |
5015 operand 5. Bit @var{i} of the mask is set if element @var{i} | |
5016 of the result should be stored to memory. | |
4868 | 5017 |
4869 @cindex @code{vec_set@var{m}} instruction pattern | 5018 @cindex @code{vec_set@var{m}} instruction pattern |
4870 @item @samp{vec_set@var{m}} | 5019 @item @samp{vec_set@var{m}} |
4871 Set given field in the vector value. Operand 0 is the vector to modify, | 5020 Set given field in the vector value. Operand 0 is the vector to modify, |
4872 operand 1 is new value of field and operand 2 specify the field index. | 5021 operand 1 is new value of field and operand 2 specify the field index. |
4885 Initialize the vector to given values. Operand 0 is the vector to initialize | 5034 Initialize the vector to given values. Operand 0 is the vector to initialize |
4886 and operand 1 is parallel containing values for individual fields. The | 5035 and operand 1 is parallel containing values for individual fields. The |
4887 @var{n} mode is the mode of the elements, should be either element mode of | 5036 @var{n} mode is the mode of the elements, should be either element mode of |
4888 the vector mode @var{m}, or a vector mode with the same element mode and | 5037 the vector mode @var{m}, or a vector mode with the same element mode and |
4889 smaller number of elements. | 5038 smaller number of elements. |
5039 | |
5040 @cindex @code{vec_duplicate@var{m}} instruction pattern | |
5041 @item @samp{vec_duplicate@var{m}} | |
5042 Initialize vector output operand 0 so that each element has the value given | |
5043 by scalar input operand 1. The vector has mode @var{m} and the scalar has | |
5044 the mode appropriate for one element of @var{m}. | |
5045 | |
5046 This pattern only handles duplicates of non-constant inputs. Constant | |
5047 vectors go through the @code{mov@var{m}} pattern instead. | |
5048 | |
5049 This pattern is not allowed to @code{FAIL}. | |
5050 | |
5051 @cindex @code{vec_series@var{m}} instruction pattern | |
5052 @item @samp{vec_series@var{m}} | |
5053 Initialize vector output operand 0 so that element @var{i} is equal to | |
5054 operand 1 plus @var{i} times operand 2. In other words, create a linear | |
5055 series whose base value is operand 1 and whose step is operand 2. | |
5056 | |
5057 The vector output has mode @var{m} and the scalar inputs have the mode | |
5058 appropriate for one element of @var{m}. This pattern is not used for | |
5059 floating-point vectors, in order to avoid having to specify the | |
5060 rounding behavior for @var{i} > 1. | |
5061 | |
5062 This pattern is not allowed to @code{FAIL}. | |
5063 | |
5064 @cindex @code{while_ult@var{m}@var{n}} instruction pattern | |
5065 @item @code{while_ult@var{m}@var{n}} | |
5066 Set operand 0 to a mask that is true while incrementing operand 1 | |
5067 gives a value that is less than operand 2. Operand 0 has mode @var{n} | |
5068 and operands 1 and 2 are scalar integers of mode @var{m}. | |
5069 The operation is equivalent to: | |
5070 | |
5071 @smallexample | |
5072 operand0[0] = operand1 < operand2; | |
5073 for (i = 1; i < GET_MODE_NUNITS (@var{n}); i++) | |
5074 operand0[i] = operand0[i - 1] && (operand1 + i < operand2); | |
5075 @end smallexample | |
4890 | 5076 |
4891 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern | 5077 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern |
4892 @item @samp{vec_cmp@var{m}@var{n}} | 5078 @item @samp{vec_cmp@var{m}@var{n}} |
4893 Output a vector comparison. Operand 0 of mode @var{n} is the destination for | 5079 Output a vector comparison. Operand 0 of mode @var{n} is the destination for |
4894 predicate in operand 1 which is a signed vector comparison with operands of | 5080 predicate in operand 1 which is a signed vector comparison with operands of |
4970 @samp{vec_perm} pattern for mode @var{m}, but there is for mode @var{q} | 5156 @samp{vec_perm} pattern for mode @var{m}, but there is for mode @var{q} |
4971 where @var{q} is a vector of @code{QImode} of the same width as @var{m}, | 5157 where @var{q} is a vector of @code{QImode} of the same width as @var{m}, |
4972 the middle-end will lower the mode @var{m} @code{VEC_PERM_EXPR} to | 5158 the middle-end will lower the mode @var{m} @code{VEC_PERM_EXPR} to |
4973 mode @var{q}. | 5159 mode @var{q}. |
4974 | 5160 |
4975 @cindex @code{vec_perm_const@var{m}} instruction pattern | 5161 See also @code{TARGET_VECTORIZER_VEC_PERM_CONST}, which performs |
4976 @item @samp{vec_perm_const@var{m}} | 5162 the analogous operation for constant selectors. |
4977 Like @samp{vec_perm} except that the permutation is a compile-time | |
4978 constant. That is, operand 3, the @dfn{selector}, is a @code{CONST_VECTOR}. | |
4979 | |
4980 Some targets cannot perform a permutation with a variable selector, | |
4981 but can efficiently perform a constant permutation. Further, the | |
4982 target hook @code{vec_perm_ok} is queried to determine if the | |
4983 specific constant permutation is available efficiently; the named | |
4984 pattern is never expanded without @code{vec_perm_ok} returning true. | |
4985 | |
4986 There is no need for a target to supply both @samp{vec_perm@var{m}} | |
4987 and @samp{vec_perm_const@var{m}} if the former can trivially implement | |
4988 the operation with, say, the vector constant loaded into a register. | |
4989 | 5163 |
4990 @cindex @code{push@var{m}1} instruction pattern | 5164 @cindex @code{push@var{m}1} instruction pattern |
4991 @item @samp{push@var{m}1} | 5165 @item @samp{push@var{m}1} |
4992 Output a push instruction. Operand 0 is value to push. Used only when | 5166 Output a push instruction. Operand 0 is value to push. Used only when |
4993 @code{PUSH_ROUNDING} is defined. For historical reason, this pattern may be | 5167 @code{PUSH_ROUNDING} is defined. For historical reason, this pattern may be |
5139 @item @samp{reduc_plus_scal_@var{m}} | 5313 @item @samp{reduc_plus_scal_@var{m}} |
5140 Compute the sum of the elements of a vector. The vector is operand 1, and | 5314 Compute the sum of the elements of a vector. The vector is operand 1, and |
5141 operand 0 is the scalar result, with mode equal to the mode of the elements of | 5315 operand 0 is the scalar result, with mode equal to the mode of the elements of |
5142 the input vector. | 5316 the input vector. |
5143 | 5317 |
5318 @cindex @code{reduc_and_scal_@var{m}} instruction pattern | |
5319 @item @samp{reduc_and_scal_@var{m}} | |
5320 @cindex @code{reduc_ior_scal_@var{m}} instruction pattern | |
5321 @itemx @samp{reduc_ior_scal_@var{m}} | |
5322 @cindex @code{reduc_xor_scal_@var{m}} instruction pattern | |
5323 @itemx @samp{reduc_xor_scal_@var{m}} | |
5324 Compute the bitwise @code{AND}/@code{IOR}/@code{XOR} reduction of the elements | |
5325 of a vector of mode @var{m}. Operand 1 is the vector input and operand 0 | |
5326 is the scalar result. The mode of the scalar result is the same as one | |
5327 element of @var{m}. | |
5328 | |
5329 @cindex @code{extract_last_@var{m}} instruction pattern | |
5330 @item @code{extract_last_@var{m}} | |
5331 Find the last set bit in mask operand 1 and extract the associated element | |
5332 of vector operand 2. Store the result in scalar operand 0. Operand 2 | |
5333 has vector mode @var{m} while operand 0 has the mode appropriate for one | |
5334 element of @var{m}. Operand 1 has the usual mask mode for vectors of mode | |
5335 @var{m}; see @code{TARGET_VECTORIZE_GET_MASK_MODE}. | |
5336 | |
5337 @cindex @code{fold_extract_last_@var{m}} instruction pattern | |
5338 @item @code{fold_extract_last_@var{m}} | |
5339 If any bits of mask operand 2 are set, find the last set bit, extract | |
5340 the associated element from vector operand 3, and store the result | |
5341 in operand 0. Store operand 1 in operand 0 otherwise. Operand 3 | |
5342 has mode @var{m} and operands 0 and 1 have the mode appropriate for | |
5343 one element of @var{m}. Operand 2 has the usual mask mode for vectors | |
5344 of mode @var{m}; see @code{TARGET_VECTORIZE_GET_MASK_MODE}. | |
5345 | |
5346 @cindex @code{fold_left_plus_@var{m}} instruction pattern | |
5347 @item @code{fold_left_plus_@var{m}} | |
5348 Take scalar operand 1 and successively add each element from vector | |
5349 operand 2. Store the result in scalar operand 0. The vector has | |
5350 mode @var{m} and the scalars have the mode appropriate for one | |
5351 element of @var{m}. The operation is strictly in-order: there is | |
5352 no reassociation. | |
5353 | |
5144 @cindex @code{sdot_prod@var{m}} instruction pattern | 5354 @cindex @code{sdot_prod@var{m}} instruction pattern |
5145 @item @samp{sdot_prod@var{m}} | 5355 @item @samp{sdot_prod@var{m}} |
5146 @cindex @code{udot_prod@var{m}} instruction pattern | 5356 @cindex @code{udot_prod@var{m}} instruction pattern |
5147 @itemx @samp{udot_prod@var{m}} | 5357 @itemx @samp{udot_prod@var{m}} |
5148 Compute the sum of the products of two signed/unsigned elements. | 5358 Compute the sum of the products of two signed/unsigned elements. |
5168 Operands 0 and 2 are of the same mode, which is wider than the mode of | 5378 Operands 0 and 2 are of the same mode, which is wider than the mode of |
5169 operand 1. Add operand 1 to operand 2 and place the widened result in | 5379 operand 1. Add operand 1 to operand 2 and place the widened result in |
5170 operand 0. (This is used express accumulation of elements into an accumulator | 5380 operand 0. (This is used express accumulation of elements into an accumulator |
5171 of a wider mode.) | 5381 of a wider mode.) |
5172 | 5382 |
5383 @cindex @code{vec_shl_insert_@var{m}} instruction pattern | |
5384 @item @samp{vec_shl_insert_@var{m}} | |
5385 Shift the elements in vector input operand 1 left one element (i.e. | |
5386 away from element 0) and fill the vacated element 0 with the scalar | |
5387 in operand 2. Store the result in vector output operand 0. Operands | |
5388 0 and 1 have mode @var{m} and operand 2 has the mode appropriate for | |
5389 one element of @var{m}. | |
5390 | |
5173 @cindex @code{vec_shr_@var{m}} instruction pattern | 5391 @cindex @code{vec_shr_@var{m}} instruction pattern |
5174 @item @samp{vec_shr_@var{m}} | 5392 @item @samp{vec_shr_@var{m}} |
5175 Whole vector right shift in bits, i.e. towards element 0. | 5393 Whole vector right shift in bits, i.e. towards element 0. |
5176 Operand 1 is a vector to be shifted. | 5394 Operand 1 is a vector to be shifted. |
5177 Operand 2 is an integer shift amount in bits. | 5395 Operand 2 is an integer shift amount in bits. |
5198 @cindex @code{vec_pack_ufix_trunc_@var{m}} instruction pattern | 5416 @cindex @code{vec_pack_ufix_trunc_@var{m}} instruction pattern |
5199 @item @samp{vec_pack_sfix_trunc_@var{m}}, @samp{vec_pack_ufix_trunc_@var{m}} | 5417 @item @samp{vec_pack_sfix_trunc_@var{m}}, @samp{vec_pack_ufix_trunc_@var{m}} |
5200 Narrow, convert to signed/unsigned integral type and merge the elements | 5418 Narrow, convert to signed/unsigned integral type and merge the elements |
5201 of two vectors. Operands 1 and 2 are vectors of the same mode having N | 5419 of two vectors. Operands 1 and 2 are vectors of the same mode having N |
5202 floating point elements of size S@. Operand 0 is the resulting vector | 5420 floating point elements of size S@. Operand 0 is the resulting vector |
5421 in which 2*N elements of size N/2 are concatenated. | |
5422 | |
5423 @cindex @code{vec_packs_float_@var{m}} instruction pattern | |
5424 @cindex @code{vec_packu_float_@var{m}} instruction pattern | |
5425 @item @samp{vec_packs_float_@var{m}}, @samp{vec_packu_float_@var{m}} | |
5426 Narrow, convert to floating point type and merge the elements | |
5427 of two vectors. Operands 1 and 2 are vectors of the same mode having N | |
5428 signed/unsigned integral elements of size S@. Operand 0 is the resulting vector | |
5203 in which 2*N elements of size N/2 are concatenated. | 5429 in which 2*N elements of size N/2 are concatenated. |
5204 | 5430 |
5205 @cindex @code{vec_unpacks_hi_@var{m}} instruction pattern | 5431 @cindex @code{vec_unpacks_hi_@var{m}} instruction pattern |
5206 @cindex @code{vec_unpacks_lo_@var{m}} instruction pattern | 5432 @cindex @code{vec_unpacks_lo_@var{m}} instruction pattern |
5207 @item @samp{vec_unpacks_hi_@var{m}}, @samp{vec_unpacks_lo_@var{m}} | 5433 @item @samp{vec_unpacks_hi_@var{m}}, @samp{vec_unpacks_lo_@var{m}} |
5227 @itemx @samp{vec_unpacku_float_hi_@var{m}}, @samp{vec_unpacku_float_lo_@var{m}} | 5453 @itemx @samp{vec_unpacku_float_hi_@var{m}}, @samp{vec_unpacku_float_lo_@var{m}} |
5228 Extract, convert to floating point type and widen the high/low part of a | 5454 Extract, convert to floating point type and widen the high/low part of a |
5229 vector of signed/unsigned integral elements. The input vector (operand 1) | 5455 vector of signed/unsigned integral elements. The input vector (operand 1) |
5230 has N elements of size S@. Convert the high/low elements of the vector using | 5456 has N elements of size S@. Convert the high/low elements of the vector using |
5231 floating point conversion and place the resulting N/2 values of size 2*S in | 5457 floating point conversion and place the resulting N/2 values of size 2*S in |
5458 the output vector (operand 0). | |
5459 | |
5460 @cindex @code{vec_unpack_sfix_trunc_hi_@var{m}} instruction pattern | |
5461 @cindex @code{vec_unpack_sfix_trunc_lo_@var{m}} instruction pattern | |
5462 @cindex @code{vec_unpack_ufix_trunc_hi_@var{m}} instruction pattern | |
5463 @cindex @code{vec_unpack_ufix_trunc_lo_@var{m}} instruction pattern | |
5464 @item @samp{vec_unpack_sfix_trunc_hi_@var{m}}, | |
5465 @itemx @samp{vec_unpack_sfix_trunc_lo_@var{m}} | |
5466 @itemx @samp{vec_unpack_ufix_trunc_hi_@var{m}} | |
5467 @itemx @samp{vec_unpack_ufix_trunc_lo_@var{m}} | |
5468 Extract, convert to signed/unsigned integer type and widen the high/low part of a | |
5469 vector of floating point elements. The input vector (operand 1) | |
5470 has N elements of size S@. Convert the high/low elements of the vector | |
5471 to integers and place the resulting N/2 values of size 2*S in | |
5232 the output vector (operand 0). | 5472 the output vector (operand 0). |
5233 | 5473 |
5234 @cindex @code{vec_widen_umult_hi_@var{m}} instruction pattern | 5474 @cindex @code{vec_widen_umult_hi_@var{m}} instruction pattern |
5235 @cindex @code{vec_widen_umult_lo_@var{m}} instruction pattern | 5475 @cindex @code{vec_widen_umult_lo_@var{m}} instruction pattern |
5236 @cindex @code{vec_widen_smult_hi_@var{m}} instruction pattern | 5476 @cindex @code{vec_widen_smult_hi_@var{m}} instruction pattern |
5404 @cindex @code{vrotr@var{m}3} instruction pattern | 5644 @cindex @code{vrotr@var{m}3} instruction pattern |
5405 @item @samp{vashl@var{m}3}, @samp{vashr@var{m}3}, @samp{vlshr@var{m}3}, @samp{vrotl@var{m}3}, @samp{vrotr@var{m}3} | 5645 @item @samp{vashl@var{m}3}, @samp{vashr@var{m}3}, @samp{vlshr@var{m}3}, @samp{vrotl@var{m}3}, @samp{vrotr@var{m}3} |
5406 Vector shift and rotate instructions that take vectors as operand 2 | 5646 Vector shift and rotate instructions that take vectors as operand 2 |
5407 instead of a scalar type. | 5647 instead of a scalar type. |
5408 | 5648 |
5649 @cindex @code{avg@var{m}3_floor} instruction pattern | |
5650 @cindex @code{uavg@var{m}3_floor} instruction pattern | |
5651 @item @samp{avg@var{m}3_floor} | |
5652 @itemx @samp{uavg@var{m}3_floor} | |
5653 Signed and unsigned average instructions. These instructions add | |
5654 operands 1 and 2 without truncation, divide the result by 2, | |
5655 round towards -Inf, and store the result in operand 0. This is | |
5656 equivalent to the C code: | |
5657 @smallexample | |
5658 narrow op0, op1, op2; | |
5659 @dots{} | |
5660 op0 = (narrow) (((wide) op1 + (wide) op2) >> 1); | |
5661 @end smallexample | |
5662 where the sign of @samp{narrow} determines whether this is a signed | |
5663 or unsigned operation. | |
5664 | |
5665 @cindex @code{avg@var{m}3_ceil} instruction pattern | |
5666 @cindex @code{uavg@var{m}3_ceil} instruction pattern | |
5667 @item @samp{avg@var{m}3_ceil} | |
5668 @itemx @samp{uavg@var{m}3_ceil} | |
5669 Like @samp{avg@var{m}3_floor} and @samp{uavg@var{m}3_floor}, but round | |
5670 towards +Inf. This is equivalent to the C code: | |
5671 @smallexample | |
5672 narrow op0, op1, op2; | |
5673 @dots{} | |
5674 op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1); | |
5675 @end smallexample | |
5676 | |
5409 @cindex @code{bswap@var{m}2} instruction pattern | 5677 @cindex @code{bswap@var{m}2} instruction pattern |
5410 @item @samp{bswap@var{m}2} | 5678 @item @samp{bswap@var{m}2} |
5411 Reverse the order of bytes of operand 1 and store the result in operand 0. | 5679 Reverse the order of bytes of operand 1 and store the result in operand 0. |
5412 | 5680 |
5413 @cindex @code{neg@var{m}2} instruction pattern | 5681 @cindex @code{neg@var{m}2} instruction pattern |
6160 Similar to @samp{mov@var{mode}cc} but for conditional addition. Conditionally | 6428 Similar to @samp{mov@var{mode}cc} but for conditional addition. Conditionally |
6161 move operand 2 or (operands 2 + operand 3) into operand 0 according to the | 6429 move operand 2 or (operands 2 + operand 3) into operand 0 according to the |
6162 comparison in operand 1. If the comparison is false, operand 2 is moved into | 6430 comparison in operand 1. If the comparison is false, operand 2 is moved into |
6163 operand 0, otherwise (operand 2 + operand 3) is moved. | 6431 operand 0, otherwise (operand 2 + operand 3) is moved. |
6164 | 6432 |
6433 @cindex @code{cond_add@var{mode}} instruction pattern | |
6434 @cindex @code{cond_sub@var{mode}} instruction pattern | |
6435 @cindex @code{cond_mul@var{mode}} instruction pattern | |
6436 @cindex @code{cond_div@var{mode}} instruction pattern | |
6437 @cindex @code{cond_udiv@var{mode}} instruction pattern | |
6438 @cindex @code{cond_mod@var{mode}} instruction pattern | |
6439 @cindex @code{cond_umod@var{mode}} instruction pattern | |
6440 @cindex @code{cond_and@var{mode}} instruction pattern | |
6441 @cindex @code{cond_ior@var{mode}} instruction pattern | |
6442 @cindex @code{cond_xor@var{mode}} instruction pattern | |
6443 @cindex @code{cond_smin@var{mode}} instruction pattern | |
6444 @cindex @code{cond_smax@var{mode}} instruction pattern | |
6445 @cindex @code{cond_umin@var{mode}} instruction pattern | |
6446 @cindex @code{cond_umax@var{mode}} instruction pattern | |
6447 @item @samp{cond_add@var{mode}} | |
6448 @itemx @samp{cond_sub@var{mode}} | |
6449 @itemx @samp{cond_mul@var{mode}} | |
6450 @itemx @samp{cond_div@var{mode}} | |
6451 @itemx @samp{cond_udiv@var{mode}} | |
6452 @itemx @samp{cond_mod@var{mode}} | |
6453 @itemx @samp{cond_umod@var{mode}} | |
6454 @itemx @samp{cond_and@var{mode}} | |
6455 @itemx @samp{cond_ior@var{mode}} | |
6456 @itemx @samp{cond_xor@var{mode}} | |
6457 @itemx @samp{cond_smin@var{mode}} | |
6458 @itemx @samp{cond_smax@var{mode}} | |
6459 @itemx @samp{cond_umin@var{mode}} | |
6460 @itemx @samp{cond_umax@var{mode}} | |
6461 When operand 1 is true, perform an operation on operands 2 and 3 and | |
6462 store the result in operand 0, otherwise store operand 4 in operand 0. | |
6463 The operation works elementwise if the operands are vectors. | |
6464 | |
6465 The scalar case is equivalent to: | |
6466 | |
6467 @smallexample | |
6468 op0 = op1 ? op2 @var{op} op3 : op4; | |
6469 @end smallexample | |
6470 | |
6471 while the vector case is equivalent to: | |
6472 | |
6473 @smallexample | |
6474 for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++) | |
6475 op0[i] = op1[i] ? op2[i] @var{op} op3[i] : op4[i]; | |
6476 @end smallexample | |
6477 | |
6478 where, for example, @var{op} is @code{+} for @samp{cond_add@var{mode}}. | |
6479 | |
6480 When defined for floating-point modes, the contents of @samp{op3[i]} | |
6481 are not interpreted if @var{op1[i]} is false, just like they would not | |
6482 be in a normal C @samp{?:} condition. | |
6483 | |
6484 Operands 0, 2, 3 and 4 all have mode @var{m}. Operand 1 is a scalar | |
6485 integer if @var{m} is scalar, otherwise it has the mode returned by | |
6486 @code{TARGET_VECTORIZE_GET_MASK_MODE}. | |
6487 | |
6488 @cindex @code{cond_fma@var{mode}} instruction pattern | |
6489 @cindex @code{cond_fms@var{mode}} instruction pattern | |
6490 @cindex @code{cond_fnma@var{mode}} instruction pattern | |
6491 @cindex @code{cond_fnms@var{mode}} instruction pattern | |
6492 @item @samp{cond_fma@var{mode}} | |
6493 @itemx @samp{cond_fms@var{mode}} | |
6494 @itemx @samp{cond_fnma@var{mode}} | |
6495 @itemx @samp{cond_fnms@var{mode}} | |
6496 Like @samp{cond_add@var{m}}, except that the conditional operation | |
6497 takes 3 operands rather than two. For example, the vector form of | |
6498 @samp{cond_fma@var{mode}} is equivalent to: | |
6499 | |
6500 @smallexample | |
6501 for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++) | |
6502 op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i]; | |
6503 @end smallexample | |
6504 | |
6165 @cindex @code{neg@var{mode}cc} instruction pattern | 6505 @cindex @code{neg@var{mode}cc} instruction pattern |
6166 @item @samp{neg@var{mode}cc} | 6506 @item @samp{neg@var{mode}cc} |
6167 Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally | 6507 Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally |
6168 move the negation of operand 2 or the unchanged operand 3 into operand 0 | 6508 move the negation of operand 2 or the unchanged operand 3 into operand 0 |
6169 according to the comparison in operand 1. If the comparison is true, the negation | 6509 according to the comparison in operand 1. If the comparison is true, the negation |
6402 The @samp{tablejump} insn is always the last insn before the jump | 6742 The @samp{tablejump} insn is always the last insn before the jump |
6403 table it uses. Its assembler code normally has no need to use the | 6743 table it uses. Its assembler code normally has no need to use the |
6404 second operand, but you should incorporate it in the RTL pattern so | 6744 second operand, but you should incorporate it in the RTL pattern so |
6405 that the jump optimizer will not delete the table as unreachable code. | 6745 that the jump optimizer will not delete the table as unreachable code. |
6406 | 6746 |
6407 | |
6408 @cindex @code{decrement_and_branch_until_zero} instruction pattern | |
6409 @item @samp{decrement_and_branch_until_zero} | |
6410 Conditional branch instruction that decrements a register and | |
6411 jumps if the register is nonzero. Operand 0 is the register to | |
6412 decrement and test; operand 1 is the label to jump to if the | |
6413 register is nonzero. @xref{Looping Patterns}. | |
6414 | |
6415 This optional instruction pattern is only used by the combiner, | |
6416 typically for loops reversed by the loop optimizer when strength | |
6417 reduction is enabled. | |
6418 | 6747 |
6419 @cindex @code{doloop_end} instruction pattern | 6748 @cindex @code{doloop_end} instruction pattern |
6420 @item @samp{doloop_end} | 6749 @item @samp{doloop_end} |
6421 Conditional branch instruction that decrements a register and | 6750 Conditional branch instruction that decrements a register and |
6422 jumps if the register is nonzero. Operand 0 is the register to | 6751 jumps if the register is nonzero. Operand 0 is the register to |
6747 @item @samp{memory_barrier} | 7076 @item @samp{memory_barrier} |
6748 If the target memory model is not fully synchronous, then this pattern | 7077 If the target memory model is not fully synchronous, then this pattern |
6749 should be defined to an instruction that orders both loads and stores | 7078 should be defined to an instruction that orders both loads and stores |
6750 before the instruction with respect to loads and stores after the instruction. | 7079 before the instruction with respect to loads and stores after the instruction. |
6751 This pattern has no operands. | 7080 This pattern has no operands. |
7081 | |
7082 @cindex @code{speculation_barrier} instruction pattern | |
7083 @item @samp{speculation_barrier} | |
7084 If the target can support speculative execution, then this pattern should | |
7085 be defined to an instruction that will block subsequent execution until | |
7086 any prior speculation conditions has been resolved. The pattern must also | |
7087 ensure that the compiler cannot move memory operations past the barrier, | |
7088 so it needs to be an UNSPEC_VOLATILE pattern. The pattern has no | |
7089 operands. | |
7090 | |
7091 If this pattern is not defined then the default expansion of | |
7092 @code{__builtin_speculation_safe_value} will emit a warning. You can | |
7093 suppress this warning by defining this pattern with a final condition | |
7094 of @code{0} (zero), which tells the compiler that a speculation | |
7095 barrier is not needed for this target. | |
6752 | 7096 |
6753 @cindex @code{sync_compare_and_swap@var{mode}} instruction pattern | 7097 @cindex @code{sync_compare_and_swap@var{mode}} instruction pattern |
6754 @item @samp{sync_compare_and_swap@var{mode}} | 7098 @item @samp{sync_compare_and_swap@var{mode}} |
6755 This pattern, if defined, emits code for an atomic compare-and-swap | 7099 This pattern, if defined, emits code for an atomic compare-and-swap |
6756 operation. Operand 1 is the memory on which the atomic operation is | 7100 operation. Operand 1 is the memory on which the atomic operation is |
7237 mark the top and end of a loop and to count the number of loop | 7581 mark the top and end of a loop and to count the number of loop |
7238 iterations. This avoids the need for fetching and executing a | 7582 iterations. This avoids the need for fetching and executing a |
7239 @samp{dbra}-like instruction and avoids pipeline stalls associated with | 7583 @samp{dbra}-like instruction and avoids pipeline stalls associated with |
7240 the jump. | 7584 the jump. |
7241 | 7585 |
7242 GCC has three special named patterns to support low overhead looping. | 7586 GCC has two special named patterns to support low overhead looping. |
7243 They are @samp{decrement_and_branch_until_zero}, @samp{doloop_begin}, | 7587 They are @samp{doloop_begin} and @samp{doloop_end}. These are emitted |
7244 and @samp{doloop_end}. The first pattern, | 7588 by the loop optimizer for certain well-behaved loops with a finite |
7245 @samp{decrement_and_branch_until_zero}, is not emitted during RTL | 7589 number of loop iterations using information collected during strength |
7246 generation but may be emitted during the instruction combination phase. | 7590 reduction. |
7247 This requires the assistance of the loop optimizer, using information | |
7248 collected during strength reduction, to reverse a loop to count down to | |
7249 zero. Some targets also require the loop optimizer to add a | |
7250 @code{REG_NONNEG} note to indicate that the iteration count is always | |
7251 positive. This is needed if the target performs a signed loop | |
7252 termination test. For example, the 68000 uses a pattern similar to the | |
7253 following for its @code{dbra} instruction: | |
7254 | |
7255 @smallexample | |
7256 @group | |
7257 (define_insn "decrement_and_branch_until_zero" | |
7258 [(set (pc) | |
7259 (if_then_else | |
7260 (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am") | |
7261 (const_int -1)) | |
7262 (const_int 0)) | |
7263 (label_ref (match_operand 1 "" "")) | |
7264 (pc))) | |
7265 (set (match_dup 0) | |
7266 (plus:SI (match_dup 0) | |
7267 (const_int -1)))] | |
7268 "find_reg_note (insn, REG_NONNEG, 0)" | |
7269 "@dots{}") | |
7270 @end group | |
7271 @end smallexample | |
7272 | |
7273 Note that since the insn is both a jump insn and has an output, it must | |
7274 deal with its own reloads, hence the `m' constraints. Also note that | |
7275 since this insn is generated by the instruction combination phase | |
7276 combining two sequential insns together into an implicit parallel insn, | |
7277 the iteration counter needs to be biased by the same amount as the | |
7278 decrement operation, in this case @minus{}1. Note that the following similar | |
7279 pattern will not be matched by the combiner. | |
7280 | |
7281 @smallexample | |
7282 @group | |
7283 (define_insn "decrement_and_branch_until_zero" | |
7284 [(set (pc) | |
7285 (if_then_else | |
7286 (ge (match_operand:SI 0 "general_operand" "+d*am") | |
7287 (const_int 1)) | |
7288 (label_ref (match_operand 1 "" "")) | |
7289 (pc))) | |
7290 (set (match_dup 0) | |
7291 (plus:SI (match_dup 0) | |
7292 (const_int -1)))] | |
7293 "find_reg_note (insn, REG_NONNEG, 0)" | |
7294 "@dots{}") | |
7295 @end group | |
7296 @end smallexample | |
7297 | |
7298 The other two special looping patterns, @samp{doloop_begin} and | |
7299 @samp{doloop_end}, are emitted by the loop optimizer for certain | |
7300 well-behaved loops with a finite number of loop iterations using | |
7301 information collected during strength reduction. | |
7302 | 7591 |
7303 The @samp{doloop_end} pattern describes the actual looping instruction | 7592 The @samp{doloop_end} pattern describes the actual looping instruction |
7304 (or the implicit looping operation) and the @samp{doloop_begin} pattern | 7593 (or the implicit looping operation) and the @samp{doloop_begin} pattern |
7305 is an optional companion pattern that can be used for initialization | 7594 is an optional companion pattern that can be used for initialization |
7306 needed for some low-overhead looping instructions. | 7595 needed for some low-overhead looping instructions. |
7316 additional labels can be emitted at this point. In addition, if the | 7605 additional labels can be emitted at this point. In addition, if the |
7317 desired special iteration counter register was not allocated, this | 7606 desired special iteration counter register was not allocated, this |
7318 machine dependent reorg pass could emit a traditional compare and jump | 7607 machine dependent reorg pass could emit a traditional compare and jump |
7319 instruction pair. | 7608 instruction pair. |
7320 | 7609 |
7321 The essential difference between the | 7610 For the @samp{doloop_end} pattern, the loop optimizer allocates an |
7322 @samp{decrement_and_branch_until_zero} and the @samp{doloop_end} | 7611 additional pseudo register as an iteration counter. This pseudo |
7323 patterns is that the loop optimizer allocates an additional pseudo | 7612 register cannot be used within the loop (i.e., general induction |
7324 register for the latter as an iteration counter. This pseudo register | 7613 variables cannot be derived from it), however, in many cases the loop |
7325 cannot be used within the loop (i.e., general induction variables cannot | 7614 induction variable may become redundant and removed by the flow pass. |
7326 be derived from it), however, in many cases the loop induction variable | 7615 |
7327 may become redundant and removed by the flow pass. | 7616 The @samp{doloop_end} pattern must have a specific structure to be |
7328 | 7617 handled correctly by GCC. The example below is taken (slightly |
7618 simplified) from the PDP-11 target: | |
7619 | |
7620 @smallexample | |
7621 @group | |
7622 (define_expand "doloop_end" | |
7623 [(parallel [(set (pc) | |
7624 (if_then_else | |
7625 (ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m") | |
7626 (const_int 1)) | |
7627 (label_ref (match_operand 1 "" "")) | |
7628 (pc))) | |
7629 (set (match_dup 0) | |
7630 (plus:HI (match_dup 0) | |
7631 (const_int -1)))])] | |
7632 "" | |
7633 "@{ | |
7634 if (GET_MODE (operands[0]) != HImode) | |
7635 FAIL; | |
7636 @}") | |
7637 | |
7638 (define_insn "doloop_end_insn" | |
7639 [(set (pc) | |
7640 (if_then_else | |
7641 (ne (match_operand:HI 0 "nonimmediate_operand" "+r,!m") | |
7642 (const_int 1)) | |
7643 (label_ref (match_operand 1 "" "")) | |
7644 (pc))) | |
7645 (set (match_dup 0) | |
7646 (plus:HI (match_dup 0) | |
7647 (const_int -1)))] | |
7648 "" | |
7649 | |
7650 @{ | |
7651 if (which_alternative == 0) | |
7652 return "sob %0,%l1"; | |
7653 | |
7654 /* emulate sob */ | |
7655 output_asm_insn ("dec %0", operands); | |
7656 return "bne %l1"; | |
7657 @}) | |
7658 @end group | |
7659 @end smallexample | |
7660 | |
7661 The first part of the pattern describes the branch condition. GCC | |
7662 supports three cases for the way the target machine handles the loop | |
7663 counter: | |
7664 @itemize @bullet | |
7665 @item Loop terminates when the loop register decrements to zero. This | |
7666 is represented by a @code{ne} comparison of the register (its old value) | |
7667 with constant 1 (as in the example above). | |
7668 @item Loop terminates when the loop register decrements to @minus{}1. | |
7669 This is represented by a @code{ne} comparison of the register with | |
7670 constant zero. | |
7671 @item Loop terminates when the loop register decrements to a negative | |
7672 value. This is represented by a @code{ge} comparison of the register | |
7673 with constant zero. For this case, GCC will attach a @code{REG_NONNEG} | |
7674 note to the @code{doloop_end} insn if it can determine that the register | |
7675 will be non-negative. | |
7676 @end itemize | |
7677 | |
7678 Since the @code{doloop_end} insn is a jump insn that also has an output, | |
7679 the reload pass does not handle the output operand. Therefore, the | |
7680 constraint must allow for that operand to be in memory rather than a | |
7681 register. In the example shown above, that is handled (in the | |
7682 @code{doloop_end_insn} pattern) by using a loop instruction sequence | |
7683 that can handle memory operands when the memory alternative appears. | |
7684 | |
7685 GCC does not check the mode of the loop register operand when generating | |
7686 the @code{doloop_end} pattern. If the pattern is only valid for some | |
7687 modes but not others, the pattern should be a @code{define_expand} | |
7688 pattern that checks the operand mode in the preparation code, and issues | |
7689 @code{FAIL} if an unsupported mode is found. The example above does | |
7690 this, since the machine instruction to be used only exists for | |
7691 @code{HImode}. | |
7692 | |
7693 If the @code{doloop_end} pattern is a @code{define_expand}, there must | |
7694 also be a @code{define_insn} or @code{define_insn_and_split} matching | |
7695 the generated pattern. Otherwise, the compiler will fail during loop | |
7696 optimization. | |
7329 | 7697 |
7330 @end ifset | 7698 @end ifset |
7331 @ifset INTERNALS | 7699 @ifset INTERNALS |
7332 @node Insn Canonicalizations | 7700 @node Insn Canonicalizations |
7333 @section Canonicalization of Instructions | 7701 @section Canonicalization of Instructions |
7781 and are executed before the new RTL is generated to prepare for the | 8149 and are executed before the new RTL is generated to prepare for the |
7782 generated code or emit some insns whose pattern is not fixed. Unlike | 8150 generated code or emit some insns whose pattern is not fixed. Unlike |
7783 those in @code{define_expand}, however, these statements must not | 8151 those in @code{define_expand}, however, these statements must not |
7784 generate any new pseudo-registers. Once reload has completed, they also | 8152 generate any new pseudo-registers. Once reload has completed, they also |
7785 must not allocate any space in the stack frame. | 8153 must not allocate any space in the stack frame. |
8154 | |
8155 There are two special macros defined for use in the preparation statements: | |
8156 @code{DONE} and @code{FAIL}. Use them with a following semicolon, | |
8157 as a statement. | |
8158 | |
8159 @table @code | |
8160 | |
8161 @findex DONE | |
8162 @item DONE | |
8163 Use the @code{DONE} macro to end RTL generation for the splitter. The | |
8164 only RTL insns generated as replacement for the matched input insn will | |
8165 be those already emitted by explicit calls to @code{emit_insn} within | |
8166 the preparation statements; the replacement pattern is not used. | |
8167 | |
8168 @findex FAIL | |
8169 @item FAIL | |
8170 Make the @code{define_split} fail on this occasion. When a @code{define_split} | |
8171 fails, it means that the splitter was not truly available for the inputs | |
8172 it was given, and the input insn will not be split. | |
8173 @end table | |
8174 | |
8175 If the preparation falls through (invokes neither @code{DONE} nor | |
8176 @code{FAIL}), then the @code{define_split} uses the replacement | |
8177 template. | |
7786 | 8178 |
7787 Patterns are matched against @var{insn-pattern} in two different | 8179 Patterns are matched against @var{insn-pattern} in two different |
7788 circumstances. If an insn needs to be split for delay slot scheduling | 8180 circumstances. If an insn needs to be split for delay slot scheduling |
7789 or insn scheduling, the insn is already known to be valid, which means | 8181 or insn scheduling, the insn is already known to be valid, which means |
7790 that it must have been matched by some @code{define_insn} and, if | 8182 that it must have been matched by some @code{define_insn} and, if |
7888 definitions, one for the insns that are valid and one for the insns that | 8280 definitions, one for the insns that are valid and one for the insns that |
7889 are not valid. | 8281 are not valid. |
7890 | 8282 |
7891 The splitter is allowed to split jump instructions into sequence of | 8283 The splitter is allowed to split jump instructions into sequence of |
7892 jumps or create new jumps in while splitting non-jump instructions. As | 8284 jumps or create new jumps in while splitting non-jump instructions. As |
7893 the central flowgraph and branch prediction information needs to be updated, | 8285 the control flow graph and branch prediction information needs to be updated, |
7894 several restriction apply. | 8286 several restriction apply. |
7895 | 8287 |
7896 Splitting of jump instruction into sequence that over by another jump | 8288 Splitting of jump instruction into sequence that over by another jump |
7897 instruction is always valid, as compiler expect identical behavior of new | 8289 instruction is always valid, as compiler expect identical behavior of new |
7898 jump. When new sequence contains multiple jump instructions or new labels, | 8290 jump. When new sequence contains multiple jump instructions or new labels, |
8336 (set (match_dup 0) (match_dup 4)) | 8728 (set (match_dup 0) (match_dup 4)) |
8337 (set (match_dup 2) (match_dup 4)) | 8729 (set (match_dup 2) (match_dup 4)) |
8338 (set (match_dup 3) (match_dup 4))] | 8730 (set (match_dup 3) (match_dup 4))] |
8339 "") | 8731 "") |
8340 @end smallexample | 8732 @end smallexample |
8733 | |
8734 There are two special macros defined for use in the preparation statements: | |
8735 @code{DONE} and @code{FAIL}. Use them with a following semicolon, | |
8736 as a statement. | |
8737 | |
8738 @table @code | |
8739 | |
8740 @findex DONE | |
8741 @item DONE | |
8742 Use the @code{DONE} macro to end RTL generation for the peephole. The | |
8743 only RTL insns generated as replacement for the matched input insn will | |
8744 be those already emitted by explicit calls to @code{emit_insn} within | |
8745 the preparation statements; the replacement pattern is not used. | |
8746 | |
8747 @findex FAIL | |
8748 @item FAIL | |
8749 Make the @code{define_peephole2} fail on this occasion. When a @code{define_peephole2} | |
8750 fails, it means that the replacement was not truly available for the | |
8751 particular inputs it was given. In that case, GCC may still apply a | |
8752 later @code{define_peephole2} that also matches the given insn pattern. | |
8753 (Note that this is different from @code{define_split}, where @code{FAIL} | |
8754 prevents the input insn from being split at all.) | |
8755 @end table | |
8756 | |
8757 If the preparation falls through (invokes neither @code{DONE} nor | |
8758 @code{FAIL}), then the @code{define_peephole2} uses the replacement | |
8759 template. | |
8341 | 8760 |
8342 @noindent | 8761 @noindent |
8343 If we had not added the @code{(match_dup 4)} in the middle of the input | 8762 If we had not added the @code{(match_dup 4)} in the middle of the input |
8344 sequence, it might have been the case that the register we chose at the | 8763 sequence, it might have been the case that the register we chose at the |
8345 beginning of the sequence is killed by the first or second @code{set}. | 8764 beginning of the sequence is killed by the first or second @code{set}. |
9615 All simple integer insns can be executed in any integer pipeline and | 10034 All simple integer insns can be executed in any integer pipeline and |
9616 their result is ready in two cycles. The simple integer insns are | 10035 their result is ready in two cycles. The simple integer insns are |
9617 issued into the first pipeline unless it is reserved, otherwise they | 10036 issued into the first pipeline unless it is reserved, otherwise they |
9618 are issued into the second pipeline. Integer division and | 10037 are issued into the second pipeline. Integer division and |
9619 multiplication insns can be executed only in the second integer | 10038 multiplication insns can be executed only in the second integer |
9620 pipeline and their results are ready correspondingly in 8 and 4 | 10039 pipeline and their results are ready correspondingly in 9 and 4 |
9621 cycles. The integer division is not pipelined, i.e.@: the subsequent | 10040 cycles. The integer division is not pipelined, i.e.@: the subsequent |
9622 integer division insn can not be issued until the current division | 10041 integer division insn can not be issued until the current division |
9623 insn finished. Floating point insns are fully pipelined and their | 10042 insn finished. Floating point insns are fully pipelined and their |
9624 results are ready in 3 cycles. Where the result of a floating point | 10043 results are ready in 3 cycles. Where the result of a floating point |
9625 insn is used by an integer insn, an additional delay of one cycle is | 10044 insn is used by an integer insn, an additional delay of one cycle is |
9632 "(i0_pipeline | i1_pipeline), (port0 | port1)") | 10051 "(i0_pipeline | i1_pipeline), (port0 | port1)") |
9633 | 10052 |
9634 (define_insn_reservation "mult" 4 (eq_attr "type" "mult") | 10053 (define_insn_reservation "mult" 4 (eq_attr "type" "mult") |
9635 "i1_pipeline, nothing*2, (port0 | port1)") | 10054 "i1_pipeline, nothing*2, (port0 | port1)") |
9636 | 10055 |
9637 (define_insn_reservation "div" 8 (eq_attr "type" "div") | 10056 (define_insn_reservation "div" 9 (eq_attr "type" "div") |
9638 "i1_pipeline, div*7, div + (port0 | port1)") | 10057 "i1_pipeline, div*7, div + (port0 | port1)") |
9639 | 10058 |
9640 (define_insn_reservation "float" 3 (eq_attr "type" "float") | 10059 (define_insn_reservation "float" 3 (eq_attr "type" "float") |
9641 "f_pipeline, nothing, (port0 | port1)) | 10060 "f_pipeline, nothing, (port0 | port1)) |
9642 | 10061 |
9934 @code{match_dup N} is used in the output template to be replaced with | 10353 @code{match_dup N} is used in the output template to be replaced with |
9935 the expression from the original pattern, which matched | 10354 the expression from the original pattern, which matched |
9936 @code{match_operand N} from the input pattern. As a consequence, | 10355 @code{match_operand N} from the input pattern. As a consequence, |
9937 @code{match_dup} cannot be used to point to @code{match_operand}s from | 10356 @code{match_dup} cannot be used to point to @code{match_operand}s from |
9938 the output pattern, it should always refer to a @code{match_operand} | 10357 the output pattern, it should always refer to a @code{match_operand} |
9939 from the input pattern. | 10358 from the input pattern. If a @code{match_dup N} occurs more than once |
10359 in the output template, its first occurrence is replaced with the | |
10360 expression from the original pattern, and the subsequent expressions | |
10361 are replaced with @code{match_dup N}, i.e., a reference to the first | |
10362 expression. | |
9940 | 10363 |
9941 In the output template one can refer to the expressions from the | 10364 In the output template one can refer to the expressions from the |
9942 original pattern and create new ones. For instance, some operands could | 10365 original pattern and create new ones. For instance, some operands could |
9943 be added by means of standard @code{match_operand}. | 10366 be added by means of standard @code{match_operand}. |
9944 | 10367 |
10142 @menu | 10565 @menu |
10143 * Mode Iterators:: Generating variations of patterns for different modes. | 10566 * Mode Iterators:: Generating variations of patterns for different modes. |
10144 * Code Iterators:: Doing the same for codes. | 10567 * Code Iterators:: Doing the same for codes. |
10145 * Int Iterators:: Doing the same for integers. | 10568 * Int Iterators:: Doing the same for integers. |
10146 * Subst Iterators:: Generating variations of patterns for define_subst. | 10569 * Subst Iterators:: Generating variations of patterns for define_subst. |
10570 * Parameterized Names:: Specifying iterator values in C++ code. | |
10147 @end menu | 10571 @end menu |
10148 | 10572 |
10149 @node Mode Iterators | 10573 @node Mode Iterators |
10150 @subsection Mode Iterators | 10574 @subsection Mode Iterators |
10151 @cindex mode iterators in @file{.md} files | 10575 @cindex mode iterators in @file{.md} files |
10537 replaced in the first copy of the original RTL-template. | 10961 replaced in the first copy of the original RTL-template. |
10538 | 10962 |
10539 @var{subst-applied-value} is a value with which subst-attribute would be | 10963 @var{subst-applied-value} is a value with which subst-attribute would be |
10540 replaced in the second copy of the original RTL-template. | 10964 replaced in the second copy of the original RTL-template. |
10541 | 10965 |
10966 @node Parameterized Names | |
10967 @subsection Parameterized Names | |
10968 @cindex @samp{@@} in instruction pattern names | |
10969 Ports sometimes need to apply iterators using C++ code, in order to | |
10970 get the code or RTL pattern for a specific instruction. For example, | |
10971 suppose we have the @samp{neon_vq<absneg><mode>} pattern given above: | |
10972 | |
10973 @smallexample | |
10974 (define_int_iterator QABSNEG [UNSPEC_VQABS UNSPEC_VQNEG]) | |
10975 | |
10976 (define_int_attr absneg [(UNSPEC_VQABS "abs") (UNSPEC_VQNEG "neg")]) | |
10977 | |
10978 (define_insn "neon_vq<absneg><mode>" | |
10979 [(set (match_operand:VDQIW 0 "s_register_operand" "=w") | |
10980 (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w") | |
10981 (match_operand:SI 2 "immediate_operand" "i")] | |
10982 QABSNEG))] | |
10983 @dots{} | |
10984 ) | |
10985 @end smallexample | |
10986 | |
10987 A port might need to generate this pattern for a variable | |
10988 @samp{QABSNEG} value and a variable @samp{VDQIW} mode. There are two | |
10989 ways of doing this. The first is to build the rtx for the pattern | |
10990 directly from C++ code; this is a valid technique and avoids any risk | |
10991 of combinatorial explosion. The second is to prefix the instruction | |
10992 name with the special character @samp{@@}, which tells GCC to generate | |
10993 the four additional functions below. In each case, @var{name} is the | |
10994 name of the instruction without the leading @samp{@@} character, | |
10995 without the @samp{<@dots{}>} placeholders, and with any underscore | |
10996 before a @samp{<@dots{}>} placeholder removed if keeping it would | |
10997 lead to a double or trailing underscore. | |
10998 | |
10999 @table @samp | |
11000 @item insn_code maybe_code_for_@var{name} (@var{i1}, @var{i2}, @dots{}) | |
11001 See whether replacing the first @samp{<@dots{}>} placeholder with | |
11002 iterator value @var{i1}, the second with iterator value @var{i2}, and | |
11003 so on, gives a valid instruction. Return its code if so, otherwise | |
11004 return @code{CODE_FOR_nothing}. | |
11005 | |
11006 @item insn_code code_for_@var{name} (@var{i1}, @var{i2}, @dots{}) | |
11007 Same, but abort the compiler if the requested instruction does not exist. | |
11008 | |
11009 @item rtx maybe_gen_@var{name} (@var{i1}, @var{i2}, @dots{}, @var{op0}, @var{op1}, @dots{}) | |
11010 Check for a valid instruction in the same way as | |
11011 @code{maybe_code_for_@var{name}}. If the instruction exists, | |
11012 generate an instance of it using the operand values given by @var{op0}, | |
11013 @var{op1}, and so on, otherwise return null. | |
11014 | |
11015 @item rtx gen_@var{name} (@var{i1}, @var{i2}, @dots{}, @var{op0}, @var{op1}, @dots{}) | |
11016 Same, but abort the compiler if the requested instruction does not exist, | |
11017 or if the instruction generator invoked the @code{FAIL} macro. | |
11018 @end table | |
11019 | |
11020 For example, changing the pattern above to: | |
11021 | |
11022 @smallexample | |
11023 (define_insn "@@neon_vq<absneg><mode>" | |
11024 [(set (match_operand:VDQIW 0 "s_register_operand" "=w") | |
11025 (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w") | |
11026 (match_operand:SI 2 "immediate_operand" "i")] | |
11027 QABSNEG))] | |
11028 @dots{} | |
11029 ) | |
11030 @end smallexample | |
11031 | |
11032 would define the same patterns as before, but in addition would generate | |
11033 the four functions below: | |
11034 | |
11035 @smallexample | |
11036 insn_code maybe_code_for_neon_vq (int, machine_mode); | |
11037 insn_code code_for_neon_vq (int, machine_mode); | |
11038 rtx maybe_gen_neon_vq (int, machine_mode, rtx, rtx, rtx); | |
11039 rtx gen_neon_vq (int, machine_mode, rtx, rtx, rtx); | |
11040 @end smallexample | |
11041 | |
11042 Calling @samp{code_for_neon_vq (UNSPEC_VQABS, V8QImode)} | |
11043 would then give @code{CODE_FOR_neon_vqabsv8qi}. | |
11044 | |
11045 It is possible to have multiple @samp{@@} patterns with the same | |
11046 name and same types of iterator. For example: | |
11047 | |
11048 @smallexample | |
11049 (define_insn "@@some_arithmetic_op<mode>" | |
11050 [(set (match_operand:INTEGER_MODES 0 "register_operand") @dots{})] | |
11051 @dots{} | |
11052 ) | |
11053 | |
11054 (define_insn "@@some_arithmetic_op<mode>" | |
11055 [(set (match_operand:FLOAT_MODES 0 "register_operand") @dots{})] | |
11056 @dots{} | |
11057 ) | |
11058 @end smallexample | |
11059 | |
11060 would produce a single set of functions that handles both | |
11061 @code{INTEGER_MODES} and @code{FLOAT_MODES}. | |
11062 | |
10542 @end ifset | 11063 @end ifset |