Mercurial > hg > CbC > CbC_gcc
comparison gcc/jit/docs/cp/intro/tutorial04.rst @ 111:04ced10e8804
gcc 7
author | kono |
---|---|
date | Fri, 27 Oct 2017 22:46:09 +0900 |
parents | |
children | 84e7813d76e9 |
comparison
equal
deleted
inserted
replaced
68:561a7518be6b | 111:04ced10e8804 |
---|---|
1 .. Copyright (C) 2014-2017 Free Software Foundation, Inc. | |
2 Originally contributed by David Malcolm <dmalcolm@redhat.com> | |
3 | |
4 This is free software: you can redistribute it and/or modify it | |
5 under the terms of the GNU General Public License as published by | |
6 the Free Software Foundation, either version 3 of the License, or | |
7 (at your option) any later version. | |
8 | |
9 This program is distributed in the hope that it will be useful, but | |
10 WITHOUT ANY WARRANTY; without even the implied warranty of | |
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
12 General Public License for more details. | |
13 | |
14 You should have received a copy of the GNU General Public License | |
15 along with this program. If not, see | |
16 <http://www.gnu.org/licenses/>. | |
17 | |
18 .. default-domain:: cpp | |
19 | |
20 Tutorial part 4: Adding JIT-compilation to a toy interpreter | |
21 ------------------------------------------------------------ | |
22 In this example we construct a "toy" interpreter, and add JIT-compilation | |
23 to it. | |
24 | |
25 Our toy interpreter | |
26 ******************* | |
27 | |
28 It's a stack-based interpreter, and is intended as a (very simple) example | |
29 of the kind of bytecode interpreter seen in dynamic languages such as | |
30 Python, Ruby etc. | |
31 | |
32 For the sake of simplicity, our toy virtual machine is very limited: | |
33 | |
34 * The only data type is `int` | |
35 | |
36 * It can only work on one function at a time (so that the only | |
37 function call that can be made is to recurse). | |
38 | |
39 * Functions can only take one parameter. | |
40 | |
41 * Functions have a stack of `int` values. | |
42 | |
43 * We'll implement function call within the interpreter by calling a | |
44 function in our implementation, rather than implementing our own | |
45 frame stack. | |
46 | |
47 * The parser is only good enough to get the examples to work. | |
48 | |
49 Naturally, a real interpreter would be much more complicated that this. | |
50 | |
51 The following operations are supported: | |
52 | |
53 ====================== ======================== =============== ============== | |
54 Operation Meaning Old Stack New Stack | |
55 ====================== ======================== =============== ============== | |
56 DUP Duplicate top of stack. ``[..., x]`` ``[..., x, x]`` | |
57 ROT Swap top two elements ``[..., x, y]`` ``[..., y, x]`` | |
58 of stack. | |
59 BINARY_ADD Add the top two elements ``[..., x, y]`` ``[..., (x+y)]`` | |
60 on the stack. | |
61 BINARY_SUBTRACT Likewise, but subtract. ``[..., x, y]`` ``[..., (x-y)]`` | |
62 BINARY_MULT Likewise, but multiply. ``[..., x, y]`` ``[..., (x*y)]`` | |
63 BINARY_COMPARE_LT Compare the top two ``[..., x, y]`` ``[..., (x<y)]`` | |
64 elements on the stack | |
65 and push a nonzero/zero | |
66 if (x<y). | |
67 RECURSE Recurse, passing the top ``[..., x]`` ``[..., fn(x)]`` | |
68 of the stack, and | |
69 popping the result. | |
70 RETURN Return the top of the ``[x]`` ``[]`` | |
71 stack. | |
72 PUSH_CONST `arg` Push an int const. ``[...]`` ``[..., arg]`` | |
73 JUMP_ABS_IF_TRUE `arg` Pop; if top of stack was ``[..., x]`` ``[...]`` | |
74 nonzero, jump to | |
75 ``arg``. | |
76 ====================== ======================== =============== ============== | |
77 | |
78 Programs can be interpreted, disassembled, and compiled to machine code. | |
79 | |
80 The interpreter reads ``.toy`` scripts. Here's what a simple recursive | |
81 factorial program looks like, the script ``factorial.toy``. | |
82 The parser ignores lines beginning with a `#`. | |
83 | |
84 .. literalinclude:: ../../examples/tut04-toyvm/factorial.toy | |
85 :lines: 1- | |
86 :language: c | |
87 | |
88 The interpreter is a simple infinite loop with a big ``switch`` statement | |
89 based on what the next opcode is: | |
90 | |
91 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
92 :start-after: /* Execute the given function. */ | |
93 :end-before: /* JIT compilation. */ | |
94 :language: c++ | |
95 | |
96 Compiling to machine code | |
97 ************************* | |
98 We want to generate machine code that can be cast to this type and | |
99 then directly executed in-process: | |
100 | |
101 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
102 :start-after: /* Functions are compiled to this function ptr type. */ | |
103 :end-before: enum opcode | |
104 :language: c++ | |
105 | |
106 Our compiler isn't very sophisticated; it takes the implementation of | |
107 each opcode above, and maps it directly to the operations supported by | |
108 the libgccjit API. | |
109 | |
110 How should we handle the stack? In theory we could calculate what the | |
111 stack depth will be at each opcode, and optimize away the stack | |
112 manipulation "by hand". We'll see below that libgccjit is able to do | |
113 this for us, so we'll implement stack manipulation | |
114 in a direct way, by creating a ``stack`` array and ``stack_depth`` | |
115 variables, local within the generated function, equivalent to this C code: | |
116 | |
117 .. code-block:: c | |
118 | |
119 int stack_depth; | |
120 int stack[MAX_STACK_DEPTH]; | |
121 | |
122 We'll also have local variables ``x`` and ``y`` for use when implementing | |
123 the opcodes, equivalent to this: | |
124 | |
125 .. code-block:: c | |
126 | |
127 int x; | |
128 int y; | |
129 | |
130 This means our compiler has the following state: | |
131 | |
132 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
133 :start-after: /* State. */ | |
134 :end-before: }; | |
135 :language: c++ | |
136 | |
137 Setting things up | |
138 ***************** | |
139 | |
140 First we create our types: | |
141 | |
142 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
143 :start-after: /* Create types. */ | |
144 :end-before: /* The constant value 1. */ | |
145 :language: c++ | |
146 | |
147 along with extracting a useful `int` constant: | |
148 | |
149 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
150 :start-after: /* The constant value 1. */ | |
151 :end-before: /* Create locations. */ | |
152 :language: c++ | |
153 | |
154 We'll implement push and pop in terms of the ``stack`` array and | |
155 ``stack_depth``. Here are helper functions for adding statements to | |
156 a block, implementing pushing and popping values: | |
157 | |
158 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
159 :start-after: /* Stack manipulation. */ | |
160 :end-before: /* Create the context. */ | |
161 :language: c++ | |
162 | |
163 We will support single-stepping through the generated code in the | |
164 debugger, so we need to create :type:`gccjit::location` instances, one | |
165 per operation in the source code. These will reference the lines of | |
166 e.g. ``factorial.toy``. | |
167 | |
168 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
169 :start-after: /* Create locations. */ | |
170 :end-before: /* Creating the function. */ | |
171 :language: c++ | |
172 | |
173 Let's create the function itself. As usual, we create its parameter | |
174 first, then use the parameter to create the function: | |
175 | |
176 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
177 :start-after: /* Creating the function. */ | |
178 :end-before: /* Create stack lvalues. */ | |
179 :language: c++ | |
180 | |
181 We create the locals within the function. | |
182 | |
183 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
184 :start-after: /* Create stack lvalues. */ | |
185 :end-before: /* 1st pass: create blocks, one per opcode. | |
186 :language: c++ | |
187 | |
188 Populating the function | |
189 *********************** | |
190 | |
191 There's some one-time initialization, and the API treats the first block | |
192 you create as the entrypoint of the function, so we need to create that | |
193 block first: | |
194 | |
195 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
196 :start-after: first. */ | |
197 :end-before: /* Create a block per operation. */ | |
198 :language: c++ | |
199 | |
200 We can now create blocks for each of the operations. Most of these will | |
201 be consolidated into larger blocks when the optimizer runs. | |
202 | |
203 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
204 :start-after: /* Create a block per operation. */ | |
205 :end-before: /* Populate the initial block. */ | |
206 :language: c++ | |
207 | |
208 Now that we have a block it can jump to when it's done, we can populate | |
209 the initial block: | |
210 | |
211 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
212 :start-after: /* Populate the initial block. */ | |
213 :end-before: /* 2nd pass: fill in instructions. */ | |
214 :language: c++ | |
215 | |
216 We can now populate the blocks for the individual operations. We loop | |
217 through them, adding instructions to their blocks: | |
218 | |
219 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
220 :start-after: /* 2nd pass: fill in instructions. */ | |
221 :end-before: /* Helper macros. */ | |
222 :language: c++ | |
223 | |
224 We're going to have another big ``switch`` statement for implementing | |
225 the opcodes, this time for compiling them, rather than interpreting | |
226 them. It's helpful to have macros for implementing push and pop, so that | |
227 we can make the ``switch`` statement that's coming up look as much as | |
228 possible like the one above within the interpreter: | |
229 | |
230 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
231 :start-after: /* Helper macros. */ | |
232 :end-before: block.add_comment | |
233 :language: c++ | |
234 | |
235 .. note:: | |
236 | |
237 A particularly clever implementation would have an *identical* | |
238 ``switch`` statement shared by the interpreter and the compiler, with | |
239 some preprocessor "magic". We're not doing that here, for the sake | |
240 of simplicity. | |
241 | |
242 When I first implemented this compiler, I accidentally missed an edit | |
243 when copying and pasting the ``Y_EQUALS_POP`` macro, so that popping the | |
244 stack into ``y`` instead erroneously assigned it to ``x``, leaving ``y`` | |
245 uninitialized. | |
246 | |
247 To track this kind of thing down, we can use | |
248 :func:`gccjit::block::add_comment` to add descriptive comments | |
249 to the internal representation. This is invaluable when looking through | |
250 the generated IR for, say ``factorial``: | |
251 | |
252 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
253 :start-after: PUSH_RVALUE (y) | |
254 :end-before: /* Handle the individual opcodes. */ | |
255 :language: c++ | |
256 | |
257 We can now write the big ``switch`` statement that implements the | |
258 individual opcodes, populating the relevant block with statements: | |
259 | |
260 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
261 :start-after: /* Handle the individual opcodes. */ | |
262 :end-before: /* Go to the next block. */ | |
263 :language: c++ | |
264 | |
265 Every block must be terminated, via a call to one of the | |
266 ``gccjit::block::end_with_`` entrypoints. This has been done for two | |
267 of the opcodes, but we need to do it for the other ones, by jumping | |
268 to the next block. | |
269 | |
270 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
271 :start-after: /* Go to the next block. */ | |
272 :end-before: /* end of loop on PC locations. */ | |
273 :language: c++ | |
274 | |
275 This is analogous to simply incrementing the program counter. | |
276 | |
277 Verifying the control flow graph | |
278 ******************************** | |
279 Having finished looping over the blocks, the context is complete. | |
280 | |
281 As before, we can verify that the control flow and statements are sane by | |
282 using :func:`gccjit::function::dump_to_dot`: | |
283 | |
284 .. code-block:: c++ | |
285 | |
286 fn.dump_to_dot ("/tmp/factorial.dot"); | |
287 | |
288 and viewing the result. Note how the label names, comments, and | |
289 variable names show up in the dump, to make it easier to spot | |
290 errors in our compiler. | |
291 | |
292 .. figure:: ../../intro/factorial.png | |
293 :alt: image of a control flow graph | |
294 | |
295 Compiling the context | |
296 ********************* | |
297 Having finished looping over the blocks and populating them with | |
298 statements, the context is complete. | |
299 | |
300 We can now compile it, extract machine code from the result, and | |
301 run it: | |
302 | |
303 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
304 :start-after: /* Wrapper around a gcc_jit_result *. */ | |
305 :end-before: /* Functions are compiled to this function ptr type. */ | |
306 :language: c++ | |
307 | |
308 .. literalinclude:: ../../examples/tut04-toyvm/toyvm.cc | |
309 :start-after: /* JIT-compilation. */ | |
310 :end-before: return 0; | |
311 :language: c++ | |
312 | |
313 Single-stepping through the generated code | |
314 ****************************************** | |
315 | |
316 It's possible to debug the generated code. To do this we need to both: | |
317 | |
318 * Set up source code locations for our statements, so that we can | |
319 meaningfully step through the code. We did this above by | |
320 calling :func:`gccjit::context::new_location` and using the | |
321 results. | |
322 | |
323 * Enable the generation of debugging information, by setting | |
324 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` on the | |
325 :type:`gccjit::context` via | |
326 :func:`gccjit::context::set_bool_option`: | |
327 | |
328 .. code-block:: c++ | |
329 | |
330 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DEBUGINFO, 1); | |
331 | |
332 Having done this, we can put a breakpoint on the generated function: | |
333 | |
334 .. code-block:: console | |
335 | |
336 $ gdb --args ./toyvm factorial.toy 10 | |
337 (gdb) break factorial | |
338 Function "factorial" not defined. | |
339 Make breakpoint pending on future shared library load? (y or [n]) y | |
340 Breakpoint 1 (factorial) pending. | |
341 (gdb) run | |
342 Breakpoint 1, factorial (arg=10) at factorial.toy:14 | |
343 14 DUP | |
344 | |
345 We've set up location information, which references ``factorial.toy``. | |
346 This allows us to use e.g. ``list`` to see where we are in the script: | |
347 | |
348 .. code-block:: console | |
349 | |
350 (gdb) list | |
351 9 | |
352 10 # Initial state: | |
353 11 # stack: [arg] | |
354 12 | |
355 13 # 0: | |
356 14 DUP | |
357 15 # stack: [arg, arg] | |
358 16 | |
359 17 # 1: | |
360 18 PUSH_CONST 2 | |
361 | |
362 and to step through the function, examining the data: | |
363 | |
364 .. code-block:: console | |
365 | |
366 (gdb) n | |
367 18 PUSH_CONST 2 | |
368 (gdb) n | |
369 22 BINARY_COMPARE_LT | |
370 (gdb) print stack | |
371 $5 = {10, 10, 2, 0, -7152, 32767, 0, 0} | |
372 (gdb) print stack_depth | |
373 $6 = 3 | |
374 | |
375 You'll see that the parts of the ``stack`` array that haven't been | |
376 touched yet are uninitialized. | |
377 | |
378 .. note:: | |
379 | |
380 Turning on optimizations may lead to unpredictable results when | |
381 stepping through the generated code: the execution may appear to | |
382 "jump around" the source code. This is analogous to turning up the | |
383 optimization level in a regular compiler. | |
384 | |
385 Examining the generated code | |
386 **************************** | |
387 | |
388 How good is the optimized code? | |
389 | |
390 We can turn up optimizations, by calling | |
391 :cpp:func:`gccjit::context::set_int_option` with | |
392 :c:macro:`GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL`: | |
393 | |
394 .. code-block:: c++ | |
395 | |
396 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 3); | |
397 | |
398 One of GCC's internal representations is called "gimple". A dump of the | |
399 initial gimple representation of the code can be seen by setting: | |
400 | |
401 .. code-block:: c++ | |
402 | |
403 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE, 1); | |
404 | |
405 With optimization on and source locations displayed, this gives: | |
406 | |
407 .. We'll use "c" for gimple dumps | |
408 | |
409 .. code-block:: c | |
410 | |
411 factorial (signed int arg) | |
412 { | |
413 <unnamed type> D.80; | |
414 signed int D.81; | |
415 signed int D.82; | |
416 signed int D.83; | |
417 signed int D.84; | |
418 signed int D.85; | |
419 signed int y; | |
420 signed int x; | |
421 signed int stack_depth; | |
422 signed int stack[8]; | |
423 | |
424 try | |
425 { | |
426 initial: | |
427 stack_depth = 0; | |
428 stack[stack_depth] = arg; | |
429 stack_depth = stack_depth + 1; | |
430 goto instr0; | |
431 instr0: | |
432 /* DUP */: | |
433 stack_depth = stack_depth + -1; | |
434 x = stack[stack_depth]; | |
435 stack[stack_depth] = x; | |
436 stack_depth = stack_depth + 1; | |
437 stack[stack_depth] = x; | |
438 stack_depth = stack_depth + 1; | |
439 goto instr1; | |
440 instr1: | |
441 /* PUSH_CONST */: | |
442 stack[stack_depth] = 2; | |
443 stack_depth = stack_depth + 1; | |
444 goto instr2; | |
445 | |
446 /* etc */ | |
447 | |
448 You can see the generated machine code in assembly form via: | |
449 | |
450 .. code-block:: c++ | |
451 | |
452 ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE, 1); | |
453 result = ctxt.compile (); | |
454 | |
455 which shows that (on this x86_64 box) the compiler has unrolled the loop | |
456 and is using MMX instructions to perform several multiplications | |
457 simultaneously: | |
458 | |
459 .. code-block:: gas | |
460 | |
461 .file "fake.c" | |
462 .text | |
463 .Ltext0: | |
464 .p2align 4,,15 | |
465 .globl factorial | |
466 .type factorial, @function | |
467 factorial: | |
468 .LFB0: | |
469 .file 1 "factorial.toy" | |
470 .loc 1 14 0 | |
471 .cfi_startproc | |
472 .LVL0: | |
473 .L2: | |
474 .loc 1 26 0 | |
475 cmpl $1, %edi | |
476 jle .L13 | |
477 leal -1(%rdi), %edx | |
478 movl %edx, %ecx | |
479 shrl $2, %ecx | |
480 leal 0(,%rcx,4), %esi | |
481 testl %esi, %esi | |
482 je .L14 | |
483 cmpl $9, %edx | |
484 jbe .L14 | |
485 leal -2(%rdi), %eax | |
486 movl %eax, -16(%rsp) | |
487 leal -3(%rdi), %eax | |
488 movd -16(%rsp), %xmm0 | |
489 movl %edi, -16(%rsp) | |
490 movl %eax, -12(%rsp) | |
491 movd -16(%rsp), %xmm1 | |
492 xorl %eax, %eax | |
493 movl %edx, -16(%rsp) | |
494 movd -12(%rsp), %xmm4 | |
495 movd -16(%rsp), %xmm6 | |
496 punpckldq %xmm4, %xmm0 | |
497 movdqa .LC1(%rip), %xmm4 | |
498 punpckldq %xmm6, %xmm1 | |
499 punpcklqdq %xmm0, %xmm1 | |
500 movdqa .LC0(%rip), %xmm0 | |
501 jmp .L5 | |
502 # etc - edited for brevity | |
503 | |
504 This is clearly overkill for a function that will likely overflow the | |
505 ``int`` type before the vectorization is worthwhile - but then again, this | |
506 is a toy example. | |
507 | |
508 Turning down the optimization level to 2: | |
509 | |
510 .. code-block:: c++ | |
511 | |
512 ctxt.set_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL, 2); | |
513 | |
514 yields this code, which is simple enough to quote in its entirety: | |
515 | |
516 .. code-block:: gas | |
517 | |
518 .file "fake.c" | |
519 .text | |
520 .p2align 4,,15 | |
521 .globl factorial | |
522 .type factorial, @function | |
523 factorial: | |
524 .LFB0: | |
525 .cfi_startproc | |
526 .L2: | |
527 cmpl $1, %edi | |
528 jle .L8 | |
529 movl $1, %edx | |
530 jmp .L4 | |
531 .p2align 4,,10 | |
532 .p2align 3 | |
533 .L6: | |
534 movl %eax, %edi | |
535 .L4: | |
536 .L5: | |
537 leal -1(%rdi), %eax | |
538 imull %edi, %edx | |
539 cmpl $1, %eax | |
540 jne .L6 | |
541 .L3: | |
542 .L7: | |
543 imull %edx, %eax | |
544 ret | |
545 .L8: | |
546 movl %edi, %eax | |
547 movl $1, %edx | |
548 jmp .L7 | |
549 .cfi_endproc | |
550 .LFE0: | |
551 .size factorial, .-factorial | |
552 .ident "GCC: (GNU) 4.9.0 20131023 (Red Hat 0.2-%{gcc_release})" | |
553 .section .note.GNU-stack,"",@progbits | |
554 | |
555 Note that the stack pushing and popping have been eliminated, as has the | |
556 recursive call (in favor of an iteration). | |
557 | |
558 Putting it all together | |
559 *********************** | |
560 | |
561 The complete example can be seen in the source tree at | |
562 ``gcc/jit/docs/examples/tut04-toyvm/toyvm.cc`` | |
563 | |
564 along with a Makefile and a couple of sample .toy scripts: | |
565 | |
566 .. code-block:: console | |
567 | |
568 $ ls -al | |
569 drwxrwxr-x. 2 david david 4096 Sep 19 17:46 . | |
570 drwxrwxr-x. 3 david david 4096 Sep 19 15:26 .. | |
571 -rw-rw-r--. 1 david david 615 Sep 19 12:43 factorial.toy | |
572 -rw-rw-r--. 1 david david 834 Sep 19 13:08 fibonacci.toy | |
573 -rw-rw-r--. 1 david david 238 Sep 19 14:22 Makefile | |
574 -rw-rw-r--. 1 david david 16457 Sep 19 17:07 toyvm.cc | |
575 | |
576 $ make toyvm | |
577 g++ -Wall -g -o toyvm toyvm.cc -lgccjit | |
578 | |
579 $ ./toyvm factorial.toy 10 | |
580 interpreter result: 3628800 | |
581 compiler result: 3628800 | |
582 | |
583 $ ./toyvm fibonacci.toy 10 | |
584 interpreter result: 55 | |
585 compiler result: 55 | |
586 | |
587 Behind the curtain: How does our code get optimized? | |
588 **************************************************** | |
589 | |
590 Our example is done, but you may be wondering about exactly how the | |
591 compiler turned what we gave it into the machine code seen above. | |
592 | |
593 We can examine what the compiler is doing in detail by setting: | |
594 | |
595 .. code-block:: c++ | |
596 | |
597 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_DUMP_EVERYTHING, 1); | |
598 state.ctxt.set_bool_option (GCC_JIT_BOOL_OPTION_KEEP_INTERMEDIATES, 1); | |
599 | |
600 This will dump detailed information about the compiler's state to a | |
601 directory under ``/tmp``, and keep it from being cleaned up. | |
602 | |
603 The precise names and their formats of these files is subject to change. | |
604 Higher optimization levels lead to more files. | |
605 Here's what I saw (edited for brevity; there were almost 200 files): | |
606 | |
607 .. code-block:: console | |
608 | |
609 intermediate files written to /tmp/libgccjit-KPQbGw | |
610 $ ls /tmp/libgccjit-KPQbGw/ | |
611 fake.c.000i.cgraph | |
612 fake.c.000i.type-inheritance | |
613 fake.c.004t.gimple | |
614 fake.c.007t.omplower | |
615 fake.c.008t.lower | |
616 fake.c.011t.eh | |
617 fake.c.012t.cfg | |
618 fake.c.014i.visibility | |
619 fake.c.015i.early_local_cleanups | |
620 fake.c.016t.ssa | |
621 # etc | |
622 | |
623 The gimple code is converted into Static Single Assignment form, | |
624 with annotations for use when generating the debuginfo: | |
625 | |
626 .. code-block:: console | |
627 | |
628 $ less /tmp/libgccjit-KPQbGw/fake.c.016t.ssa | |
629 | |
630 .. code-block:: c | |
631 | |
632 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) | |
633 | |
634 factorial (signed int arg) | |
635 { | |
636 signed int stack[8]; | |
637 signed int stack_depth; | |
638 signed int x; | |
639 signed int y; | |
640 <unnamed type> _20; | |
641 signed int _21; | |
642 signed int _38; | |
643 signed int _44; | |
644 signed int _51; | |
645 signed int _56; | |
646 | |
647 initial: | |
648 stack_depth_3 = 0; | |
649 # DEBUG stack_depth => stack_depth_3 | |
650 stack[stack_depth_3] = arg_5(D); | |
651 stack_depth_7 = stack_depth_3 + 1; | |
652 # DEBUG stack_depth => stack_depth_7 | |
653 # DEBUG instr0 => NULL | |
654 # DEBUG /* DUP */ => NULL | |
655 stack_depth_8 = stack_depth_7 + -1; | |
656 # DEBUG stack_depth => stack_depth_8 | |
657 x_9 = stack[stack_depth_8]; | |
658 # DEBUG x => x_9 | |
659 stack[stack_depth_8] = x_9; | |
660 stack_depth_11 = stack_depth_8 + 1; | |
661 # DEBUG stack_depth => stack_depth_11 | |
662 stack[stack_depth_11] = x_9; | |
663 stack_depth_13 = stack_depth_11 + 1; | |
664 # DEBUG stack_depth => stack_depth_13 | |
665 # DEBUG instr1 => NULL | |
666 # DEBUG /* PUSH_CONST */ => NULL | |
667 stack[stack_depth_13] = 2; | |
668 | |
669 /* etc; edited for brevity */ | |
670 | |
671 We can perhaps better see the code by turning off | |
672 :c:macro:`GCC_JIT_BOOL_OPTION_DEBUGINFO` to suppress all those ``DEBUG`` | |
673 statements, giving: | |
674 | |
675 .. code-block:: console | |
676 | |
677 $ less /tmp/libgccjit-1Hywc0/fake.c.016t.ssa | |
678 | |
679 .. code-block:: c | |
680 | |
681 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) | |
682 | |
683 factorial (signed int arg) | |
684 { | |
685 signed int stack[8]; | |
686 signed int stack_depth; | |
687 signed int x; | |
688 signed int y; | |
689 <unnamed type> _20; | |
690 signed int _21; | |
691 signed int _38; | |
692 signed int _44; | |
693 signed int _51; | |
694 signed int _56; | |
695 | |
696 initial: | |
697 stack_depth_3 = 0; | |
698 stack[stack_depth_3] = arg_5(D); | |
699 stack_depth_7 = stack_depth_3 + 1; | |
700 stack_depth_8 = stack_depth_7 + -1; | |
701 x_9 = stack[stack_depth_8]; | |
702 stack[stack_depth_8] = x_9; | |
703 stack_depth_11 = stack_depth_8 + 1; | |
704 stack[stack_depth_11] = x_9; | |
705 stack_depth_13 = stack_depth_11 + 1; | |
706 stack[stack_depth_13] = 2; | |
707 stack_depth_15 = stack_depth_13 + 1; | |
708 stack_depth_16 = stack_depth_15 + -1; | |
709 y_17 = stack[stack_depth_16]; | |
710 stack_depth_18 = stack_depth_16 + -1; | |
711 x_19 = stack[stack_depth_18]; | |
712 _20 = x_19 < y_17; | |
713 _21 = (signed int) _20; | |
714 stack[stack_depth_18] = _21; | |
715 stack_depth_23 = stack_depth_18 + 1; | |
716 stack_depth_24 = stack_depth_23 + -1; | |
717 x_25 = stack[stack_depth_24]; | |
718 if (x_25 != 0) | |
719 goto <bb 4> (instr9); | |
720 else | |
721 goto <bb 3> (instr4); | |
722 | |
723 instr4: | |
724 /* DUP */: | |
725 stack_depth_26 = stack_depth_24 + -1; | |
726 x_27 = stack[stack_depth_26]; | |
727 stack[stack_depth_26] = x_27; | |
728 stack_depth_29 = stack_depth_26 + 1; | |
729 stack[stack_depth_29] = x_27; | |
730 stack_depth_31 = stack_depth_29 + 1; | |
731 stack[stack_depth_31] = 1; | |
732 stack_depth_33 = stack_depth_31 + 1; | |
733 stack_depth_34 = stack_depth_33 + -1; | |
734 y_35 = stack[stack_depth_34]; | |
735 stack_depth_36 = stack_depth_34 + -1; | |
736 x_37 = stack[stack_depth_36]; | |
737 _38 = x_37 - y_35; | |
738 stack[stack_depth_36] = _38; | |
739 stack_depth_40 = stack_depth_36 + 1; | |
740 stack_depth_41 = stack_depth_40 + -1; | |
741 x_42 = stack[stack_depth_41]; | |
742 _44 = factorial (x_42); | |
743 stack[stack_depth_41] = _44; | |
744 stack_depth_46 = stack_depth_41 + 1; | |
745 stack_depth_47 = stack_depth_46 + -1; | |
746 y_48 = stack[stack_depth_47]; | |
747 stack_depth_49 = stack_depth_47 + -1; | |
748 x_50 = stack[stack_depth_49]; | |
749 _51 = x_50 * y_48; | |
750 stack[stack_depth_49] = _51; | |
751 stack_depth_53 = stack_depth_49 + 1; | |
752 | |
753 # stack_depth_1 = PHI <stack_depth_24(2), stack_depth_53(3)> | |
754 instr9: | |
755 /* RETURN */: | |
756 stack_depth_54 = stack_depth_1 + -1; | |
757 x_55 = stack[stack_depth_54]; | |
758 _56 = x_55; | |
759 stack ={v} {CLOBBER}; | |
760 return _56; | |
761 | |
762 } | |
763 | |
764 Note in the above how all the :type:`gccjit::block` instances we | |
765 created have been consolidated into just 3 blocks in GCC's internal | |
766 representation: ``initial``, ``instr4`` and ``instr9``. | |
767 | |
768 Optimizing away stack manipulation | |
769 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
770 Recall our simple implementation of stack operations. Let's examine | |
771 how the stack operations are optimized away. | |
772 | |
773 After a pass of constant-propagation, the depth of the stack at each | |
774 opcode can be determined at compile-time: | |
775 | |
776 .. code-block:: console | |
777 | |
778 $ less /tmp/libgccjit-1Hywc0/fake.c.021t.ccp1 | |
779 | |
780 .. code-block:: c | |
781 | |
782 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) | |
783 | |
784 factorial (signed int arg) | |
785 { | |
786 signed int stack[8]; | |
787 signed int stack_depth; | |
788 signed int x; | |
789 signed int y; | |
790 <unnamed type> _20; | |
791 signed int _21; | |
792 signed int _38; | |
793 signed int _44; | |
794 signed int _51; | |
795 | |
796 initial: | |
797 stack[0] = arg_5(D); | |
798 x_9 = stack[0]; | |
799 stack[0] = x_9; | |
800 stack[1] = x_9; | |
801 stack[2] = 2; | |
802 y_17 = stack[2]; | |
803 x_19 = stack[1]; | |
804 _20 = x_19 < y_17; | |
805 _21 = (signed int) _20; | |
806 stack[1] = _21; | |
807 x_25 = stack[1]; | |
808 if (x_25 != 0) | |
809 goto <bb 4> (instr9); | |
810 else | |
811 goto <bb 3> (instr4); | |
812 | |
813 instr4: | |
814 /* DUP */: | |
815 x_27 = stack[0]; | |
816 stack[0] = x_27; | |
817 stack[1] = x_27; | |
818 stack[2] = 1; | |
819 y_35 = stack[2]; | |
820 x_37 = stack[1]; | |
821 _38 = x_37 - y_35; | |
822 stack[1] = _38; | |
823 x_42 = stack[1]; | |
824 _44 = factorial (x_42); | |
825 stack[1] = _44; | |
826 y_48 = stack[1]; | |
827 x_50 = stack[0]; | |
828 _51 = x_50 * y_48; | |
829 stack[0] = _51; | |
830 | |
831 instr9: | |
832 /* RETURN */: | |
833 x_55 = stack[0]; | |
834 x_56 = x_55; | |
835 stack ={v} {CLOBBER}; | |
836 return x_56; | |
837 | |
838 } | |
839 | |
840 Note how, in the above, all those ``stack_depth`` values are now just | |
841 constants: we're accessing specific stack locations at each opcode. | |
842 | |
843 The "esra" pass ("Early Scalar Replacement of Aggregates") breaks | |
844 out our "stack" array into individual elements: | |
845 | |
846 .. code-block:: console | |
847 | |
848 $ less /tmp/libgccjit-1Hywc0/fake.c.024t.esra | |
849 | |
850 .. code-block:: c | |
851 | |
852 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) | |
853 | |
854 Created a replacement for stack offset: 0, size: 32: stack$0 | |
855 Created a replacement for stack offset: 32, size: 32: stack$1 | |
856 Created a replacement for stack offset: 64, size: 32: stack$2 | |
857 | |
858 Symbols to be put in SSA form | |
859 { D.89 D.90 D.91 } | |
860 Incremental SSA update started at block: 0 | |
861 Number of blocks in CFG: 5 | |
862 Number of blocks to update: 4 ( 80%) | |
863 | |
864 | |
865 factorial (signed int arg) | |
866 { | |
867 signed int stack$2; | |
868 signed int stack$1; | |
869 signed int stack$0; | |
870 signed int stack[8]; | |
871 signed int stack_depth; | |
872 signed int x; | |
873 signed int y; | |
874 <unnamed type> _20; | |
875 signed int _21; | |
876 signed int _38; | |
877 signed int _44; | |
878 signed int _51; | |
879 | |
880 initial: | |
881 stack$0_45 = arg_5(D); | |
882 x_9 = stack$0_45; | |
883 stack$0_39 = x_9; | |
884 stack$1_32 = x_9; | |
885 stack$2_30 = 2; | |
886 y_17 = stack$2_30; | |
887 x_19 = stack$1_32; | |
888 _20 = x_19 < y_17; | |
889 _21 = (signed int) _20; | |
890 stack$1_28 = _21; | |
891 x_25 = stack$1_28; | |
892 if (x_25 != 0) | |
893 goto <bb 4> (instr9); | |
894 else | |
895 goto <bb 3> (instr4); | |
896 | |
897 instr4: | |
898 /* DUP */: | |
899 x_27 = stack$0_39; | |
900 stack$0_22 = x_27; | |
901 stack$1_14 = x_27; | |
902 stack$2_12 = 1; | |
903 y_35 = stack$2_12; | |
904 x_37 = stack$1_14; | |
905 _38 = x_37 - y_35; | |
906 stack$1_10 = _38; | |
907 x_42 = stack$1_10; | |
908 _44 = factorial (x_42); | |
909 stack$1_6 = _44; | |
910 y_48 = stack$1_6; | |
911 x_50 = stack$0_22; | |
912 _51 = x_50 * y_48; | |
913 stack$0_1 = _51; | |
914 | |
915 # stack$0_52 = PHI <stack$0_39(2), stack$0_1(3)> | |
916 instr9: | |
917 /* RETURN */: | |
918 x_55 = stack$0_52; | |
919 x_56 = x_55; | |
920 stack ={v} {CLOBBER}; | |
921 return x_56; | |
922 | |
923 } | |
924 | |
925 Hence at this point, all those pushes and pops of the stack are now | |
926 simply assignments to specific temporary variables. | |
927 | |
928 After some copy propagation, the stack manipulation has been completely | |
929 optimized away: | |
930 | |
931 .. code-block:: console | |
932 | |
933 $ less /tmp/libgccjit-1Hywc0/fake.c.026t.copyprop1 | |
934 | |
935 .. code-block:: c | |
936 | |
937 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) | |
938 | |
939 factorial (signed int arg) | |
940 { | |
941 signed int stack$2; | |
942 signed int stack$1; | |
943 signed int stack$0; | |
944 signed int stack[8]; | |
945 signed int stack_depth; | |
946 signed int x; | |
947 signed int y; | |
948 <unnamed type> _20; | |
949 signed int _21; | |
950 signed int _38; | |
951 signed int _44; | |
952 signed int _51; | |
953 | |
954 initial: | |
955 stack$0_39 = arg_5(D); | |
956 _20 = arg_5(D) <= 1; | |
957 _21 = (signed int) _20; | |
958 if (_21 != 0) | |
959 goto <bb 4> (instr9); | |
960 else | |
961 goto <bb 3> (instr4); | |
962 | |
963 instr4: | |
964 /* DUP */: | |
965 _38 = arg_5(D) + -1; | |
966 _44 = factorial (_38); | |
967 _51 = arg_5(D) * _44; | |
968 stack$0_1 = _51; | |
969 | |
970 # stack$0_52 = PHI <arg_5(D)(2), _51(3)> | |
971 instr9: | |
972 /* RETURN */: | |
973 stack ={v} {CLOBBER}; | |
974 return stack$0_52; | |
975 | |
976 } | |
977 | |
978 Later on, another pass finally eliminated ``stack_depth`` local and the | |
979 unused parts of the `stack`` array altogether: | |
980 | |
981 .. code-block:: console | |
982 | |
983 $ less /tmp/libgccjit-1Hywc0/fake.c.036t.release_ssa | |
984 | |
985 .. code-block:: c | |
986 | |
987 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) | |
988 | |
989 Released 44 names, 314.29%, removed 44 holes | |
990 factorial (signed int arg) | |
991 { | |
992 signed int stack$0; | |
993 signed int mult_acc_1; | |
994 <unnamed type> _5; | |
995 signed int _6; | |
996 signed int _7; | |
997 signed int mul_tmp_10; | |
998 signed int mult_acc_11; | |
999 signed int mult_acc_13; | |
1000 | |
1001 # arg_9 = PHI <arg_8(D)(0)> | |
1002 # mult_acc_13 = PHI <1(0)> | |
1003 initial: | |
1004 | |
1005 <bb 5>: | |
1006 # arg_4 = PHI <arg_9(2), _7(3)> | |
1007 # mult_acc_1 = PHI <mult_acc_13(2), mult_acc_11(3)> | |
1008 _5 = arg_4 <= 1; | |
1009 _6 = (signed int) _5; | |
1010 if (_6 != 0) | |
1011 goto <bb 4> (instr9); | |
1012 else | |
1013 goto <bb 3> (instr4); | |
1014 | |
1015 instr4: | |
1016 /* DUP */: | |
1017 _7 = arg_4 + -1; | |
1018 mult_acc_11 = mult_acc_1 * arg_4; | |
1019 goto <bb 5>; | |
1020 | |
1021 # stack$0_12 = PHI <arg_4(5)> | |
1022 instr9: | |
1023 /* RETURN */: | |
1024 mul_tmp_10 = mult_acc_1 * stack$0_12; | |
1025 return mul_tmp_10; | |
1026 | |
1027 } | |
1028 | |
1029 | |
1030 Elimination of tail recursion | |
1031 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
1032 Another significant optimization is the detection that the call to | |
1033 ``factorial`` is tail recursion, which can be eliminated in favor of | |
1034 an iteration: | |
1035 | |
1036 .. code-block:: console | |
1037 | |
1038 $ less /tmp/libgccjit-1Hywc0/fake.c.030t.tailr1 | |
1039 | |
1040 .. code-block:: c | |
1041 | |
1042 ;; Function factorial (factorial, funcdef_no=0, decl_uid=53, symbol_order=0) | |
1043 | |
1044 | |
1045 Symbols to be put in SSA form | |
1046 { D.88 } | |
1047 Incremental SSA update started at block: 0 | |
1048 Number of blocks in CFG: 5 | |
1049 Number of blocks to update: 4 ( 80%) | |
1050 | |
1051 | |
1052 factorial (signed int arg) | |
1053 { | |
1054 signed int stack$2; | |
1055 signed int stack$1; | |
1056 signed int stack$0; | |
1057 signed int stack[8]; | |
1058 signed int stack_depth; | |
1059 signed int x; | |
1060 signed int y; | |
1061 signed int mult_acc_1; | |
1062 <unnamed type> _20; | |
1063 signed int _21; | |
1064 signed int _38; | |
1065 signed int mul_tmp_44; | |
1066 signed int mult_acc_51; | |
1067 | |
1068 # arg_5 = PHI <arg_39(D)(0), _38(3)> | |
1069 # mult_acc_1 = PHI <1(0), mult_acc_51(3)> | |
1070 initial: | |
1071 _20 = arg_5 <= 1; | |
1072 _21 = (signed int) _20; | |
1073 if (_21 != 0) | |
1074 goto <bb 4> (instr9); | |
1075 else | |
1076 goto <bb 3> (instr4); | |
1077 | |
1078 instr4: | |
1079 /* DUP */: | |
1080 _38 = arg_5 + -1; | |
1081 mult_acc_51 = mult_acc_1 * arg_5; | |
1082 goto <bb 2> (initial); | |
1083 | |
1084 # stack$0_52 = PHI <arg_5(2)> | |
1085 instr9: | |
1086 /* RETURN */: | |
1087 stack ={v} {CLOBBER}; | |
1088 mul_tmp_44 = mult_acc_1 * stack$0_52; | |
1089 return mul_tmp_44; | |
1090 | |
1091 } |