111
|
1 \input texinfo @c -*-texinfo-*-
|
|
2
|
|
3 @c %**start of header
|
|
4 @setfilename libitm.info
|
|
5 @settitle GNU libitm
|
|
6 @c %**end of header
|
|
7
|
|
8
|
|
9 @copying
|
145
|
10 Copyright @copyright{} 2011-2020 Free Software Foundation, Inc.
|
111
|
11
|
|
12 Permission is granted to copy, distribute and/or modify this document
|
|
13 under the terms of the GNU Free Documentation License, Version 1.2 or
|
|
14 any later version published by the Free Software Foundation; with no
|
|
15 Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
|
|
16 A copy of the license is included in the section entitled ``GNU
|
|
17 Free Documentation License''.
|
|
18 @end copying
|
|
19
|
|
20 @ifinfo
|
|
21 @dircategory GNU Libraries
|
|
22 @direntry
|
|
23 * libitm: (libitm). GNU Transactional Memory Library
|
|
24 @end direntry
|
|
25
|
|
26 This manual documents the GNU Transactional Memory Library.
|
|
27
|
|
28 @insertcopying
|
|
29 @end ifinfo
|
|
30
|
|
31
|
|
32 @setchapternewpage odd
|
|
33
|
|
34 @titlepage
|
|
35 @title The GNU Transactional Memory Library
|
|
36 @page
|
|
37 @vskip 0pt plus 1filll
|
|
38 @comment For the @value{version-GCC} Version*
|
|
39 @sp 1
|
|
40 @insertcopying
|
|
41 @end titlepage
|
|
42
|
|
43 @summarycontents
|
|
44 @contents
|
|
45 @page
|
|
46
|
|
47
|
|
48 @node Top
|
|
49 @top Introduction
|
|
50 @cindex Introduction
|
|
51
|
|
52 This manual documents the usage and internals of libitm, the GNU Transactional
|
|
53 Memory Library. It provides transaction support for accesses to a process'
|
|
54 memory, enabling easy-to-use synchronization of accesses to shared memory by
|
|
55 several threads.
|
|
56
|
|
57
|
|
58 @comment
|
|
59 @comment When you add a new menu item, please keep the right hand
|
|
60 @comment aligned to the same column. Do not use tabs. This provides
|
|
61 @comment better formatting.
|
|
62 @comment
|
|
63 @menu
|
|
64 * Enabling libitm:: How to enable libitm for your applications.
|
|
65 * C/C++ Language Constructs for TM::
|
|
66 Notes on the language-level interface supported
|
|
67 by gcc.
|
|
68 * The libitm ABI:: Notes on the external ABI provided by libitm.
|
|
69 * Internals:: Notes on libitm's internal synchronization.
|
|
70 * GNU Free Documentation License::
|
|
71 How you can copy and share this manual.
|
|
72 * Library Index:: Index of this documentation.
|
|
73 @end menu
|
|
74
|
|
75
|
|
76 @c ---------------------------------------------------------------------
|
|
77 @c Enabling libitm
|
|
78 @c ---------------------------------------------------------------------
|
|
79
|
|
80 @node Enabling libitm
|
|
81 @chapter Enabling libitm
|
|
82
|
|
83 To activate support for TM in C/C++, the compile-time flag @option{-fgnu-tm}
|
|
84 must be specified. This enables TM language-level constructs such as
|
|
85 transaction statements (e.g., @code{__transaction_atomic}, @pxref{C/C++
|
|
86 Language Constructs for TM} for details).
|
|
87
|
|
88 @c ---------------------------------------------------------------------
|
|
89 @c C/C++ Language Constructs for TM
|
|
90 @c ---------------------------------------------------------------------
|
|
91
|
|
92 @node C/C++ Language Constructs for TM
|
|
93 @chapter C/C++ Language Constructs for TM
|
|
94
|
|
95 Transactions are supported in C++ and C in the form of transaction statements,
|
|
96 transaction expressions, and function transactions. In the following example,
|
|
97 both @code{a} and @code{b} will be read and the difference will be written to
|
|
98 @code{c}, all atomically and isolated from other transactions:
|
|
99
|
|
100 @example
|
|
101 __transaction_atomic @{ c = a - b; @}
|
|
102 @end example
|
|
103
|
|
104 Therefore, another thread can use the following code to concurrently update
|
|
105 @code{b} without ever causing @code{c} to hold a negative value (and without
|
|
106 having to use other synchronization constructs such as locks or C++11
|
|
107 atomics):
|
|
108
|
|
109 @example
|
|
110 __transaction_atomic @{ if (a > b) b++; @}
|
|
111 @end example
|
|
112
|
|
113 GCC follows the @uref{https://sites.google.com/site/tmforcplusplus/, Draft
|
|
114 Specification of Transactional Language Constructs for C++ (v1.1)} in its
|
|
115 implementation of transactions.
|
|
116
|
|
117 The precise semantics of transactions are defined in terms of the C++11/C11
|
|
118 memory model (see the specification). Roughly, transactions provide
|
|
119 synchronization guarantees that are similar to what would be guaranteed when
|
|
120 using a single global lock as a guard for all transactions. Note that like
|
|
121 other synchronization constructs in C/C++, transactions rely on a
|
|
122 data-race-free program (e.g., a nontransactional write that is concurrent
|
|
123 with a transactional read to the same memory location is a data race).
|
|
124
|
|
125 @c ---------------------------------------------------------------------
|
|
126 @c The libitm ABI
|
|
127 @c ---------------------------------------------------------------------
|
|
128
|
|
129 @node The libitm ABI
|
|
130 @chapter The libitm ABI
|
|
131
|
|
132 The ABI provided by libitm is basically equal to the Linux variant of Intel's
|
|
133 current TM ABI specification document (Revision 1.1, May 6 2009) but with the
|
|
134 differences listed in this chapter. It would be good if these changes would
|
|
135 eventually be merged into a future version of this specification. To ease
|
|
136 look-up, the following subsections mirror the structure of this specification.
|
|
137
|
|
138 @section [No changes] Objectives
|
|
139 @section [No changes] Non-objectives
|
|
140
|
|
141 @section Library design principles
|
|
142 @subsection [No changes] Calling conventions
|
|
143 @subsection [No changes] TM library algorithms
|
|
144 @subsection [No changes] Optimized load and store routines
|
|
145 @subsection [No changes] Aligned load and store routines
|
|
146
|
|
147 @subsection Data logging functions
|
|
148
|
|
149 The memory locations accessed with transactional loads and stores and the
|
|
150 memory locations whose values are logged must not overlap. This required
|
|
151 separation only extends to the scope of the execution of one transaction
|
|
152 including all the executions of all nested transactions.
|
|
153
|
|
154 The compiler must be consistent (within the scope of a single transaction)
|
|
155 about which memory locations are shared and which are not shared with other
|
|
156 threads (i.e., data must be accessed either transactionally or
|
|
157 nontransactionally). Otherwise, non-write-through TM algorithms would not work.
|
|
158
|
|
159 For memory locations on the stack, this requirement extends to only the
|
|
160 lifetime of the stack frame that the memory location belongs to (or the
|
|
161 lifetime of the transaction, whichever is shorter). Thus, memory that is
|
|
162 reused for several stack frames could be target of both data logging and
|
|
163 transactional accesses; however, this is harmless because these stack frames'
|
|
164 lifetimes will end before the transaction finishes.
|
|
165
|
|
166 @subsection [No changes] Scatter/gather calls
|
|
167 @subsection [No changes] Serial and irrevocable mode
|
|
168 @subsection [No changes] Transaction descriptor
|
|
169 @subsection Store allocation
|
|
170
|
|
171 There is no @code{getTransaction} function.
|
|
172
|
|
173 @subsection [No changes] Naming conventions
|
|
174
|
|
175 @subsection Function pointer encryption
|
|
176
|
|
177 Currently, this is not implemented.
|
|
178
|
|
179
|
|
180 @section Types and macros list
|
|
181
|
|
182 @code{_ITM_codeProperties} has changed, @pxref{txn-code-properties,,Starting a
|
|
183 transaction}.
|
|
184 @code{_ITM_srcLocation} is not used.
|
|
185
|
|
186
|
|
187 @section Function list
|
|
188
|
|
189 @subsection Initialization and finalization functions
|
|
190 These functions are not part of the ABI.
|
|
191
|
|
192 @subsection [No changes] Version checking
|
|
193 @subsection [No changes] Error reporting
|
|
194 @subsection [No changes] inTransaction call
|
|
195
|
|
196 @subsection State manipulation functions
|
|
197 There is no @code{getTransaction} function. Transaction identifiers for
|
|
198 nested transactions will be ordered but not necessarily sequential (i.e., for
|
|
199 a nested transaction's identifier @var{IN} and its enclosing transaction's
|
|
200 identifier @var{IE}, it is guaranteed that @math{IN >= IE}).
|
|
201
|
|
202 @subsection [No changes] Source locations
|
|
203
|
|
204 @subsection Starting a transaction
|
|
205
|
|
206 @subsubsection Transaction code properties
|
|
207
|
|
208 @anchor{txn-code-properties}
|
|
209 The bit @code{hasNoXMMUpdate} is instead called @code{hasNoVectorUpdate}.
|
|
210 Iff it is set, vector register save/restore is not necessary for any target
|
|
211 machine.
|
|
212
|
|
213 The @code{hasNoFloatUpdate} bit (@code{0x0010}) is new. Iff it is set, floating
|
|
214 point register save/restore is not necessary for any target machine.
|
|
215
|
|
216 @code{undoLogCode} is not supported and a fatal runtime error will be raised
|
|
217 if this bit is set. It is not properly defined in the ABI why barriers
|
|
218 other than undo logging are not present; Are they not necessary (e.g., a
|
|
219 transaction operating purely on thread-local data) or have they been omitted by
|
|
220 the compiler because it thinks that some kind of global synchronization
|
|
221 (e.g., serial mode) might perform better? The specification suggests that the
|
|
222 latter might be the case, but the former seems to be more useful.
|
|
223
|
|
224 The @code{readOnly} bit (@code{0x4000}) is new. @strong{TODO} Lexical or dynamic
|
|
225 scope?
|
|
226
|
|
227 @code{hasNoRetry} is not supported. If this bit is not set, but
|
|
228 @code{hasNoAbort} is set, the library can assume that transaction
|
|
229 rollback will not be requested.
|
|
230
|
|
231 It would be useful if the absence of externally-triggered rollbacks would be
|
|
232 reported for the dynamic scope as well, not just for the lexical scope
|
|
233 (@code{hasNoAbort}). Without this, a library cannot exploit this together
|
|
234 with flat nesting.
|
|
235
|
|
236 @code{exceptionBlock} is not supported because exception blocks are not used.
|
|
237
|
|
238 @subsubsection [No changes] Windows exception state
|
|
239 @subsubsection [No changes] Other machine state
|
|
240
|
|
241 @subsubsection [No changes] Results from beginTransaction
|
|
242
|
|
243 @subsection Aborting a transaction
|
|
244
|
|
245 @code{_ITM_rollbackTransaction} is not supported. @code{_ITM_abortTransaction}
|
|
246 is supported but the abort reasons @code{exceptionBlockAbort},
|
|
247 @code{TMConflict}, and @code{userRetry} are not supported. There are no
|
|
248 exception blocks in general, so the related cases also do not have to be
|
|
249 considered. To encode @code{__transaction_cancel [[outer]]}, compilers must
|
|
250 set the new @code{outerAbort} bit (@code{0x10}) additionally to the
|
|
251 @code{userAbort} bit in the abort reason.
|
|
252
|
|
253 @subsection Committing a transaction
|
|
254
|
|
255 The exception handling (EH) scheme is different. The Intel ABI requires the
|
|
256 @code{_ITM_tryCommitTransaction} function that will return even when the
|
|
257 commit failed and will have to be matched with calls to either
|
|
258 @code{_ITM_abortTransaction} or @code{_ITM_commitTransaction}. In contrast,
|
|
259 gcc relies on transactional wrappers for the functions of the Exception
|
|
260 Handling ABI and on one additional commit function (shown below). This allows
|
|
261 the TM to keep track of EH internally and thus it does not have to embed the
|
|
262 cleanup of EH state into the existing EH code in the program.
|
|
263 @code{_ITM_tryCommitTransaction} is not supported.
|
|
264 @code{_ITM_commitTransactionToId} is also not supported because the
|
|
265 propagation of thrown exceptions will not bypass commits of nested
|
|
266 transactions.
|
|
267
|
|
268 @example
|
|
269 void _ITM_commitTransactionEH(void *exc_ptr) ITM_REGPARM;
|
|
270 void *_ITM_cxa_allocate_exception (size_t);
|
|
271 void _ITM_cxa_free_exception (void *exc_ptr);
|
145
|
272 void _ITM_cxa_throw (void *obj, void *tinfo, void (*dest) (void *));
|
111
|
273 void *_ITM_cxa_begin_catch (void *exc_ptr);
|
|
274 void _ITM_cxa_end_catch (void);
|
|
275 @end example
|
|
276
|
|
277 The EH scheme changed in version 6 of GCC. Previously, the compiler
|
|
278 added a call to @code{_ITM_commitTransactionEH} to commit a transaction if
|
|
279 an exception could be in flight at this position in the code; @code{exc_ptr} is
|
|
280 the address of the current exception and must be non-zero. Now, the
|
|
281 compiler must catch all exceptions that are about to be thrown out of a
|
|
282 transaction and call @code{_ITM_commitTransactionEH} from the catch clause,
|
|
283 with @code{exc_ptr} being zero.
|
|
284
|
|
285 Note that the old EH scheme never worked completely in GCC's implementation;
|
|
286 libitm currently does not try to be compatible with the old scheme.
|
|
287
|
|
288 The @code{_ITM_cxa...} functions are transactional wrappers for the respective
|
|
289 @code{__cxa...} functions and must be called instead of these in transactional
|
|
290 code. @code{_ITM_cxa_free_exception} is new in GCC 6.
|
|
291
|
|
292 To support this EH scheme, libstdc++ needs to provide one additional function
|
|
293 (@code{_cxa_tm_cleanup}), which is used by the TM to clean up the exception
|
|
294 handling state while rolling back a transaction:
|
|
295
|
|
296 @example
|
|
297 void __cxa_tm_cleanup (void *unthrown_obj, void *cleanup_exc,
|
|
298 unsigned int caught_count);
|
|
299 @end example
|
|
300
|
|
301 Since GCC 6, @code{unthrown_obj} is not used anymore and always null;
|
|
302 prior to that, @code{unthrown_obj} is non-null if the program called
|
|
303 @code{__cxa_allocate_exception} for this exception but did not yet called
|
|
304 @code{__cxa_throw} for it. @code{cleanup_exc} is non-null if the program is
|
|
305 currently processing a cleanup along an exception path but has not caught this
|
|
306 exception yet. @code{caught_count} is the nesting depth of
|
|
307 @code{__cxa_begin_catch} within the transaction (which can be counted by the TM
|
|
308 using @code{_ITM_cxa_begin_catch} and @code{_ITM_cxa_end_catch});
|
|
309 @code{__cxa_tm_cleanup} then performs rollback by essentially performing
|
|
310 @code{__cxa_end_catch} that many times.
|
|
311
|
|
312
|
|
313
|
|
314 @subsection Exception handling support
|
|
315
|
|
316 Currently, there is no support for functionality like
|
|
317 @code{__transaction_cancel throw} as described in the C++ TM specification.
|
|
318 Supporting this should be possible with the EH scheme explained previously
|
|
319 because via the transactional wrappers for the EH ABI, the TM is able to
|
|
320 observe and intercept EH.
|
|
321
|
|
322
|
|
323 @subsection [No changes] Transition to serial--irrevocable mode
|
|
324 @subsection [No changes] Data transfer functions
|
|
325 @subsection [No changes] Transactional memory copies
|
|
326
|
|
327 @subsection Transactional versions of memmove
|
|
328
|
|
329 If either the source or destination memory region is to be accessed
|
|
330 nontransactionally, then source and destination regions must not be
|
|
331 overlapping. The respective @code{_ITM_memmove} functions are still
|
|
332 available but a fatal runtime error will be raised if such regions do overlap.
|
|
333 To support this functionality, the ABI would have to specify how the
|
|
334 intersection of the regions has to be accessed (i.e., transactionally or
|
|
335 nontransactionally).
|
|
336
|
|
337 @subsection [No changes] Transactional versions of memset
|
|
338 @subsection [No changes] Logging functions
|
|
339
|
|
340 @subsection User-registered commit and undo actions
|
|
341
|
|
342 Commit actions will get executed in the same order in which the respective
|
|
343 calls to @code{_ITM_addUserCommitAction} happened. Only
|
|
344 @code{_ITM_noTransactionId} is allowed as value for the
|
|
345 @code{resumingTransactionId} argument. Commit actions get executed after
|
|
346 privatization safety has been ensured.
|
|
347
|
|
348 Undo actions will get executed in reverse order compared to the order in which
|
|
349 the respective calls to @code{_ITM_addUserUndoAction} happened. The ordering of
|
|
350 undo actions w.r.t. the roll-back of other actions (e.g., data transfers or
|
|
351 memory allocations) is undefined.
|
|
352
|
|
353 @code{_ITM_getThreadnum} is not supported currently because its only purpose
|
|
354 is to provide a thread ID that matches some assumed performance tuning output,
|
|
355 but this output is not part of the ABI nor further defined by it.
|
|
356
|
|
357 @code{_ITM_dropReferences} is not supported currently because its semantics and
|
|
358 the intention behind it is not entirely clear. The
|
|
359 specification suggests that this function is necessary because of certain
|
|
360 orderings of data transfer undos and the releasing of memory regions (i.e.,
|
|
361 privatization). However, this ordering is never defined, nor is the ordering of
|
|
362 dropping references w.r.t. other events.
|
|
363
|
|
364 @subsection [New] Transactional indirect calls
|
|
365
|
|
366 Indirect calls (i.e., calls through a function pointer) within transactions
|
|
367 should execute the transactional clone of the original function (i.e., a clone
|
|
368 of the original that has been fully instrumented to use the TM runtime), if
|
|
369 such a clone is available. The runtime provides two functions to
|
|
370 register/deregister clone tables:
|
|
371
|
|
372 @example
|
|
373 struct clone_entry
|
|
374 @{
|
|
375 void *orig, *clone;
|
|
376 @};
|
|
377
|
|
378 void _ITM_registerTMCloneTable (clone_entry *table, size_t entries);
|
|
379 void _ITM_deregisterTMCloneTable (clone_entry *table);
|
|
380 @end example
|
|
381
|
|
382 Registered tables must be writable by the TM runtime, and must be live
|
|
383 throughout the life-time of the TM runtime.
|
|
384
|
|
385 @strong{TODO} The intention was always to drop the registration functions
|
|
386 entirely, and create a new ELF Phdr describing the linker-sorted table. Much
|
|
387 like what currently happens for @code{PT_GNU_EH_FRAME}.
|
|
388 This work kept getting bogged down in how to represent the @var{N} different
|
|
389 code generation variants. We clearly needed at least two---SW and HW
|
|
390 transactional clones---but there was always a suggestion of more variants for
|
|
391 different TM assumptions/invariants.
|
|
392
|
|
393 The compiler can then use two TM runtime functions to perform indirect calls in
|
|
394 transactions:
|
|
395 @example
|
|
396 void *_ITM_getTMCloneOrIrrevocable (void *function) ITM_REGPARM;
|
|
397 void *_ITM_getTMCloneSafe (void *function) ITM_REGPARM;
|
|
398 @end example
|
|
399
|
|
400 If there is a registered clone for supplied function, both will return a
|
|
401 pointer to the clone. If not, the first runtime function will attempt to switch
|
|
402 to serial--irrevocable mode and return the original pointer, whereas the second
|
|
403 will raise a fatal runtime error.
|
|
404
|
|
405 @subsection [New] Transactional dynamic memory management
|
|
406
|
|
407 @example
|
|
408 void *_ITM_malloc (size_t)
|
|
409 __attribute__((__malloc__)) ITM_PURE;
|
|
410 void *_ITM_calloc (size_t, size_t)
|
|
411 __attribute__((__malloc__)) ITM_PURE;
|
|
412 void _ITM_free (void *) ITM_PURE;
|
|
413 @end example
|
|
414
|
|
415 These functions are essentially transactional wrappers for @code{malloc},
|
|
416 @code{calloc}, and @code{free}. Within transactions, the compiler should
|
|
417 replace calls to the original functions with calls to the wrapper functions.
|
|
418
|
|
419 libitm also provides transactional clones of C++ memory management functions
|
|
420 such as global operator new and delete. They are part of libitm for historic
|
|
421 reasons but do not need to be part of this ABI.
|
|
422
|
|
423
|
|
424 @section [No changes] Future Enhancements to the ABI
|
|
425
|
|
426 @section Sample code
|
|
427
|
|
428 The code examples might not be correct w.r.t. the current version of the ABI,
|
|
429 especially everything related to exception handling.
|
|
430
|
|
431
|
|
432 @section [New] Memory model
|
|
433
|
|
434 The ABI should define a memory model and the ordering that is guaranteed for
|
|
435 data transfers and commit/undo actions, or at least refer to another memory
|
|
436 model that needs to be preserved. Without that, the compiler cannot ensure the
|
|
437 memory model specified on the level of the programming language (e.g., by the
|
|
438 C++ TM specification).
|
|
439
|
|
440 For example, if a transactional load is ordered before another load/store, then
|
|
441 the TM runtime must also ensure this ordering when accessing shared state. If
|
|
442 not, this might break the kind of publication safety used in the C++ TM
|
|
443 specification. Likewise, the TM runtime must ensure privatization safety.
|
|
444
|
|
445
|
|
446
|
|
447 @c ---------------------------------------------------------------------
|
|
448 @c Internals
|
|
449 @c ---------------------------------------------------------------------
|
|
450
|
|
451 @node Internals
|
|
452 @chapter Internals
|
|
453
|
|
454 @section TM methods and method groups
|
|
455
|
|
456 libitm supports several ways of synchronizing transactions with each other.
|
|
457 These TM methods (or TM algorithms) are implemented in the form of
|
|
458 subclasses of @code{abi_dispatch}, which provide methods for
|
|
459 transactional loads and stores as well as callbacks for rollback and commit.
|
|
460 All methods that are compatible with each other (i.e., that let concurrently
|
|
461 running transactions still synchronize correctly even if different methods
|
|
462 are used) belong to the same TM method group. Pointers to TM methods can be
|
|
463 obtained using the factory methods prefixed with @code{dispatch_} in
|
|
464 @file{libitm_i.h}. There are two special methods, @code{dispatch_serial} and
|
|
465 @code{dispatch_serialirr}, that are compatible with all methods because they
|
|
466 run transactions completely in serial mode.
|
|
467
|
|
468 @subsection TM method life cycle
|
|
469
|
|
470 The state of TM methods does not change after construction, but they do alter
|
|
471 the state of transactions that use this method. However, because
|
|
472 per-transaction data gets used by several methods, @code{gtm_thread} is
|
|
473 responsible for setting an initial state that is useful for all methods.
|
|
474 After that, methods are responsible for resetting/clearing this state on each
|
|
475 rollback or commit (of outermost transactions), so that the transaction
|
|
476 executed next is not affected by the previous transaction.
|
|
477
|
|
478 There is also global state associated with each method group, which is
|
|
479 initialized and shut down (@code{method_group::init()} and @code{fini()})
|
|
480 when switching between method groups (see @file{retry.cc}).
|
|
481
|
|
482 @subsection Selecting the default method
|
|
483
|
|
484 The default method that libitm uses for freshly started transactions (but
|
|
485 not necessarily for restarted transactions) can be set via an environment
|
|
486 variable (@env{ITM_DEFAULT_METHOD}), whose value should be equal to the name
|
|
487 of one of the factory methods returning abi_dispatch subclasses but without
|
|
488 the "dispatch_" prefix (e.g., "serialirr" instead of
|
|
489 @code{GTM::dispatch_serialirr()}).
|
|
490
|
|
491 Note that this environment variable is only a hint for libitm and might not
|
|
492 be supported in the future.
|
|
493
|
|
494
|
|
495 @section Nesting: flat vs. closed
|
|
496
|
|
497 We support two different kinds of nesting of transactions. In the case of
|
|
498 @emph{flat nesting}, the nesting structure is flattened and all nested
|
|
499 transactions are subsumed by the enclosing transaction. In contrast,
|
|
500 with @emph{closed nesting}, nested transactions that have not yet committed
|
|
501 can be rolled back separately from the enclosing transactions; when they
|
|
502 commit, they are subsumed by the enclosing transaction, and their effects
|
|
503 will be finally committed when the outermost transaction commits.
|
|
504 @emph{Open nesting} (where nested transactions can commit independently of the
|
|
505 enclosing transactions) are not supported.
|
|
506
|
|
507 Flat nesting is the default nesting mode, but closed nesting is supported and
|
|
508 used when transactions contain user-controlled aborts
|
|
509 (@code{__transaction_cancel} statements). We assume that user-controlled
|
|
510 aborts are rare in typical code and used mostly in exceptional situations.
|
|
511 Thus, it makes more sense to use flat nesting by default to avoid the
|
|
512 performance overhead of the additional checkpoints required for closed
|
|
513 nesting. User-controlled aborts will correctly abort the innermost enclosing
|
|
514 transaction, whereas the whole (i.e., outermost) transaction will be restarted
|
|
515 otherwise (e.g., when a transaction encounters data conflicts during
|
|
516 optimistic execution).
|
|
517
|
|
518
|
|
519 @section Locking conventions
|
|
520
|
|
521 This section documents the locking scheme and rules for all uses of locking
|
|
522 in libitm. We have to support serial(-irrevocable) mode, which is implemented
|
|
523 using a global lock as explained next (called the @emph{serial lock}). To
|
|
524 simplify the overall design, we use the same lock as catch-all locking
|
|
525 mechanism for other infrequent tasks such as (de)registering clone tables or
|
|
526 threads. Besides the serial lock, there are @emph{per-method-group locks} that
|
|
527 are managed by specific method groups (i.e., groups of similar TM concurrency
|
|
528 control algorithms), and lock-like constructs for quiescence-based operations
|
|
529 such as ensuring privatization safety.
|
|
530
|
|
531 Thus, the actions that participate in the libitm-internal locking are either
|
|
532 @emph{active transactions} that do not run in serial mode, @emph{serial
|
|
533 transactions} (which (are about to) run in serial mode), and management tasks
|
|
534 that do not execute within a transaction but have acquired the serial mode
|
|
535 like a serial transaction would do (e.g., to be able to register threads with
|
|
536 libitm). Transactions become active as soon as they have successfully used the
|
|
537 serial lock to announce this globally (@pxref{serial-lock-impl,,Serial lock
|
|
538 implementation}). Likewise, transactions become serial transactions as soon as
|
|
539 they have acquired the exclusive rights provided by the serial lock (i.e.,
|
|
540 serial mode, which also means that there are no other concurrent active or
|
|
541 serial transactions). Note that active transactions can become serial
|
|
542 transactions when they enter serial mode during the runtime of the
|
|
543 transaction.
|
|
544
|
|
545 @subsection State-to-lock mapping
|
|
546
|
|
547 Application data is protected by the serial lock if there is a serial
|
|
548 transaction and no concurrently running active transaction (i.e., non-serial).
|
|
549 Otherwise, application data is protected by the currently selected method
|
|
550 group, which might use per-method-group locks or other mechanisms. Also note
|
|
551 that application data that is about to be privatized might not be allowed to be
|
|
552 accessed by nontransactional code until privatization safety has been ensured;
|
|
553 the details of this are handled by the current method group.
|
|
554
|
|
555 libitm-internal state is either protected by the serial lock or accessed
|
|
556 through custom concurrent code. The latter applies to the public/shared part
|
|
557 of a transaction object and most typical method-group-specific state.
|
|
558
|
|
559 The former category (protected by the serial lock) includes:
|
|
560 @itemize @bullet
|
|
561 @item The list of active threads that have used transactions.
|
|
562 @item The tables that map functions to their transactional clones.
|
|
563 @item The current selection of which method group to use.
|
|
564 @item Some method-group-specific data, or invariants of this data. For example,
|
|
565 resetting a method group to its initial state is handled by switching to the
|
|
566 same method group, so the serial lock protects such resetting as well.
|
|
567 @end itemize
|
|
568 In general, such state is immutable whenever there exists an active
|
|
569 (non-serial) transaction. If there is no active transaction, a serial
|
|
570 transaction (or a thread that is not currently executing a transaction but has
|
|
571 acquired the serial lock) is allowed to modify this state (but must of course
|
|
572 be careful to not surprise the current method group's implementation with such
|
|
573 modifications).
|
|
574
|
|
575 @subsection Lock acquisition order
|
|
576
|
|
577 To prevent deadlocks, locks acquisition must happen in a globally agreed-upon
|
|
578 order. Note that this applies to other forms of blocking too, but does not
|
|
579 necessarily apply to lock acquisitions that do not block (e.g., trylock()
|
|
580 calls that do not get retried forever). Note that serial transactions are
|
|
581 never return back to active transactions until the transaction has committed.
|
|
582 Likewise, active transactions stay active until they have committed.
|
|
583 Per-method-group locks are typically also not released before commit.
|
|
584
|
|
585 Lock acquisition / blocking rules:
|
|
586 @itemize @bullet
|
|
587
|
|
588 @item Transactions must become active or serial before they are allowed to
|
|
589 use method-group-specific locks or blocking (i.e., the serial lock must be
|
|
590 acquired before those other locks, either in serial or nonserial mode).
|
|
591
|
|
592 @item Any number of threads that do not currently run active transactions can
|
|
593 block while trying to get the serial lock in exclusive mode. Note that active
|
|
594 transactions must not block when trying to upgrade to serial mode unless there
|
|
595 is no other transaction that is trying that (the latter is ensured by the
|
|
596 serial lock implementation.
|
|
597
|
|
598 @item Method groups must prevent deadlocks on their locks. In particular, they
|
|
599 must also be prepared for another active transaction that has acquired
|
|
600 method-group-specific locks but is blocked during an attempt to upgrade to
|
|
601 being a serial transaction. See below for details.
|
|
602
|
|
603 @item Serial transactions can acquire method-group-specific locks because there
|
|
604 will be no other active nor serial transaction.
|
|
605
|
|
606 @end itemize
|
|
607
|
|
608 There is no single rule for per-method-group blocking because this depends on
|
|
609 when a TM method might acquire locks. If no active transaction can upgrade to
|
|
610 being a serial transaction after it has acquired per-method-group locks (e.g.,
|
|
611 when those locks are only acquired during an attempt to commit), then the TM
|
|
612 method does not need to consider a potential deadlock due to serial mode.
|
|
613
|
|
614 If there can be upgrades to serial mode after the acquisition of
|
|
615 per-method-group locks, then TM methods need to avoid those deadlocks:
|
|
616 @itemize @bullet
|
|
617 @item When upgrading to a serial transaction, after acquiring exclusive rights
|
|
618 to the serial lock but before waiting for concurrent active transactions to
|
|
619 finish (@pxref{serial-lock-impl,,Serial lock implementation} for details),
|
|
620 we have to wake up all active transactions waiting on the upgrader's
|
|
621 per-method-group locks.
|
|
622 @item Active transactions blocking on per-method-group locks need to check the
|
|
623 serial lock and abort if there is a pending serial transaction.
|
|
624 @item Lost wake-ups have to be prevented (e.g., by changing a bit in each
|
|
625 per-method-group lock before doing the wake-up, and only blocking on this lock
|
|
626 using a futex if this bit is not group).
|
|
627 @end itemize
|
|
628
|
|
629 @strong{TODO}: Can reuse serial lock for gl-*? And if we can, does it make
|
|
630 sense to introduce further complexity in the serial lock? For gl-*, we can
|
|
631 really only avoid an abort if we do -wb and -vbv.
|
|
632
|
|
633
|
|
634 @subsection Serial lock implementation
|
|
635 @anchor{serial-lock-impl}
|
|
636
|
|
637 The serial lock implementation is optimized towards assuming that serial
|
|
638 transactions are infrequent and not the common case. However, the performance
|
|
639 of entering serial mode can matter because when only few transactions are run
|
|
640 concurrently or if there are few threads, then it can be efficient to run
|
|
641 transactions serially.
|
|
642
|
|
643 The serial lock is similar to a multi-reader-single-writer lock in that there
|
|
644 can be several active transactions but only one serial transaction. However,
|
|
645 we do want to avoid contention (in the lock implementation) between active
|
|
646 transactions, so we split up the reader side of the lock into per-transaction
|
|
647 flags that are true iff the transaction is active. The exclusive writer side
|
|
648 remains a shared single flag, which is acquired using a CAS, for example.
|
|
649 On the fast-path, the serial lock then works similar to Dekker's algorithm but
|
|
650 with several reader flags that a serial transaction would have to check.
|
|
651 A serial transaction thus requires a list of all threads with potentially
|
|
652 active transactions; we can use the serial lock itself to protect this list
|
|
653 (i.e., only threads that have acquired the serial lock can modify this list).
|
|
654
|
|
655 We want starvation-freedom for the serial lock to allow for using it to ensure
|
|
656 progress for potentially starved transactions (@pxref{progress-guarantees,,
|
|
657 Progress Guarantees} for details). However, this is currently not enforced by
|
|
658 the implementation of the serial lock.
|
|
659
|
|
660 Here is pseudo-code for the read/write fast paths of acquiring the serial
|
|
661 lock (read-to-write upgrade is similar to write_lock:
|
|
662 @example
|
|
663 // read_lock:
|
|
664 tx->shared_state |= active;
|
|
665 __sync_synchronize(); // or STLD membar, or C++0x seq-cst fence
|
|
666 while (!serial_lock.exclusive)
|
|
667 if (spinning_for_too_long) goto slowpath;
|
|
668
|
|
669 // write_lock:
|
|
670 if (CAS(&serial_lock.exclusive, 0, this) != 0)
|
|
671 goto slowpath; // writer-writer contention
|
|
672 // need a membar here, but CAS already has full membar semantics
|
|
673 bool need_blocking = false;
|
|
674 for (t: all txns)
|
|
675 @{
|
|
676 for (;t->shared_state & active;)
|
|
677 if (spinning_for_too_long) @{ need_blocking = true; break; @}
|
|
678 @}
|
|
679 if (need_blocking) goto slowpath;
|
|
680 @end example
|
|
681
|
|
682 Releasing a lock in this spin-lock version then just consists of resetting
|
|
683 @code{tx->shared_state} to inactive or clearing @code{serial_lock.exclusive}.
|
|
684
|
|
685 However, we can't rely on a pure spinlock because we need to get the OS
|
|
686 involved at some time (e.g., when there are more threads than CPUs to run on).
|
|
687 Therefore, the real implementation falls back to a blocking slow path, either
|
|
688 based on pthread mutexes or Linux futexes.
|
|
689
|
|
690
|
|
691 @subsection Reentrancy
|
|
692
|
|
693 libitm has to consider the following cases of reentrancy:
|
|
694 @itemize @bullet
|
|
695
|
|
696 @item Transaction calls unsafe code that starts a new transaction: The outer
|
|
697 transaction will become a serial transaction before executing unsafe code.
|
|
698 Therefore, nesting within serial transactions must work, even if the nested
|
|
699 transaction is called from within uninstrumented code.
|
|
700
|
|
701 @item Transaction calls either a transactional wrapper or safe code, which in
|
|
702 turn starts a new transaction: It is not yet defined in the specification
|
|
703 whether this is allowed. Thus, it is undefined whether libitm supports this.
|
|
704
|
|
705 @item Code that starts new transactions might be called from within any part
|
|
706 of libitm: This kind of reentrancy would likely be rather complex and can
|
|
707 probably be avoided. Therefore, it is not supported.
|
|
708
|
|
709 @end itemize
|
|
710
|
|
711 @subsection Privatization safety
|
|
712
|
|
713 Privatization safety is ensured by libitm using a quiescence-based approach.
|
|
714 Basically, a privatizing transaction waits until all concurrent active
|
|
715 transactions will either have finished (are not active anymore) or operate on
|
|
716 a sufficiently recent snapshot to not access the privatized data anymore. This
|
|
717 happens after the privatizing transaction has stopped being an active
|
|
718 transaction, so waiting for quiescence does not contribute to deadlocks.
|
|
719
|
|
720 In method groups that need to ensure publication safety explicitly, active
|
|
721 transactions maintain a flag or timestamp in the public/shared part of the
|
|
722 transaction descriptor. Before blocking, privatizers need to let the other
|
|
723 transactions know that they should wake up the privatizer.
|
|
724
|
|
725 @strong{TODO} Ho to implement the waiters? Should those flags be
|
|
726 per-transaction or at a central place? We want to avoid one wake/wait call
|
|
727 per active transactions, so we might want to use either a tree or combining
|
|
728 to reduce the syscall overhead, or rather spin for a long amount of time
|
|
729 instead of doing blocking. Also, it would be good if only the last transaction
|
|
730 that the privatizer waits for would do the wake-up.
|
|
731
|
|
732 @subsection Progress guarantees
|
|
733 @anchor{progress-guarantees}
|
|
734
|
|
735 Transactions that do not make progress when using the current TM method will
|
|
736 eventually try to execute in serial mode. Thus, the serial lock's progress
|
|
737 guarantees determine the progress guarantees of the whole TM. Obviously, we at
|
|
738 least need deadlock-freedom for the serial lock, but it would also be good to
|
|
739 provide starvation-freedom (informally, all threads will finish executing a
|
|
740 transaction eventually iff they get enough cycles).
|
|
741
|
|
742 However, the scheduling of transactions (e.g., thread scheduling by the OS)
|
|
743 also affects the handling of progress guarantees by the TM. First, the TM
|
|
744 can only guarantee deadlock-freedom if threads do not get stopped. Likewise,
|
|
745 low-priority threads can starve if they do not get scheduled when other
|
|
746 high-priority threads get those cycles instead.
|
|
747
|
|
748 If all threads get scheduled eventually, correct lock implementations will
|
|
749 provide deadlock-freedom, but might not provide starvation-freedom. We can
|
|
750 either enforce the latter in the TM's lock implementation, or assume that
|
|
751 the scheduling is sufficiently random to yield a probabilistic guarantee that
|
|
752 no thread will starve (because eventually, a transaction will encounter a
|
|
753 scheduling that will allow it to run). This can indeed work well in practice
|
|
754 but is not necessarily guaranteed to work (e.g., simple spin locks can be
|
|
755 pretty efficient).
|
|
756
|
|
757 Because enforcing stronger progress guarantees in the TM has a higher runtime
|
|
758 overhead, we focus on deadlock-freedom right now and assume that the threads
|
|
759 will get scheduled eventually by the OS (but don't consider threads with
|
|
760 different priorities). We should support starvation-freedom for serial
|
|
761 transactions in the future. Everything beyond that is highly related to proper
|
|
762 contention management across all of the TM (including with TM method to
|
|
763 choose), and is future work.
|
|
764
|
|
765 @strong{TODO} Handling thread priorities: We want to avoid priority inversion
|
|
766 but it's unclear how often that actually matters in practice. Workloads that
|
|
767 have threads with different priorities will likely also require lower latency
|
|
768 or higher throughput for high-priority threads. Therefore, it probably makes
|
|
769 not that much sense (except for eventual progress guarantees) to use
|
|
770 priority inheritance until the TM has priority-aware contention management.
|
|
771
|
|
772
|
|
773 @c ---------------------------------------------------------------------
|
|
774 @c GNU Free Documentation License
|
|
775 @c ---------------------------------------------------------------------
|
|
776
|
|
777 @include fdl.texi
|
|
778
|
|
779 @c ---------------------------------------------------------------------
|
|
780 @c Index
|
|
781 @c ---------------------------------------------------------------------
|
|
782
|
|
783 @node Library Index
|
|
784 @unnumbered Library Index
|
|
785
|
|
786 @printindex cp
|
|
787
|
|
788 @bye
|