Mercurial > hg > CbC > CbC_gcc
diff libgomp/libgomp.texi @ 145:1830386684a0
gcc-9.2.0
author | anatofuz |
---|---|
date | Thu, 13 Feb 2020 11:34:05 +0900 |
parents | 84e7813d76e9 |
children |
line wrap: on
line diff
--- a/libgomp/libgomp.texi Thu Oct 25 07:37:49 2018 +0900 +++ b/libgomp/libgomp.texi Thu Feb 13 11:34:05 2020 +0900 @@ -7,7 +7,7 @@ @copying -Copyright @copyright{} 2006-2018 Free Software Foundation, Inc. +Copyright @copyright{} 2006-2020 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or @@ -95,10 +95,12 @@ @comment @menu * Enabling OpenMP:: How to enable OpenMP for your applications. -* Runtime Library Routines:: The OpenMP runtime application programming +* OpenMP Runtime Library Routines: Runtime Library Routines. + The OpenMP runtime application programming interface. -* Environment Variables:: Influencing runtime behavior with environment - variables. +* OpenMP Environment Variables: Environment Variables. + Influencing OpenMP runtime behavior with + environment variables. * Enabling OpenACC:: How to enable OpenACC for your applications. * OpenACC Runtime Library Routines:: The OpenACC runtime application @@ -109,6 +111,7 @@ asynchronous operations. * OpenACC Library Interoperability:: OpenACC library interoperability with the NVIDIA CUBLAS library. +* OpenACC Profiling Interface:: * The libgomp ABI:: Notes on the external ABI presented by libgomp. * Reporting Bugs:: How to report bugs in the GNU Offloading and Multi Processing Runtime Library. @@ -144,11 +147,11 @@ @c --------------------------------------------------------------------- -@c Runtime Library Routines +@c OpenMP Runtime Library Routines @c --------------------------------------------------------------------- @node Runtime Library Routines -@chapter Runtime Library Routines +@chapter OpenMP Runtime Library Routines The runtime routines described here are defined by Section 3 of the OpenMP specification in version 4.5. The routines are structured in following @@ -1327,11 +1330,11 @@ @c --------------------------------------------------------------------- -@c Environment Variables +@c OpenMP Environment Variables @c --------------------------------------------------------------------- @node Environment Variables -@chapter Environment Variables +@chapter OpenMP Environment Variables The environment variables which beginning with @env{OMP_} are defined by section 4 of the OpenMP specification in version 4.5, while those @@ -1724,9 +1727,9 @@ @ref{OMP_STACKSIZE} @item @emph{Reference}: -@uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html, +@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html, GCC Patches Mailinglist}, -@uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html, +@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html, GCC Patches Mailinglist} @end table @@ -1808,7 +1811,7 @@ To activate the OpenACC extensions for C/C++ and Fortran, the compile-time flag @option{-fopenacc} must be specified. This enables the OpenACC directive -@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form, +@code{#pragma acc} in C/C++ and @code{!$acc} directives in free form, @code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form, @code{!$} conditional compilation sentinels in free form and @code{c$}, @code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also @@ -1817,7 +1820,7 @@ A complete description of all OpenACC directives accepted may be found in the @uref{https://www.openacc.org, OpenACC} Application Programming -Interface manual, version 2.0. +Interface manual, version 2.6. Note that this is an experimental feature and subject to change in future versions of GCC. See @@ -1833,7 +1836,7 @@ @chapter OpenACC Runtime Library Routines The runtime routines described here are defined by section 3 of the OpenACC -specifications in version 2.0. +specifications in version 2.6. They have C linkage, and do not throw exceptions. Generally, they are available only for the host, with the exception of @code{acc_on_device}, which is available for both the host and the @@ -1846,13 +1849,14 @@ * acc_get_device_type:: Get type of device accelerator to be used. * acc_set_device_num:: Set device number to use. * acc_get_device_num:: Get device number to be used. +* acc_get_property:: Get device property. * acc_async_test:: Tests for completion of a specific asynchronous operation. -* acc_async_test_all:: Tests for completion of all asychronous +* acc_async_test_all:: Tests for completion of all asynchronous operations. * acc_wait:: Wait for completion of a specific asynchronous operation. -* acc_wait_all:: Waits for completion of all asyncrhonous +* acc_wait_all:: Waits for completion of all asynchronous operations. * acc_wait_all_async:: Wait for completion of all asynchronous operations. @@ -1884,10 +1888,12 @@ host address. * acc_hostptr:: Get host pointer associated with specific device address. -* acc_is_present:: Indiciate whether host variable / array is +* acc_is_present:: Indicate whether host variable / array is present on device. * acc_memcpy_to_device:: Copy host memory to device memory. * acc_memcpy_from_device:: Copy device memory to host memory. +* acc_attach:: Let device pointer point to device-pointer target. +* acc_detach:: Let device pointer point to host-pointer target. API routines for target platforms. @@ -1895,6 +1901,13 @@ * acc_get_current_cuda_context::Get CUDA context handle. * acc_get_cuda_stream:: Get CUDA stream handle. * acc_set_cuda_stream:: Set CUDA stream handle. + +API routines for the OpenACC Profiling Interface. + +* acc_prof_register:: Register callbacks. +* acc_prof_unregister:: Unregister callbacks. +* acc_prof_lookup:: Obtain inquiry functions. +* acc_register_library:: Library registration. @end menu @@ -1918,7 +1931,7 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3.2.1. @end table @@ -1928,7 +1941,7 @@ @section @code{acc_set_device_type} -- Set type of device accelerator to use. @table @asis @item @emph{Description} -This function indicates to the runtime library which device typr, specified +This function indicates to the runtime library which device type, specified in @var{devicetype}, to use when executing a parallel or kernels region. @item @emph{C/C++}: @@ -1943,7 +1956,7 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3.2.2. @end table @@ -1968,7 +1981,7 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3.2.3. @end table @@ -1979,7 +1992,7 @@ @table @asis @item @emph{Description} This function will indicate to the runtime which device number, -specified by @var{num}, associated with the specifed device +specified by @var{num}, associated with the specified device type @var{devicetype}. @item @emph{C/C++}: @@ -1995,7 +2008,7 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3.2.4. @end table @@ -2022,20 +2035,58 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3.2.5. @end table +@node acc_get_property +@section @code{acc_get_property} -- Get device property. +@cindex acc_get_property +@cindex acc_get_property_string +@table @asis +@item @emph{Description} +These routines return the value of the specified @var{property} for the +device being queried according to @var{devicenum} and @var{devicetype}. +Integer-valued and string-valued properties are returned by +@code{acc_get_property} and @code{acc_get_property_string} respectively. +The Fortran @code{acc_get_property_string} subroutine returns the string +retrieved in its fourth argument while the remaining entry points are +functions, which pass the return value as their result. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);} +@item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);} +@end multitable + +@item @emph{Fortran}: +@multitable @columnfractions .20 .80 +@item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)} +@item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)} +@item @tab @code{integer devicenum} +@item @tab @code{integer(kind=acc_device_kind) devicetype} +@item @tab @code{integer(kind=acc_device_property) property} +@item @tab @code{integer(kind=acc_device_property) acc_get_property} +@item @tab @code{character(*) string} +@end multitable + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.6. +@end table + + + @node acc_async_test @section @code{acc_async_test} -- Test for completion of a specific asynchronous operation. @table @asis @item @emph{Description} -This function tests for completion of the asynchrounous operation specified +This function tests for completion of the asynchronous operation specified in @var{arg}. In C/C++, a non-zero value will be returned to indicate the specified asynchronous operation has completed. While Fortran will return -a @code{true}. If the asynchrounous operation has not completed, C/C++ returns +a @code{true}. If the asynchronous operation has not completed, C/C++ returns a zero and Fortran returns a @code{false}. @item @emph{C/C++}: @@ -2051,8 +2102,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.6. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.9. @end table @@ -2061,7 +2112,7 @@ @section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations. @table @asis @item @emph{Description} -This function tests for completion of all asynchrounous operations. +This function tests for completion of all asynchronous operations. In C/C++, a non-zero value will be returned to indicate all asynchronous operations have completed. While Fortran will return a @code{true}. If any asynchronous operation has not completed, C/C++ returns a zero and @@ -2079,8 +2130,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.7. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.10. @end table @@ -2107,8 +2158,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.8. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.11. @end table @@ -2132,8 +2183,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.10. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.13. @end table @@ -2158,8 +2209,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.11. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.14. @end table @@ -2183,8 +2234,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.9. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.12. @end table @@ -2208,8 +2259,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.12. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.7. @end table @@ -2233,8 +2284,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.13. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.8. @end table @@ -2245,7 +2296,7 @@ @item @emph{Description}: This function returns whether the program is executing on a particular device specified in @var{devicetype}. In C/C++ a non-zero value is -returned to indicate the device is execiting on the specified device type. +returned to indicate the device is executing on the specified device type. In Fortran, @code{true} will be returned. If the program is not executing on the specified device type C/C++ will return a zero, while Fortran will return @code{false}. @@ -2264,8 +2315,8 @@ @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.14. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.17. @end table @@ -2283,8 +2334,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.15. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.18. @end table @@ -2301,8 +2352,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.16. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.19. @end table @@ -2322,6 +2373,7 @@ @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);} @end multitable @item @emph{Fortran}: @@ -2331,11 +2383,18 @@ @item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)} @item @tab @code{type, dimension(:[,:]...) :: a} @item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.17. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.20. @end table @@ -2344,7 +2403,7 @@ @section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory. @table @asis @item @emph{Description} -This function tests if the host data specifed by @var{a} and of length +This function tests if the host data specified by @var{a} and of length @var{len} is present or not. If it is not present, then device memory will be allocated and the host memory copied. The device address of the newly allocated device memory is returned. @@ -2353,6 +2412,9 @@ a contiguous array section. The second form @var{a} specifies a variable or array element and @var{len} specifies the length in bytes. +Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for +backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead. + @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);} @@ -2374,8 +2436,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.18. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.20. @end table @@ -2395,6 +2457,7 @@ @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);} @end multitable @item @emph{Fortran}: @@ -2404,11 +2467,18 @@ @item @emph{Interface}: @tab @code{subroutine acc_create(a, len)} @item @tab @code{type, dimension(:[,:]...) :: a} @item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.19. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.21. @end table @@ -2417,7 +2487,7 @@ @section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory. @table @asis @item @emph{Description} -This function tests if the host data specifed by @var{a} and of length +This function tests if the host data specified by @var{a} and of length @var{len} is present or not. If it is not present, then device memory will be allocated and mapped to host memory. In C/C++, the device address of the newly allocated device memory is returned. @@ -2426,6 +2496,8 @@ a contiguous array section. The second form @var{a} specifies a variable or array element and @var{len} specifies the length in bytes. +Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for +backward compatibility with OpenACC 2.0; use @ref{acc_create} instead. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @@ -2448,8 +2520,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.20. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.21. @end table @@ -2468,6 +2540,9 @@ @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);} +@item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);} @end multitable @item @emph{Fortran}: @@ -2477,11 +2552,30 @@ @item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)} @item @tab @code{type, dimension(:[,:]...) :: a} @item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.21. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.22. @end table @@ -2500,6 +2594,9 @@ @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);} +@item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);} @end multitable @item @emph{Fortran}: @@ -2509,11 +2606,30 @@ @item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)} @item @tab @code{type, dimension(:[,:]...) :: a} @item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.22. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.23. @end table @@ -2533,6 +2649,7 @@ @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);} @end multitable @item @emph{Fortran}: @@ -2542,11 +2659,18 @@ @item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)} @item @tab @code{type, dimension(:[,:]...) :: a} @item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.23. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.24. @end table @@ -2566,6 +2690,7 @@ @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);} +@item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);} @end multitable @item @emph{Fortran}: @@ -2575,11 +2700,18 @@ @item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)} @item @tab @code{type, dimension(:[,:]...) :: a} @item @tab @code{integer len} +@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer(acc_handle_kind) :: async} +@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)} +@item @tab @code{type, dimension(:[,:]...) :: a} +@item @tab @code{integer len} +@item @tab @code{integer(acc_handle_kind) :: async} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.24. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.25. @end table @@ -2598,8 +2730,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.25. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.26. @end table @@ -2617,8 +2749,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.26. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.27. @end table @@ -2636,8 +2768,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.27. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.28. @end table @@ -2655,8 +2787,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.28. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.29. @end table @@ -2694,8 +2826,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.29. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.30. @end table @@ -2714,8 +2846,8 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.30. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.31. @end table @@ -2734,8 +2866,50 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -3.2.31. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.32. +@end table + + + +@node acc_attach +@section @code{acc_attach} -- Let device pointer point to device-pointer target. +@table @asis +@item @emph{Description} +This function updates a pointer on the device from pointing to a host-pointer +address to pointing to the corresponding device data. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);} +@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);} +@end multitable + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.34. +@end table + + + +@node acc_detach +@section @code{acc_detach} -- Let device pointer point to host-pointer target. +@table @asis +@item @emph{Description} +This function updates a pointer on the device from pointing to a device-pointer +address to pointing to the corresponding host data. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);} +@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);} +@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);} +@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);} +@end multitable + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +3.2.35. @end table @@ -2753,7 +2927,7 @@ @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section A.2.1.1. @end table @@ -2768,11 +2942,11 @@ @item @emph{C/C++}: @multitable @columnfractions .20 .80 -@item @emph{Prototype}: @tab @code{acc_get_current_cuda_context(void);} +@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section A.2.1.2. @end table @@ -2782,16 +2956,16 @@ @section @code{acc_get_cuda_stream} -- Get CUDA stream handle. @table @asis @item @emph{Description} -This function returns the CUDA stream handle. This handle is the same -as used by the CUDA Runtime or Driver API's. +This function returns the CUDA stream handle for the queue @var{async}. +This handle is the same as used by the CUDA Runtime or Driver API's. @item @emph{C/C++}: @multitable @columnfractions .20 .80 -@item @emph{Prototype}: @tab @code{acc_get_cuda_stream(void);} +@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);} @end multitable @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section A.2.1.3. @end table @@ -2802,16 +2976,105 @@ @table @asis @item @emph{Description} This function associates the stream handle specified by @var{stream} with -the asynchronous value specified by @var{async}. +the queue @var{async}. + +This cannot be used to change the stream handle associated with +@code{acc_async_sync}. + +The return value is not specified. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);} +@end multitable + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +A.2.1.4. +@end table + + + +@node acc_prof_register +@section @code{acc_prof_register} -- Register callbacks. +@table @asis +@item @emph{Description}: +This function registers callbacks. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);} +@end multitable + +@item @emph{See also}: +@ref{OpenACC Profiling Interface} + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +5.3. +@end table + + + +@node acc_prof_unregister +@section @code{acc_prof_unregister} -- Unregister callbacks. +@table @asis +@item @emph{Description}: +This function unregisters callbacks. @item @emph{C/C++}: @multitable @columnfractions .20 .80 -@item @emph{Prototype}: @tab @code{acc_set_cuda_stream(int async void *stream);} +@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);} @end multitable +@item @emph{See also}: +@ref{OpenACC Profiling Interface} + @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section -A.2.1.4. +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +5.3. +@end table + + + +@node acc_prof_lookup +@section @code{acc_prof_lookup} -- Obtain inquiry functions. +@table @asis +@item @emph{Description}: +Function to obtain inquiry functions. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);} +@end multitable + +@item @emph{See also}: +@ref{OpenACC Profiling Interface} + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +5.3. +@end table + + + +@node acc_register_library +@section @code{acc_register_library} -- Library registration. +@table @asis +@item @emph{Description}: +Function for library registration. + +@item @emph{C/C++}: +@multitable @columnfractions .20 .80 +@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);} +@end multitable + +@item @emph{See also}: +@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB} + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +5.3. @end table @@ -2825,11 +3088,14 @@ The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} are defined by section 4 of the OpenACC specification in version 2.0. +The variable @env{ACC_PROFLIB} +is defined by section 4 of the OpenACC specification in version 2.6. The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes. @menu * ACC_DEVICE_TYPE:: * ACC_DEVICE_NUM:: +* ACC_PROFLIB:: * GCC_ACC_NOTIFY:: @end menu @@ -2839,7 +3105,7 @@ @section @code{ACC_DEVICE_TYPE} @table @asis @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section 4.1. @end table @@ -2849,12 +3115,25 @@ @section @code{ACC_DEVICE_NUM} @table @asis @item @emph{Reference}: -@uref{https://www.openacc.org, OpenACC specification v2.0}, section +@uref{https://www.openacc.org, OpenACC specification v2.6}, section 4.2. @end table +@node ACC_PROFLIB +@section @code{ACC_PROFLIB} +@table @asis +@item @emph{See also}: +@ref{acc_register_library}, @ref{OpenACC Profiling Interface} + +@item @emph{Reference}: +@uref{https://www.openacc.org, OpenACC specification v2.6}, section +4.3. +@end table + + + @node GCC_ACC_NOTIFY @section @code{GCC_ACC_NOTIFY} @table @asis @@ -2879,7 +3158,7 @@ streams@footnote{See "Stream Management" in "CUDA Driver API", TRM-06703-001, Version 5.5, for additional information}. -The primary means by that the asychronous functionality is accessed +The primary means by that the asynchronous functionality is accessed is through the use of those OpenACC directives which make use of the @code{async} and @code{wait} clauses. When the @code{async} clause is first used with a directive, it creates a CUDA stream. If an @@ -3066,7 +3345,311 @@ @code{acc_set_device_num()}@footnote{More complete information about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC} -Application Programming Interface”, Version 2.0.} +Application Programming Interface”, Version 2.6.} + + + +@c --------------------------------------------------------------------- +@c OpenACC Profiling Interface +@c --------------------------------------------------------------------- + +@node OpenACC Profiling Interface +@chapter OpenACC Profiling Interface + +@section Implementation Status and Implementation-Defined Behavior + +We're implementing the OpenACC Profiling Interface as defined by the +OpenACC 2.6 specification. We're clarifying some aspects here as +@emph{implementation-defined behavior}, while they're still under +discussion within the OpenACC Technical Committee. + +This implementation is tuned to keep the performance impact as low as +possible for the (very common) case that the Profiling Interface is +not enabled. This is relevant, as the Profiling Interface affects all +the @emph{hot} code paths (in the target code, not in the offloaded +code). Users of the OpenACC Profiling Interface can be expected to +understand that performance will be impacted to some degree once the +Profiling Interface has gotten enabled: for example, because of the +@emph{runtime} (libgomp) calling into a third-party @emph{library} for +every event that has been registered. + +We're not yet accounting for the fact that @cite{OpenACC events may +occur during event processing}. + +We're not yet implementing initialization via a +@code{acc_register_library} function that is either statically linked +in, or dynamically via @env{LD_PRELOAD}. +Initialization via @code{acc_register_library} functions dynamically +loaded via the @env{ACC_PROFLIB} environment variable does work, as +does directly calling @code{acc_prof_register}, +@code{acc_prof_unregister}, @code{acc_prof_lookup}. + +As currently there are no inquiry functions defined, calls to +@code{acc_prof_lookup} will always return @code{NULL}. + +There aren't separate @emph{start}, @emph{stop} events defined for the +event types @code{acc_ev_create}, @code{acc_ev_delete}, +@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these +should be triggered before or after the actual device-specific call is +made. We trigger them after. + +Remarks about data provided to callbacks: + +@table @asis + +@item @code{acc_prof_info.event_type} +It's not clear if for @emph{nested} event callbacks (for example, +@code{acc_ev_enqueue_launch_start} as part of a parent compute +construct), this should be set for the nested event +(@code{acc_ev_enqueue_launch_start}), or if the value of the parent +construct should remain (@code{acc_ev_compute_construct_start}). In +this implementation, the value will generally correspond to the +innermost nested event type. + +@item @code{acc_prof_info.device_type} +@itemize + +@item +For @code{acc_ev_compute_construct_start}, and in presence of an +@code{if} clause with @emph{false} argument, this will still refer to +the offloading device type. +It's not clear if that's the expected behavior. + +@item +Complementary to the item before, for +@code{acc_ev_compute_construct_end}, this is set to +@code{acc_device_host} in presence of an @code{if} clause with +@emph{false} argument. +It's not clear if that's the expected behavior. + +@end itemize + +@item @code{acc_prof_info.thread_id} +Always @code{-1}; not yet implemented. + +@item @code{acc_prof_info.async} +@itemize + +@item +Not yet implemented correctly for +@code{acc_ev_compute_construct_start}. + +@item +In a compute construct, for host-fallback +execution/@code{acc_device_host} it will always be +@code{acc_async_sync}. +It's not clear if that's the expected behavior. + +@item +For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}, +it will always be @code{acc_async_sync}. +It's not clear if that's the expected behavior. + +@end itemize + +@item @code{acc_prof_info.async_queue} +There is no @cite{limited number of asynchronous queues} in libgomp. +This will always have the same value as @code{acc_prof_info.async}. + +@item @code{acc_prof_info.src_file} +Always @code{NULL}; not yet implemented. + +@item @code{acc_prof_info.func_name} +Always @code{NULL}; not yet implemented. + +@item @code{acc_prof_info.line_no} +Always @code{-1}; not yet implemented. + +@item @code{acc_prof_info.end_line_no} +Always @code{-1}; not yet implemented. + +@item @code{acc_prof_info.func_line_no} +Always @code{-1}; not yet implemented. + +@item @code{acc_prof_info.func_end_line_no} +Always @code{-1}; not yet implemented. + +@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type} +Relating to @code{acc_prof_info.event_type} discussed above, in this +implementation, this will always be the same value as +@code{acc_prof_info.event_type}. + +@item @code{acc_event_info.*.parent_construct} +@itemize + +@item +Will be @code{acc_construct_parallel} for all OpenACC compute +constructs as well as many OpenACC Runtime API calls; should be the +one matching the actual construct, or +@code{acc_construct_runtime_api}, respectively. + +@item +Will be @code{acc_construct_enter_data} or +@code{acc_construct_exit_data} when processing variable mappings +specified in OpenACC @emph{declare} directives; should be +@code{acc_construct_declare}. + +@item +For implicit @code{acc_ev_device_init_start}, +@code{acc_ev_device_init_end}, and explicit as well as implicit +@code{acc_ev_alloc}, @code{acc_ev_free}, +@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, +@code{acc_ev_enqueue_download_start}, and +@code{acc_ev_enqueue_download_end}, will be +@code{acc_construct_parallel}; should reflect the real parent +construct. + +@end itemize + +@item @code{acc_event_info.*.implicit} +For @code{acc_ev_alloc}, @code{acc_ev_free}, +@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, +@code{acc_ev_enqueue_download_start}, and +@code{acc_ev_enqueue_download_end}, this currently will be @code{1} +also for explicit usage. + +@item @code{acc_event_info.data_event.var_name} +Always @code{NULL}; not yet implemented. + +@item @code{acc_event_info.data_event.host_ptr} +For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always +@code{NULL}. + +@item @code{typedef union acc_api_info} +@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific +Information}. This should obviously be @code{typedef @emph{struct} +acc_api_info}. + +@item @code{acc_api_info.device_api} +Possibly not yet implemented correctly for +@code{acc_ev_compute_construct_start}, +@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}: +will always be @code{acc_device_api_none} for these event types. +For @code{acc_ev_enter_data_start}, it will be +@code{acc_device_api_none} in some cases. + +@item @code{acc_api_info.device_type} +Always the same as @code{acc_prof_info.device_type}. + +@item @code{acc_api_info.vendor} +Always @code{-1}; not yet implemented. + +@item @code{acc_api_info.device_handle} +Always @code{NULL}; not yet implemented. + +@item @code{acc_api_info.context_handle} +Always @code{NULL}; not yet implemented. + +@item @code{acc_api_info.async_handle} +Always @code{NULL}; not yet implemented. + +@end table + +Remarks about certain event types: + +@table @asis + +@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} +@itemize + +@item +@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in +@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c', +@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'. +Whan a compute construct triggers implicit +@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end} +events, they currently aren't @emph{nested within} the corresponding +@code{acc_ev_compute_construct_start} and +@code{acc_ev_compute_construct_end}, but they're currently observed +@emph{before} @code{acc_ev_compute_construct_start}. +It's not clear what to do: the standard asks us provide a lot of +details to the @code{acc_ev_compute_construct_start} callback, without +(implicitly) initializing a device before? + +@item +Callbacks for these event types will not be invoked for calls to the +@code{acc_set_device_type} and @code{acc_set_device_num} functions. +It's not clear if they should be. + +@end itemize + +@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end} +@itemize + +@item +Callbacks for these event types will also be invoked for OpenACC +@emph{host_data} constructs. +It's not clear if they should be. + +@item +Callbacks for these event types will also be invoked when processing +variable mappings specified in OpenACC @emph{declare} directives. +It's not clear if they should be. + +@end itemize + +@end table + +Callbacks for the following event types will be invoked, but dispatch +and information provided therein has not yet been thoroughly reviewed: + +@itemize +@item @code{acc_ev_alloc} +@item @code{acc_ev_free} +@item @code{acc_ev_update_start}, @code{acc_ev_update_end} +@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end} +@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end} +@end itemize + +During device initialization, and finalization, respectively, +callbacks for the following event types will not yet be invoked: + +@itemize +@item @code{acc_ev_alloc} +@item @code{acc_ev_free} +@end itemize + +Callbacks for the following event types have not yet been implemented, +so currently won't be invoked: + +@itemize +@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end} +@item @code{acc_ev_runtime_shutdown} +@item @code{acc_ev_create}, @code{acc_ev_delete} +@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end} +@end itemize + +For the following runtime library functions, not all expected +callbacks will be invoked (mostly concerning implicit device +initialization): + +@itemize +@item @code{acc_get_num_devices} +@item @code{acc_set_device_type} +@item @code{acc_get_device_type} +@item @code{acc_set_device_num} +@item @code{acc_get_device_num} +@item @code{acc_init} +@item @code{acc_shutdown} +@end itemize + +Aside from implicit device initialization, for the following runtime +library functions, no callbacks will be invoked for shared-memory +offloading devices (it's not clear if they should be): + +@itemize +@item @code{acc_malloc} +@item @code{acc_free} +@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async} +@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async} +@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async} +@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async} +@item @code{acc_update_device}, @code{acc_update_device_async} +@item @code{acc_update_self}, @code{acc_update_self_async} +@item @code{acc_map_data}, @code{acc_unmap_data} +@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async} +@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async} +@end itemize @@ -3477,7 +4060,7 @@ @chapter Reporting Bugs Bugs in the GNU Offloading and Multi Processing Runtime Library should -be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}. Please add +be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add "openacc", or "openmp", or both to the keywords field in the bug report, as appropriate.