27
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
1 title: Gears OS
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
2 author: Tatsuki IHA
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
3 profile:
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
4 lang: Japanese
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
5 code-engine: coderay
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
6
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
7 ## 研究目的
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
8 - 当研究室では 処理の単位を Code Gear、 データの単位を Data Gear を用いて 信頼性が高い並列処理を行う Gears OS を開発している
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
9 - Gears OS では Task を Code Gear と実行するときに必要な Input Data Gear と出力するための Output Data Gear の組で表現される。 Input Data Gear/Output Data Gear によって依存関係が決定し、それにそって並列実行を行う.
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
10 - 信頼性の確保はモデルチェック、検証等を使用して行う。この信頼性のための計算は通常の計算とは別の階層のメタ計算として記述する。
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
11 - また、 メタ計算は信頼性の他に CPU, GPU などの実行環境の切り替え, データ拡張等の柔軟性を提供する。
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
12 - 本研究では、 Gears OS の並列処理機構の実装を行う。また、並列処理の検証をメタ計算として記述することで、 並列処理の信頼性を保証する。
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
13
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
14 ## 今週
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
15 - Gears での GPU の実行(結構無理やり)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
16 - bitonicSort
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
17
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
18 ## GPUWorker
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
19 - cudaInit() はもともと, createGPUWorker() で行っている
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
20 - しかし, 実際 CUDA の API を呼び出すのは, createGPUWorker を呼んだスレッドではなく, createGPUWorker で作られた Thread
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
21 - そうすると code=201(CUDA_ERROR_INVALID_CONTEXT) がでる
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
22 - cuda の API は cuInit() を呼んだ thread でしか呼び出せない
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
23 - thread を入れ替える技の痕跡があるが, thread 作った先の startCUDAWorker で cudaInit することに
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
24
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
25 ``` c
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
26 static void startCUDAWorker(Worker* worker) {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
27 struct CUDAWorker* cudaWorker = &worker->worker->CUDAWorker;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
28 cudaInit(cudaWorker, 0);
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
29 cudaWorker->context = NEW(struct Context);
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
30 initContext(cudaWorker->context);
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
31 Gearef(cudaWorker->context, Worker)->worker = (union Data*)worker;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
32 goto meta(cudaWorker->context, worker->taskReceive);
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
33 }
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
34 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
35
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
36 ## MultiDimIterator
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
37 - 取り敢えず Iterator は全て CUDAWorker(作られていれば) にで実行するように
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
38 - 将来的には meta部分で切り替えられるようにする
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
39 - CUDA に対応させるために, 前作った 1次元の iterator を 3次元に対応
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
40
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
41 ``` c
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
42 Iterator* createMultiDimIterator(struct Context* context, int x, int y, int z) {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
43 struct Iterator* iterator = new Iterator();
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
44 struct MultiDimIterator* multiDimIterator = new MultiDimIterator();
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
45 iterator->iterator = (union Data*)multiDimIterator;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
46 iterator->exec = C_execMultiDimIterator;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
47 iterator->barrier = C_barrierMultiDimIterator;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
48 multiDimIterator->x = x;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
49 multiDimIterator->y = y;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
50 multiDimIterator->z = z;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
51 multiDimIterator->count = x * y * z;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
52 multiDimIterator->counterX = 0;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
53 multiDimIterator->counterY = 0;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
54 multiDimIterator->counterZ = 0;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
55 return iterator;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
56 }
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
57 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
58
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
59 - exec時は, GPU があれば, task に gpu を設定して spawn する
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
60
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
61 ``` c
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
62 __code execMultiDimIterator(struct MultiDimIterator* iterator, struct TaskManager* taskManager, struct Context* task, int numGPU, __code next(...)) {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
63 // No GPU device
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
64 if (numGPU == 0) {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
65 goto meta(context, C_execMultiDimIterator1);
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
66 }
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
67 task->iterate = 1;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
68 task->gpu = 1;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
69 taskManager->taskManager = (union Data*)task->taskManager;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
70 taskManager->context = task;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
71 taskManager->next = next;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
72 goto meta(context, task->taskManager->spawn);
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
73 }
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
74 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
75
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
76 ## TaskManager
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
77 - Iterator でgpu がセットされていれば, CUDAWorker に投げる
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
78
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
79 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
80 if (task->gpu) {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
81 task->workerId = taskManagerImpl->sendGPUWorkerIndex;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
82 if(++taskManagerImpl->sendGPUWorkerIndex >= taskManager->cpu) {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
83 taskManagerImpl->sendGPUWorkerIndex = taskManager->gpu;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
84 }
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
85 } else {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
86 task->workerId = taskManagerImpl->sendCPUWorkerIndex;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
87 if(++taskManagerImpl->sendCPUWorkerIndex >= taskManager->maxCPU) {
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
88 taskManagerImpl->sendCPUWorkerIndex = taskManager->cpu;
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
89 }
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
90 }
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
91 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
92
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
93 ## CUDAExec()
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
94 - Task(Context) から Input&output DataGear をGPU に送信するために変換 & Task の実行をしている部分
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
95 - まだここは実行するタスクの内容で書き換えているのでCeriumのように一般化してデータを送信する必要あり
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
96 - meta 部分で全部押し込んじゃう?
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
97 - 必要なのはGPUに送るデータのサイズなので, meta部分でサイズを持たせて, それを使うようにする
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
98 - block の数は単純に iterator の値を適応(この辺は Cerium と一緒), 次にBenchmarkを載せるが, 遅い
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
99 - http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
100 - http://horacio9573.no-ip.org/cuda/group__CUDA__EXEC_gb8f3dc3031b40da29d5f9a7139e52e15.html
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
101
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
102 ``` c
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
103 checkCudaErrors(cuLaunchKernel(context->function,
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
104 iterator->x, iterator->y, iterator->z,
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
105 1, 1, 1,
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
106 0, NULL, args, NULL));
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
107 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
108
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
109 ## 一応Benchmark的なもの
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
110 - BitonicSort を GPUで動かしてみました
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
111 - 環境 firefly
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
112 - GPU Quadro k5000
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
113 - サイズは 2^23
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
114 - swap のタスク数は 2^22 にしたかったが, CUDA\_ERROR\_INVALID\_VALUE が出てしまった, 2^15=32768 だと Error なしだったのでそれで動かしてみました
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
115 - cpu は 2^15 で分割するとその分Contextを作ってしまい, とても遅いので32個のTaskに分割
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
116
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
117 <table border="1" align='center' width='50%'>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
118 <tbody>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
119 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
120 <td style="text-align: center;">Processors</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
121 <td style="text-align: center;">Time(ms)</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
122 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
123 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
124 <td style="text-align: center;">1 CPU</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
125 <td style="text-align: right;">31161</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
126 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
127 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
128 <td style="text-align: center;">2 CPUs</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
129 <td style="text-align: right;">16277</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
130 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
131 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
132 <td style="text-align: center;">4 CPUs</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
133 <td style="text-align: right;">8466</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
134 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
135 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
136 <td style="text-align: center;">8 CPUs</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
137 <td style="text-align: right;">4457</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
138 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
139 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
140 <td style="text-align: center;">12 CPUs</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
141 <td style="text-align: right;">3299</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
142 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
143 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
144 <td style="text-align: center;">GPU(CUDA)</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
145 <td style="text-align: right;">12083</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
146 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
147 </tbody>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
148 </table>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
149
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
150 ## ちょっと GPU が遅いので
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
151 - block 内の Threadも指定してみる
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
152 - 指定値は 取り敢えず最大値の 1024
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
153 - Task 数は 2^22
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
154
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
155 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
156 checkCudaErrors(cuLaunchKernel(context->function,
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
157 iterator->x/1024, iterator->y, iterator->z,
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
158 1024, 1, 1,
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
159 0, NULL, args, NULL));
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
160 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
161
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
162 <table border="1" align='center' width='50%'>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
163 <tbody>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
164 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
165 <td style="text-align: center;">Processors</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
166 <td style="text-align: center;">Time(ms)</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
167 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
168 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
169 <td style="text-align: center;">4 CPUs</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
170 <td style="text-align: right;">8466</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
171 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
172 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
173 <td style="text-align: center;">8 CPUs</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
174 <td style="text-align: right;">4457</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
175 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
176 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
177 <td style="text-align: center;">12 CPUs</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
178 <td style="text-align: right;">3299</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
179 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
180 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
181 <td style="text-align: center;">GPU(CUDA)</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
182 <td style="text-align: right;">12083</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
183 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
184 <tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
185 <td style="text-align: center;">GPU(CUDA Block Threads)</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
186 <td style="text-align: right;">5642</td>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
187 </tr>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
188 </tbody>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
189 </table>
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
190
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
191 ## いろいろtips
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
192 - cuda sample の ./deviceQuery で, GPU の性能を見れる
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
193
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
194 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
195 firefly@one$ /Developer/NVIDIA/CUDA-8.0/samples/1_Utilities/deviceQuery/deviceQuery
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
196 /Developer/NVIDIA/CUDA-8.0/samples/1_Utilities/deviceQuery/deviceQuery Starting...
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
197
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
198 CUDA Device Query (Runtime API) version (CUDART static linking)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
199
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
200 Detected 1 CUDA Capable device(s)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
201
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
202 Device 0: "Quadro K5000"
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
203 CUDA Driver Version / Runtime Version 8.0 / 8.0
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
204 CUDA Capability Major/Minor version number: 3.0
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
205 Total amount of global memory: 4096 MBytes (4294508544 bytes)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
206 ( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
207 GPU Max Clock rate: 706 MHz (0.71 GHz)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
208 Memory Clock rate: 2700 Mhz
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
209 Memory Bus Width: 256-bit
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
210 L2 Cache Size: 524288 bytes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
211 Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
212 Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
213 Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
214 Total amount of constant memory: 65536 bytes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
215 Total amount of shared memory per block: 49152 bytes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
216 Total number of registers available per block: 65536
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
217 Warp size: 32
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
218 Maximum number of threads per multiprocessor: 2048
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
219 Maximum number of threads per block: 1024
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
220 Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
221 Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
222 Maximum memory pitch: 2147483647 bytes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
223 Texture alignment: 512 bytes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
224 Concurrent copy and kernel execution: Yes with 2 copy engine(s)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
225 Run time limit on kernels: Yes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
226 Integrated GPU sharing Host Memory: No
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
227 Support host page-locked memory mapping: Yes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
228 Alignment requirement for Surfaces: Yes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
229 Device has ECC support: Disabled
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
230 Device supports Unified Addressing (UVA): Yes
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
231 Device PCI Domain ID / Bus ID / location ID: 0 / 5 / 0
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
232 Compute Mode:
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
233 < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
234
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
235 deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Quadro K5000
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
236 ```
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
237
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
238 ## 自分のpc に cuda を入れたときにちょっとハマった場所
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
239 - cuda version
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
240 - V8.0.61
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
241 - Apple clang version
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
242 - Apple LLVM version 8.1.0 (clang-802.0.42)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
243 - The version ('80100') of the host compiler ('Apple clang') is not supported と怒られる
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
244 - https://github.com/arrayfire/arrayfire/issues/1384#issuecomment-291471533 を見ると, 8.0.61 では xcode 8.3 の CLT は not support(8.1.0 は xocde8.3 用?になってる) なので 8.2用を入れることに
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
245 - 8.2 を入れるとApple LLVM version 8.0.0 (clang-800.0.42.1)になりました
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
246 - 複数入れられるかなぁ
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
247 - xcode 経由で入れると pathが変わるらしい(dmg 経由で入れると /Library/Developer/CommandLineTools/usr/bin)
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
248 - xcode 経由と言うより, xcode にくっついてる感じ preference -> location から変更できる
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
249 - ``sudo xcode-select -s path/to`` で切り替え
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
250 - めでたくcompileできた
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
251 - 最近 CUDA の Versionが 9 になっており, Apple LLVM 8.1.0 でも動くようになっていた
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
252 - http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html#system-requirements
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
253 - しかし, xcode の version も 9.0 にあがっており, Apple LLVM の version 9.0.0に上がっているため, 動くかどうか不明
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
254
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
255 ## 次は
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
256 - RBTree の deletion
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
257 - cuda.c の抽象化
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
258 - Perl script をどうにかする
|
Tatsuki IHA <innparusu@cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
259 - 並列処理の検証をどうするか
|