annotate paper/data_parallel.tex @ 0:9e88a388ec83

first commit
author Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
date Tue, 05 Nov 2013 23:18:04 +0900
parents
children f4b3de446113
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 \section{Ceriumにおけるデータ並列}\label{data_parallel}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 OpenCLで充分な並列度を得るには、データ並列による実行をサポートした方が良い。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 CeriumでOpenCLのデータ並列を使うために、iterateというAPIを用意した。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
4
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 ベンチマークをとるために、まずはCPU(many core)上でデータ並列の機構を実装した。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 OpenCLでデータ並列を行う際は、NDRangeの引数でワークアイテムのサイズを設定し、以下のようにkernelを書けばよい。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
7
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 \begin{Verbatim}[fontsize=\footnotesize,xleftmargin=1cm]
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
9 __kernel void
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
10 multi(__global const float *i_data1,
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 __global const float *i_data2,
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12 __global float *o_data)
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 {
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 int i = get_global_id(0);
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 o_data[i] = i_data1[i]*i_data2[i];
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 }
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
17
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
18 \end{Verbatim}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 kernelを複数生成し、各kernelは自分が担当するindexをget\_global\_id APIで取得し、
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20 その部分だけ計算を行う。CPUで実行する場合もGPU実行時のkernelとなるべく近い形式で記述できるようにする。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
21
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22 \subsection{データ並列実行の機構}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 データ並列で実行する場合はspawn APIではなく、iterate APIでTaskを生成すればよい。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24 Scheduler内で引数分のTaskを生成し、それぞれに自分が担当するindexをパラメタとして設定していく。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
25 iterateにはlengthを引数として渡し、lengthの値と渡したlengthの個数で次元数や
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
26 ワークアイテムのサイズをSchedulerが計算する。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
27 CPU実行時のkernelは以下のように記述する。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
28
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
29 \begin{Verbatim}[fontsize=\footnotesize,xleftmargin=1cm]
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
30 static int // kernel
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
31 run(SchedTask *s,void *rbuf, void *wbuf)
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
32 {
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
33 float *indata1,*indata2,*outdata;
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
34
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
35 indata1 = (float*)s->get_input(rbuf, 0);
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
36 indata2 = (float*)s->get_input(rbuf, 1);
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
37 outdata = (float*)s->get_output(wbuf, 0);
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
38
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
39 long i = (long)s->get_param(0);
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 outdata[i]=indata1[i]*indata2[i];
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
41 return 0;
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
42 }
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
43 \end{Verbatim}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
44
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
45 \subsection{データ並列におけるindex割り当ての実装}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
46 Taskを生成するとき、dimensionとワークアイテムのサイズをもとに各Taskが担当するindexを計算し、set\_paramする。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
47 kernelはget\_paramでそのindexを取得してデータ並列で実行する。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
48 get\_param APIがOpenCLのget\_global\_id APIに相当する。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
49
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
50 例として、cpu数4、一次元で10個のdataにたいしてデータ並列実行を行った場合、
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
51 各CPUが担当するindexは表:\ref{table:data_parallel_index}のようになる。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
52
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
53 この例だと各CPUに対するindexの割り当ては、
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
54 CPU0はindex0、4、8、
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
55 CPU1はindex1、5、9、
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
56 CPU2はindex2、6、
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
57 CPU3はindex3、7となっている。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
58
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
59 \begin{tiny}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
60 \begin{table}[h]
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
61 \begin{center}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
62 \caption{data並列実行時のindexの割り当て}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
63 \label{table:data_parallel_index}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
64 \small
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
65 \begin{tabular}[t]{c||c|c|c|c}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
66 \hline
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
67 stage&CPU0& CPU1&CPU2&CPU3 \\
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
68 \hline
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
69 1&0&1&2&3 \\
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
70 \hline
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
71 2&4&5&6&7 \\
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
72 \hline
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
73 3&8&9& & \\
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
74 \hline
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
75 \end{tabular}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
76 \end{center}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
77 \end{table}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
78
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
79 \end{tiny}
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
80 この実装により、Ceriumでデータ並列の実行が可能になった。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
81 並列プログラミングだと、並列化するTaskが全部同一であるという事は少なくない。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
82 その際、Taskを生成する部分をループで回すことなく、簡単なsyntaxで記述できる。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
83
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
84 データ並列で実行する場合は、inputとoutputを各Taskで共有するため、少ないコピーですむ。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
85 CPUならメモリ領域がTaskとmanagerで同じなので、dataのコピーで大きいオーバーヘッドにはならない。
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
86 しかしCellとGPUはメモリ領域が異なるため、dataコピーのオーバーヘッドが大きく、
9e88a388ec83 first commit
Shohei KOKUBO <e105744@ie.u-ryukyu.ac.jp>
parents:
diff changeset
87 データ並列による高速化が見込める。