annotate paper/benchmark.tex @ 6:1702e6278518

add English Abstract
author Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
date Wed, 06 May 2015 17:12:15 +0900
parents 0127effb8fcd
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 \section{Benchmark}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 本章では、WordCount, FFT を例題として用い、本研究で実装した GpuScheduler および CudaScheduler の測定を行う。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
4 測定環境
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 \begin{itemize}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 \item OS : MacOS 10.9.2
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
7 \item CPU : 2*2.66GHz 6-Core Intel Xeon
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 \item GPU : NVIDIA Quadro K5000 4096MB
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
9 \item Memory : 16GB 1333MHz DDR3
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
10 \item Compiler : Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 \end{itemize}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 \section{WordCount}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 今回は 100MB のテキストファイルに対して WordCount を行なった。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 表:\ref{table:wordcount}は実行結果である。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
17 \begin{table}[!h]
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
18 \begin{center}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 \small
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20 \begin{tabular}[t]{c||r} \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
21 & Run Time \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22 1 CPU & 0.73s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 2 CPU & 0.38s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24 4 CPU & 0.21s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
25 8 CPU & 0.12s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
26 OpenCL(no pipeline) & 48.32s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
27 OpenCL(pipeline) & 46.74s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
28 OpenCL Data Parallel & 0.38s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
29 CUDA(no pipeline) & 55.71s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
30 CUDA(pipeline) & 10.26s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
31 CUDA Data Parallel & 0.71s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
32 \end{tabular}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
33 \caption{WordCount}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
34 \label{table:wordcount}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
35 \end{center}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
36 \end{table}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
37
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
38 パイプライン処理を行うことで CUDA では5.4倍の性能向上が見られた。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
39 しかし、OpenCL ではパイプライン処理による性能向上が見られなかった。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 OpenCL と CUDA を用いたそれぞれの Scheduler はほぼ同等な実装である。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
41 OpenCL でパイプライン処理を行うために実行機構を見直す必要がある。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
42 一方で、データ並列による実行は 1CPU に対して OpenCL では1.9倍、CUDA では1.02倍という結果になった。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
43 どちらもタスク並列による実行よりは優れた結果になっている。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
44 CUDA によるデータ並列実行の機構を見直す必要がある。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
45
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
46 \subsection{FFT}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
47 次に、フーリエ変換と周波数フィルタによる画像処理を行う例題を利用し測定を行う。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
48 使用する画像のサイズは512*512で、画像に対して High Pass Filter をかけて変換を行う。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
49 表:\ref{table:fft}は実行結果である。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
50
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
51 \begin{table}[!h]
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
52 \begin{center}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
53 \small
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
54 \begin{tabular}[t]{c||r} \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
55 & Run Time \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
56 1 CPU & 0.48s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
57 2 CPU & 0.26s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
58 4 CPU & 0.17s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
59 8 CPU & 0.11s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
60 OpenCL & 0.09s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
61 CUDA & 0.21s \\ \hline
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
62 \end{tabular}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
63 \label{table:fft}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
64 \caption{FFT}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
65 \end{center}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
66 \end{table}
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
67
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
68 1CPU に対して OpenCL ではの5.3倍、CUDA では2.2倍の性能向上が見られた。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
69 しかし、WordCount の場合と同様に OpenCL と CUDA で差がある。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
70 WordCount と FFT の結果から CudaScheduler によるデータ並列実行機構を見直す必要がある。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
71 また、FFT の OpenCL の kernel は cl\_float2 というベクター型を用いている。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
72 CUDA では cl\_float2 を float に変換して演算している。
0127effb8fcd first commit
Nozomi Teruya <e125769@ie.u-ryukyu.ac.jp>
parents:
diff changeset
73 OpenCL ではベクターの演算なので、その部分に最適化がかかっており結果が良くなっている可能性がある。