Mercurial > hg > Members > amothic > TRW
annotate Paper/paper.tex @ 1:fa9cfac50776 draft
add section for Cerium Task Manager
author | Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp> |
---|---|
date | Sun, 22 Jul 2012 14:24:53 +0900 |
parents | c0689037215f |
children | 7efb3ef94295 |
rev | line source |
---|---|
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
1 \documentclass[twocolumn,twoside,9.5pt]{article} |
0 | 2 \usepackage[dvipdfmx]{graphicx} |
3 \usepackage{url} | |
4 \usepackage{picins} | |
5 \usepackage{fancyhdr} | |
6 \pagestyle{fancy} | |
7 \lhead{\parpic{\includegraphics[height=1zw,clip,keepaspectratio]{pic/emblem-bitmap.eps}}Technical Reading \& Writing} | |
8 \rhead{} | |
9 \cfoot{} | |
10 | |
11 \setlength{\topmargin}{-1in \addtolength{\topmargin}{15mm}} | |
12 \setlength{\headheight}{0mm} | |
13 \setlength{\headsep}{5mm} | |
14 \setlength{\oddsidemargin}{-1in \addtolength{\oddsidemargin}{15mm}} | |
15 \setlength{\evensidemargin}{-1in \addtolength{\evensidemargin}{15mm}} | |
16 \setlength{\textwidth}{181mm} | |
17 \setlength{\textheight}{261mm} | |
18 \setlength{\footskip}{0mm} | |
19 \pagestyle{empty} | |
20 | |
21 \begin{document} | |
22 \title{Implementation of Cerium Parallel Task Manager on Multi-core} | |
23 \author{128569G Daichi TOMA} | |
24 \date{} | |
25 \maketitle | |
26 \thispagestyle{fancy} | |
27 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
28 \section{Introduction} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
29 We have developed Cerium Task Manager\cite{gongo:2008a} that is a Game Framework on the PlayStation 3/Cell\cite{cell}. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
30 Cerium Task Manager new supporting parallel execution on Mac OS X and Linux. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
31 In this paper, we described implementation of existing Cerium Task Manager and a new parallel execution. |
0 | 32 |
33 \section{Cerium Task Manager}\label{section:cerium} | |
34 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
35 Cerium Task Manager is a game framework has been developed for the Cell, and include the Rendering Engine. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
36 In Cerium Task Manager, parallel processing is described as a task. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
37 The task usually consists of a function or subroutine. also the task is setted data inputs, data outputs and dependencies. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
38 Cerium Task Manager managed those tasks, and execute. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
39 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
40 Cerium Task Manager is available on PlayStaiton 3, Linux, Max OSX, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
41 furthermore run the same programs on each platform. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
42 Therefore, to write a programs that does not depend on the architecture is possible. |
0 | 43 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
44 Cerium Task Manager configure pipeline at various levels of the program, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
45 thus performance improvement. (Figure \ref{fig:scheduler}). |
0 | 46 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
47 The task is very simple because only calculate data outputs from data inputs; |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
48 nevertheless to switch to those data inputs and outputs as double buffering, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
49 To generate gradually so as to obtain concurrency is very complicate. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
50 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
51 Additionally, these data management, it is necessary to the operation that specializes in architecture using parallel execution.\cite{yutaka:2011b} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
52 Cerium Task Manager helps to do to such operation, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
53 therefore be able to concentrate on the implementation of parallel computation. |
0 | 54 |
55 \begin{figure}[h] | |
56 \begin{center} | |
57 \includegraphics[scale=0.4]{./pic/scheduler.pdf} | |
58 \end{center} | |
59 \caption{Scheduler} | |
60 \label{fig:scheduler} | |
61 \end{figure} | |
62 | |
63 \section{マルチコア上での並列実行の機構}\label{section:impl} | |
64 | |
65 PlayStation 3/Cell 上の場合, 各 SPE に Task が割り当てられ, 並列に実行される. | |
66 | |
67 今回新たに, Mac OS X, Linux 上でも並列に実行させることを可能にした. | |
68 これは, PlayStation 3/Cell の Mailbox に対応させる形で, | |
69 Synchronized Queue を用いて Mac OS X, Linux 側の Cerium Task Manager へ移植したものである. | |
70 操作しているスレッドが常に1つになるよう, バイナリセマフォで管理されている. | |
71 各スレッドは, input 用と output 用として Synchronized Queue を2つ持っており, | |
72 管理スレッドからタスクを受けて並列に実行するようになっている. | |
73 | |
74 また, PlayStation 3/Cell と違い各 CPU で同じメモリ空間が利用できるため, | |
75 DMA転送を用いていた箇所をポインタ渡しをするように修正し, 速度の向上を図った. | |
76 | |
77 % \subsection{Mailbox} | |
78 % Mailbox は, Cell の機能の1つである. | |
79 % Mailbox は, PPE と SPE の間を双方向で, 32 bit メッセージの受け渡しが可能であり, | |
80 % FIFO キュー構造になっている. | |
81 | |
82 | |
83 \section{ベンチマーク} | |
84 Word Count, Sort 及び Prime Counter の例題を用いて, 計測した. | |
85 それぞれ入力として, 100MB のテキストファイルの単語数カウント, 10 万入力のソート, 100 万までの範囲の素数を全て数え上げるようになっている. | |
86 比較対象として, PlayStation 3/Cell においても同様の例題を用いて計測している. | |
87 どちらも, 最適化レベルは最大にしてある. | |
88 | |
89 表\ref{table:benchmark}に結果を示す. | |
90 | |
91 {\bf 実験環境} | |
92 | |
93 CentOS/Xeon | |
94 \begin{small} | |
95 \begin{itemize}\small | |
96 \item OS : CentOS 6.0 | |
97 \item CPU : Intel\textregistered Xeon\textregistered X5650 @2.67GHz * 2 | |
98 \item Memory : 128GB | |
99 \item Compiler : GCC 4.4.4 | |
100 \end{itemize} | |
101 \end{small} | |
102 | |
103 PlayStation 3/Cell | |
104 \begin{small} | |
105 \begin{itemize}\small | |
106 \item OS : Yellow Dog Linux 6.1 | |
107 \item CPU : Cell Broadband Engine @ 3.2GHz | |
108 \item Memory : 256MB | |
109 \item Compiler : GCC 4.1.2 | |
110 \end{itemize} | |
111 \end{small} | |
112 | |
113 | |
114 \begin{tiny} | |
115 \begin{table}[h] | |
116 \caption{Benchmark} | |
117 \label{table:benchmark} | |
118 \small | |
119 \begin{tabular}[t]{c||r|r|r} | |
120 \hline | |
121 & Word Count & Sort & Prime Counter\\ | |
122 \hline\hline | |
123 1 CPU (Cell)& 2381 ms & 6244 ms & 2081 ms \\ | |
124 \hline | |
125 6 CPU (Cell)& 1268 ms & 1111 ms & 604 ms\\ | |
126 \hline | |
127 1 CPU (Xeon)& 354 ms & 846 ms & 266 ms\\ | |
128 \hline | |
129 6 CPU (Xeon)& 70 ms & 163 ms & 50 ms\\ | |
130 \hline | |
131 12 CPU (Xeon)& 48 ms & 127 ms & 36 ms\\ | |
132 \hline | |
133 24 CPU (Xeon)& 40 ms & 100 ms & 31 ms\\ | |
134 \hline | |
135 \end{tabular} | |
136 \end{table} | |
137 \end{tiny} | |
138 | |
139 % Word Count 354 / 70 = 5.0571 | |
140 % Sort 846 / 163 = 5.1901 | |
141 % Prime Counter 266 / 50 = 5.32 | |
142 表\ref{table:benchmark}より, CentOS上で 6 CPU を利用した場合, 1 CPU を利用した場合と比較して | |
143 Word Count の例題で約 5.1 倍, Sort の例題で約 5.2 倍, Prime Counter の例題で, 約 5.3倍の速度向上が見られる. | |
144 しかしながら, 24 CPU を利用した場合, 12 CPU を利用した場合と比較して速度は上がっているものの速度向上率が落ちている. | |
145 これは並列化率が低いために性能を活かすことができず, 速度向上が頭打ちになっているとアムダールの法則\cite{amdahl}から考えられる. | |
146 並列化率の向上は今後の課題である. | |
147 | |
148 % また, 図\ref{fig:multi_result}より, 台数効果が確認できる. | |
149 | |
150 \section{まとめ} | |
151 本稿では, 既存の Cerium Task Manager の実装と新しい並列実行の機構について説明した. | |
152 新しく実装した並列実行の機構を用いることによって, Mac OS X, Linux 上でのマルチプロセッサ環境に対応できる. | |
153 | |
154 今後の課題として, 並列化率を向上させ, プロセッサ数が増えた時の速度向上率を改善する. | |
155 また, 現在の Cerium Task Manager は Task の種類が増え, Open CL\cite{opencl} に比べても記述が煩雑であるなどの欠点がある. | |
156 これは Task の依存関係を, ユーザ側ではなくシステム側が記述するようにすることで解決できると考える. | |
157 | |
158 \nocite{cell_abi} | |
159 % \nocite{yutaka:2010a, cell_abi, cell_cpp, cell_sdk, libspe2, ydl, clay200912, fix200609} | |
160 \bibliographystyle{junsrt} | |
161 \bibliography{cerium.bib,book.bib} | |
162 | |
163 \end{document} |