Mercurial > hg > Members > amothic > TRW
annotate Paper/paper.tex @ 4:03e644cc3366 draft
add section of cell
author | Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp> |
---|---|
date | Mon, 23 Jul 2012 06:51:08 +0900 |
parents | 4fc34730ac45 |
children | 17c01f69db69 |
rev | line source |
---|---|
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
1 \documentclass[twocolumn,twoside,9.5pt]{article} |
0 | 2 \usepackage[dvipdfmx]{graphicx} |
3 \usepackage{url} | |
4 \usepackage{picins} | |
5 \usepackage{fancyhdr} | |
6 \pagestyle{fancy} | |
7 \lhead{\parpic{\includegraphics[height=1zw,clip,keepaspectratio]{pic/emblem-bitmap.eps}}Technical Reading \& Writing} | |
8 \rhead{} | |
9 \cfoot{} | |
10 | |
11 \setlength{\topmargin}{-1in \addtolength{\topmargin}{15mm}} | |
12 \setlength{\headheight}{0mm} | |
13 \setlength{\headsep}{5mm} | |
14 \setlength{\oddsidemargin}{-1in \addtolength{\oddsidemargin}{15mm}} | |
15 \setlength{\evensidemargin}{-1in \addtolength{\evensidemargin}{15mm}} | |
16 \setlength{\textwidth}{181mm} | |
17 \setlength{\textheight}{261mm} | |
18 \setlength{\footskip}{0mm} | |
19 \pagestyle{empty} | |
20 | |
21 \begin{document} | |
22 \title{Implementation of Cerium Parallel Task Manager on Multi-core} | |
23 \author{128569G Daichi TOMA} | |
24 \date{} | |
25 \maketitle | |
26 \thispagestyle{fancy} | |
27 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
28 \section{Introduction} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
29 We have developed Cerium Task Manager\cite{gongo:2008a} that is a Game Framework on the PlayStation 3/Cell\cite{cell}. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
30 Cerium Task Manager new supporting parallel execution on Mac OS X and Linux. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
31 In this paper, we described implementation of existing Cerium Task Manager and a new parallel execution. |
0 | 32 |
33 \section{Cerium Task Manager}\label{section:cerium} | |
34 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
35 Cerium Task Manager is a game framework has been developed for the Cell, and include the Rendering Engine. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
36 In Cerium Task Manager, parallel processing is described as a task. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
37 The task usually consists of a function or subroutine. also the task is setted data inputs, data outputs and dependencies. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
38 Cerium Task Manager managed those tasks, and execute. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
39 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
40 Cerium Task Manager is available on PlayStaiton 3, Linux, Max OSX, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
41 furthermore run the same programs on each platform. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
42 Therefore, to write a programs that does not depend on the architecture is possible. |
0 | 43 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
44 Cerium Task Manager configure pipeline at various levels of the program, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
45 thus performance improvement. (Figure \ref{fig:scheduler}). |
0 | 46 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
47 The task is very simple because only calculate data outputs from data inputs; |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
48 nevertheless to switch to those data inputs and outputs as double buffering, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
49 To generate gradually so as to obtain concurrency is very complicate. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
50 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
51 Additionally, these data management, it is necessary to the operation that specializes in architecture using parallel execution.\cite{yutaka:2011b} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
52 Cerium Task Manager helps to do to such operation, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
53 therefore be able to concentrate on the implementation of parallel computation. |
0 | 54 |
55 \begin{figure}[h] | |
56 \begin{center} | |
57 \includegraphics[scale=0.4]{./pic/scheduler.pdf} | |
58 \end{center} | |
59 \caption{Scheduler} | |
60 \label{fig:scheduler} | |
61 \end{figure} | |
62 | |
4 | 63 \subsection{Cell Broadband Engine} |
64 Cell Broadband Engine is a microprocessor architecture jointly developed by Sony, Sony Computer Entertainment, Toshiba, and IBM. | |
65 The first major commercial application of Cell Broadband Engine was in Sony's PlayStation 3 game console. | |
66 The Cell processor can be split into four components: | |
67 external input and output strctures, the main processor called the Power Processing Element (PPE), | |
68 eight fully functional co-processors called the Synergistic Processing Elements or SPEs, | |
69 and a specialized high-bandwidth circular data bus connecting the PPE, input/output elements and the SPEs, | |
70 called the Element Interconnect Bus or EIB (Figure \ref{fig:cell_arch}). | |
71 | |
72 | |
73 \begin{figure}[htb] | |
74 \begin{center} | |
75 \includegraphics[scale=0.4]{./pic/cell-main.pdf} | |
76 \end{center} | |
77 \caption{Cell Broadband Engine Architecture} | |
78 \label{fig:cell_arch} | |
79 \end{figure} | |
80 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
81 % Cell の説明いれる |
0 | 82 |
83 % \subsection{Mailbox} | |
84 % Mailbox は, Cell の機能の1つである. | |
85 % Mailbox は, PPE と SPE の間を双方向で, 32 bit メッセージの受け渡しが可能であり, | |
86 % FIFO キュー構造になっている. | |
87 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
88 \section{mechanism of parallel execution on multi-core}\label{section:impl} |
0 | 89 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
90 If on a PlayStation 3, Task is assigned to each SPE, then to be executed in parallel. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
91 Cerium Task Manager possible to be executed in parallel on Mac OSX and Linux anew. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
92 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
93 We implement a synchronized queue on Mac OS X and Linux. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
94 The synchronized queue corresponds to the Mailbox on Playstation 3. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
95 For only one thread use the synchronized queue, that was managed by a binary semaphore. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
96 Each threads has two synchronized queues for input and output, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
97 be able to execute in parallel tasks was received under managment thread. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
98 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
99 Furthermore, because multicore available the same memory space in comparison with Playstation 3, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
100 we modified to pass the pointer a spots that were using the transfer DMA, aimed to improve the speed. |
0 | 101 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
102 \section{Benchmark} |
0 | 103 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
104 Performance was measured using the example of Word Count, Sort and Prime Counter. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
105 Word Count is to count number of words in the 100MBtext file. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
106 Sort is to sort in one hundred thousand pieces of numeric. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
107 Prime Counter is to enumerate all the prime numbers in the range of up to one million. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
108 for comparsion performance was measured using the same example in PlayStation 3. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
109 Both the optimization level is at the maximum. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
110 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
111 The results are shown in Table \ref{table:benchmark}. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
112 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
113 {\bf Experiment environment} |
0 | 114 |
115 CentOS/Xeon | |
116 \begin{small} | |
117 \begin{itemize}\small | |
118 \item OS : CentOS 6.0 | |
119 \item CPU : Intel\textregistered Xeon\textregistered X5650 @2.67GHz * 2 | |
120 \item Memory : 128GB | |
121 \item Compiler : GCC 4.4.4 | |
122 \end{itemize} | |
123 \end{small} | |
124 | |
125 PlayStation 3/Cell | |
126 \begin{small} | |
127 \begin{itemize}\small | |
128 \item OS : Yellow Dog Linux 6.1 | |
129 \item CPU : Cell Broadband Engine @ 3.2GHz | |
130 \item Memory : 256MB | |
131 \item Compiler : GCC 4.1.2 | |
132 \end{itemize} | |
133 \end{small} | |
134 | |
135 | |
136 \begin{tiny} | |
137 \begin{table}[h] | |
138 \caption{Benchmark} | |
139 \label{table:benchmark} | |
140 \small | |
141 \begin{tabular}[t]{c||r|r|r} | |
142 \hline | |
143 & Word Count & Sort & Prime Counter\\ | |
144 \hline\hline | |
145 1 CPU (Cell)& 2381 ms & 6244 ms & 2081 ms \\ | |
146 \hline | |
147 6 CPU (Cell)& 1268 ms & 1111 ms & 604 ms\\ | |
148 \hline | |
149 1 CPU (Xeon)& 354 ms & 846 ms & 266 ms\\ | |
150 \hline | |
151 6 CPU (Xeon)& 70 ms & 163 ms & 50 ms\\ | |
152 \hline | |
153 12 CPU (Xeon)& 48 ms & 127 ms & 36 ms\\ | |
154 \hline | |
155 24 CPU (Xeon)& 40 ms & 100 ms & 31 ms\\ | |
156 \hline | |
157 \end{tabular} | |
158 \end{table} | |
159 \end{tiny} | |
160 | |
161 % Word Count 354 / 70 = 5.0571 | |
162 % Sort 846 / 163 = 5.1901 | |
163 % Prime Counter 266 / 50 = 5.32 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
164 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
165 We use 6 CPU on CentOS, as compared with the case using 1 CPU, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
166 about 5.1 times the speed improvement in the example of WordCount, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
167 about 5.2 times the speed improvement in the example of Sort, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
168 about 5.3 times the speed improvement in the example of Prime Counter. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
169 If we use 24 CPU, the speed is rising as compared with the case using 12 CPU, however, the speed improvement rate is down. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
170 This is probably concurrency is low, and that seems to be grinding to a halt speed improvement from Amdahl's law\cite{amdahl}. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
171 Improvement of parallelization rate is a challenge for the future. |
0 | 172 |
173 % また, 図\ref{fig:multi_result}より, 台数効果が確認できる. | |
174 | |
3
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
175 \section{Conclusions} |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
176 In this paper, we describe a new mechanism of parallel execution and implementation of existing Cerium Task Manager. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
177 By using a new implementation mechanism of parallel execution, You can correspond to a multi-core processor environment on Mac OSX and Linux. |
0 | 178 |
3
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
179 To improve the rate of speed as future work when the number of processors has increased. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
180 In addition, Cerium Task Manager has many type of task, is a drawback of such description. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
181 This can be solved by the system description the dependency of the task rather than on the user side. |
0 | 182 |
4 | 183 \nocite{cell_abi, opencl, clay200912} |
0 | 184 % \nocite{yutaka:2010a, cell_abi, cell_cpp, cell_sdk, libspe2, ydl, clay200912, fix200609} |
185 \bibliographystyle{junsrt} | |
186 \bibliography{cerium.bib,book.bib} | |
187 | |
188 \end{document} |