Mercurial > hg > Members > amothic > TRW
annotate Paper/paper.tex @ 3:4fc34730ac45 draft
add section of conclusions
author | Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp> |
---|---|
date | Mon, 23 Jul 2012 00:23:02 +0900 |
parents | 7efb3ef94295 |
children | 03e644cc3366 |
rev | line source |
---|---|
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
1 \documentclass[twocolumn,twoside,9.5pt]{article} |
0 | 2 \usepackage[dvipdfmx]{graphicx} |
3 \usepackage{url} | |
4 \usepackage{picins} | |
5 \usepackage{fancyhdr} | |
6 \pagestyle{fancy} | |
7 \lhead{\parpic{\includegraphics[height=1zw,clip,keepaspectratio]{pic/emblem-bitmap.eps}}Technical Reading \& Writing} | |
8 \rhead{} | |
9 \cfoot{} | |
10 | |
11 \setlength{\topmargin}{-1in \addtolength{\topmargin}{15mm}} | |
12 \setlength{\headheight}{0mm} | |
13 \setlength{\headsep}{5mm} | |
14 \setlength{\oddsidemargin}{-1in \addtolength{\oddsidemargin}{15mm}} | |
15 \setlength{\evensidemargin}{-1in \addtolength{\evensidemargin}{15mm}} | |
16 \setlength{\textwidth}{181mm} | |
17 \setlength{\textheight}{261mm} | |
18 \setlength{\footskip}{0mm} | |
19 \pagestyle{empty} | |
20 | |
21 \begin{document} | |
22 \title{Implementation of Cerium Parallel Task Manager on Multi-core} | |
23 \author{128569G Daichi TOMA} | |
24 \date{} | |
25 \maketitle | |
26 \thispagestyle{fancy} | |
27 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
28 \section{Introduction} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
29 We have developed Cerium Task Manager\cite{gongo:2008a} that is a Game Framework on the PlayStation 3/Cell\cite{cell}. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
30 Cerium Task Manager new supporting parallel execution on Mac OS X and Linux. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
31 In this paper, we described implementation of existing Cerium Task Manager and a new parallel execution. |
0 | 32 |
33 \section{Cerium Task Manager}\label{section:cerium} | |
34 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
35 Cerium Task Manager is a game framework has been developed for the Cell, and include the Rendering Engine. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
36 In Cerium Task Manager, parallel processing is described as a task. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
37 The task usually consists of a function or subroutine. also the task is setted data inputs, data outputs and dependencies. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
38 Cerium Task Manager managed those tasks, and execute. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
39 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
40 Cerium Task Manager is available on PlayStaiton 3, Linux, Max OSX, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
41 furthermore run the same programs on each platform. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
42 Therefore, to write a programs that does not depend on the architecture is possible. |
0 | 43 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
44 Cerium Task Manager configure pipeline at various levels of the program, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
45 thus performance improvement. (Figure \ref{fig:scheduler}). |
0 | 46 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
47 The task is very simple because only calculate data outputs from data inputs; |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
48 nevertheless to switch to those data inputs and outputs as double buffering, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
49 To generate gradually so as to obtain concurrency is very complicate. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
50 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
51 Additionally, these data management, it is necessary to the operation that specializes in architecture using parallel execution.\cite{yutaka:2011b} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
52 Cerium Task Manager helps to do to such operation, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
53 therefore be able to concentrate on the implementation of parallel computation. |
0 | 54 |
55 \begin{figure}[h] | |
56 \begin{center} | |
57 \includegraphics[scale=0.4]{./pic/scheduler.pdf} | |
58 \end{center} | |
59 \caption{Scheduler} | |
60 \label{fig:scheduler} | |
61 \end{figure} | |
62 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
63 % Cell の説明いれる |
0 | 64 |
65 % \subsection{Mailbox} | |
66 % Mailbox は, Cell の機能の1つである. | |
67 % Mailbox は, PPE と SPE の間を双方向で, 32 bit メッセージの受け渡しが可能であり, | |
68 % FIFO キュー構造になっている. | |
69 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
70 \section{mechanism of parallel execution on multi-core}\label{section:impl} |
0 | 71 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
72 If on a PlayStation 3, Task is assigned to each SPE, then to be executed in parallel. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
73 Cerium Task Manager possible to be executed in parallel on Mac OSX and Linux anew. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
74 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
75 We implement a synchronized queue on Mac OS X and Linux. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
76 The synchronized queue corresponds to the Mailbox on Playstation 3. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
77 For only one thread use the synchronized queue, that was managed by a binary semaphore. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
78 Each threads has two synchronized queues for input and output, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
79 be able to execute in parallel tasks was received under managment thread. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
80 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
81 Furthermore, because multicore available the same memory space in comparison with Playstation 3, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
82 we modified to pass the pointer a spots that were using the transfer DMA, aimed to improve the speed. |
0 | 83 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
84 \section{Benchmark} |
0 | 85 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
86 Performance was measured using the example of Word Count, Sort and Prime Counter. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
87 Word Count is to count number of words in the 100MBtext file. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
88 Sort is to sort in one hundred thousand pieces of numeric. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
89 Prime Counter is to enumerate all the prime numbers in the range of up to one million. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
90 for comparsion performance was measured using the same example in PlayStation 3. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
91 Both the optimization level is at the maximum. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
92 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
93 The results are shown in Table \ref{table:benchmark}. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
94 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
95 {\bf Experiment environment} |
0 | 96 |
97 CentOS/Xeon | |
98 \begin{small} | |
99 \begin{itemize}\small | |
100 \item OS : CentOS 6.0 | |
101 \item CPU : Intel\textregistered Xeon\textregistered X5650 @2.67GHz * 2 | |
102 \item Memory : 128GB | |
103 \item Compiler : GCC 4.4.4 | |
104 \end{itemize} | |
105 \end{small} | |
106 | |
107 PlayStation 3/Cell | |
108 \begin{small} | |
109 \begin{itemize}\small | |
110 \item OS : Yellow Dog Linux 6.1 | |
111 \item CPU : Cell Broadband Engine @ 3.2GHz | |
112 \item Memory : 256MB | |
113 \item Compiler : GCC 4.1.2 | |
114 \end{itemize} | |
115 \end{small} | |
116 | |
117 | |
118 \begin{tiny} | |
119 \begin{table}[h] | |
120 \caption{Benchmark} | |
121 \label{table:benchmark} | |
122 \small | |
123 \begin{tabular}[t]{c||r|r|r} | |
124 \hline | |
125 & Word Count & Sort & Prime Counter\\ | |
126 \hline\hline | |
127 1 CPU (Cell)& 2381 ms & 6244 ms & 2081 ms \\ | |
128 \hline | |
129 6 CPU (Cell)& 1268 ms & 1111 ms & 604 ms\\ | |
130 \hline | |
131 1 CPU (Xeon)& 354 ms & 846 ms & 266 ms\\ | |
132 \hline | |
133 6 CPU (Xeon)& 70 ms & 163 ms & 50 ms\\ | |
134 \hline | |
135 12 CPU (Xeon)& 48 ms & 127 ms & 36 ms\\ | |
136 \hline | |
137 24 CPU (Xeon)& 40 ms & 100 ms & 31 ms\\ | |
138 \hline | |
139 \end{tabular} | |
140 \end{table} | |
141 \end{tiny} | |
142 | |
143 % Word Count 354 / 70 = 5.0571 | |
144 % Sort 846 / 163 = 5.1901 | |
145 % Prime Counter 266 / 50 = 5.32 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
146 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
147 We use 6 CPU on CentOS, as compared with the case using 1 CPU, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
148 about 5.1 times the speed improvement in the example of WordCount, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
149 about 5.2 times the speed improvement in the example of Sort, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
150 about 5.3 times the speed improvement in the example of Prime Counter. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
151 If we use 24 CPU, the speed is rising as compared with the case using 12 CPU, however, the speed improvement rate is down. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
152 This is probably concurrency is low, and that seems to be grinding to a halt speed improvement from Amdahl's law\cite{amdahl}. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
153 Improvement of parallelization rate is a challenge for the future. |
0 | 154 |
155 % また, 図\ref{fig:multi_result}より, 台数効果が確認できる. | |
156 | |
3
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
157 \section{Conclusions} |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
158 In this paper, we describe a new mechanism of parallel execution and implementation of existing Cerium Task Manager. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
159 By using a new implementation mechanism of parallel execution, You can correspond to a multi-core processor environment on Mac OSX and Linux. |
0 | 160 |
3
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
161 To improve the rate of speed as future work when the number of processors has increased. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
162 In addition, Cerium Task Manager has many type of task, is a drawback of such description. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
163 This can be solved by the system description the dependency of the task rather than on the user side. |
0 | 164 |
3
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
165 \nocite{cell_abi, opencl} |
0 | 166 % \nocite{yutaka:2010a, cell_abi, cell_cpp, cell_sdk, libspe2, ydl, clay200912, fix200609} |
167 \bibliographystyle{junsrt} | |
168 \bibliography{cerium.bib,book.bib} | |
169 | |
170 \end{document} |