Mercurial > hg > Members > amothic > TRW
annotate Paper/paper.tex @ 5:17c01f69db69 draft default tip
finish
author | Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp> |
---|---|
date | Mon, 23 Jul 2012 11:58:20 +0900 |
parents | 03e644cc3366 |
children |
rev | line source |
---|---|
5 | 1 \documentclass[twocolumn,twoside,11pt]{article} |
0 | 2 \usepackage[dvipdfmx]{graphicx} |
3 \usepackage{url} | |
4 \usepackage{picins} | |
5 \usepackage{fancyhdr} | |
6 \pagestyle{fancy} | |
7 \lhead{\parpic{\includegraphics[height=1zw,clip,keepaspectratio]{pic/emblem-bitmap.eps}}Technical Reading \& Writing} | |
8 \rhead{} | |
9 \cfoot{} | |
10 | |
5 | 11 \setlength{\topmargin}{-1in \addtolength{\topmargin}{20mm}} |
0 | 12 \setlength{\headheight}{0mm} |
13 \setlength{\headsep}{5mm} | |
5 | 14 \setlength{\oddsidemargin}{-1in \addtolength{\oddsidemargin}{20mm}} |
15 \setlength{\evensidemargin}{-1in \addtolength{\evensidemargin}{20mm}} | |
16 \setlength{\textwidth}{171mm} | |
17 \setlength{\textheight}{256mm} | |
0 | 18 \setlength{\footskip}{0mm} |
19 \pagestyle{empty} | |
20 | |
21 \begin{document} | |
22 \title{Implementation of Cerium Parallel Task Manager on Multi-core} | |
23 \author{128569G Daichi TOMA} | |
24 \date{} | |
25 \maketitle | |
26 \thispagestyle{fancy} | |
27 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
28 \section{Introduction} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
29 We have developed Cerium Task Manager\cite{gongo:2008a} that is a Game Framework on the PlayStation 3/Cell\cite{cell}. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
30 Cerium Task Manager new supporting parallel execution on Mac OS X and Linux. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
31 In this paper, we described implementation of existing Cerium Task Manager and a new parallel execution. |
0 | 32 |
33 \section{Cerium Task Manager}\label{section:cerium} | |
34 | |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
35 Cerium Task Manager is a game framework has been developed for the Cell, and include the Rendering Engine. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
36 In Cerium Task Manager, parallel processing is described as a task. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
37 The task usually consists of a function or subroutine. also the task is setted data inputs, data outputs and dependencies. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
38 Cerium Task Manager managed those tasks, and execute. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
39 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
40 Cerium Task Manager is available on PlayStaiton 3, Linux, Max OSX, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
41 furthermore run the same programs on each platform. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
42 Therefore, to write a programs that does not depend on the architecture is possible. |
0 | 43 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
44 Cerium Task Manager configure pipeline at various levels of the program, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
45 thus performance improvement. (Figure \ref{fig:scheduler}). |
0 | 46 |
1
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
47 The task is very simple because only calculate data outputs from data inputs; |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
48 nevertheless to switch to those data inputs and outputs as double buffering, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
49 To generate gradually so as to obtain concurrency is very complicate. |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
50 |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
51 Additionally, these data management, it is necessary to the operation that specializes in architecture using parallel execution.\cite{yutaka:2011b} |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
52 Cerium Task Manager helps to do to such operation, |
fa9cfac50776
add section for Cerium Task Manager
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
0
diff
changeset
|
53 therefore be able to concentrate on the implementation of parallel computation. |
0 | 54 |
55 \begin{figure}[h] | |
56 \begin{center} | |
57 \includegraphics[scale=0.4]{./pic/scheduler.pdf} | |
58 \end{center} | |
59 \caption{Scheduler} | |
60 \label{fig:scheduler} | |
61 \end{figure} | |
62 | |
4 | 63 \subsection{Cell Broadband Engine} |
64 Cell Broadband Engine is a microprocessor architecture jointly developed by Sony, Sony Computer Entertainment, Toshiba, and IBM. | |
65 The first major commercial application of Cell Broadband Engine was in Sony's PlayStation 3 game console. | |
66 The Cell processor can be split into four components: | |
67 external input and output strctures, the main processor called the Power Processing Element (PPE), | |
68 eight fully functional co-processors called the Synergistic Processing Elements or SPEs, | |
69 and a specialized high-bandwidth circular data bus connecting the PPE, input/output elements and the SPEs, | |
70 called the Element Interconnect Bus or EIB (Figure \ref{fig:cell_arch}). | |
71 | |
72 | |
73 \begin{figure}[htb] | |
74 \begin{center} | |
75 \includegraphics[scale=0.4]{./pic/cell-main.pdf} | |
76 \end{center} | |
77 \caption{Cell Broadband Engine Architecture} | |
78 \label{fig:cell_arch} | |
79 \end{figure} | |
80 | |
5 | 81 The Cell processor marries the SPEs and the PPE via EIB to give access, |
82 via fully cache coherent DMA (direct memory access), to both main memory and to other external data storage. | |
83 To make the best of EIB, and to overlap computation and data transfer, | |
84 each of the nine processing elements (PPE and SPEs) is equipped with a DMA engine. | |
85 Since the SPE's load/store instructions can only access its own local memory, | |
86 each SPE entirely depends on DMAs to transfer data to and from the main memory and other SPEs' local memories. | |
87 A DMA operation can transfer either a single block area of size up to 16KB, or a list of 2 to 2048 such blocks. | |
88 One of the major design decisions in the architecture of Cell is the use of DMAs as a central means of intra-chip data transfer, | |
89 with a view to enabling maximal asynchrony and concurrency in data processing inside a chip\cite{2006:CMC}. | |
90 | |
91 The PPE, which is capable of running a conventional operating system, | |
92 has control over the SPEs and can start, stop, interrupt, and schedule processes running on the SPEs. | |
93 To this end the PPE has additional instructions relating to control of the SPEs. | |
94 Unlike SPEs, the PPE can read and write the main memory and the local memories of SPEs through the standard load/store instructions. | |
95 Despite having Turing complete architectures, | |
96 the SPEs are not fully autonomous and require the PPE to prime them before they can do any useful work. | |
97 Though most of the "horsepower" of the system comes from the synergistic processing elements, | |
98 the use of DMA as a method of data transfer and the limited local memory footprint of each SPE pose a major challenge | |
99 to software developers who wish to make the most of this horsepower, | |
100 demanding careful hand-tuning of programs to extract maximal performance from this CPU. | |
101 | |
102 The PPE and bus architecture includes various modes of operation giving different levels of memory protection, | |
103 allowing areas of memory to be protected from access by specific processes running on the SPEs or the PPE. | |
104 | |
105 Both the PPE and SPE are RISC architectures with a fixed-width 32-bit instruction format. | |
106 The PPE contains a 64-bit general purpose register set (GPR), a 64-bit floating point register set (FPR), | |
107 and a 128-bit Altivec register set. The SPE contains 128-bit registers only. | |
108 These can be used for scalar data types ranging from 8-bits to 128-bits | |
109 in size or for SIMD computations on a variety of integer and floating point formats. | |
110 System memory addresses for both the PPE and SPE are expressed as 64-bit values | |
111 for a theoretic address range of 264 bytes (16 exabytes or 16,777,216 terabytes). | |
112 In practice, not all of these bits are implemented in hardware. | |
113 Local store addresses internal to the SPU processor are expressed as a 32-bit word. | |
114 In documentation relating to Cell a word is always taken to mean 32 bits, a doubleword means 64 bits, and a quadword means 128 bits. | |
115 | |
116 | |
117 \subsubsection{Power Processor Element (PPE)} | |
118 The PPE(Figure \ref{fig:ppe}) is the Power Architecture based, | |
119 two-way multithreaded core acting as the controller for the eight SPEs, | |
120 which handle most of the computational workload. The PPE will work | |
121 with conventional operating systems due to its similarity to other 64-bit PowerPC processors, | |
122 while the SPEs are designed for vectorized floating point code execution. | |
123 The PPE contains a 64 KiB level 1 cache (32 KiB instruction and a 32 KiB data) and a 512 KiB Level 2 cache. | |
124 The size of a cache line is 128 bytes. | |
125 Each PPE can complete two double precision operations per clock cycle using a scalar-fused multiply-add instruction, | |
126 which translates to 6.4 GFLOPS at 3.2 GHz; | |
127 or eight single precision operations per clock cycle with a vector fused-multiply-add instruction, | |
128 which translates to 25.6 GFLOPS at 3.2 GHz. | |
129 | |
130 \begin{figure}[htb] | |
131 \begin{center} | |
132 \includegraphics[scale=0.4]{./pic/PPE.pdf} | |
133 \end{center} | |
134 \caption{PPE (Power Processor Element)} | |
135 \label{fig:ppe} | |
136 \end{figure} | |
137 | |
138 \subsubsection{Synergistic Processing Elements (SPE)} | |
139 Each SPE(Figure \ref{fig:ppe}) is composed of a "Synergistic Processing Unit", SPU, and a "Memory Flow Controller", MFC (DMA, MMU, and bus interface)\cite{cell-ibm}. | |
140 An SPE is a RISC processor with 128-bit SIMD organization\cite{cell-ieee} for single and double precision instructions. | |
141 With the current generation of the Cell, each SPE contains a 256 KiB embedded SRAM for instruction and data, | |
142 called "Local Storage" (not to be mistaken for "Local Memory" in Sony's documents that refer to the VRAM) | |
143 which is visible to the PPE and can be addressed directly by software. Each SPE can support up to 4 GiB of local store memory. | |
144 The local store does not operate like a conventional CPU cache since it is neither transparent | |
145 to software nor does it contain hardware structures that predict which data to load. The SPEs contain a 128-bit, | |
146 128-entry register file and measures 14.5 mm2 on a 90 nm process. | |
147 An SPE can operate on sixteen 8-bit integers, eight 16-bit integers, four 32-bit integers, | |
148 or four single-precision floating-point numbers in a single clock cycle, as well as a memory operation. | |
149 Note that the SPU cannot directly access system memory; | |
150 the 64-bit virtual memory addresses formed by the SPU must be passed from the SPU | |
151 to the SPE memory flow controller (MFC) to set up a DMA operation within the system address space. | |
152 At 3.2 GHz, each SPE gives a theoretical 25.6 GFLOPS of single precision performance. | |
153 | |
154 \begin{figure}[htb] | |
155 \begin{center} | |
156 \includegraphics[scale=0.5]{./pic/SPE.pdf} | |
157 \end{center} | |
158 \caption{SPE (Synergistic Processing Element)} | |
159 \label{fig:spe} | |
160 \end{figure} | |
161 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
162 % Cell の説明いれる |
0 | 163 |
164 % \subsection{Mailbox} | |
165 % Mailbox は, Cell の機能の1つである. | |
166 % Mailbox は, PPE と SPE の間を双方向で, 32 bit メッセージの受け渡しが可能であり, | |
167 % FIFO キュー構造になっている. | |
168 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
169 \section{mechanism of parallel execution on multi-core}\label{section:impl} |
0 | 170 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
171 If on a PlayStation 3, Task is assigned to each SPE, then to be executed in parallel. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
172 Cerium Task Manager possible to be executed in parallel on Mac OSX and Linux anew. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
173 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
174 We implement a synchronized queue on Mac OS X and Linux. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
175 The synchronized queue corresponds to the Mailbox on Playstation 3. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
176 For only one thread use the synchronized queue, that was managed by a binary semaphore. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
177 Each threads has two synchronized queues for input and output, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
178 be able to execute in parallel tasks was received under managment thread. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
179 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
180 Furthermore, because multicore available the same memory space in comparison with Playstation 3, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
181 we modified to pass the pointer a spots that were using the transfer DMA, aimed to improve the speed. |
0 | 182 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
183 \section{Benchmark} |
0 | 184 |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
185 Performance was measured using the example of Word Count, Sort and Prime Counter. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
186 Word Count is to count number of words in the 100MBtext file. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
187 Sort is to sort in one hundred thousand pieces of numeric. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
188 Prime Counter is to enumerate all the prime numbers in the range of up to one million. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
189 for comparsion performance was measured using the same example in PlayStation 3. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
190 Both the optimization level is at the maximum. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
191 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
192 The results are shown in Table \ref{table:benchmark}. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
193 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
194 {\bf Experiment environment} |
0 | 195 |
196 CentOS/Xeon | |
197 \begin{small} | |
198 \begin{itemize}\small | |
199 \item OS : CentOS 6.0 | |
200 \item CPU : Intel\textregistered Xeon\textregistered X5650 @2.67GHz * 2 | |
201 \item Memory : 128GB | |
202 \item Compiler : GCC 4.4.4 | |
203 \end{itemize} | |
204 \end{small} | |
205 | |
206 PlayStation 3/Cell | |
207 \begin{small} | |
208 \begin{itemize}\small | |
209 \item OS : Yellow Dog Linux 6.1 | |
210 \item CPU : Cell Broadband Engine @ 3.2GHz | |
211 \item Memory : 256MB | |
212 \item Compiler : GCC 4.1.2 | |
213 \end{itemize} | |
214 \end{small} | |
215 | |
216 | |
217 \begin{table}[h] | |
218 \caption{Benchmark} | |
219 \label{table:benchmark} | |
5 | 220 {\scriptsize |
0 | 221 \begin{tabular}[t]{c||r|r|r} |
222 \hline | |
223 & Word Count & Sort & Prime Counter\\ | |
224 \hline\hline | |
225 1 CPU (Cell)& 2381 ms & 6244 ms & 2081 ms \\ | |
226 \hline | |
227 6 CPU (Cell)& 1268 ms & 1111 ms & 604 ms\\ | |
228 \hline | |
229 1 CPU (Xeon)& 354 ms & 846 ms & 266 ms\\ | |
230 \hline | |
231 6 CPU (Xeon)& 70 ms & 163 ms & 50 ms\\ | |
232 \hline | |
233 12 CPU (Xeon)& 48 ms & 127 ms & 36 ms\\ | |
234 \hline | |
235 24 CPU (Xeon)& 40 ms & 100 ms & 31 ms\\ | |
236 \hline | |
5 | 237 \end{tabular}} |
0 | 238 \end{table} |
239 | |
240 % Word Count 354 / 70 = 5.0571 | |
241 % Sort 846 / 163 = 5.1901 | |
242 % Prime Counter 266 / 50 = 5.32 | |
2
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
243 |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
244 We use 6 CPU on CentOS, as compared with the case using 1 CPU, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
245 about 5.1 times the speed improvement in the example of WordCount, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
246 about 5.2 times the speed improvement in the example of Sort, |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
247 about 5.3 times the speed improvement in the example of Prime Counter. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
248 If we use 24 CPU, the speed is rising as compared with the case using 12 CPU, however, the speed improvement rate is down. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
249 This is probably concurrency is low, and that seems to be grinding to a halt speed improvement from Amdahl's law\cite{amdahl}. |
7efb3ef94295
add a section of benchmark
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
1
diff
changeset
|
250 Improvement of parallelization rate is a challenge for the future. |
0 | 251 |
252 % また, 図\ref{fig:multi_result}より, 台数効果が確認できる. | |
253 | |
3
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
254 \section{Conclusions} |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
255 In this paper, we describe a new mechanism of parallel execution and implementation of existing Cerium Task Manager. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
256 By using a new implementation mechanism of parallel execution, You can correspond to a multi-core processor environment on Mac OSX and Linux. |
0 | 257 |
3
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
258 To improve the rate of speed as future work when the number of processors has increased. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
259 In addition, Cerium Task Manager has many type of task, is a drawback of such description. |
4fc34730ac45
add section of conclusions
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
2
diff
changeset
|
260 This can be solved by the system description the dependency of the task rather than on the user side. |
0 | 261 |
5 | 262 \nocite{cell_abi, opencl, clay200912, cell_wiki, cell_cpp, cell_sdk, libspe2} |
0 | 263 % \nocite{yutaka:2010a, cell_abi, cell_cpp, cell_sdk, libspe2, ydl, clay200912, fix200609} |
264 \bibliographystyle{junsrt} | |
265 \bibliography{cerium.bib,book.bib} | |
266 | |
267 \end{document} |