annotate paper/chapter1.tex @ 1:5dbcea03717e draft

fix
author Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
date Tue, 07 Feb 2012 17:38:59 +0900
parents 6d80c2c895e4
children d17943f59cc3
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
1 \chapter{Cell Broadband Engine}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
2 本研究で実験題材の対象となった Cell アーキテクチャについて説明する。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
3
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
4 Cell Broadband Engine \cite{cell} は、ソニー・コンピュータエンタテインメント、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
5 ソニー、IBM , 東芝によって開発されたマルチコア CPU である。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
6 Cell は、1基の制御系プロセッサコア PPE ( PowerPC Processor Element ) と
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
7 8基の演算系プロセッサコア SPE ( Synergistic Processor Element ) で構成される。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
8 各プロセッサコアは、EIB (Element Interconnect Bus)
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
9 と呼ばれる高速なバスで接続されている。また、EIB はメインメモリや
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
10 外部入出力デバイスとも接続されていて、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
11 各プロセッサコアは EIB を経由してデータアクセスをおこなう。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
12 本研究で用いた PS3Linux (Yellow Dog Linux 6.2) では、6 個の SPE を
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
13 使うことが出来る ( \figref{cell_arch} )
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
14
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
15 この PPE と SPE の2種類の CPU を、プログラマ自身が用途に
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
16 合わせて適切に使い分けるように考慮する必要がある。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
17
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
18 \begin{figure}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
19 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
20 \includegraphics[scale=0.8]{./images/cell-main.pdf}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
21 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
22 \caption{Cell Broadband Engine Architecture}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
23 \label{fig:cell_arch}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
24 \end{figure}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
25
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
26
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
27 \section{PPE (PowerPC Processor Element)}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
28
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
29 PPE は Cell Broadband Engine のメインプロセッサで、複数の SPE を
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
30 コアプロセッサとして使用することができる汎用プロセッサである。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
31 メインメモリや外部デバイスへの入出力、SPEを制御する役割を担っている。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
32 PPU (PowerPC Processor Unit) は、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
33 PPE の演算処理を行うユニットで、PowerPC アーキテクチャをベースとした
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
34 命令セットを持つ。PPSS (PowerPC Processor Storage Subsystem) は
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
35 PPU からメインメモリへのデータアクセスを制御する
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
36 ユニットである (\figref{ppe}) 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
37
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
38 \begin{figure}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
39 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
40 \includegraphics[scale=0.6]{./images/PPE.pdf}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
41 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
42 \caption{PPE (PowerPC Processor Element)}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
43 \label{fig:ppe}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
44 \end{figure}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
45
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
46 \section{SPE (Synergistic Processor Element)} \label{sec:spe}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
47
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
48 SPE には 256KB の Local Store (LS) と呼ばれる、SPE から唯一、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
49 直接参照できるメモリ領域があり、バスに負担をかける事無く
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
50 並列に計算を進めることが出来る。SPE からメインメモリへは、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
51 直接アクセスすることは出来ず、SPE を構成する一つである
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
52 MFC (Memory Flow Controller) へ、チャネルを介して DMA (Direct Memory Access)
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
53 命令を送ることで行われる (\figref{spe}) 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
54
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
55 \begin{figure}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
56 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
57 \includegraphics[scale=0.8]{./images/SPE.pdf}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
58 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
59 \caption{SPE (Synergistic Processor Element)}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
60 \label{fig:spe}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
61 \end{figure}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
62
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
63 \section{Cell の基本性能}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
64
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
65 \subsection{DMA (Direct Memory Access)}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
66
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
67 \ref{sec:spe}節 で述べた通り、SPE は LS 以外のメモリに
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
68 直接アクセスすることができず、PPE が利用するメインメモリ上のデータに
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
69 アクセスするには DMA (Direct Memory Access) を用いる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
70 DMA 転送とは、CPU を介さずに周辺装置とメモリとの間でデータ転送ことで、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
71 Cell の場合はメインメモリと LS 間でデータの転送を行う。手順としては以下の様になる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
72
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
73 \begin{enumerate}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
74 \item SPE プログラムが MFC (Memory Flow Controller) に対して
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
75 DMA 転送命令を発行
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
76 \item MFC が DMA Controller を介して DMA 転送を開始。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
77 この間、SPE プログラムは停止しない。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
78 \item DMA 転送の終了を待つ場合、SPE プログラム内で転送の完了を待つ
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
79 \end{enumerate}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
80
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
81 この時、DMA 転送するデータとアドレスにはいくつか制限がある。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
82 転送データが 16 バイト以上の場合、データサイズは 16 バイトの倍数で、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
83 転送元と転送先のアドレスが 16 バイト境界に揃えられている必要がある。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
84 転送データが 16 バイト未満の場合、データサイズは 1,2,4,8 バイトで、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
85 転送サイズに応じた自然なアライメントである (転送サイズのバイト境界に
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
86 揃えられている) ことが条件となる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
87
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
88 \subsection{SIMD (Single Instruction Multiple Data)}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
89 Cell では、SPE に実装されている 128 ビットレジスタを用いて
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
90 SIMD を行うことが出来る。SIMD とは、1 つの命令で
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
91 複数のデータに対して処理を行う演算方式である (\figref{simd}) 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
92
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
93 \begin{figure}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
94 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
95 \includegraphics[scale=0.8]{./images/SIMD.pdf}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
96 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
97 \caption{SIMD (Single Instruction Multiple Data)}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
98 \label{fig:simd}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
99 \end{figure}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
100
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
101 \figref{simd} のスカラ演算は以下のような式に当てはまる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
102
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
103 \begin{verbatim}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
104 int a[4] = {1, 2, 3, 4};
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
105 int b[4] = {5, 6, 7, 8};
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
106 int c[4];
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
107
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
108 for (int i = 0; i < 4; i++) {
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
109 a[i] + b[i] = c[i];
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
110 }
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
111 \end{verbatim}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
112
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
113 これに対し、SIMD 演算は以下のようになる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
114
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
115 \begin{verbatim}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
116 vector signed int va = {1, 2, 3, 4};
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
117 vector signed int vb = {5, 6, 7, 8};
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
118 vector signed int vc;
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
119
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
120 vc = spu_add(va, vb);
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
121 \end{verbatim}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
122
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
123 Cell の SIMD 演算では、vector 型の変数を用いる。
0
6d80c2c895e4 first commit
Daichi TOMA <toma@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
124
1
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
125 このように、通常は 4 回計算するところを 1 回の計算で行うことが
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
126 できる反面、すべての演算を 128 ビットで計算するため、なるべく
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
127 効果的に行う様に工夫する必要がある。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
128
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
129 \begin{verbatim}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
130 int a, b, c;
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
131
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
132 c = a + b;
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
133 \end{verbatim}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
134
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
135 この様な計算の場合でも 128 ビット同士の演算を行うため、無駄が生じる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
136
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
137 \subsection{Mailbox} \label{sec:mailbox}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
138
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
139 Mailbox とは SPE の MFC 内の FIFO キューであり、PPE と SPE 間の 32 ビット
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
140 メッセージの交換に用いられる。Mailbox では 3 つの振る舞いが
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
141 出来る様に設計されている (\figref{mailbox}) 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
142
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
143 \begin{figure}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
144 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
145 \includegraphics[scale=0.8]{./images/Mailbox.pdf}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
146 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
147 \caption{Mailbox}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
148 \label{fig:mailbox}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
149 \end{figure}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
150
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
151 \begin{enumerate}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
152 \item SPU Inbound Mailbox \\
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
153 PPE から SPE へデータを渡すためのキュー。キューのエントリ数は
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
154 実装依存による \cite{cell} が、研究環境では最大4個までのデータを蓄積できる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
155 このキューが空の場合は、SPE は、データがメールボックスに書き込まれるまでは、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
156 命令でストールする。読み出すデータの順番は書き込んだ順番に保証されている。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
157 \item SPU Outbound Mailbox \\
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
158 SPE から PPE へのデータを渡すためのキュー。研究環境では最大1個までしか
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
159 データが蓄積できない。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
160 \item SPU Outbound interrupt Mailbox \\
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
161 SPU Outbound Mailbox とほとんど同じだが、このキューでは SPE から
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
162 キューにデータが書き込まれると、PPE に対して割り込みイベントが
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
163 発生し、データの読み出しタイミングを通知する事が出来る。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
164 \end{enumerate}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
165
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
166 \section{開発環境}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
167
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
168 \subsection{libSPE2}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
169
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
170 libSPE2 とは、PPE が SPE を扱うためのライブラリ群である \cite{libspe2} 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
171 libSPE2 は SPE Context Creation、SPE Program Image Handling、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
172 SPE Run Control、SPE Event Handling、SPE MFC Problem State Facilities、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
173 Direct SPE Access for Applications という基本構成でできている。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
174 Cell の基本プログラムは次の様になる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
175
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
176 \begin{enumerate}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
177 \item create N SPE context
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
178 \item Load the appropriate SPE executable object into each SPE context's local store
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
179 \item Create N threads
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
180 \item Wait for all N threads to terminate
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
181 \end{enumerate}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
182
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
183 \subsection{SPU C/C++ 言語拡張}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
184 SPE では基本的な C 言語の機能の他に、Cell 特有の拡張が行われている
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
185 \cite{cell_cpp} 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
186 \tabref{cell_cpp} に主な API を記す。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
187
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
188 \begin{table}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
189 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
190 \caption{SPU C/C++ 言語拡張 API}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
191 \label{tab:cell_cpp}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
192 \begin{tabular}{|l|l|}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
193 \hline
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
194 spu\_mfcdma32 & DMA 転送を開始する \\
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
195 \hline
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
196 spu\_read\_in\_mbox & PPE からの mail を取得する \\
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
197 \hline
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
198 spu\_write\_out\_mbox & PPE へ mail を送信する \\
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
199 \hline
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
200 spu\_add、spu\_sub、spu\_mul & SIMD 演算 (加算、減算、乗算) \\
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
201 \hline
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
202 \end{tabular}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
203 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
204 \end{table}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
205
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
206 SPE を効率よく使う上で \tabref{cell_cpp} の様な Cell 特有の API や、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
207 SPE アセンブラ命令を学ぶことが必要となる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
208
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
209 \subsection{SPURS}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
210
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
211 ここでは、現在発表されている Cell の開発環境である SPURS について説明する。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
212
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
213 SPURS \cite{spurs} は、閉じた並列分散と考えることができる Cell の環境で、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
214 いかに効率よく動作させるかということを考えたシステムである。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
215
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
216 Cell の性能を存分に生かすためには SPE を効率よく使い切ることであり、
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
217 SPE の動作を止めることなく、同期を最小限に行う必要がある。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
218 そこで SPURS では SPE を効率よく利用するために、PPE に依存せずに SPE コードを
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
219 選択し、実行することと機能は効率重視で割り切ることを挙げている。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
220 そのため、SPE 上にカーネルを組み込んでいる。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
221
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
222 アプリケーションを複数 SPE で実行するとき、アプリケーションプログラムを
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
223 出来るだけ小さな単位(タスク)に分割し、通信ライブラリを用いて
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
224 タスク間を依存関係で結合する。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
225 LS 常駐のカーネルが、実行可能なタスクを選んで実行していく
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
226 (\figref{spurs}) 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
227
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
228 \begin{figure}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
229 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
230 \includegraphics[scale=0.6]{./images/spurs_task.pdf}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
231 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
232 \caption{SPURS Task}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
233 \label{fig:spurs}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
234 \end{figure}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
235
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
236 これらの処理はデータを扱うため、SPURS はパイプラインで実行される
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
237 (\figref{spurs_pipeline}) 。
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
238
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
239 \begin{figure}[htb]
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
240 \begin{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
241 \includegraphics[scale=0.6]{./images/spurs_pipeline.pdf}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
242 \end{center}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
243 \caption{SPURS Pipeline}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
244 \label{fig:spurs_pipeline}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
245 \end{figure}
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
246
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
247 以上から、SPURS は複数の SPE を効率よく使うためのライブラリとして
Yutaka_Kinjyo <yutaka@cr.ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
248 優れた物であると思われるが、現在一般には公開されていない。