annotate paper/c5.tex @ 71:c01a514d33f7

add bm_search
author Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
date Wed, 17 Feb 2016 00:07:04 +0900
parents 9c16f6b18100
children 69742d52fd7d
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
53
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
1 \chapter{ベンチマーク}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
2 本項で行なった実験の環境は以下の通りである。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
3 \begin{itemize}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
4 \item Mac OS X 10.10.5
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
5 \item 2*2.66 GHz 6-Core Intel Xeon
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
6 \item Memory 16GB 1333MHz DDR3
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
7 \item 1TB HDD
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
8 \end{itemize}
45
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 40
diff changeset
9
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
10 Cerium で実装した Word Count と Mac の wc の比較と、実装した正規表現と Mac の egrep の比較を行なった。
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
11 また、それぞれの結果に実装した並列処理向け I/O の結果も含む。
54
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 53
diff changeset
12
16
a3c5125aea03 add images
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 15
diff changeset
13 \section{Word Count}
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
14 ファイルの大きさは 約500MByte で、このファイルには 約650万行、約8300万単語が含まれている。
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
15
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
16 表\ref{table:IOwordcount} は、ファイル読み込みを含めた Word Count の結果である。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
17 Mac の wc ではこのファイルを処理するのに 10.59 秒かかる。それに対して、Cerium Word Count は mmap Blocked Read 全ての状況で Mac の wc よりも速いことを示している。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
18 Cerium Word Count 12 CPU のとき、7.83 秒で処理をしており、Mac の wc の 1.4 倍ほど速くなっている。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
19
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
20 mmap は読み込みを OS が制御しており、書き手が制御できない。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
21 また Word Count が走る際ファイルアクセスはランダムアクセスとなる。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
22 mmap はランダムアクセスを想定していなくてグラフにばらつきが起こっていると考えられる。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
23 Blocked Read では読み込みをプログラムの書き手が制御しており、ファイルの読み込みもファイルの先頭から順次読み込みを行なっている。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
24 そのため、読み込みを含めた結果にばらつきが起こりにくくなっていると予想される。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
25
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
26 \begin{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
27 \begin{table}[ht]
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
28 \begin{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
29 \begin{tabular}[t]{|r|r|r|r|}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
30 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
31 CPU Num / 実行方式 & Mac(wc) & mmap & Blocked Read\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
32 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
33 1 & 10.590 & 9.96 & 9.33 \\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
34 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
35 4 & --- & 8.63 & 8.52 \\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
36 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
37 8 & --- & 10.35 & 8.04 \\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
38 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
39 12 & --- & 9.26 & 7.82 \\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
40 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
41 \end{tabular}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
42 \caption{ファイル読み込みを含む Word Count}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
43 \label{table:IOwordcount}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
44 \end{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
45 \end{table}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
46 \end{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
47
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
48 \newpage
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
49 表\ref{fig:wordcount} はファイル読み込みを含まない Word Count の結果である。
50
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 47
diff changeset
50
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
51 Mac の wc ではこのファイルを処理するのに 4.08 秒かかる。それに対して、Cerium Word Count は 1 CPU で 3.70 秒、12 CPU だと 0.40 秒で処理できる。
53
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
52
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
53 1 CPU で動作させると Mac の wc よりも 1.1 倍ほど速くなり、12 CPU で動作させると wc よりも 10.2 倍ほど速くなった。
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
54 1 CPU と 12 CPU で比較すると、9.25 倍ほど速くなった。
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
55
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
56 ファイルを読み込んだ結果と比較すると、ファイルを読み込まないで実行したほうが 6,7 秒ほど速くなる。
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
57 これよりファイルを読み込んだ文字列処理の場合、処理時間の60\%から90\% はファイルの読み込みであることがわかる。
53
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
58
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
59 \begin{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
60 \begin{table}[ht]
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
61 \begin{center}
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
62 \begin{tabular}[t]{|r|r|}
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
63 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
64 実行方式 & 実行速度(秒)\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
65 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
66 Mac(wc) & 4.08 \\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
67 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
68 Cerium Word Count(CPU 1) & 3.70\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
69 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
70 Cerium Word Count(CPU 4) & 1.00\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
71 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
72 Cerium Word Count(CPU 8) & 0.52\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
73 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
74 Cerium Word Count(CPU 12) & 0.40\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
75 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
76 \end{tabular}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
77 \caption{ファイル読み込み無しの Word Count}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
78 \label{fig:wordcount}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
79 \end{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
80 \end{table}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
81 \end{tiny}
54
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 53
diff changeset
82
16
a3c5125aea03 add images
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 15
diff changeset
83 \section{正規表現}
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
84 当実験では、Mac の egrep 、C で実装した逐次に DFA の状態遷移と照らし合わせる CGrep、Cerium で並列処理をする CeriumGrep を比較している。
47
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
85
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
86 表\ref{table:AZaz} は正規表現 '[A-Z][A-Za-z0-9]*s' を 500MB(単語数約8500万)、1GB(単語数約1.7億語)のファイルに対してマッチングを行なった。
53
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
87
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
88 \begin{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
89 \begin{table}[ht]
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
90 \begin{center}
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
91 正規表現 '[A-Z][A-Za-z0-9]*s'
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
92 \begin{tabular}[t]{|c|r|r|r|r|}
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
93 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
94 実行方式/File Size(Match Num) & 50MB(54万) & 100MB(107万) & 500MB(536万) & 1GB(1072万) \\
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
95 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
96 CGrep & 4.51 & 9.42 & 20.62 & 40.10\\
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
97 \hline
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
98 CeriumGrep(CPU 12) mmap & 8.97 & 10.79 & 18.00 & 29.16\\
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
99 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
100 CeriumGrep(CPU 12) bread & 7.75 & 10.49 & 15.76 & 26.83\\
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
101 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
102 egrep & 6.42 & 12.80 & 59.51 & 119.23\\
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
103 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
104 \end{tabular}
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
105 \caption{ファイルサイズを変化させた各 grep の結果}
56
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
106 \label{table:AZaz}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
107 \end{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
108 \end{table}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
109 \end{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 54
diff changeset
110
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
111 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
112
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
113 表\ref{table:metachar} 500MB(単語数約8500万) のファイルに対して正規表現 '[A-Z][A-Za-z0-9]*s' をマッチングした結果である。
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
114 これはファイル読み込みを含めた結果と読み込みを含めていない結果の比較である。
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
115 egrep は実行するたびにファイル読み込みを行うため、ファイル読み込み無しの測定はなし。
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
116 \begin{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
117 \begin{table}[ht]
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
118 \begin{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
119 \begin{tabular}[t]{|c|r|r|}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
120 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
121 実行方式 & ファイル読み込み有 & ファイル読み込み無\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
122 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
123 CGrep & 21.17 & 16.15\\
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
124 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
125 CeriumGrep(CPU 2) & 27.06 & 15.40\\
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
126 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
127 CeriumGrep(CPU 12) & 12.48 & 7.39\\
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
128 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
129 egrep & 59.51 & 59.51 \\
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
130 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
131 \end{tabular}
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
132 \caption{ファイル読み込み有りと無しを変化させた各 grep の結果}
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
133 \label{table:metachar}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
134 \end{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
135 \end{table}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
136 \end{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
137
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
138 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
139 表\ref{table:abab}
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
140 aとb が多く含まれている約500MB(単語数約2400万)のファイルに対して、正規表現の状態数を変化させてみた。
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
141 これは読み込みを含んでいる結果で、CeriumGrep のファイル読み込みは Blocked Read、CPU 数 12 にて実行した。
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
142
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
143 \begin{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
144 \begin{table}[ht]
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
145 \begin{center}
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
146 \begin{tabular}[t]{|l|r|r|r|}
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
147 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
148 正規表現 & マッチ数 & CeriumGrep time (s) & egrep time(s)\\
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
149 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
150 '(a \textbar b)*a(a \textbar b)(a \textbar b)z' & 約10万 & 26.58 & 70.11 \\
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
151 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
152 '(a \textbar b)*a(a \textbar b)(a \textbar b)(a \textbar b)z' & 約10000 & 27.89 & 76.78 \\
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
153 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
154 '(a \textbar b)*a(a \textbar b)(a \textbar b)(a \textbar b)(a \textbar b)z' & 約7000 & & 81.88 \\
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
155 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
156 '(a \textbar b)*a(a \textbar b)(a \textbar b)(a \textbar b)(a \textbar b)(a \textbar b)z' & 約4000 & & 86.93 \\
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
157 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
158 \end{tabular}
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
159 \caption{正規表現の状態数を増やした Grep の結果}
65
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
160 \label{table:abab}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
161 \end{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
162 \end{table}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
163 \end{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
164
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 62
diff changeset
165 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
53
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 50
diff changeset
166
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
167 表\ref{table:nomatch} ab の文字列がならんでいるところに (W \textbar w)ord の正規表現
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
168 aとb が多く含まれている約500MB(単語数約2300万)のファイルに対して、全くマッチしない正規表現を与えてパターンマッチングさせてみた。
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
169
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
170 \begin{tiny}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
171 \begin{table}[ht]
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
172 \begin{center}
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
173 \begin{tabular}[t]{|c|r|}
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
174 \hline
66
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 65
diff changeset
175 実行方式/File Size(Match Num) & time (s)\\
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
176 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
177 CGrep & 27.13\\
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
178 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
179 CeriumGrep(CPU 12) mmap & 21.58\\
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
180 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
181 CeriumGrep(CPU 12) bread & 19.99\\
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
182 \hline
71
c01a514d33f7 add bm_search
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
183 egrep & 28.33\\
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
184 \hline
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
185 \end{tabular}
67
9c16f6b18100 add result
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 66
diff changeset
186 \caption{全くマッチングしないパターンを grep した結果}
62
0d13c52a54fd remove bm_search explain
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 61
diff changeset
187 \label{table:nomatch}
61
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
188 \end{center}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
189 \end{table}
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
190 \end{tiny}