# HG changeset patch # User Masataka Kohagura # Date 1455110869 -32400 # Node ID cda3a0bc239dc01afd34e12622a50b62ae2a2003 # Parent cfaafa20942491f9bfbb715f9980a91048291b20 add diff -r cfaafa209424 -r cda3a0bc239d c4.tex --- a/c4.tex Wed Feb 10 18:39:39 2016 +0900 +++ b/c4.tex Wed Feb 10 22:27:49 2016 +0900 @@ -561,17 +561,27 @@ \end{figure} \newpage -\subsection{DFA を元にパターンマッチを行う} -NFA、DFA の生成後、読み込んだファイルに対してパターンマッチを行う。今回の例題では、 +\subsection{並列処理時の整合性の取り方} +正規表現をファイル分割して並列処理をする際、本来マッチングする文章がファイル分割によってマッチングしない場合がある。 + +図\ref{fig:regexdivide}はその一例である。正規表現 ab*c のマッチングする文字列の集合は {ac,abc,abbc,ab..bc} である。 +分割される前はこの文字列 abbbbc は問題なく正規表現 ab*c にマッチングする。 + +並列処理時、分割されたファイルに対してパターンマッチさせるので、分割された1つ目のファイルの末尾の abb 、2つ目のファイルの先頭に bbc はマッチングしない。 +本来分割される前はマッチングする文字列だが、この場合見逃してしまう。 +それを解決するために、正規表現にマッチングし始めたファイルの場所を覚えておく。 +そして、1つ目のファイルの末尾が状態遷移の途中で終わっていた場合は、結果を集計する際に再度マッチングし始めた場所から正規表現をマッチングさせる。 -\begin{itemize} -\item DFA を生成後(NFA であれば、Subset Construction後)、逐次に状態と遷移先を照らし合わせる。 -\item DFA をkh -\end{itemize} -on the fly -subset construction で使わない状態は生成しないで済む +\begin{figure}[htpb] + \begin{center} + \includegraphics[scale=0.3]{images/regex/regexdivide.pdf} + \end{center} + \caption{分割された部分に正規表現がマッチングする場合の処理} + \label{fig:regexdivide} +\end{figure} +\newpage \subsection{(Word をノードに含める話)} diff -r cfaafa209424 -r cda3a0bc239d c5.tex --- a/c5.tex Wed Feb 10 18:39:39 2016 +0900 +++ b/c5.tex Wed Feb 10 22:27:49 2016 +0900 @@ -1,4 +1,10 @@ \chapter{評価・考察} + \section{I/O の測定} \section{Word Count} \section{正規表現} + +\begin{itemize} +\item DFA を生成後(NFA であれば、Subset Construction後)、逐次にDFAと照らし合わせる。 +\item 並列処理時に NFA・DFA を分割した Task に配りそれぞれの Taskで 照らし合わせる。照らし合わせた際に NFA だとわかった場合にはその場で Subset Construction し DFA を生成する。 +\end{itemize} diff -r cfaafa209424 -r cda3a0bc239d images/image.graffle --- a/images/image.graffle Wed Feb 10 18:39:39 2016 +0900 +++ b/images/image.graffle Wed Feb 10 22:27:49 2016 +0900 @@ -26,7 +26,7 @@ MasterSheets ModificationDate - 2016-02-10 06:56:47 +0000 + 2016-02-10 12:49:11 +0000 Modifier MasaKoha NotesVisible @@ -59453,6 +59453,932 @@ GraphicsList + Bounds + {{1294.7364710322211, 461.41689793228198}, {55, 30}} + Class + ShapedGraphic + FitText + YES + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + + ID + 816 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs32 \cf0 match} + + Wrap + NO + + + Bounds + {{1336.535445197454, 357.42771265156341}, {188.50393871819915, 60}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + Size + 11 + + ID + 815 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset128 HiraginoSans-W3;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs22 \cf0 \'8f\'f3\'91\'d4\'91\'4a\'88\'da\'82\'cc\'93\'72\'92\'86\'82\'c5\'8f\'49\'97\'b9\'82\'b5\'82\'c4\'82\'a2\'82\'e9\'8f\'ea\'8d\'87\'82\'cd\'81\'41match \'82\'b5\'8e\'6e\'82\'df\'82\'bd\'82\'c6\'82\'ab\'82\'cc\'90\'e6\'93\'aa\'82\'f0\'8a\'6f\'82\'a6\'82\'c4\'8d\'c4\'93\'78\'83\'7d\'83\'62\'83\'60\'83\'93\'83\'4f\'82\'b3\'82\'b9\'82\'e9\'81\'42} + + + + Class + LineGraphic + FontInfo + + Font + Helvetica + Size + 12 + + ID + 814 + Points + + {1319.5275710273913, 358.58268041882923} + {1319.5275710273913, 410.87468727359089} + + Style + + shadow + + Draws + NO + + stroke + + HeadArrow + FilledArrow + Legacy + + LineType + 1 + TailArrow + 0 + + + + + Class + Group + Graphics + + + Bounds + {{1286.9057624222426, 428.03149994658668}, {70.866142375262825, 30}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + + ID + 812 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\qc\partightenfactor0 + +\f0\fs32 \cf0 abbbbc} + + + + Bounds + {{1274.1732399072239, 431.05879826062301}, {96.331187405300398, 23.945403371927398}} + Class + ShapedGraphic + ID + 813 + Style + + shadow + + Draws + NO + + + + + ID + 811 + + + Class + LineGraphic + FontInfo + + Font + Helvetica + Size + 12 + + Head + + ID + 800 + + ID + 806 + Points + + {1424.40946174278, 210.50393871819878} + {1354.4606422150227, 271.46582568488111} + + Style + + shadow + + Draws + NO + + stroke + + HeadArrow + FilledArrow + Legacy + + LineType + 1 + TailArrow + 0 + + + Tail + + ID + 804 + + + + Class + LineGraphic + FontInfo + + Font + Helvetica + Size + 12 + + Head + + ID + 799 + + ID + 805 + Points + + {1247.2441058046234, 214.25197017568303} + {1299.1850511623179, 271.46582568488111} + + Style + + shadow + + Draws + NO + + stroke + + HeadArrow + FilledArrow + Legacy + + LineType + 1 + TailArrow + 0 + + + Tail + + ID + 803 + + + + Bounds + {{1340.7874137399701, 188.50393871819878}, {167.24409600561989, 44}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + Size + 11 + + ID + 804 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;\f1\fnil\fcharset128 HiraginoSans-W3;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs22 \cf0 bbc +\f1 \'82\'b5\'82\'a9\'93\'c7\'82\'df\'82\'c8\'82\'a2\'82\'cc\'82\'c5\'81\'41\ +\'90\'b3\'8b\'4b\'95\'5c\'8c\'bb\'82\'c9\'82\'cdmatch \'82\'b5\'82\'c8\'82\'a2} + + + + Bounds + {{1159.3700892592976, 184.25197017568303}, {175.74803309065146, 60}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + Size + 11 + + ID + 803 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;\f1\fnil\fcharset128 HiraginoSans-W3;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs22 \cf0 abb +\f1 \'82\'b5\'82\'a9\'93\'c7\'82\'df\'82\'c8\'82\'a2\'82\'cc\'82\'c5\ +\'90\'b3\'8b\'4b\'95\'5c\'8c\'bb\'82\'c9\'82\'cdmatch \'82\'b5\'82\'c8\'82\'a2\'82\'aa\'81\'41\'8f\'f3\'91\'d4\'91\'4a\'88\'da\'82\'cc\'93\'72\'92\'86\'82\'c5\'8f\'49\'97\'b9\'82\'b5\'82\'c4\'82\'a2\'82\'e9} + + + + Bounds + {{1375.2204849276015, 283.43852737084484}, {82, 30}} + Class + ShapedGraphic + FitText + YES + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + + ID + 802 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs32 \cf0 no match} + + Wrap + NO + + + Bounds + {{1192.0708773295707, 283.43852737084484}, {82, 30}} + Class + ShapedGraphic + FitText + YES + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + + ID + 801 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs32 \cf0 no match} + + Wrap + NO + + + Bounds + {{1333.7007995024439, 256.46582568488111}, {41.519685425157604, 30}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + + ID + 800 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs32 \cf0 bbc} + + + + Bounds + {{1278.4252084497391, 256.46582568488111}, {41.519685425157604, 30}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + + ID + 799 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset0 HelveticaNeue;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs32 \cf0 abb} + + + + Bounds + {{1148.0315064792558, 143.14960759803066}, {133.22834766549408, 36}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + + ID + 798 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset128 HiraginoSans-W3;\f1\fnil\fcharset0 HelveticaNeue;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs32 \cf0 \'90\'b3\'8b\'4b\'95\'5c\'8c\'bb : +\f1 ab*c} + + + + Class + LineGraphic + FontInfo + + Font + Helvetica + Size + 12 + + ID + 797 + Points + + {1369.5044273125243, 252.43255189159709} + {1369.5044273125243, 283.43852737084484} + + Style + + shadow + + Draws + NO + + stroke + + HeadArrow + 0 + Legacy + + LineType + 1 + Pattern + 1 + TailArrow + 0 + + + + + Class + LineGraphic + FontInfo + + Font + Helvetica + Size + 12 + + ID + 795 + Points + + {1274.1732399072232, 252.43255189159714} + {1274.1732399072232, 283.43852737084484} + + Style + + shadow + + Draws + NO + + stroke + + HeadArrow + 0 + Legacy + + LineType + 1 + Pattern + 1 + TailArrow + 0 + + + + + Bounds + {{1274.1732399072232, 310.42771265156341}, {107.66977018534249, 47}} + Class + ShapedGraphic + FitText + Vertical + Flow + Resize + FontInfo + + Color + + b + 0 + g + 0 + r + 0 + + Size + 12 + + ID + 793 + Style + + fill + + Draws + NO + + shadow + + Draws + NO + + stroke + + Draws + NO + + + Text + + Align + 0 + Text + {\rtf1\ansi\ansicpg932\cocoartf1404\cocoasubrtf340 +{\fonttbl\f0\fnil\fcharset128 HiraginoSans-W3;} +{\colortbl;\red255\green255\blue255;} +\deftab720 +\pard\pardeftab720\partightenfactor0 + +\f0\fs24 \cf0 \'83\'7d\'83\'62\'83\'60\'82\'b7\'82\'e9\'95\'94\'95\'aa\'82\'aa\ +\'95\'aa\'8a\'84\'82\'b3\'82\'ea\'82\'c4\'82\'a2\'82\'e9} + + + + Class + LineGraphic + FontInfo + + Font + Helvetica + Size + 12 + + ID + 792 + Points + + {1274.0708773295707, 306.14173506113485} + {1370.4020647348714, 306.14173506113485} + + Style + + shadow + + Draws + NO + + stroke + + HeadArrow + DimensionArrow + Legacy + + LineType + 1 + TailArrow + DimensionArrow + + + + + Bounds + {{1333.7007995024439, 259.49312399891738}, {144.56693044553572, 23.945403371927398}} + Class + ShapedGraphic + ID + 788 + Style + + shadow + + Draws + NO + + + + + Bounds + {{1166.4567034968238, 259.49312399891738}, {150.23622183555699, 23.945403371927398}} + Class + ShapedGraphic + ID + 1 + Style + + shadow + + Draws + NO + + + + Class LineGraphic FontInfo @@ -63567,9 +64493,9 @@ TopSlabHeight 682 VisibleRegion - {{0, -295}, {1318.9655172413793, 1374.1379310344828}} + {{1053.8461925010017, 70.979023582516049}, {534.96505458743627, 557.34267778586491}} Zoom - 0.57999999999999996 + 1.4299999475479126 ZoomValues @@ -63654,8 +64580,8 @@ cctree - 0.57999999999999996 - 0.59000000000000008 + 1.4299999475479126 + 1.4300000000000002 diff -r cfaafa209424 -r cda3a0bc239d master_paper.pdf Binary file master_paper.pdf has changed diff -r cfaafa209424 -r cda3a0bc239d memo/result.txt --- a/memo/result.txt Wed Feb 10 18:39:39 2016 +0900 +++ b/memo/result.txt Wed Feb 10 22:27:49 2016 +0900 @@ -1,10 +1,29 @@ Wed Feb 10 11:06:12 JST 2016 -egrep -o '(a|b)*a(a|b)(a|b)' file/ab500MB.txt > /dev/null +cpu 6 + +./cerium/ceriumGrep -regex '(a|b)*a(a|b)(a|b)' -br -file file/ab500MB.txt -cp 32.90s user 1.28s system 99% cpu 34.514 total +cache 17.625 + +./cerium/ceriumGrep -regex '(a|b)*a(a|b)(a|b)(a|b)' -br -file file/ab500MB.tx 31.77s user 1.18s system 109% cpu 30.167 total +cache 19.153 + +./cerium/ceriumGrep -regex '(a|b)*a(a|b)(a|b)(a|b)(a|b)' -br -file -cpu 6 > 31.82s user 1.17s system 113% cpu 29.160 total +17.193 + +./cerium/ceriumGrep -regex '(a|b)*a(a|b)(a|b)(a|b)(a|b)(a|b)' -br -file -cpu 33.98s user 1.36s system 97% cpu 36.390 total +19.701 + 82.88s user 0.20s system 99% cpu 1:23.09 total egrep 83.09 -./cerium/ceriumGrep -regex '(a|b)*a(a|b)(a|b)' -file file/ab500MB.txt -cpu 8 32.82s user 0.85s system 202% cpu 16.595 total +状態数を増やしてみたけど、速度には影響を与えない?? + +egrep -o '(a|b)*a(a|b)(a|b)' file/ab500MB.txt > /dev/null + +./cerium/ceriumGrep -regex '(a|b)*a(a|b)(a|b)' -file file/ab500MB.txt -cpu 6 32.82s user 0.85s system 202% cpu 16.595 total + + [キャッシュ有] cpu time 1 29.053 @@ -28,6 +47,10 @@ ./cerium/ceriumGrep -regex '[A-Z][a-zA-Z0-9_]*' -file file/500MB.txt > 25.29s user 0.53s system 100% cpu 25.721 total +egrep -o '[A-Z][a-zA-Z0-9_]*' 500MB.txt > /dev/null +56.34s user 0.16s system 99% cpu 56.506 total +line:13260580 + [キャッシュ有] cpu time 1 25.721 @@ -39,6 +62,7 @@ 7 8.551 8 8.180 +egrep 56.506 [キャッシュ無 : bread] 1 30.682 2 19.841