Mercurial > hg > Applications > Grep
annotate regexParser/TODO @ 302:27414e6fb33c
retrying blocked search
fix for CbC support
author | Shinji KONO <kono@ie.u-ryukyu.ac.jp> |
---|---|
date | Mon, 08 Feb 2016 08:59:38 +0900 |
parents | 63213964502a |
children | c48a8671ce34 |
rev | line source |
---|---|
302
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
1 Sat Feb 6 19:50:04 JST 2016 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
2 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
3 ちょっとあれだけど、 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
4 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
5 各blockはstate 1から始める |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
6 終わりの状態が1でなかったら、そこだけやりなおす |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
7 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
8 ってのが簡単。最悪、全部やり直す可能性があるが... |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
9 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
10 Wed Feb 3 21:15:49 JST 2016 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
11 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
12 blockedSearch だと一つはoverrapさせる必要がある。 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
13 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
14 (aaa|aaabb) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
15 state : 1 [a-a] (14) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
16 state : 2* |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
17 state : 4 [a-a] (8) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
18 state : 8 [a-a] (2) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
19 state : 10 [a-a] (20) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
20 state : 20 [a-a] (40) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
21 state : 40 [b-b] (80) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
22 state : 80 [b-b] (2) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
23 state : 14 [a-a] (28) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
24 state : 28 [a-a] (42) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
25 state : 42* [b-b] (80) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
26 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
27 a | a | a bbb |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
28 prev 14 28 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
29 curret 7F ... .. |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
30 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
31 a a | a | a bbb |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
32 prev 14 28 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
33 curret 7F ... .. |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
34 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
35 false positive がある → 再判定 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
36 maxmum match による見落としがある (元々そういうものはあるのだが...) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
37 なくそうと思うと、ちょっと大変(可能な resultを全部推移させる必要がある) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
38 内部の非決定性がなければ、こういう問題は出ない |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
39 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
40 |
298 | 41 Wed Feb 3 08:20:06 JST 2016 |
42 | |
43 state : 1 [w-w] (4) | |
44 state : 4 [o-o] (8) | |
45 state : 8 [r-r] (10) | |
46 node : a 10 -> 2 [d-d] (2) | |
47 | |
48 w | o r d | |
49 4 8 10 2 | |
50 | |
51 x | w o r d | |
52 1 4 8 10 2 | |
53 | |
54 Tue Feb 2 11:21:14 JST 2016 kono | |
295 | 55 |
56 あとは word の処理だけだ | |
57 charClassMergeをなおさないといけない | |
58 merge で文字列のlistにする | |
59 長いものは分割 | |
60 部分文字列は分解する? | |
61 | |
296 | 62 Cerirum 側で、最初のmatchが表示されてない |
63 | |
298 | 64 Tue Feb 2 09:55:40 JST 2016 kono |
293
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
65 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
66 % ./regexParser -subst -regex '(a|b)*a(a|b)(a|b)' |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
67 ---Print Node---- |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
68 a(1)->(1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
69 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
70 b(1)->(1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
71 * |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
72 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
73 a(4)->(4) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
74 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
75 a(4)->(8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
76 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
77 b(4)->(8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
78 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
79 a(8)->(2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
80 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
81 b(8)->(2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
82 ----------------- |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
83 state : 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
84 node : + 1 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
85 [a-a] (5) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
86 [b-b] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
87 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
88 state : 2* |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
89 node : e 2 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
90 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
91 state : 4 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
92 node : | 4 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
93 [a-a] (8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
94 [b-b] (8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
95 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
96 state : 8 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
97 node : | 8 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
98 [a-a] (2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
99 [b-b] (2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
100 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
101 state : 5 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
102 [a-a] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
103 [b-b] (9) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
104 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
105 state : 9 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
106 [a-a] (1) <---- 間違い 2 とmergeしているはずだが... |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
107 [b-b] (3) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
108 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
109 state : 3* |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
110 [a-a] (5) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
111 [b-b] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
112 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
113 やはり charClassMerge のbugだった。 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
114 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
115 createCharClassRangeで、同じものだったら新しく作らないってのがあると良い |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
116 charClassMerg が同じものを返す場合があるってことね |
295 | 117 同じレンジで同じ状態のものだけなので、それほどあるとは思えないが。 |
293
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
118 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
119 |
289 | 120 Mon Feb 1 01:51:10 JST 2016 kono |
121 | |
122 非決定性がある時の maxmum match がよろしくない | |
123 これ以上拡張できないという終了条件の実現は? | |
124 | |
125 ./regexParser -ts -subset -regex '(a|b)*a' -file ahoaho.txt | |
126 | |
127 で、bの後にaが来なくなると、bの手前までをacceptする | |
128 | |
291 | 129 subset construction はいじらない方針で。 |
130 | |
131 | |
132 state : 1 | |
133 node : + 1 -> 1 | |
134 [a-a] (3) | |
135 [b-b] (1) | |
136 | |
137 state : 2* | |
138 node : e 2 -> 1 | |
139 | |
140 state : 3* | |
141 [a-a] (3) | |
142 [b-b] (1) | |
143 | |
144 * はaccept state。 | |
145 | |
146 [a-a] (3) で stateMatch で良いが、maxmum だと match している間は stateMatch はしない。 | |
147 現状は、*の付いているstateで、条件にmatchしない時に stateMatch してる。 | |
148 これだと state 3 で b で satete 1 に行ってしまい、b 以降に a がない時に失敗する。b に行く前の state 3 で stateMatchするべき。 | |
149 | |
150 matchする可能性がなくなったところで、前の部分でmatchさせる必要がある。 | |
151 * match してなければ、match top をupdate | |
152 * match している間は直前matchをupdate | |
153 * match fail したところで、直前のmatch があれば、それを返す | |
154 という感じか? | |
155 | |
156 minimum match は | |
157 * match してなければ、match top をupdate | |
158 * match したところで、直前のmatch があれば、それを返す | |
159 か? | |
160 | |
161 ソース生成を CbC に対応させる。(でないと動かないらしい) | |
289 | 162 |
163 | |
284 | 164 Sun Jan 31 20:37:49 JST 2016 masa |
289 | 165 並列処理時のバグ Ok |
166 (mili|have) のsubset construction のミス Ok | |
167 tSearch の segv Ok | |
284 | 168 |
289 | 169 '(main|int) ' .. Ok |
170 '(main|int)\(' .. Ok | |
287 | 171 |
172 とかが動かない。 | |
173 | |
291 | 174 start state に accept flag が立っていると''にmatchしてしまう。それは別に生成する。 |
175 | |
221 | 176 Sat Jan 2 15:29:16 JST 2016 kono |
177 | |
178 stateよりもstate transitionの方が大きいので、subset contructionで CharClassWalkするのは良くない。 | |
179 mergeTransition した時に、state listに新しいものを接続してやれば、CharClassWalkの必要はない。 | |
180 その時に、stateArray には入れないでおく。sateArrayは処理済みなので。 | |
181 | |
182 EOF stateには cc がないので特別扱いする必要がある。 | |
183 | |
184 Tue Dec 29 17:55:17 JST 2015 kono | |
215 | 185 |
186 Todo は上に付け加えていく。 | |
187 | |
188 abc*d + | |
189 / \ | |
190 + d | |
191 / \ | |
192 + * | |
193 / \ | | |
194 a b c | |
195 | |
196 Parserを書き換えて、 | |
197 | |
198 abc*d + | |
199 / \ | |
200 a + | |
201 / \ | |
202 b + | |
203 / \ | |
204 * d | |
205 | | |
206 c | |
207 | |
208 とすることもできる。たぶん、こっちの方が良い。でも、 | |
209 ((ab)(c*))d | |
210 と書いても良いはずで、しかも、これは abc*d とおなじになるので解決になってない。 | |
211 | |
212 sub treeは、最初の状態を返す必要がある。そうでないと、 | |
213 (ab*|bc*) | |
214 とかがうまく動かない。 | |
215 | |
216 最後が*で終わっている時には、次の式と重ねる必要がある。なので、 | |
217 最後の*があれば、それを持ち歩く | |
218 方式が良いと思います。 | |
219 | |
220 stateAllocateをgenerateTransitionは1 passにすると stateArrayの大きさを徐々に増やす必要がある。 | |
221 少なくともループは一つにした方が間違いが少ないだろう。 | |
222 | |
210
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
223 |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
224 2015年 12月27日 日曜日 19時31分03秒 JST |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
225 例題 特定の IP のアクセス数をカウントする |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
226 concordance |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
227 regex をつかった条件付き concordance |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
228 regex をつかった条件付き wordcount |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
229 これを行う perl スクリプトと比較 |
215 | 230 |
231 2015年 12月26日 土曜日 18時07分00秒 JST | |
232 TODO CharClassWalker の routine test を作成する | |
233 TODO CharClassMerge の routine test を作成する | |
234 TODO searchBit の routine test を作成する | |
235 TODO subsetConstraction の routine test を作成する | |
236 |