annotate regexParser/TODO @ 291:1b75546ff65f

fix TODO
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Mon, 01 Feb 2016 12:20:16 +0900
parents 20ed7536784f
children 948428caf616
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
289
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
1 Mon Feb 1 01:51:10 JST 2016 kono
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
2
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
3 非決定性がある時の maxmum match がよろしくない
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
4 これ以上拡張できないという終了条件の実現は?
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
5
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
6 ./regexParser -ts -subset -regex '(a|b)*a' -file ahoaho.txt
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
7
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
8 で、bの後にaが来なくなると、bの手前までをacceptする
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
9
291
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
10 subset construction はいじらない方針で。
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
11
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
12
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
13 state : 1
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
14 node : + 1 -> 1
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
15 [a-a] (3)
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
16 [b-b] (1)
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
17
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
18 state : 2*
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
19 node : e 2 -> 1
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
20
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
21 state : 3*
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
22 [a-a] (3)
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
23 [b-b] (1)
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
24
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
25 * はaccept state。
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
26
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
27 [a-a] (3) で stateMatch で良いが、maxmum だと match している間は stateMatch はしない。
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
28 現状は、*の付いているstateで、条件にmatchしない時に stateMatch してる。
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
29 これだと state 3 で b で satete 1 に行ってしまい、b 以降に a がない時に失敗する。b に行く前の state 3 で stateMatchするべき。
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
30
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
31 matchする可能性がなくなったところで、前の部分でmatchさせる必要がある。
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
32 * match してなければ、match top をupdate
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
33 * match している間は直前matchをupdate
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
34 * match fail したところで、直前のmatch があれば、それを返す
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
35 という感じか?
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
36
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
37 minimum match は
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
38 * match してなければ、match top をupdate
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
39 * match したところで、直前のmatch があれば、それを返す
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
40 か?
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
41
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
42 ソース生成を CbC に対応させる。(でないと動かないらしい)
289
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
43
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
44
284
5d23dc02f60d add TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 221
diff changeset
45 Sun Jan 31 20:37:49 JST 2016 masa
289
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
46 並列処理時のバグ Ok
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
47 (mili|have) のsubset construction のミス Ok
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
48 tSearch の segv Ok
284
5d23dc02f60d add TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 221
diff changeset
49
289
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
50 '(main|int) ' .. Ok
20ed7536784f add test file
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 287
diff changeset
51 '(main|int)\(' .. Ok
287
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 284
diff changeset
52
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 284
diff changeset
53 とかが動かない。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 284
diff changeset
54
291
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
55 start state に accept flag が立っていると''にmatchしてしまう。それは別に生成する。
1b75546ff65f fix TODO
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 289
diff changeset
56
221
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
57 Sat Jan 2 15:29:16 JST 2016 kono
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
58
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
59 stateよりもstate transitionの方が大きいので、subset contructionで CharClassWalkするのは良くない。
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
60 mergeTransition した時に、state listに新しいものを接続してやれば、CharClassWalkの必要はない。
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
61 その時に、stateArray には入れないでおく。sateArrayは処理済みなので。
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
62
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
63 EOF stateには cc がないので特別扱いする必要がある。
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
64
78174ff2f338 add Todo
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 215
diff changeset
65 Tue Dec 29 17:55:17 JST 2015 kono
215
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
66
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
67 Todo は上に付け加えていく。
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
68
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
69 abc*d +
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
70 / \
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
71 + d
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
72 / \
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
73 + *
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
74 / \ |
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
75 a b c
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
76
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
77 Parserを書き換えて、
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
78
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
79 abc*d +
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
80 / \
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
81 a +
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
82 / \
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
83 b +
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
84 / \
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
85 * d
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
86 |
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
87 c
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
88
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
89 とすることもできる。たぶん、こっちの方が良い。でも、
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
90 ((ab)(c*))d
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
91 と書いても良いはずで、しかも、これは abc*d とおなじになるので解決になってない。
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
92
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
93 sub treeは、最初の状態を返す必要がある。そうでないと、
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
94 (ab*|bc*)
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
95 とかがうまく動かない。
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
96
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
97 最後が*で終わっている時には、次の式と重ねる必要がある。なので、
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
98 最後の*があれば、それを持ち歩く
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
99 方式が良いと思います。
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
100
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
101 stateAllocateをgenerateTransitionは1 passにすると stateArrayの大きさを徐々に増やす必要がある。
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
102 少なくともループは一つにした方が間違いが少ないだろう。
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
103
210
e8aa8a1ea749 add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 204
diff changeset
104
e8aa8a1ea749 add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 204
diff changeset
105 2015年 12月27日 日曜日 19時31分03秒 JST
e8aa8a1ea749 add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 204
diff changeset
106 例題 特定の IP のアクセス数をカウントする
e8aa8a1ea749 add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 204
diff changeset
107 concordance
e8aa8a1ea749 add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 204
diff changeset
108 regex をつかった条件付き concordance
e8aa8a1ea749 add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 204
diff changeset
109 regex をつかった条件付き wordcount
e8aa8a1ea749 add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents: 204
diff changeset
110 これを行う perl スクリプトと比較
215
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
111
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
112 2015年 12月26日 土曜日 18時07分00秒 JST
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
113 TODO CharClassWalker の routine test を作成する
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
114 TODO CharClassMerge の routine test を作成する
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
115 TODO searchBit の routine test を作成する
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
116 TODO subsetConstraction の routine test を作成する
63e9224c7b2b try to fix asterisk
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 212
diff changeset
117