Mercurial > hg > Members > shinya > pyrect
annotate pyrect/regexp/__init__.py @ 98:4d498b002de5
refactor for data-structure (dict -> Transition).
author | Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp> |
---|---|
date | Sun, 12 Dec 2010 23:04:31 +0900 |
parents | d23f12ce0369 |
children | 8102bf4bbec6 |
rev | line source |
---|---|
43
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
1 #!/usr/bin/env python |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
2 |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
3 from pyrect.regexp.parser import Parser |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
4 from pyrect.regexp.dfa import DFA |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
5 from pyrect.regexp.nfa import NFA |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
6 from pyrect.regexp.nfa_translator import NFATranslator |
87
d23f12ce0369
add suffix-dfa, it's used to be parallel-matching-algorithm (not implement, yet).
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
70
diff
changeset
|
7 from pyrect.regexp.dfa_translator import DFATranslator, SuffixDFATranslator, SuffixTrieTranslator |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
45
diff
changeset
|
8 from pyrect.regexp.analyzer import Analyzer |
70
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
9 from pyrect.regexp.char_collector import CharCollector |
39
43b277a00905
add Lexer/Parser/AST class. it's can parse Regexp to AST. (Pyrect will shift to more flexible/robust system.)
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
10 |
43
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
11 class Regexp(object): |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
12 """Regexp is basic class in Pyrect. |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
13 this contains regexp, dfa, nfa,, actually it's include all. |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
14 >>> regexp = Regexp('(A|B)*C') |
98
4d498b002de5
refactor for data-structure (dict -> Transition).
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
87
diff
changeset
|
15 >>> print(regexp.dfa.transition) |
4d498b002de5
refactor for data-structure (dict -> Transition).
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
87
diff
changeset
|
16 Transition: 0 x 'A' -> 0, 0 x 'B' -> 0, 0 x 'C' -> 1, |
43
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
17 >>> regexp.matches('ABC') |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
18 True |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
19 >>> regexp = Regexp('(a|b)*cd*e') |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
20 >>> regexp.matches('abababcdddde') |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
21 True |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
22 >>> regexp.matches('ababccdeee') |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
23 False |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
24 """ |
45
d29d3470fde7
modify dot translator. add regex as title, and simplify graph.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
43
diff
changeset
|
25 |
43
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
26 def __init__(self, regexp): |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
27 self.regexp = regexp |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
28 self.ast = Parser().parse(regexp) |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
29 self.nfa = NFATranslator().translate(self.ast) |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
30 self.dfa = DFATranslator().translate(self.nfa) |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
45
diff
changeset
|
31 an = Analyzer() |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
45
diff
changeset
|
32 self.max_len, self.min_len, self.must_words =\ |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
45
diff
changeset
|
33 an.analyze(self.ast) |
70
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
34 an = CharCollector() |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
35 self.chars = an.analyze(self.ast) |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
36 |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
37 @classmethod |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
38 def get_chars(cls, regexp): |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
39 an = CharCollector() |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
40 return an.analyze(cls.parse(regexp)) |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
41 |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
42 @classmethod |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
43 def get_analyze(cls, regexp): |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
44 an = Analyzer() |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
45 return an.analyze(cls.parse(regexp)) |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
46 |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
47 @classmethod |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
48 def parse(cls, regexp): |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
49 psr = Parser() |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
50 return psr.parse(regexp) |
39
43b277a00905
add Lexer/Parser/AST class. it's can parse Regexp to AST. (Pyrect will shift to more flexible/robust system.)
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
51 |
43
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
52 def matches(self, string): |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
53 runtime = self.dfa.get_runtime() |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
54 return runtime.accept(string) |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
55 |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
56 |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
57 def test(): |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
58 import doctest |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
59 doctest.testmod() |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
60 |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
61 |
83c69d42faa8
replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
39
diff
changeset
|
62 if __name__ == "__main__": test() |