annotate pyrect/regexp/__init__.py @ 98:4d498b002de5

refactor for data-structure (dict -> Transition).
author Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
date Sun, 12 Dec 2010 23:04:31 +0900
parents d23f12ce0369
children 8102bf4bbec6
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
43
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
1 #!/usr/bin/env python
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
2
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
3 from pyrect.regexp.parser import Parser
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
4 from pyrect.regexp.dfa import DFA
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
5 from pyrect.regexp.nfa import NFA
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
6 from pyrect.regexp.nfa_translator import NFATranslator
87
d23f12ce0369 add suffix-dfa, it's used to be parallel-matching-algorithm (not implement, yet).
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 70
diff changeset
7 from pyrect.regexp.dfa_translator import DFATranslator, SuffixDFATranslator, SuffixTrieTranslator
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
8 from pyrect.regexp.analyzer import Analyzer
70
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
9 from pyrect.regexp.char_collector import CharCollector
39
43b277a00905 add Lexer/Parser/AST class. it's can parse Regexp to AST. (Pyrect will shift to more flexible/robust system.)
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
10
43
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
11 class Regexp(object):
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
12 """Regexp is basic class in Pyrect.
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
13 this contains regexp, dfa, nfa,, actually it's include all.
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
14 >>> regexp = Regexp('(A|B)*C')
98
4d498b002de5 refactor for data-structure (dict -> Transition).
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 87
diff changeset
15 >>> print(regexp.dfa.transition)
4d498b002de5 refactor for data-structure (dict -> Transition).
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 87
diff changeset
16 Transition: 0 x 'A' -> 0, 0 x 'B' -> 0, 0 x 'C' -> 1,
43
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
17 >>> regexp.matches('ABC')
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
18 True
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
19 >>> regexp = Regexp('(a|b)*cd*e')
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
20 >>> regexp.matches('abababcdddde')
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
21 True
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
22 >>> regexp.matches('ababccdeee')
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
23 False
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
24 """
45
d29d3470fde7 modify dot translator. add regex as title, and simplify graph.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
25
43
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
26 def __init__(self, regexp):
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
27 self.regexp = regexp
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
28 self.ast = Parser().parse(regexp)
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
29 self.nfa = NFATranslator().translate(self.ast)
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
30 self.dfa = DFATranslator().translate(self.nfa)
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
31 an = Analyzer()
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
32 self.max_len, self.min_len, self.must_words =\
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
33 an.analyze(self.ast)
70
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
34 an = CharCollector()
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
35 self.chars = an.analyze(self.ast)
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
36
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
37 @classmethod
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
38 def get_chars(cls, regexp):
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
39 an = CharCollector()
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
40 return an.analyze(cls.parse(regexp))
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
41
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
42 @classmethod
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
43 def get_analyze(cls, regexp):
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
44 an = Analyzer()
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
45 return an.analyze(cls.parse(regexp))
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
46
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
47 @classmethod
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
48 def parse(cls, regexp):
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
49 psr = Parser()
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
50 return psr.parse(regexp)
39
43b277a00905 add Lexer/Parser/AST class. it's can parse Regexp to AST. (Pyrect will shift to more flexible/robust system.)
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
51
43
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
52 def matches(self, string):
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
53 runtime = self.dfa.get_runtime()
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
54 return runtime.accept(string)
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
55
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
56
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
57 def test():
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
58 import doctest
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
59 doctest.testmod()
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
60
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
61
83c69d42faa8 replace converting-flow, module dfareg with module regexp. it's is substantial changing in implimentation.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 39
diff changeset
62 if __name__ == "__main__": test()