111
|
1 ------------------------------------------------------------------------------
|
|
2 -- --
|
|
3 -- GNAT LIBRARY COMPONENTS --
|
|
4 -- --
|
|
5 -- G N A T . S P I T B O L . P A T T E R N S --
|
|
6 -- --
|
|
7 -- S p e c --
|
|
8 -- --
|
|
9 -- Copyright (C) 1997-2017, AdaCore --
|
|
10 -- --
|
|
11 -- GNAT is free software; you can redistribute it and/or modify it under --
|
|
12 -- terms of the GNU General Public License as published by the Free Soft- --
|
|
13 -- ware Foundation; either version 3, or (at your option) any later ver- --
|
|
14 -- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
|
|
15 -- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
|
|
16 -- or FITNESS FOR A PARTICULAR PURPOSE. --
|
|
17 -- --
|
|
18 -- As a special exception under Section 7 of GPL version 3, you are granted --
|
|
19 -- additional permissions described in the GCC Runtime Library Exception, --
|
|
20 -- version 3.1, as published by the Free Software Foundation. --
|
|
21 -- --
|
|
22 -- You should have received a copy of the GNU General Public License and --
|
|
23 -- a copy of the GCC Runtime Library Exception along with this program; --
|
|
24 -- see the files COPYING3 and COPYING.RUNTIME respectively. If not, see --
|
|
25 -- <http://www.gnu.org/licenses/>. --
|
|
26 -- --
|
|
27 -- GNAT was originally developed by the GNAT team at New York University. --
|
|
28 -- Extensive contributions were provided by Ada Core Technologies Inc. --
|
|
29 -- --
|
|
30 ------------------------------------------------------------------------------
|
|
31
|
|
32 -- SPITBOL-like pattern construction and matching
|
|
33
|
|
34 -- This child package of GNAT.SPITBOL provides a complete implementation
|
|
35 -- of the SPITBOL-like pattern construction and matching operations. This
|
|
36 -- package is based on Macro-SPITBOL created by Robert Dewar.
|
|
37
|
|
38 ------------------------------------------------------------
|
|
39 -- Summary of Pattern Matching Packages in GNAT Hierarchy --
|
|
40 ------------------------------------------------------------
|
|
41
|
|
42 -- There are three related packages that perform pattern matching functions.
|
|
43 -- the following is an outline of these packages, to help you determine
|
|
44 -- which is best for your needs.
|
|
45
|
|
46 -- GNAT.Regexp (files g-regexp.ads/g-regexp.adb)
|
|
47 -- This is a simple package providing Unix-style regular expression
|
|
48 -- matching with the restriction that it matches entire strings. It
|
|
49 -- is particularly useful for file name matching, and in particular
|
|
50 -- it provides "globbing patterns" that are useful in implementing
|
|
51 -- unix or DOS style wild card matching for file names.
|
|
52
|
|
53 -- GNAT.Regpat (files g-regpat.ads/g-regpat.adb)
|
|
54 -- This is a more complete implementation of Unix-style regular
|
|
55 -- expressions, copied from the original V7 style regular expression
|
|
56 -- library written in C by Henry Spencer. It is functionally the
|
|
57 -- same as this library, and uses the same internal data structures
|
|
58 -- stored in a binary compatible manner.
|
|
59
|
|
60 -- GNAT.Spitbol.Patterns (files g-spipat.ads/g-spipat.adb)
|
|
61 -- This is a completely general patterm matching package based on the
|
|
62 -- pattern language of SNOBOL4, as implemented in SPITBOL. The pattern
|
|
63 -- language is modeled on context free grammars, with context sensitive
|
|
64 -- extensions that provide full (type 0) computational capabilities.
|
|
65
|
|
66 with Ada.Strings.Maps; use Ada.Strings.Maps;
|
|
67 with Ada.Text_IO; use Ada.Text_IO;
|
|
68
|
|
69 package GNAT.Spitbol.Patterns is
|
|
70 pragma Elaborate_Body;
|
|
71
|
|
72 -------------------------------
|
|
73 -- Pattern Matching Tutorial --
|
|
74 -------------------------------
|
|
75
|
|
76 -- A pattern matching operation (a call to one of the Match subprograms)
|
|
77 -- takes a subject string and a pattern, and optionally a replacement
|
|
78 -- string. The replacement string option is only allowed if the subject
|
|
79 -- is a variable.
|
|
80
|
|
81 -- The pattern is matched against the subject string, and either the
|
|
82 -- match fails, or it succeeds matching a contiguous substring. If a
|
|
83 -- replacement string is specified, then the subject string is modified
|
|
84 -- by replacing the matched substring with the given replacement.
|
|
85
|
|
86 -- Concatenation and Alternation
|
|
87 -- =============================
|
|
88
|
|
89 -- A pattern consists of a series of pattern elements. The pattern is
|
|
90 -- built up using either the concatenation operator:
|
|
91
|
|
92 -- A & B
|
|
93
|
|
94 -- which means match A followed immediately by matching B, or the
|
|
95 -- alternation operator:
|
|
96
|
|
97 -- A or B
|
|
98
|
|
99 -- which means first attempt to match A, and then if that does not
|
|
100 -- succeed, match B.
|
|
101
|
|
102 -- There is full backtracking, which means that if a given pattern
|
|
103 -- element fails to match, then previous alternatives are matched.
|
|
104 -- For example if we have the pattern:
|
|
105
|
|
106 -- (A or B) & (C or D) & (E or F)
|
|
107
|
|
108 -- First we attempt to match A, if that succeeds, then we go on to try
|
|
109 -- to match C, and if that succeeds, we go on to try to match E. If E
|
|
110 -- fails, then we try F. If F fails, then we go back and try matching
|
|
111 -- D instead of C. Let's make this explicit using a specific example,
|
|
112 -- and introducing the simplest kind of pattern element, which is a
|
|
113 -- literal string. The meaning of this pattern element is simply to
|
|
114 -- match the characters that correspond to the string characters. Now
|
|
115 -- let's rewrite the above pattern form with specific string literals
|
|
116 -- as the pattern elements:
|
|
117
|
|
118 -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
|
|
119
|
|
120 -- The following strings will be attempted in sequence:
|
|
121
|
|
122 -- ABC . DEF . GH
|
|
123 -- ABC . DEF . IJ
|
|
124 -- ABC . CDE . GH
|
|
125 -- ABC . CDE . IJ
|
|
126 -- AB . DEF . GH
|
|
127 -- AB . DEF . IJ
|
|
128 -- AB . CDE . GH
|
|
129 -- AB . CDE . IJ
|
|
130
|
|
131 -- Here we use the dot simply to separate the pieces of the string
|
|
132 -- matched by the three separate elements.
|
|
133
|
|
134 -- Moving the Start Point
|
|
135 -- ======================
|
|
136
|
|
137 -- A pattern is not required to match starting at the first character
|
|
138 -- of the string, and is not required to match to the end of the string.
|
|
139 -- The first attempt does indeed attempt to match starting at the first
|
|
140 -- character of the string, trying all the possible alternatives. But
|
|
141 -- if all alternatives fail, then the starting point of the match is
|
|
142 -- moved one character, and all possible alternatives are attempted at
|
|
143 -- the new anchor point.
|
|
144
|
|
145 -- The entire match fails only when every possible starting point has
|
|
146 -- been attempted. As an example, suppose that we had the subject
|
|
147 -- string
|
|
148
|
|
149 -- "ABABCDEIJKL"
|
|
150
|
|
151 -- matched using the pattern in the previous example:
|
|
152
|
|
153 -- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
|
|
154
|
|
155 -- would succeed, after two anchor point moves:
|
|
156
|
|
157 -- "ABABCDEIJKL"
|
|
158 -- ^^^^^^^
|
|
159 -- matched
|
|
160 -- section
|
|
161
|
|
162 -- This mode of pattern matching is called the unanchored mode. It is
|
|
163 -- also possible to put the pattern matcher into anchored mode by
|
|
164 -- setting the global variable Anchored_Mode to True. This will cause
|
|
165 -- all subsequent matches to be performed in anchored mode, where the
|
|
166 -- match is required to start at the first character.
|
|
167
|
|
168 -- We will also see later how the effect of an anchored match can be
|
|
169 -- obtained for a single specified anchor point if this is desired.
|
|
170
|
|
171 -- Other Pattern Elements
|
|
172 -- ======================
|
|
173
|
|
174 -- In addition to strings (or single characters), there are many special
|
|
175 -- pattern elements that correspond to special predefined alternations:
|
|
176
|
|
177 -- Arb Matches any string. First it matches the null string, and
|
|
178 -- then on a subsequent failure, matches one character, and
|
|
179 -- then two characters, and so on. It only fails if the
|
|
180 -- entire remaining string is matched.
|
|
181
|
|
182 -- Bal Matches a non-empty string that is parentheses balanced
|
|
183 -- with respect to ordinary () characters. Examples of
|
|
184 -- balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E".
|
|
185 -- Bal matches the shortest possible balanced string on the
|
|
186 -- first attempt, and if there is a subsequent failure,
|
|
187 -- attempts to extend the string.
|
|
188
|
|
189 -- Cancel Immediately aborts the entire pattern match, signalling
|
|
190 -- failure. This is a specialized pattern element, which is
|
|
191 -- useful in conjunction with some of the special pattern
|
|
192 -- elements that have side effects.
|
|
193
|
|
194 -- Fail The null alternation. Matches no possible strings, so it
|
|
195 -- always signals failure. This is a specialized pattern
|
|
196 -- element, which is useful in conjunction with some of the
|
|
197 -- special pattern elements that have side effects.
|
|
198
|
|
199 -- Fence Matches the null string at first, and then if a failure
|
|
200 -- causes alternatives to be sought, aborts the match (like
|
|
201 -- a Cancel). Note that using Fence at the start of a pattern
|
|
202 -- has the same effect as matching in anchored mode.
|
|
203
|
|
204 -- Rest Matches from the current point to the last character in
|
|
205 -- the string. This is a specialized pattern element, which
|
|
206 -- is useful in conjunction with some of the special pattern
|
|
207 -- elements that have side effects.
|
|
208
|
|
209 -- Succeed Repeatedly matches the null string (it is equivalent to
|
|
210 -- the alternation ("" or "" or "" ....). This is a special
|
|
211 -- pattern element, which is useful in conjunction with some
|
|
212 -- of the special pattern elements that have side effects.
|
|
213
|
|
214 -- Pattern Construction Functions
|
|
215 -- ==============================
|
|
216
|
|
217 -- The following functions construct additional pattern elements
|
|
218
|
|
219 -- Any(S) Where S is a string, matches a single character that is
|
|
220 -- any one of the characters in S. Fails if the current
|
|
221 -- character is not one of the given set of characters.
|
|
222
|
|
223 -- Arbno(P) Where P is any pattern, matches any number of instances
|
|
224 -- of the pattern, starting with zero occurrences. It is
|
|
225 -- thus equivalent to ("" or (P & ("" or (P & ("" ....)))).
|
|
226 -- The pattern P may contain any number of pattern elements
|
|
227 -- including the use of alternation and concatenation.
|
|
228
|
|
229 -- Break(S) Where S is a string, matches a string of zero or more
|
|
230 -- characters up to but not including a break character
|
|
231 -- that is one of the characters given in the string S.
|
|
232 -- Can match the null string, but cannot match the last
|
|
233 -- character in the string, since a break character is
|
|
234 -- required to be present.
|
|
235
|
|
236 -- BreakX(S) Where S is a string, behaves exactly like Break(S) when
|
|
237 -- it first matches, but if a string is successfully matched,
|
|
238 -- then a subsequent failure causes an attempt to extend the
|
|
239 -- matched string.
|
|
240
|
|
241 -- Fence(P) Where P is a pattern, attempts to match the pattern P
|
|
242 -- including trying all possible alternatives of P. If none
|
|
243 -- of these alternatives succeeds, then the Fence pattern
|
|
244 -- fails. If one alternative succeeds, then the pattern
|
|
245 -- match proceeds, but on a subsequent failure, no attempt
|
|
246 -- is made to search for alternative matches of P. The
|
|
247 -- pattern P may contain any number of pattern elements
|
|
248 -- including the use of alternation and concatenation.
|
|
249
|
|
250 -- Len(N) Where N is a natural number, matches the given number of
|
|
251 -- characters. For example, Len(10) matches any string that
|
|
252 -- is exactly ten characters long.
|
|
253
|
|
254 -- NotAny(S) Where S is a string, matches a single character that is
|
|
255 -- not one of the characters of S. Fails if the current
|
|
256 -- character is one of the given set of characters.
|
|
257
|
|
258 -- NSpan(S) Where S is a string, matches a string of zero or more
|
|
259 -- characters that is among the characters given in the
|
|
260 -- string. Always matches the longest possible such string.
|
|
261 -- Always succeeds, since it can match the null string.
|
|
262
|
|
263 -- Pos(N) Where N is a natural number, matches the null string
|
|
264 -- if exactly N characters have been matched so far, and
|
|
265 -- otherwise fails.
|
|
266
|
|
267 -- Rpos(N) Where N is a natural number, matches the null string
|
|
268 -- if exactly N characters remain to be matched, and
|
|
269 -- otherwise fails.
|
|
270
|
|
271 -- Rtab(N) Where N is a natural number, matches characters from
|
|
272 -- the current position until exactly N characters remain
|
|
273 -- to be matched in the string. Fails if fewer than N
|
|
274 -- unmatched characters remain in the string.
|
|
275
|
|
276 -- Tab(N) Where N is a natural number, matches characters from
|
|
277 -- the current position until exactly N characters have
|
|
278 -- been matched in all. Fails if more than N characters
|
|
279 -- have already been matched.
|
|
280
|
|
281 -- Span(S) Where S is a string, matches a string of one or more
|
|
282 -- characters that is among the characters given in the
|
|
283 -- string. Always matches the longest possible such string.
|
|
284 -- Fails if the current character is not one of the given
|
|
285 -- set of characters.
|
|
286
|
|
287 -- Recursive Pattern Matching
|
|
288 -- ==========================
|
|
289
|
|
290 -- The plus operator (+P) where P is a pattern variable, creates
|
|
291 -- a recursive pattern that will, at pattern matching time, follow
|
|
292 -- the pointer to obtain the referenced pattern, and then match this
|
|
293 -- pattern. This may be used to construct recursive patterns. Consider
|
|
294 -- for example:
|
|
295
|
|
296 -- P := ("A" or ("B" & (+P)))
|
|
297
|
|
298 -- On the first attempt, this pattern attempts to match the string "A".
|
|
299 -- If this fails, then the alternative matches a "B", followed by an
|
|
300 -- attempt to match P again. This second attempt first attempts to
|
|
301 -- match "A", and so on. The result is a pattern that will match a
|
|
302 -- string of B's followed by a single A.
|
|
303
|
|
304 -- This particular example could simply be written as NSpan('B') & 'A',
|
|
305 -- but the use of recursive patterns in the general case can construct
|
|
306 -- complex patterns which could not otherwise be built.
|
|
307
|
|
308 -- Pattern Assignment Operations
|
|
309 -- =============================
|
|
310
|
|
311 -- In addition to the overall result of a pattern match, which indicates
|
|
312 -- success or failure, it is often useful to be able to keep track of
|
|
313 -- the pieces of the subject string that are matched by individual
|
|
314 -- pattern elements, or subsections of the pattern.
|
|
315
|
|
316 -- The pattern assignment operators allow this capability. The first
|
|
317 -- form is the immediate assignment:
|
|
318
|
|
319 -- P * S
|
|
320
|
|
321 -- Here P is an arbitrary pattern, and S is a variable of type VString
|
|
322 -- that will be set to the substring matched by P. This assignment
|
|
323 -- happens during pattern matching, so if P matches more than once,
|
|
324 -- then the assignment happens more than once.
|
|
325
|
|
326 -- The deferred assignment operation:
|
|
327
|
|
328 -- P ** S
|
|
329
|
|
330 -- avoids these multiple assignments by deferring the assignment to the
|
|
331 -- end of the match. If the entire match is successful, and if the
|
|
332 -- pattern P was part of the successful match, then at the end of the
|
|
333 -- matching operation the assignment to S of the string matching P is
|
|
334 -- performed.
|
|
335
|
|
336 -- The cursor assignment operation:
|
|
337
|
|
338 -- Setcur(N'Access)
|
|
339
|
|
340 -- assigns the current cursor position to the natural variable N. The
|
|
341 -- cursor position is defined as the count of characters that have been
|
|
342 -- matched so far (including any start point moves).
|
|
343
|
|
344 -- Finally the operations * and ** may be used with values of type
|
|
345 -- Text_IO.File_Access. The effect is to do a Put_Line operation of
|
|
346 -- the matched substring. These are particularly useful in debugging
|
|
347 -- pattern matches.
|
|
348
|
|
349 -- Deferred Matching
|
|
350 -- =================
|
|
351
|
|
352 -- The pattern construction functions (such as Len and Any) all permit
|
|
353 -- the use of pointers to natural or string values, or functions that
|
|
354 -- return natural or string values. These forms cause the actual value
|
|
355 -- to be obtained at pattern matching time. This allows interesting
|
|
356 -- possibilities for constructing dynamic patterns as illustrated in
|
|
357 -- the examples section.
|
|
358
|
|
359 -- In addition the (+S) operator may be used where S is a pointer to
|
|
360 -- string or function returning string, with a similar deferred effect.
|
|
361
|
|
362 -- A special use of deferred matching is the construction of predicate
|
|
363 -- functions. The element (+P) where P is an access to a function that
|
|
364 -- returns a Boolean value, causes the function to be called at the
|
|
365 -- time the element is matched. If the function returns True, then the
|
|
366 -- null string is matched, if the function returns False, then failure
|
|
367 -- is signalled and previous alternatives are sought.
|
|
368
|
|
369 -- Deferred Replacement
|
|
370 -- ====================
|
|
371
|
|
372 -- The simple model given for pattern replacement (where the matched
|
|
373 -- substring is replaced by the string given as the third argument to
|
|
374 -- Match) works fine in simple cases, but this approach does not work
|
|
375 -- in the case where the expression used as the replacement string is
|
|
376 -- dependent on values set by the match.
|
|
377
|
|
378 -- For example, suppose we want to find an instance of a parenthesized
|
|
379 -- character, and replace the parentheses with square brackets. At first
|
|
380 -- glance it would seem that:
|
|
381
|
|
382 -- Match (Subject, '(' & Len (1) * Char & ')', '[' & Char & ']');
|
|
383
|
|
384 -- would do the trick, but that does not work, because the third
|
|
385 -- argument to Match gets evaluated too early, before the call to
|
|
386 -- Match, and before the pattern match has had a chance to set Char.
|
|
387
|
|
388 -- To solve this problem we provide the deferred replacement capability.
|
|
389 -- With this approach, which of course is only needed if the pattern
|
|
390 -- involved has side effects, is to do the match in two stages. The
|
|
391 -- call to Match sets a pattern result in a variable of the private
|
|
392 -- type Match_Result, and then a subsequent Replace operation uses
|
|
393 -- this Match_Result object to perform the required replacement.
|
|
394
|
|
395 -- Using this approach, we can now write the above operation properly
|
|
396 -- in a manner that will work:
|
|
397
|
|
398 -- M : Match_Result;
|
|
399 -- ...
|
|
400 -- Match (Subject, '(' & Len (1) * Char & ')', M);
|
|
401 -- Replace (M, '[' & Char & ']');
|
|
402
|
|
403 -- As with other Match cases, there is a function and procedure form
|
|
404 -- of this match call. A call to Replace after a failed match has no
|
|
405 -- effect. Note that Subject should not be modified between the calls.
|
|
406
|
|
407 -- Examples of Pattern Matching
|
|
408 -- ============================
|
|
409
|
|
410 -- First a simple example of the use of pattern replacement to remove
|
|
411 -- a line number from the start of a string. We assume that the line
|
|
412 -- number has the form of a string of decimal digits followed by a
|
|
413 -- period, followed by one or more spaces.
|
|
414
|
|
415 -- Digs : constant Pattern := Span("0123456789");
|
|
416
|
|
417 -- Lnum : constant Pattern := Pos(0) & Digs & '.' & Span(' ');
|
|
418
|
|
419 -- Now to use this pattern we simply do a match with a replacement:
|
|
420
|
|
421 -- Match (Line, Lnum, "");
|
|
422
|
|
423 -- which replaces the line number by the null string. Note that it is
|
|
424 -- also possible to use an Ada.Strings.Maps.Character_Set value as an
|
|
425 -- argument to Span and similar functions, and in particular all the
|
|
426 -- useful constants 'in Ada.Strings.Maps.Constants are available. This
|
|
427 -- means that we could define Digs as:
|
|
428
|
|
429 -- Digs : constant Pattern := Span(Decimal_Digit_Set);
|
|
430
|
|
431 -- The style we use here, of defining constant patterns and then using
|
|
432 -- them is typical. It is possible to build up patterns dynamically,
|
|
433 -- but it is usually more efficient to build them in pieces in advance
|
|
434 -- using constant declarations. Note in particular that although it is
|
|
435 -- possible to construct a pattern directly as an argument for the
|
|
436 -- Match routine, it is much more efficient to preconstruct the pattern
|
|
437 -- as we did in this example.
|
|
438
|
|
439 -- Now let's look at the use of pattern assignment to break a
|
|
440 -- string into sections. Suppose that the input string has two
|
|
441 -- unsigned decimal integers, separated by spaces or a comma,
|
|
442 -- with spaces allowed anywhere. Then we can isolate the two
|
|
443 -- numbers with the following pattern:
|
|
444
|
|
445 -- Num1, Num2 : aliased VString;
|
|
446
|
|
447 -- B : constant Pattern := NSpan(' ');
|
|
448
|
|
449 -- N : constant Pattern := Span("0123456789");
|
|
450
|
|
451 -- T : constant Pattern :=
|
|
452 -- NSpan(' ') & N * Num1 & Span(" ,") & N * Num2;
|
|
453
|
|
454 -- The match operation Match (" 124, 257 ", T) would assign the
|
|
455 -- string 124 to Num1 and the string 257 to Num2.
|
|
456
|
|
457 -- Now let's see how more complex elements can be built from the
|
|
458 -- set of primitive elements. The following pattern matches strings
|
|
459 -- that have the syntax of Ada 95 based literals:
|
|
460
|
|
461 -- Digs : constant Pattern := Span(Decimal_Digit_Set);
|
|
462 -- UDigs : constant Pattern := Digs & Arbno('_' & Digs);
|
|
463
|
|
464 -- Edig : constant Pattern := Span(Hexadecimal_Digit_Set);
|
|
465 -- UEdig : constant Pattern := Edig & Arbno('_' & Edig);
|
|
466
|
|
467 -- Bnum : constant Pattern := Udigs & '#' & UEdig & '#';
|
|
468
|
|
469 -- A match against Bnum will now match the desired strings, e.g.
|
|
470 -- it will match 16#123_abc#, but not a#b#. However, this pattern
|
|
471 -- is not quite complete, since it does not allow colons to replace
|
|
472 -- the pound signs. The following is more complete:
|
|
473
|
|
474 -- Bchar : constant Pattern := Any("#:");
|
|
475 -- Bnum : constant Pattern := Udigs & Bchar & UEdig & Bchar;
|
|
476
|
|
477 -- but that is still not quite right, since it allows # and : to be
|
|
478 -- mixed, and they are supposed to be used consistently. We solve
|
|
479 -- this by using a deferred match.
|
|
480
|
|
481 -- Temp : aliased VString;
|
|
482
|
|
483 -- Bnum : constant Pattern :=
|
|
484 -- Udigs & Bchar * Temp & UEdig & (+Temp)
|
|
485
|
|
486 -- Here the first instance of the base character is stored in Temp, and
|
|
487 -- then later in the pattern we rematch the value that was assigned.
|
|
488
|
|
489 -- For an example of a recursive pattern, let's define a pattern
|
|
490 -- that is like the built in Bal, but the string matched is balanced
|
|
491 -- with respect to square brackets or curly brackets.
|
|
492
|
|
493 -- The language for such strings might be defined in extended BNF as
|
|
494
|
|
495 -- ELEMENT ::= <any character other than [] or {}>
|
|
496 -- | '[' BALANCED_STRING ']'
|
|
497 -- | '{' BALANCED_STRING '}'
|
|
498
|
|
499 -- BALANCED_STRING ::= ELEMENT {ELEMENT}
|
|
500
|
|
501 -- Here we use {} to indicate zero or more occurrences of a term, as
|
|
502 -- is common practice in extended BNF. Now we can translate the above
|
|
503 -- BNF into recursive patterns as follows:
|
|
504
|
|
505 -- Element, Balanced_String : aliased Pattern;
|
|
506 -- .
|
|
507 -- .
|
|
508 -- .
|
|
509 -- Element := NotAny ("[]{}")
|
|
510 -- or
|
|
511 -- ('[' & (+Balanced_String) & ']')
|
|
512 -- or
|
|
513 -- ('{' & (+Balanced_String) & '}');
|
|
514
|
|
515 -- Balanced_String := Element & Arbno (Element);
|
|
516
|
|
517 -- Note the important use of + here to refer to a pattern not yet
|
|
518 -- defined. Note also that we use assignments precisely because we
|
|
519 -- cannot refer to as yet undeclared variables in initializations.
|
|
520
|
|
521 -- Now that this pattern is constructed, we can use it as though it
|
|
522 -- were a new primitive pattern element, and for example, the match:
|
|
523
|
|
524 -- Match ("xy[ab{cd}]", Balanced_String * Current_Output & Fail);
|
|
525
|
|
526 -- will generate the output:
|
|
527
|
|
528 -- x
|
|
529 -- xy
|
|
530 -- xy[ab{cd}]
|
|
531 -- y
|
|
532 -- y[ab{cd}]
|
|
533 -- [ab{cd}]
|
|
534 -- a
|
|
535 -- ab
|
|
536 -- ab{cd}
|
|
537 -- b
|
|
538 -- b{cd}
|
|
539 -- {cd}
|
|
540 -- c
|
|
541 -- cd
|
|
542 -- d
|
|
543
|
|
544 -- Note that the function of the fail here is simply to force the
|
|
545 -- pattern Balanced_String to match all possible alternatives. Studying
|
|
546 -- the operation of this pattern in detail is highly instructive.
|
|
547
|
|
548 -- Finally we give a rather elaborate example of the use of deferred
|
|
549 -- matching. The following declarations build up a pattern which will
|
|
550 -- find the longest string of decimal digits in the subject string.
|
|
551
|
|
552 -- Max, Cur : VString;
|
|
553 -- Loc : Natural;
|
|
554
|
|
555 -- function GtS return Boolean is
|
|
556 -- begin
|
|
557 -- return Length (Cur) > Length (Max);
|
|
558 -- end GtS;
|
|
559
|
|
560 -- Digit : constant Character_Set := Decimal_Digit_Set;
|
|
561
|
|
562 -- Digs : constant Pattern := Span(Digit);
|
|
563
|
|
564 -- Find : constant Pattern :=
|
|
565 -- "" * Max & Fence & -- initialize Max to null
|
|
566 -- BreakX (Digit) & -- scan looking for digits
|
|
567 -- ((Span(Digit) * Cur & -- assign next string to Cur
|
|
568 -- (+GtS'Unrestricted_Access) & -- check size(Cur) > Size(Max)
|
|
569 -- Setcur(Loc'Access)) -- if so, save location
|
|
570 -- * Max) & -- and assign to Max
|
|
571 -- Fail; -- seek all alternatives
|
|
572
|
|
573 -- As we see from the comments here, complex patterns like this take
|
|
574 -- on aspects of sequential programs. In fact they are sequential
|
|
575 -- programs with general backtracking. In this pattern, we first use
|
|
576 -- a pattern assignment that matches null and assigns it to Max, so
|
|
577 -- that it is initialized for the new match. Now BreakX scans to the
|
|
578 -- next digit. Arb would do here, but BreakX will be more efficient.
|
|
579 -- Once we have found a digit, we scan out the longest string of
|
|
580 -- digits with Span, and assign it to Cur. The deferred call to GtS
|
|
581 -- tests if the string we assigned to Cur is the longest so far. If
|
|
582 -- not, then failure is signalled, and we seek alternatives (this
|
|
583 -- means that BreakX will extend and look for the next digit string).
|
|
584 -- If the call to GtS succeeds then the matched string is assigned
|
|
585 -- as the largest string so far into Max and its location is saved
|
|
586 -- in Loc. Finally Fail forces the match to fail and seek alternatives,
|
|
587 -- so that the entire string is searched.
|
|
588
|
|
589 -- If the pattern Find is matched against a string, the variable Max
|
|
590 -- at the end of the pattern will have the longest string of digits,
|
|
591 -- and Loc will be the starting character location of the string. For
|
|
592 -- example, Match("ab123cd4657ef23", Find) will assign "4657" to Max
|
|
593 -- and 11 to Loc (indicating that the string ends with the eleventh
|
|
594 -- character of the string).
|
|
595
|
|
596 -- Note: the use of Unrestricted_Access to reference GtS will not
|
|
597 -- be needed if GtS is defined at the outer level, but definitely
|
|
598 -- will be necessary if GtS is a nested function (in which case of
|
|
599 -- course the scope of the pattern Find will be restricted to this
|
|
600 -- nested scope, and this cannot be checked, i.e. use of the pattern
|
|
601 -- outside this scope is erroneous). Generally it is a good idea to
|
|
602 -- define patterns and the functions they call at the outer level
|
|
603 -- where possible, to avoid such problems.
|
|
604
|
|
605 -- Correspondence with Pattern Matching in SPITBOL
|
|
606 -- ===============================================
|
|
607
|
|
608 -- Generally the Ada syntax and names correspond closely to SPITBOL
|
|
609 -- syntax for pattern matching construction.
|
|
610
|
|
611 -- The basic pattern construction operators are renamed as follows:
|
|
612
|
|
613 -- Spitbol Ada
|
|
614
|
|
615 -- (space) &
|
|
616 -- | or
|
|
617 -- $ *
|
|
618 -- . **
|
|
619
|
|
620 -- The Ada operators were chosen so that the relative precedences of
|
|
621 -- these operators corresponds to that of the Spitbol operators, but
|
|
622 -- as always, the use of parentheses is advisable to clarify.
|
|
623
|
|
624 -- The pattern construction operators all have similar names except for
|
|
625
|
|
626 -- Spitbol Ada
|
|
627
|
|
628 -- Abort Cancel
|
|
629 -- Rem Rest
|
|
630
|
|
631 -- where we have clashes with Ada reserved names
|
|
632
|
|
633 -- Ada requires the use of 'Access to refer to functions used in the
|
|
634 -- pattern match, and often the use of 'Unrestricted_Access may be
|
|
635 -- necessary to get around the scope restrictions if the functions
|
|
636 -- are not declared at the outer level.
|
|
637
|
|
638 -- The actual pattern matching syntax is modified in Ada as follows:
|
|
639
|
|
640 -- Spitbol Ada
|
|
641
|
|
642 -- X Y Match (X, Y);
|
|
643 -- X Y = Z Match (X, Y, Z);
|
|
644
|
|
645 -- and pattern failure is indicated by returning a Boolean result from
|
|
646 -- the Match function (True for success, False for failure).
|
|
647
|
|
648 -----------------------
|
|
649 -- Type Declarations --
|
|
650 -----------------------
|
|
651
|
|
652 type Pattern is private;
|
|
653 -- Type representing a pattern. This package provides a complete set of
|
|
654 -- operations for constructing patterns that can be used in the pattern
|
|
655 -- matching operations provided.
|
|
656
|
|
657 type Boolean_Func is access function return Boolean;
|
|
658 -- General Boolean function type. When this type is used as a formal
|
|
659 -- parameter type in this package, it indicates a deferred predicate
|
|
660 -- pattern. The function will be called when the pattern element is
|
|
661 -- matched and failure signalled if False is returned.
|
|
662
|
|
663 type Natural_Func is access function return Natural;
|
|
664 -- General Natural function type. When this type is used as a formal
|
|
665 -- parameter type in this package, it indicates a deferred pattern.
|
|
666 -- The function will be called when the pattern element is matched
|
|
667 -- to obtain the currently referenced Natural value.
|
|
668
|
|
669 type VString_Func is access function return VString;
|
|
670 -- General VString function type. When this type is used as a formal
|
|
671 -- parameter type in this package, it indicates a deferred pattern.
|
|
672 -- The function will be called when the pattern element is matched
|
|
673 -- to obtain the currently referenced string value.
|
|
674
|
|
675 subtype PString is String;
|
|
676 -- This subtype is used in the remainder of the package to indicate a
|
|
677 -- formal parameter that is converted to its corresponding pattern,
|
|
678 -- i.e. a pattern that matches the characters of the string.
|
|
679
|
|
680 subtype PChar is Character;
|
|
681 -- Similarly, this subtype is used in the remainder of the package to
|
|
682 -- indicate a formal parameter that is converted to its corresponding
|
|
683 -- pattern, i.e. a pattern that matches this one character.
|
|
684
|
|
685 subtype VString_Var is VString;
|
|
686 subtype Pattern_Var is Pattern;
|
|
687 -- These synonyms are used as formal parameter types to a function where,
|
|
688 -- if the language allowed, we would use in out parameters, but we are
|
|
689 -- not allowed to have in out parameters for functions. Instead we pass
|
|
690 -- actuals which must be variables, and with a bit of trickery in the
|
|
691 -- body, manage to interpret them properly as though they were indeed
|
|
692 -- in out parameters.
|
|
693
|
|
694 pragma Warnings (Off, VString_Var);
|
|
695 pragma Warnings (Off, Pattern_Var);
|
|
696 -- We turn off warnings for these two types so that when variables are used
|
|
697 -- as arguments in this context, warnings about them not being assigned in
|
|
698 -- the source program will be suppressed.
|
|
699
|
|
700 --------------------------------
|
|
701 -- Basic Pattern Construction --
|
|
702 --------------------------------
|
|
703
|
|
704 function "&" (L : Pattern; R : Pattern) return Pattern;
|
|
705 function "&" (L : PString; R : Pattern) return Pattern;
|
|
706 function "&" (L : Pattern; R : PString) return Pattern;
|
|
707 function "&" (L : PChar; R : Pattern) return Pattern;
|
|
708 function "&" (L : Pattern; R : PChar) return Pattern;
|
|
709
|
|
710 -- Pattern concatenation. Matches L followed by R
|
|
711
|
|
712 function "or" (L : Pattern; R : Pattern) return Pattern;
|
|
713 function "or" (L : PString; R : Pattern) return Pattern;
|
|
714 function "or" (L : Pattern; R : PString) return Pattern;
|
|
715 function "or" (L : PString; R : PString) return Pattern;
|
|
716 function "or" (L : PChar; R : Pattern) return Pattern;
|
|
717 function "or" (L : Pattern; R : PChar) return Pattern;
|
|
718 function "or" (L : PChar; R : PChar) return Pattern;
|
|
719 function "or" (L : PString; R : PChar) return Pattern;
|
|
720 function "or" (L : PChar; R : PString) return Pattern;
|
|
721 -- Pattern alternation. Creates a pattern that will first try to match
|
|
722 -- L and then on a subsequent failure, attempts to match R instead.
|
|
723
|
|
724 ----------------------------------
|
|
725 -- Pattern Assignment Functions --
|
|
726 ----------------------------------
|
|
727
|
|
728 function "*" (P : Pattern; Var : VString_Var) return Pattern;
|
|
729 function "*" (P : PString; Var : VString_Var) return Pattern;
|
|
730 function "*" (P : PChar; Var : VString_Var) return Pattern;
|
|
731 -- Matches P, and if the match succeeds, assigns the matched substring
|
|
732 -- to the given VString variable Var. This assignment happens as soon as
|
|
733 -- the substring is matched, and if the pattern P1 is matched more than
|
|
734 -- once during the course of the match, then the assignment will occur
|
|
735 -- more than once.
|
|
736
|
|
737 function "**" (P : Pattern; Var : VString_Var) return Pattern;
|
|
738 function "**" (P : PString; Var : VString_Var) return Pattern;
|
|
739 function "**" (P : PChar; Var : VString_Var) return Pattern;
|
|
740 -- Like "*" above, except that the assignment happens at most once
|
|
741 -- after the entire match is completed successfully. If the match
|
|
742 -- fails, then no assignment takes place.
|
|
743
|
|
744 ----------------------------------
|
|
745 -- Deferred Matching Operations --
|
|
746 ----------------------------------
|
|
747
|
|
748 function "+" (Str : VString_Var) return Pattern;
|
|
749 -- Here Str must be a VString variable. This function constructs a
|
|
750 -- pattern which at pattern matching time will access the current
|
|
751 -- value of this variable, and match against these characters.
|
|
752
|
|
753 function "+" (Str : VString_Func) return Pattern;
|
|
754 -- Constructs a pattern which at pattern matching time calls the given
|
|
755 -- function, and then matches against the string or character value
|
|
756 -- that is returned by the call.
|
|
757
|
|
758 function "+" (P : Pattern_Var) return Pattern;
|
|
759 -- Here P must be a Pattern variable. This function constructs a
|
|
760 -- pattern which at pattern matching time will access the current
|
|
761 -- value of this variable, and match against the pattern value.
|
|
762
|
|
763 function "+" (P : Boolean_Func) return Pattern;
|
|
764 -- Constructs a predicate pattern function that at pattern matching time
|
|
765 -- calls the given function. If True is returned, then the pattern matches.
|
|
766 -- If False is returned, then failure is signalled.
|
|
767
|
|
768 --------------------------------
|
|
769 -- Pattern Building Functions --
|
|
770 --------------------------------
|
|
771
|
|
772 function Arb return Pattern;
|
|
773 -- Constructs a pattern that will match any string. On the first attempt,
|
|
774 -- the pattern matches a null string, then on each successive failure, it
|
|
775 -- matches one more character, and only fails if matching the entire rest
|
|
776 -- of the string.
|
|
777
|
|
778 function Arbno (P : Pattern) return Pattern;
|
|
779 function Arbno (P : PString) return Pattern;
|
|
780 function Arbno (P : PChar) return Pattern;
|
|
781 -- Pattern repetition. First matches null, then on a subsequent failure
|
|
782 -- attempts to match an additional instance of the given pattern.
|
|
783 -- Equivalent to (but more efficient than) P & ("" or (P & ("" or ...
|
|
784
|
|
785 function Any (Str : String) return Pattern;
|
|
786 function Any (Str : VString) return Pattern;
|
|
787 function Any (Str : Character) return Pattern;
|
|
788 function Any (Str : Character_Set) return Pattern;
|
|
789 function Any (Str : not null access VString) return Pattern;
|
|
790 function Any (Str : VString_Func) return Pattern;
|
|
791 -- Constructs a pattern that matches a single character that is one of
|
|
792 -- the characters in the given argument. The pattern fails if the current
|
|
793 -- character is not in Str.
|
|
794
|
|
795 function Bal return Pattern;
|
|
796 -- Constructs a pattern that will match any non-empty string that is
|
|
797 -- parentheses balanced with respect to the normal parentheses characters.
|
|
798 -- Attempts to extend the string if a subsequent failure occurs.
|
|
799
|
|
800 function Break (Str : String) return Pattern;
|
|
801 function Break (Str : VString) return Pattern;
|
|
802 function Break (Str : Character) return Pattern;
|
|
803 function Break (Str : Character_Set) return Pattern;
|
|
804 function Break (Str : not null access VString) return Pattern;
|
|
805 function Break (Str : VString_Func) return Pattern;
|
|
806 -- Constructs a pattern that matches a (possibly null) string which
|
|
807 -- is immediately followed by a character in the given argument. This
|
|
808 -- character is not part of the matched string. The pattern fails if
|
|
809 -- the remaining characters to be matched do not include any of the
|
|
810 -- characters in Str.
|
|
811
|
|
812 function BreakX (Str : String) return Pattern;
|
|
813 function BreakX (Str : VString) return Pattern;
|
|
814 function BreakX (Str : Character) return Pattern;
|
|
815 function BreakX (Str : Character_Set) return Pattern;
|
|
816 function BreakX (Str : not null access VString) return Pattern;
|
|
817 function BreakX (Str : VString_Func) return Pattern;
|
|
818 -- Like Break, but the pattern attempts to extend on a failure to find
|
|
819 -- the next occurrence of a character in Str, and only fails when the
|
|
820 -- last such instance causes a failure.
|
|
821
|
|
822 function Cancel return Pattern;
|
|
823 -- Constructs a pattern that immediately aborts the entire match
|
|
824
|
|
825 function Fail return Pattern;
|
|
826 -- Constructs a pattern that always fails
|
|
827
|
|
828 function Fence return Pattern;
|
|
829 -- Constructs a pattern that matches null on the first attempt, and then
|
|
830 -- causes the entire match to be aborted if a subsequent failure occurs.
|
|
831
|
|
832 function Fence (P : Pattern) return Pattern;
|
|
833 -- Constructs a pattern that first matches P. If P fails, then the
|
|
834 -- constructed pattern fails. If P succeeds, then the match proceeds,
|
|
835 -- but if subsequent failure occurs, alternatives in P are not sought.
|
|
836 -- The idea of Fence is that each time the pattern is matched, just
|
|
837 -- one attempt is made to match P, without trying alternatives.
|
|
838
|
|
839 function Len (Count : Natural) return Pattern;
|
|
840 function Len (Count : not null access Natural) return Pattern;
|
|
841 function Len (Count : Natural_Func) return Pattern;
|
|
842 -- Constructs a pattern that matches exactly the given number of
|
|
843 -- characters. The pattern fails if fewer than this number of characters
|
|
844 -- remain to be matched in the string.
|
|
845
|
|
846 function NotAny (Str : String) return Pattern;
|
|
847 function NotAny (Str : VString) return Pattern;
|
|
848 function NotAny (Str : Character) return Pattern;
|
|
849 function NotAny (Str : Character_Set) return Pattern;
|
|
850 function NotAny (Str : not null access VString) return Pattern;
|
|
851 function NotAny (Str : VString_Func) return Pattern;
|
|
852 -- Constructs a pattern that matches a single character that is not
|
|
853 -- one of the characters in the given argument. The pattern Fails if
|
|
854 -- the current character is in Str.
|
|
855
|
|
856 function NSpan (Str : String) return Pattern;
|
|
857 function NSpan (Str : VString) return Pattern;
|
|
858 function NSpan (Str : Character) return Pattern;
|
|
859 function NSpan (Str : Character_Set) return Pattern;
|
|
860 function NSpan (Str : not null access VString) return Pattern;
|
|
861 function NSpan (Str : VString_Func) return Pattern;
|
|
862 -- Constructs a pattern that matches the longest possible string
|
|
863 -- consisting entirely of characters from the given argument. The
|
|
864 -- string may be empty, so this pattern always succeeds.
|
|
865
|
|
866 function Pos (Count : Natural) return Pattern;
|
|
867 function Pos (Count : not null access Natural) return Pattern;
|
|
868 function Pos (Count : Natural_Func) return Pattern;
|
|
869 -- Constructs a pattern that matches the null string if exactly Count
|
|
870 -- characters have already been matched, and otherwise fails.
|
|
871
|
|
872 function Rest return Pattern;
|
|
873 -- Constructs a pattern that always succeeds, matching the remaining
|
|
874 -- unmatched characters in the pattern.
|
|
875
|
|
876 function Rpos (Count : Natural) return Pattern;
|
|
877 function Rpos (Count : not null access Natural) return Pattern;
|
|
878 function Rpos (Count : Natural_Func) return Pattern;
|
|
879 -- Constructs a pattern that matches the null string if exactly Count
|
|
880 -- characters remain to be matched in the string, and otherwise fails.
|
|
881
|
|
882 function Rtab (Count : Natural) return Pattern;
|
|
883 function Rtab (Count : not null access Natural) return Pattern;
|
|
884 function Rtab (Count : Natural_Func) return Pattern;
|
|
885 -- Constructs a pattern that matches from the current location until
|
|
886 -- exactly Count characters remain to be matched in the string. The
|
|
887 -- pattern fails if fewer than Count characters remain to be matched.
|
|
888
|
|
889 function Setcur (Var : not null access Natural) return Pattern;
|
|
890 -- Constructs a pattern that matches the null string, and assigns the
|
|
891 -- current cursor position in the string. This value is the number of
|
|
892 -- characters matched so far. So it is zero at the start of the match.
|
|
893
|
|
894 function Span (Str : String) return Pattern;
|
|
895 function Span (Str : VString) return Pattern;
|
|
896 function Span (Str : Character) return Pattern;
|
|
897 function Span (Str : Character_Set) return Pattern;
|
|
898 function Span (Str : not null access VString) return Pattern;
|
|
899 function Span (Str : VString_Func) return Pattern;
|
|
900 -- Constructs a pattern that matches the longest possible string
|
|
901 -- consisting entirely of characters from the given argument. The
|
|
902 -- string cannot be empty, so the pattern fails if the current
|
|
903 -- character is not one of the characters in Str.
|
|
904
|
|
905 function Succeed return Pattern;
|
|
906 -- Constructs a pattern that succeeds matching null, both on the first
|
|
907 -- attempt, and on any rematch attempt, i.e. it is equivalent to an
|
|
908 -- infinite alternation of null strings.
|
|
909
|
|
910 function Tab (Count : Natural) return Pattern;
|
|
911 function Tab (Count : not null access Natural) return Pattern;
|
|
912 function Tab (Count : Natural_Func) return Pattern;
|
|
913 -- Constructs a pattern that from the current location until Count
|
|
914 -- characters have been matched. The pattern fails if more than Count
|
|
915 -- characters have already been matched.
|
|
916
|
|
917 ---------------------------------
|
|
918 -- Pattern Matching Operations --
|
|
919 ---------------------------------
|
|
920
|
|
921 -- The Match function performs an actual pattern matching operation.
|
|
922 -- The versions with three parameters perform a match without modifying
|
|
923 -- the subject string and return a Boolean result indicating if the
|
|
924 -- match is successful or not. The Anchor parameter is set to True to
|
|
925 -- obtain an anchored match in which the pattern is required to match
|
|
926 -- the first character of the string. In an unanchored match, which is
|
|
927
|
|
928 -- the default, successive attempts are made to match the given pattern
|
|
929 -- at each character of the subject string until a match succeeds, or
|
|
930 -- until all possibilities have failed.
|
|
931
|
|
932 -- Note that pattern assignment functions in the pattern may generate
|
|
933 -- side effects, so these functions are not necessarily pure.
|
|
934
|
|
935 Anchored_Mode : Boolean := False;
|
|
936 -- This global variable can be set True to cause all subsequent pattern
|
|
937 -- matches to operate in anchored mode. In anchored mode, no attempt is
|
|
938 -- made to move the anchor point, so that if the match succeeds it must
|
|
939 -- succeed starting at the first character. Note that the effect of
|
|
940 -- anchored mode may be achieved in individual pattern matches by using
|
|
941 -- Fence or Pos(0) at the start of the pattern.
|
|
942
|
|
943 Pattern_Stack_Overflow : exception;
|
|
944 -- Exception raised if internal pattern matching stack overflows. This
|
|
945 -- is typically the result of runaway pattern recursion. If there is a
|
|
946 -- genuine case of stack overflow, then either the match must be broken
|
|
947 -- down into simpler steps, or the stack limit must be reset.
|
|
948
|
|
949 Stack_Size : constant Positive := 2000;
|
|
950 -- Size used for internal pattern matching stack. Increase this size if
|
|
951 -- complex patterns cause Pattern_Stack_Overflow to be raised.
|
|
952
|
|
953 -- Simple match functions. The subject is matched against the pattern.
|
|
954 -- Any immediate or deferred assignments or writes are executed, and
|
|
955 -- the returned value indicates whether or not the match succeeded.
|
|
956
|
|
957 function Match
|
|
958 (Subject : VString;
|
|
959 Pat : Pattern) return Boolean;
|
|
960
|
|
961 function Match
|
|
962 (Subject : VString;
|
|
963 Pat : PString) return Boolean;
|
|
964
|
|
965 function Match
|
|
966 (Subject : String;
|
|
967 Pat : Pattern) return Boolean;
|
|
968
|
|
969 function Match
|
|
970 (Subject : String;
|
|
971 Pat : PString) return Boolean;
|
|
972
|
|
973 -- Replacement functions. The subject is matched against the pattern.
|
|
974 -- Any immediate or deferred assignments or writes are executed, and
|
|
975 -- the returned value indicates whether or not the match succeeded.
|
|
976 -- If the match succeeds, then the matched part of the subject string
|
|
977 -- is replaced by the given Replace string.
|
|
978
|
|
979 function Match
|
|
980 (Subject : VString_Var;
|
|
981 Pat : Pattern;
|
|
982 Replace : VString) return Boolean;
|
|
983
|
|
984 function Match
|
|
985 (Subject : VString_Var;
|
|
986 Pat : PString;
|
|
987 Replace : VString) return Boolean;
|
|
988
|
|
989 function Match
|
|
990 (Subject : VString_Var;
|
|
991 Pat : Pattern;
|
|
992 Replace : String) return Boolean;
|
|
993
|
|
994 function Match
|
|
995 (Subject : VString_Var;
|
|
996 Pat : PString;
|
|
997 Replace : String) return Boolean;
|
|
998
|
|
999 -- Simple match procedures. The subject is matched against the pattern.
|
|
1000 -- Any immediate or deferred assignments or writes are executed. No
|
|
1001 -- indication of success or failure is returned.
|
|
1002
|
|
1003 procedure Match
|
|
1004 (Subject : VString;
|
|
1005 Pat : Pattern);
|
|
1006
|
|
1007 procedure Match
|
|
1008 (Subject : VString;
|
|
1009 Pat : PString);
|
|
1010
|
|
1011 procedure Match
|
|
1012 (Subject : String;
|
|
1013 Pat : Pattern);
|
|
1014
|
|
1015 procedure Match
|
|
1016 (Subject : String;
|
|
1017 Pat : PString);
|
|
1018
|
|
1019 -- Replacement procedures. The subject is matched against the pattern.
|
|
1020 -- Any immediate or deferred assignments or writes are executed. No
|
|
1021 -- indication of success or failure is returned. If the match succeeds,
|
|
1022 -- then the matched part of the subject string is replaced by the given
|
|
1023 -- Replace string.
|
|
1024
|
|
1025 procedure Match
|
|
1026 (Subject : in out VString;
|
|
1027 Pat : Pattern;
|
|
1028 Replace : VString);
|
|
1029
|
|
1030 procedure Match
|
|
1031 (Subject : in out VString;
|
|
1032 Pat : PString;
|
|
1033 Replace : VString);
|
|
1034
|
|
1035 procedure Match
|
|
1036 (Subject : in out VString;
|
|
1037 Pat : Pattern;
|
|
1038 Replace : String);
|
|
1039
|
|
1040 procedure Match
|
|
1041 (Subject : in out VString;
|
|
1042 Pat : PString;
|
|
1043 Replace : String);
|
|
1044
|
|
1045 -- Deferred Replacement
|
|
1046
|
|
1047 type Match_Result is private;
|
|
1048 -- Type used to record result of pattern match
|
|
1049
|
|
1050 subtype Match_Result_Var is Match_Result;
|
|
1051 -- This synonyms is used as a formal parameter type to a function where,
|
|
1052 -- if the language allowed, we would use an in out parameter, but we are
|
|
1053 -- not allowed to have in out parameters for functions. Instead we pass
|
|
1054 -- actuals which must be variables, and with a bit of trickery in the
|
|
1055 -- body, manage to interpret them properly as though they were indeed
|
|
1056 -- in out parameters.
|
|
1057
|
|
1058 function Match
|
|
1059 (Subject : VString_Var;
|
|
1060 Pat : Pattern;
|
|
1061 Result : Match_Result_Var) return Boolean;
|
|
1062
|
|
1063 procedure Match
|
|
1064 (Subject : in out VString;
|
|
1065 Pat : Pattern;
|
|
1066 Result : out Match_Result);
|
|
1067
|
|
1068 procedure Replace
|
|
1069 (Result : in out Match_Result;
|
|
1070 Replace : VString);
|
|
1071 -- Given a previous call to Match which set Result, performs a pattern
|
|
1072 -- replacement if the match was successful. Has no effect if the match
|
|
1073 -- failed. This call should immediately follow the Match call.
|
|
1074
|
|
1075 ------------------------
|
|
1076 -- Debugging Routines --
|
|
1077 ------------------------
|
|
1078
|
|
1079 -- Debugging pattern matching operations can often be quite complex,
|
|
1080 -- since there is no obvious way to trace the progress of the match.
|
|
1081 -- The declarations in this section provide some debugging assistance.
|
|
1082
|
|
1083 Debug_Mode : Boolean := False;
|
|
1084 -- This global variable can be set True to generate debugging on all
|
|
1085 -- subsequent calls to Match. The debugging output is a full trace of
|
|
1086 -- the actions of the pattern matcher, written to Standard_Output. The
|
|
1087 -- level of this information is intended to be comprehensible at the
|
|
1088 -- abstract level of this package declaration. However, note that the
|
|
1089 -- use of this switch often generates large amounts of output.
|
|
1090
|
|
1091 function "*" (P : Pattern; Fil : File_Access) return Pattern;
|
|
1092 function "*" (P : PString; Fil : File_Access) return Pattern;
|
|
1093 function "*" (P : PChar; Fil : File_Access) return Pattern;
|
|
1094 function "**" (P : Pattern; Fil : File_Access) return Pattern;
|
|
1095 function "**" (P : PString; Fil : File_Access) return Pattern;
|
|
1096 function "**" (P : PChar; Fil : File_Access) return Pattern;
|
|
1097 -- These are similar to the corresponding pattern assignment operations
|
|
1098 -- except that instead of setting the value of a variable, the matched
|
|
1099 -- substring is written to the appropriate file. This can be useful in
|
|
1100 -- following the progress of a match without generating the full amount
|
|
1101 -- of information obtained by setting Debug_Mode to True.
|
|
1102
|
|
1103 Terminal : constant File_Access := Standard_Error;
|
|
1104 Output : constant File_Access := Standard_Output;
|
|
1105 -- Two handy synonyms for use with the above pattern write operations
|
|
1106
|
|
1107 -- Finally we have some routines that are useful for determining what
|
|
1108 -- patterns are in use, particularly if they are constructed dynamically.
|
|
1109
|
|
1110 function Image (P : Pattern) return String;
|
|
1111 function Image (P : Pattern) return VString;
|
|
1112 -- This procedures yield strings that corresponds to the syntax needed
|
|
1113 -- to create the given pattern using the functions in this package. The
|
|
1114 -- form of this string is such that it could actually be compiled and
|
|
1115 -- evaluated to yield the required pattern except for references to
|
|
1116 -- variables and functions, which are output using one of the following
|
|
1117 -- forms:
|
|
1118 --
|
|
1119 -- access Natural NP(16#...#)
|
|
1120 -- access Pattern PP(16#...#)
|
|
1121 -- access VString VP(16#...#)
|
|
1122 --
|
|
1123 -- Natural_Func NF(16#...#)
|
|
1124 -- VString_Func VF(16#...#)
|
|
1125 --
|
|
1126 -- where 16#...# is the hex representation of the integer address that
|
|
1127 -- corresponds to the given access value
|
|
1128
|
|
1129 procedure Dump (P : Pattern);
|
|
1130 -- This procedure writes information about the pattern to Standard_Out.
|
|
1131 -- The format of this information is keyed to the internal data structures
|
|
1132 -- used to implement patterns. The information provided by Dump is thus
|
|
1133 -- more precise than that yielded by Image, but is also a bit more obscure
|
|
1134 -- (i.e. it cannot be interpreted solely in terms of this spec, you have
|
|
1135 -- to know something about the data structures).
|
|
1136
|
|
1137 ------------------
|
|
1138 -- Private Part --
|
|
1139 ------------------
|
|
1140
|
|
1141 private
|
|
1142 type PE;
|
|
1143 -- Pattern element, a pattern is a complex structure of PE's. This type
|
|
1144 -- is defined and described in the body of this package.
|
|
1145
|
|
1146 type PE_Ptr is access all PE;
|
|
1147 -- Pattern reference. PE's use PE_Ptr values to reference other PE's
|
|
1148
|
|
1149 type Pattern is new Controlled with record
|
|
1150 Stk : Natural := 0;
|
|
1151 -- Maximum number of stack entries required for matching this
|
|
1152 -- pattern. See description of pattern history stack in body.
|
|
1153
|
|
1154 P : PE_Ptr := null;
|
|
1155 -- Pointer to initial pattern element for pattern
|
|
1156 end record;
|
|
1157
|
|
1158 pragma Finalize_Storage_Only (Pattern);
|
|
1159
|
|
1160 procedure Adjust (Object : in out Pattern);
|
|
1161 -- Adjust routine used to copy pattern objects
|
|
1162
|
|
1163 procedure Finalize (Object : in out Pattern);
|
|
1164 -- Finalization routine used to release storage allocated for a pattern
|
|
1165
|
|
1166 type VString_Ptr is access all VString;
|
|
1167
|
|
1168 type Match_Result is record
|
|
1169 Var : VString_Ptr;
|
|
1170 -- Pointer to subject string. Set to null if match failed
|
|
1171
|
|
1172 Start : Natural := 1;
|
|
1173 -- Starting index position (1's origin) of matched section of
|
|
1174 -- subject string. Only valid if Var is non-null.
|
|
1175
|
|
1176 Stop : Natural := 0;
|
|
1177 -- Ending index position (1's origin) of matched section of
|
|
1178 -- subject string. Only valid if Var is non-null.
|
|
1179
|
|
1180 end record;
|
|
1181
|
|
1182 pragma Volatile (Match_Result);
|
|
1183 -- This ensures that the Result parameter is passed by reference, so
|
|
1184 -- that we can play our games with the bogus Match_Result_Var parameter
|
|
1185 -- in the function case to treat it as though it were an in out parameter.
|
|
1186
|
|
1187 end GNAT.Spitbol.Patterns;
|