pegs
Simple PEG (Parsing expression grammar) matching. Uses no memorization, but uses superoperators and symbol inlining to improve performance. Note: Matching performance is hopefully competitive with optimized regular expression engines.
PEG syntax and semantics
A PEG (Parsing expression grammar) is a simple deterministic grammar, that can be directly used for parsing. The current implementation has been designed as a more powerful replacement for regular expressions. UTF-8 is supported.
The notation used for a PEG is similar to that of EBNF:
notation | meaning |
---|---|
A / ... / Z |
Ordered choice: Apply expressions A , ..., Z , in this order, to the text ahead, until one of them succeeds and possibly consumes some text. Indicate success if one of expressions succeeded. Otherwise do not consume any text and indicate failure. |
A ... Z |
Sequence: Apply expressions A , ..., Z , in this order, to consume consecutive portions of the text ahead, as long as they succeed. Indicate success if all succeeded. Otherwise do not consume any text and indicate failure. The sequence's precedence is higher than that of ordered choice: A B / C means (A B) / Z and not A (B / Z) . |
(E) |
Grouping: Parenthesis can be used to change operator priority. |
{E} |
Capture: Apply expression E and store the substring that matched E into a capture that can be accessed after the matching process. |
$i |
Back reference to the i th capture. i counts from 1. |
$ |
Anchor: Matches at the end of the input. No character is consumed. Same as !. . |
^ |
Anchor: Matches at the start of the input. No character is consumed. |
&E |
And predicate: Indicate success if expression E matches the text ahead; otherwise indicate failure. Do not consume any text. |
!E |
Not predicate: Indicate failure if expression E matches the text ahead; otherwise indicate success. Do not consume any text. |
E+ |
One or more: Apply expression E repeatedly to match the text ahead, as long as it succeeds. Consume the matched text (if any) and indicate success if there was at least one match. Otherwise indicate failure. |
E* |
Zero or more: Apply expression E repeatedly to match the text ahead, as long as it succeeds. Consume the matched text (if any). Always indicate success. |
E? |
Zero or one: If expression E matches the text ahead, consume it. Always indicate success. |
[s] |
Character class: If the character ahead appears in the string s , consume it and indicate success. Otherwise indicate failure. |
[a-b] |
Character range: If the character ahead is one from the range a through b , consume it and indicate success. Otherwise indicate failure. |
's' |
String: If the text ahead is the string s , consume it and indicate success. Otherwise indicate failure. |
i's' |
String match ignoring case. |
y's' |
String match ignoring style. |
v's' |
Verbatim string match: Use this to override a global \i or \y modifier. |
i$j |
String match ignoring case for back reference. |
y$j |
String match ignoring style for back reference. |
v$j |
Verbatim string match for back reference. |
. |
Any character: If there is a character ahead, consume it and indicate success. Otherwise (that is, at the end of input) indicate failure. |
_ |
Any Unicode character: If there is an UTF-8 character ahead, consume it and indicate success. Otherwise indicate failure. |
@E |
Search: Shorthand for (!E .)* E . (Search loop for the pattern E .) |
{@} E |
Captured Search: Shorthand for {(!E .)*} E . (Search loop for the pattern E .) Everything until and exluding E is captured. |
@@ E |
Same as {@} E . |
A <- E |
Rule: Bind the expression E to the nonterminal symbol A . Left recursive rules are not possible and crash the matching engine.
|
\identifier |
Built-in macro for a longer expression. |
\ddd |
Character with decimal code ddd. |
\" , etc |
Literal " , etc. |
Built-in macros
macro | meaning |
---|---|
\d |
any decimal digit: [0-9]
|
\D |
any character that is not a decimal digit: [^0-9]
|
\s |
any whitespace character: [ \9-\13]
|
\S |
any character that is not a whitespace character: [^ \9-\13]
|
\w |
any "word" character: [a-zA-Z0-9_]
|
\W |
any "non-word" character: [^a-zA-Z0-9_]
|
\a |
same as [a-zA-Z]
|
\A |
same as [^a-zA-Z]
|
\n |
any newline combination: \10 / \13\10 / \13
|
\i |
ignore case for matching; use this at the start of the PEG |
\y |
ignore style for matching; use this at the start of the PEG |
\skip pat |
skip pattern pat before trying to match other tokens; this is useful for whitespace skipping, for example: \skip(\s*) {\ident} ':' {\ident} matches key value pairs ignoring whitespace around the ':' . |
\ident |
a standard ASCII identifier: [a-zA-Z_][a-zA-Z_0-9]*
|
\letter |
any Unicode letter |
\upper |
any Unicode uppercase letter |
\lower |
any Unicode lowercase letter |
\title |
any Unicode title letter |
\white |
any Unicode whitespace character |
A backslash followed by a letter is a built-in macro, otherwise it is used for ordinary escaping:
notation | meaning |
---|---|
\\ |
a single backslash |
\* |
same as '*'
|
\t |
not a tabulator, but an (unknown) built-in |
Supported PEG grammar
The PEG parser implements this grammar (written in PEG syntax):
# Example grammar of PEG in PEG syntax. # Comments start with '#'. # First symbol is the start symbol. grammar <- rule* / expr identifier <- [A-Za-z][A-Za-z0-9_]* charsetchar <- "\\" . / [^\]] charset <- "[" "^"? (charsetchar ("-" charsetchar)?)+ "]" stringlit <- identifier? ("\"" ("\\" . / [^"])* "\"" / "'" ("\\" . / [^'])* "'") builtin <- "\\" identifier / [^\13\10] comment <- '#' @ \n ig <- (\s / comment)* # things to ignore rule <- identifier \s* "<-" expr ig identNoArrow <- identifier !(\s* "<-") prefixOpr <- ig '&' / ig '!' / ig '@' / ig '{@}' / ig '@@' literal <- ig identifier? '$' [0-9]+ / '$' / '^' / ig identNoArrow / ig charset / ig stringlit / ig builtin / ig '.' / ig '_' / (ig "(" expr ig ")") postfixOpr <- ig '?' / ig '*' / ig '+' primary <- prefixOpr* (literal postfixOpr*) # Concatenation has higher priority than choice: # ``a b / c`` means ``(a b) / c`` seqExpr <- primary+ expr <- seqExpr (ig "/" expr)*
Note: As a special syntactic extension if the whole PEG is only a single expression, identifiers are not interpreted as non-terminals, but are interpreted as verbatim string:
abc =~ peg"abc" # is true
So it is not necessary to write peg" 'abc' "
in the above example.
Examples
Check if s
matches Nim's "while" keyword:
s =~ peg" y'while'"
Exchange (key, val)-pairs:
"key: val; key2: val2".replacef(peg"{\ident} \s* ':' \s* {\ident}", "$2: $1")
Determine the #include
'ed files of a C file:
for line in lines("myfile.c"): if line =~ peg"""s <- ws '#include' ws '"' {[^"]+} '"' ws comment <- '/*' @ '*/' / '//' .* ws <- (comment / \s+)* """: echo matches[0]
PEG vs regular expression
As a regular expression \[.*\]
matches the longest possible text between '['
and ']'
. As a PEG it never matches anything, because a PEG is deterministic: .*
consumes the rest of the input, so \]
never matches. As a PEG this needs to be written as: \[ ( !\] . )* \]
(or \[ @ \]
).
Note that the regular expression does not behave as intended either: in the example *
should not be greedy, so \[.*?\]
should be used instead.
PEG construction
There are two ways to construct a PEG in Nim code:
- Parsing a string into an AST which consists of
Peg
nodes with thepeg
proc. - Constructing the AST directly with proc calls. This method does not support constructing rules, only simple expressions and is not as convenient. Its only advantage is that it does not pull in the whole PEG parser into your executable.
Imports
Types
PegKind = enum pkEmpty, pkAny, ## any character (.) pkAnyRune, ## any Unicode character (_) pkNewLine, ## CR-LF, LF, CR pkLetter, ## Unicode letter pkLower, ## Unicode lower case letter pkUpper, ## Unicode upper case letter pkTitle, ## Unicode title character pkWhitespace, ## Unicode whitespace character pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle, pkChar, ## single character to match pkCharChoice, pkNonTerminal, pkSequence, ## a b c ... --> Internal DSL: peg(a, b, c) pkOrderedChoice, ## a / b / ... --> Internal DSL: a / b or /[a, b, c] pkGreedyRep, ## a* --> Internal DSL: *a ## a+ --> (a a*) pkGreedyRepChar, ## x* where x is a single character (superop) pkGreedyRepSet, ## [set]* (superop) pkGreedyAny, ## .* or _* (superop) pkOption, ## a? --> Internal DSL: ?a pkAndPredicate, ## &a --> Internal DSL: &a pkNotPredicate, ## !a --> Internal DSL: !a pkCapture, ## {a} --> Internal DSL: capture(a) pkBackRef, ## $i --> Internal DSL: backref(i) pkBackRefIgnoreCase, pkBackRefIgnoreStyle, pkSearch, ## @a --> Internal DSL: !*a pkCapturedSearch, ## {@} a --> Internal DSL: !*\a pkRule, ## a <- b pkList, ## a, b pkStartAnchor ## ^ --> Internal DSL: startAnchor()
- Source Edit
NonTerminalFlag = enum ntDeclared, ntUsed
- Source Edit
Peg {...}{.shallow.} = object case kind: PegKind of pkEmpty .. pkWhitespace: nil of pkTerminal, pkTerminalIgnoreCase, pkTerminalIgnoreStyle: term: string of pkChar, pkGreedyRepChar: ch: char of pkCharChoice, pkGreedyRepSet: charChoice: ref set[char] of pkNonTerminal: nt: NonTerminal of pkBackRef .. pkBackRefIgnoreStyle: index: range[0 .. MaxSubpatterns] else: sons: seq[Peg]
- type that represents a PEG Source Edit
NonTerminal = ref NonTerminalObj
- Source Edit
Captures = object matches: array[0 .. 20 - 1, tuple[first, last: int]] ml: int origStart: int
- contains the captured substrings. Source Edit
EInvalidPeg = object of ValueError
- raised if an invalid PEG has been detected Source Edit
Consts
MaxSubpatterns = 20
- defines the maximum number of subpatterns that can be captured. More subpatterns cannot be captured! Source Edit
Procs
proc kind(p: Peg): PegKind {...}{.raises: [], tags: [].}
- Returns the PegKind of a given Peg object. Source Edit
proc term(p: Peg): string {...}{.raises: [], tags: [].}
- Returns the string representation of a given Peg variant object where present. Source Edit
proc ch(p: Peg): char {...}{.raises: [], tags: [].}
- Returns the char representation of a given Peg variant object where present. Source Edit
proc charChoice(p: Peg): ref set[char] {...}{.raises: [], tags: [].}
- Returns the charChoice field of a given Peg variant object where present. Source Edit
proc nt(p: Peg): NonTerminal {...}{.raises: [], tags: [].}
- Returns the NonTerminal object of a given Peg variant object where present. Source Edit
proc index(p: Peg): range[0 .. MaxSubpatterns] {...}{.raises: [], tags: [].}
- Returns the back-reference index of a captured sub-pattern in the Captures object for a given Peg variant object where present. Source Edit
proc name(nt: NonTerminal): string {...}{.raises: [], tags: [].}
- Gets the name of the symbol represented by the parent Peg object variant of a given NonTerminal. Source Edit
proc line(nt: NonTerminal): int {...}{.raises: [], tags: [].}
- Gets the line number of the definition of the parent Peg object variant of a given NonTerminal. Source Edit
proc col(nt: NonTerminal): int {...}{.raises: [], tags: [].}
- Gets the column number of the definition of the parent Peg object variant of a given NonTerminal. Source Edit
proc flags(nt: NonTerminal): set[NonTerminalFlag] {...}{.raises: [], tags: [].}
- Gets the NonTerminalFlag-typed flags field of the parent Peg variant object of a given NonTerminal. Source Edit
proc rule(nt: NonTerminal): Peg {...}{.raises: [], tags: [].}
- Gets the Peg object representing the rule definition of the parent Peg object variant of a given NonTerminal. Source Edit
proc term(t: string): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1Str", raises: [], tags: [].}
- constructs a PEG from a terminal string Source Edit
proc termIgnoreCase(t: string): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a PEG from a terminal string; ignore case for matching Source Edit
proc termIgnoreStyle(t: string): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a PEG from a terminal string; ignore style for matching Source Edit
proc term(t: char): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1Char", raises: [], tags: [].}
- constructs a PEG from a terminal char Source Edit
proc charSet(s: set[char]): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a PEG from a character set
s
Source Edit proc `/`(a: varargs[Peg]): Peg {...}{.noSideEffect, gcsafe, extern: "npegsOrderedChoice", raises: [], tags: [].}
- constructs an ordered choice with the PEGs in
a
Source Edit proc sequence(a: varargs[Peg]): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a sequence with all the PEGs from
a
Source Edit proc `?`(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npegsOptional", raises: [], tags: [].}
- constructs an optional for the PEG
a
Source Edit proc `*`(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npegsGreedyRep", raises: [], tags: [].}
- constructs a "greedy repetition" for the PEG
a
Source Edit proc `!*`(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npegsSearch", raises: [], tags: [].}
- constructs a "search" for the PEG
a
Source Edit proc `!*\`(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npgegsCapturedSearch", raises: [], tags: [].}
- constructs a "captured search" for the PEG
a
Source Edit proc `+`(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npegsGreedyPosRep", raises: [], tags: [].}
- constructs a "greedy positive repetition" with the PEG
a
Source Edit proc `&`(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npegsAndPredicate", raises: [], tags: [].}
- constructs an "and predicate" with the PEG
a
Source Edit proc `!`(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npegsNotPredicate", raises: [], tags: [].}
- constructs a "not predicate" with the PEG
a
Source Edit proc any(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG any character (
.
) Source Edit proc anyRune(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG any rune (
_
) Source Edit proc newLine(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG newline (
\n
) Source Edit proc unicodeLetter(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG
\letter
which matches any Unicode letter. Source Edit proc unicodeLower(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG
\lower
which matches any Unicode lowercase letter. Source Edit proc unicodeUpper(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG
\upper
which matches any Unicode uppercase letter. Source Edit proc unicodeTitle(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG
\title
which matches any Unicode title letter. Source Edit proc unicodeWhitespace(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG
\white
which matches any Unicode whitespace character. Source Edit proc startAnchor(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG
^
which matches the start of the input. Source Edit proc endAnchor(): Peg {...}{.inline, raises: [], tags: [].}
- constructs the PEG
$
which matches the end of the input. Source Edit proc capture(a: Peg): Peg {...}{.noSideEffect, gcsafe, extern: "npegsCapture", raises: [], tags: [].}
- constructs a capture with the PEG
a
Source Edit proc backref(index: range[1 .. MaxSubpatterns]): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a back reference of the given
index
.index
starts counting from 1. Source Edit proc backrefIgnoreCase(index: range[1 .. MaxSubpatterns]): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a back reference of the given
index
.index
starts counting from 1. Ignores case for matching. Source Edit proc backrefIgnoreStyle(index: range[1 .. MaxSubpatterns]): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a back reference of the given
index
.index
starts counting from 1. Ignores style for matching. Source Edit proc nonterminal(n: NonTerminal): Peg {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a PEG that consists of the nonterminal symbol Source Edit
proc newNonTerminal(name: string; line, column: int): NonTerminal {...}{. noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- constructs a nonterminal symbol Source Edit
proc `$`(r: Peg): string {...}{.noSideEffect, gcsafe, extern: "npegsToString", raises: [], tags: [].}
- converts a PEG to its string representation Source Edit
proc bounds(c: Captures; i: range[0 .. 20 - 1]): tuple[first, last: int] {...}{. raises: [], tags: [].}
- returns the bounds
[first..last]
of thei
'th capture. Source Edit proc rawMatch(s: string; p: Peg; start: int; c: var Captures): int {...}{. noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- low-level matching proc that implements the PEG interpreter. Use this for maximum efficiency (every other PEG operation ends up calling this proc). Returns -1 if it does not match, else the length of the match Source Edit
proc matchLen(s: string; pattern: Peg; matches: var openArray[string]; start = 0): int {...}{. noSideEffect, gcsafe, extern: "npegs$1Capture", raises: [], tags: [].}
- the same as
match
, but it returns the length of the match, if there is no match, -1 is returned. Note that a match length of zero can happen. It's possible that a suffix ofs
remains that does not belong to the match. Source Edit proc matchLen(s: string; pattern: Peg; start = 0): int {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- the same as
match
, but it returns the length of the match, if there is no match, -1 is returned. Note that a match length of zero can happen. It's possible that a suffix ofs
remains that does not belong to the match. Source Edit proc match(s: string; pattern: Peg; matches: var openArray[string]; start = 0): bool {...}{. noSideEffect, gcsafe, extern: "npegs$1Capture", raises: [], tags: [].}
- returns
true
ifs[start..]
matches thepattern
and the captured substrings in the arraymatches
. If it does not match, nothing is written intomatches
andfalse
is returned. Source Edit proc match(s: string; pattern: Peg; start = 0): bool {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- returns
true
ifs
matches thepattern
beginning fromstart
. Source Edit proc find(s: string; pattern: Peg; matches: var openArray[string]; start = 0): int {...}{. noSideEffect, gcsafe, extern: "npegs$1Capture", raises: [], tags: [].}
- returns the starting position of
pattern
ins
and the captured substrings in the arraymatches
. If it does not match, nothing is written intomatches
and -1 is returned. Source Edit proc findBounds(s: string; pattern: Peg; matches: var openArray[string]; start = 0): tuple[first, last: int] {...}{.noSideEffect, gcsafe, extern: "npegs$1Capture", raises: [], tags: [].}
- returns the starting position and end position of
pattern
ins
and the captured substrings in the arraymatches
. If it does not match, nothing is written intomatches
and (-1,0) is returned. Source Edit proc find(s: string; pattern: Peg; start = 0): int {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- returns the starting position of
pattern
ins
. If it does not match, -1 is returned. Source Edit proc findAll(s: string; pattern: Peg; start = 0): seq[string] {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- returns all matching substrings of
s
that matchpattern
. If it does not match, @[] is returned. Source Edit proc contains(s: string; pattern: Peg; start = 0): bool {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- same as
find(s, pattern, start) >= 0
Source Edit proc contains(s: string; pattern: Peg; matches: var openArray[string]; start = 0): bool {...}{. noSideEffect, gcsafe, extern: "npegs$1Capture", raises: [], tags: [].}
- same as
find(s, pattern, matches, start) >= 0
Source Edit proc startsWith(s: string; prefix: Peg; start = 0): bool {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- returns true if
s
starts with the patternprefix
Source Edit proc endsWith(s: string; suffix: Peg; start = 0): bool {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- returns true if
s
ends with the patternsuffix
Source Edit proc replacef(s: string; sub: Peg; by: string): string {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [ValueError], tags: [].}
- Replaces
sub
ins
by the stringby
. Captures can be accessed inby
with the notation$i
and$#
(see strutils.`%`). Examples:"var1=key; var2=key2".replacef(peg"{\ident}'='{\ident}", "$1<-$2$2")
Results in:
"var1<-keykey; val2<-key2key2"
Source Edit proc replace(s: string; sub: Peg; by = ""): string {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- Replaces
sub
ins
by the stringby
. Captures cannot be accessed inby
. Source Edit proc parallelReplace(s: string; subs: varargs[tuple[pattern: Peg, repl: string]]): string {...}{. noSideEffect, gcsafe, extern: "npegs$1", raises: [ValueError], tags: [].}
- Returns a modified copy of
s
with the substitutions insubs
applied in parallel. Source Edit proc replace(s: string; sub: Peg; cb: proc (match: int; cnt: int; caps: openArray[string]): string): string {...}{. gcsafe, extern: "npegs$1cb", raises: [], tags: [].}
- Replaces
sub
ins
by the resulting strings from the callback. The callback proc receives the index of the current match (starting with 0), the count of captures and an open array with the captures of each match. Examples:proc handleMatches*(m: int, n: int, c: openArray[string]): string = result = "" if m > 0: result.add ", " result.add case n: of 2: c[0].toLower & ": '" & c[1] & "'" of 1: c[0].toLower & ": ''" else: "" let s = "Var1=key1;var2=Key2; VAR3" echo s.replace(peg"{\ident}('='{\ident})* ';'* \s*", handleMatches)
Results in:
"var1: 'key1', var2: 'Key2', var3: ''"
Source Edit proc transformFile(infile, outfile: string; subs: varargs[tuple[pattern: Peg, repl: string]]) {...}{.gcsafe, extern: "npegs$1", raises: [IOError, ValueError], tags: [ReadIOEffect, WriteIOEffect].}
-
reads in the file
infile
, performs a parallel replacement (callsparallelReplace
) and writes back tooutfile
. RaisesIOError
if an error occurs. This is supposed to be used for quick scripting.Note: this proc does not exist while using the JS backend.
Source Edit proc split(s: string; sep: Peg): seq[string] {...}{.noSideEffect, gcsafe, extern: "npegs$1", raises: [], tags: [].}
- Splits the string
s
into substrings. Source Edit proc parsePeg(pattern: string; filename = "pattern"; line = 1; col = 0): Peg {...}{. raises: [ValueError, EInvalidPeg, Exception], tags: [RootEffect].}
- constructs a Peg object from
pattern
.filename
,line
,col
are used for error messages, but they only provide start offsets.parsePeg
keeps track of line and column numbers withinpattern
. Source Edit proc peg(pattern: string): Peg {...}{.raises: [ValueError, EInvalidPeg, Exception], tags: [RootEffect].}
- constructs a Peg object from the
pattern
. The short name has been chosen to encourage its use as a raw string modifier:peg"{\ident} \s* '=' \s* {.*}"
Source Edit proc escapePeg(s: string): string {...}{.raises: [], tags: [].}
- escapes
s
so that it is matched verbatim when used as a peg. Source Edit
Iterators
iterator items(p: Peg): Peg {...}{.inline, raises: [], tags: [].}
- Yields the child nodes of a Peg variant object where present. Source Edit
iterator pairs(p: Peg): (int, Peg) {...}{.inline, raises: [], tags: [].}
- Yields the indices and child nodes of a Peg variant object where present. Source Edit
iterator findAll(s: string; pattern: Peg; start = 0): string {...}{.raises: [], tags: [].}
- yields all matching substrings of
s
that matchpattern
. Source Edit iterator split(s: string; sep: Peg): string {...}{.raises: [], tags: [].}
-
Splits the string
s
into substrings.Substrings are separated by the PEG
sep
. Examples:for word in split("00232this02939is39an22example111", peg"\d+"): writeLine(stdout, word)
Results in:
"this" "is" "an" "example"
Source Edit
Templates
template letters(): Peg
- expands to
charset({'A'..'Z', 'a'..'z'})
Source Edit template digits(): Peg
- expands to
charset({'0'..'9'})
Source Edit template whitespace(): Peg
- expands to
charset({' ', '\9'..'\13'})
Source Edit template identChars(): Peg
- expands to
charset({'a'..'z', 'A'..'Z', '0'..'9', '_'})
Source Edit template identStartChars(): Peg
- expands to
charset({'A'..'Z', 'a'..'z', '_'})
Source Edit template ident(): Peg
- same as
[a-zA-Z_][a-zA-z_0-9]*
; standard identifier Source Edit template natural(): Peg
- same as
\d+
Source Edit template eventParser(pegAst, handlers: untyped): (proc (s: string): int)
- Generates an interpreting event parser proc according to the specified PEG AST and handler code blocks. The proc can be called with a string to be parsed and will execute the handler code blocks whenever their associated grammar element is matched. It returns -1 if the string does not match, else the length of the total match. The following example code evaluates an arithmetic expression defined by a simple PEG:
import strutils, pegs let pegAst = """ Expr <- Sum Sum <- Product (('+' / '-')Product)* Product <- Value (('*' / '/')Value)* Value <- [0-9]+ / '(' Expr ')' """.peg txt = "(5+3)/2-7*22" var pStack: seq[string] = @[] valStack: seq[float] = @[] opStack = "" let parseArithExpr = pegAst.eventParser: pkNonTerminal: enter: pStack.add p.nt.name leave: pStack.setLen pStack.high if length > 0: let matchStr = s.substr(start, start+length-1) case p.nt.name of "Value": try: valStack.add matchStr.parseFloat echo valStack except ValueError: discard of "Sum", "Product": try: let val = matchStr.parseFloat except ValueError: if valStack.len > 1 and opStack.len > 0: valStack[^2] = case opStack[^1] of '+': valStack[^2] + valStack[^1] of '-': valStack[^2] - valStack[^1] of '*': valStack[^2] * valStack[^1] else: valStack[^2] / valStack[^1] valStack.setLen valStack.high echo valStack opStack.setLen opStack.high echo opStack pkChar: leave: if length == 1 and "Value" != pStack[^1]: let matchChar = s[start] opStack.add matchChar echo opStack let pLen = parseArithExpr(txt)
The handlers parameter consists of code blocks for PegKinds, which define the grammar elements of interest. Each block can contain handler code to be executed when the parser enters and leaves text matching the grammar element. An enter handler can access the specific PEG AST node being matched as p, the entire parsed string as s and the position of the matched text segment in s as start. A leave handler can access p, s, start and also the length of the matched text segment as length. For an unsuccessful match, the enter and leave handlers will be executed, with length set to -1.
Symbols declared in an enter handler can be made visible in the corresponding leave handler by annotating them with an inject pragma.
Source Edit template `=~`(s: string; pattern: Peg): bool
- This calls
match
with an implicit declaredmatches
array that can be used in the scope of the=~
call:if line =~ peg"\s* {\w+} \s* '=' \s* {\w+}": # matches a key=value pair: echo("Key: ", matches[0]) echo("Value: ", matches[1]) elif line =~ peg"\s*{'#'.*}": # matches a comment # note that the implicit ``matches`` array is different from the # ``matches`` array of the first branch echo("comment: ", matches[0]) else: echo("syntax error")
Source Edit
Exports
© 2006–2021 Andreas Rumpf
Licensed under the MIT License.
https://nim-lang.org/docs/pegs.html