Text.Parsec
Copyright | (c) Daan Leijen 1999-2001 (c) Paolo Martini 2007 |
---|---|
License | BSD-style (see the LICENSE file) |
Maintainer | [email protected] |
Stability | provisional |
Portability | portable |
Safe Haskell | Safe |
Language | Haskell2010 |
Description
This module includes everything you need to get started writing a parser.
By default this module is set up to parse character data. If you'd like to parse the result of your own tokenizer you should start with the following imports:
import Text.Parsec.Prim import Text.Parsec.Combinator
Then you can implement your own version of satisfy
on top of the tokenPrim
primitive.
Parsers
ParserT monad transformer and Parser type
ParsecT s u m a
is a parser with stream type s
, user state type u
, underlying monad m
and return type a
. Parsec is strict in the user state. If this is undesirable, simply use a data type like data Box a = Box a
and the state type Box YourStateType
to add a level of indirection.
Instances
MonadState s m => MonadState s (ParsecT s' u m) | |
MonadReader r m => MonadReader r (ParsecT s u m) | |
MonadError e m => MonadError e (ParsecT s u m) | |
Defined in Text.Parsec.Prim MethodsthrowError :: e -> ParsecT s u m a Source catchError :: ParsecT s u m a -> (e -> ParsecT s u m a) -> ParsecT s u m a Source | |
MonadTrans (ParsecT s u) | |
Defined in Text.Parsec.Prim | |
Monad (ParsecT s u m) | |
Functor (ParsecT s u m) | |
MonadFail (ParsecT s u m) | Since: parsec-3.1.12.0 |
Defined in Text.Parsec.Prim | |
Applicative (ParsecT s u m) | |
Defined in Text.Parsec.Prim Methodspure :: a -> ParsecT s u m a Source (<*>) :: ParsecT s u m (a -> b) -> ParsecT s u m a -> ParsecT s u m b Source liftA2 :: (a -> b -> c) -> ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m c Source (*>) :: ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m b Source (<*) :: ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m a Source | |
MonadIO m => MonadIO (ParsecT s u m) | |
Defined in Text.Parsec.Prim | |
Alternative (ParsecT s u m) | |
MonadPlus (ParsecT s u m) | |
MonadCont m => MonadCont (ParsecT s u m) | |
Semigroup a => Semigroup (ParsecT s u m a) |
The (many $ char a) <> (many $ char b) The above will parse a string like (many $ char a) >> (many $ char b) (many $ char a) *> (many $ char b) Since: parsec-3.1.12 |
(Monoid a, Semigroup (ParsecT s u m a)) => Monoid (ParsecT s u m a) |
The Since: parsec-3.1.12 |
type Parsec s u = ParsecT s u Identity Source
Arguments
:: Stream s Identity t | |
=> (t -> String) | Token pretty-printing function. |
-> (t -> SourcePos) | Computes the position of a token. |
-> (t -> Maybe a) | Matching function for the token to parse. |
-> Parsec s u a |
The parser token showTok posFromTok testTok
accepts a token t
with result x
when the function testTok t
returns Just x
. The source position of the t
should be returned by posFromTok t
and the token can be shown using showTok t
.
This combinator is expressed in terms of tokenPrim
. It is used to accept user defined token streams. For example, suppose that we have a stream of basic tokens tupled with source positions. We can then define a parser that accepts single tokens as:
mytoken x = token showTok posFromTok testTok where showTok (pos,t) = show t posFromTok (pos,t) = pos testTok (pos,t) = if x == t then Just t else Nothing
tokens :: (Stream s m t, Eq t) => ([t] -> String) -> (SourcePos -> [t] -> SourcePos) -> [t] -> ParsecT s u m [t] Source
runParserT :: Stream s m t => ParsecT s u m a -> u -> SourceName -> s -> m (Either ParseError a) Source
The most general way to run a parser. runParserT p state filePath
input
runs parser p
on the input list of tokens input
, obtained from source filePath
with the initial user state st
. The filePath
is only used in error messages and may be the empty string. Returns a computation in the underlying monad m
that return either a ParseError
(Left
) or a value of type a
(Right
).
runParser :: Stream s Identity t => Parsec s u a -> u -> SourceName -> s -> Either ParseError a Source
The most general way to run a parser over the Identity monad. runParser p state filePath
input
runs parser p
on the input list of tokens input
, obtained from source filePath
with the initial user state st
. The filePath
is only used in error messages and may be the empty string. Returns either a ParseError
(Left
) or a value of type a
(Right
).
parseFromFile p fname = do{ input <- readFile fname ; return (runParser p () fname input) }
parse :: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a Source
parse p filePath input
runs a parser p
over Identity without user state. The filePath
is only used in error messages and may be the empty string. Returns either a ParseError
(Left
) or a value of type a
(Right
).
main = case (parse numbers "" "11, 2, 43") of Left err -> print err Right xs -> print (sum xs) numbers = commaSep integer
parseTest :: (Stream s Identity t, Show a) => Parsec s () a -> s -> IO () Source
The expression parseTest p input
applies a parser p
against input input
and prints the result to stdout. Used for testing parsers.
getPosition :: Monad m => ParsecT s u m SourcePos Source
Returns the current source position. See also SourcePos
.
getInput :: Monad m => ParsecT s u m s Source
Returns the current input
getState :: Monad m => ParsecT s u m u Source
Returns the current user state.
putState :: Monad m => u -> ParsecT s u m () Source
putState st
set the user state to st
.
modifyState :: Monad m => (u -> u) -> ParsecT s u m () Source
modifyState f
applies function f
to the user state. Suppose that we want to count identifiers in a source, we could use the user state as:
expr = do{ x <- identifier ; modifyState (+1) ; return (Id x) }
Combinators
(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a infixr 1 Source
This combinator implements choice. The parser p <|> q
first applies p
. If it succeeds, the value of p
is returned. If p
fails without consuming any input, parser q
is tried. This combinator is defined equal to the mplus
member of the MonadPlus
class and the (<|>
) member of Alternative
.
The parser is called predictive since q
is only tried when parser p
didn't consume any input (i.e.. the look ahead is 1). This non-backtracking behaviour allows for both an efficient implementation of the parser combinators and the generation of good error messages.
(<?>) :: ParsecT s u m a -> String -> ParsecT s u m a infix 0 Source
The parser p <?> msg
behaves as parser p
, but whenever the parser p
fails without consuming any input, it replaces expect error messages with the expect error message msg
.
This is normally used at the end of a set alternatives where we want to return an error message in terms of a higher level construct rather than returning all possible characters. For example, if the expr
parser from the try
example would fail, the error message is: '...: expecting expression'. Without the (<?>)
combinator, the message would be like '...: expecting "let" or letter', which is less friendly.
label :: ParsecT s u m a -> String -> ParsecT s u m a Source
A synonym for <?>
, but as a function instead of an operator.
labels :: ParsecT s u m a -> [String] -> ParsecT s u m a Source
try :: ParsecT s u m a -> ParsecT s u m a Source
The parser try p
behaves like parser p
, except that it pretends that it hasn't consumed any input when an error occurs.
This combinator is used whenever arbitrary look ahead is needed. Since it pretends that it hasn't consumed any input when p
fails, the (<|>
) combinator will try its second alternative even when the first parser failed while consuming input.
The try
combinator can for example be used to distinguish identifiers and reserved words. Both reserved words and identifiers are a sequence of letters. Whenever we expect a certain reserved word where we can also expect an identifier we have to use the try
combinator. Suppose we write:
expr = letExpr <|> identifier <?> "expression" letExpr = do{ string "let"; ... } identifier = many1 letter
If the user writes "lexical", the parser fails with: unexpected
'x', expecting 't' in "let"
. Indeed, since the (<|>
) combinator only tries alternatives when the first alternative hasn't consumed input, the identifier
parser is never tried (because the prefix "le" of the string "let"
parser is already consumed). The right behaviour can be obtained by adding the try
combinator:
expr = letExpr <|> identifier <?> "expression" letExpr = do{ try (string "let"); ... } identifier = many1 letter
unexpected :: Stream s m t => String -> ParsecT s u m a Source
The parser unexpected msg
always fails with an unexpected error message msg
without consuming any input.
The parsers fail
, (<?>
) and unexpected
are the three parsers used to generate error messages. Of these, only (<?>
) is commonly used. For an example of the use of unexpected
, see the definition of notFollowedBy
.
choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a Source
choice ps
tries to apply the parsers in the list ps
in order, until one of them succeeds. Returns the value of the succeeding parser.
many :: ParsecT s u m a -> ParsecT s u m [a] Source
many p
applies the parser p
zero or more times. Returns a list of the returned values of p
.
identifier = do{ c <- letter ; cs <- many (alphaNum <|> char '_') ; return (c:cs) }
many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a] Source
many1 p
applies the parser p
one or more times. Returns a list of the returned values of p
.
word = many1 letter
skipMany :: ParsecT s u m a -> ParsecT s u m () Source
skipMany p
applies the parser p
zero or more times, skipping its result.
spaces = skipMany space
skipMany1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source
skipMany1 p
applies the parser p
one or more times, skipping its result.
count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a] Source
count n p
parses n
occurrences of p
. If n
is smaller or equal to zero, the parser equals to return []
. Returns a list of n
values returned by p
.
between :: Stream s m t => ParsecT s u m open -> ParsecT s u m close -> ParsecT s u m a -> ParsecT s u m a Source
between open close p
parses open
, followed by p
and close
. Returns the value returned by p
.
braces = between (symbol "{") (symbol "}")
option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a Source
option x p
tries to apply parser p
. If p
fails without consuming input, it returns the value x
, otherwise the value returned by p
.
priority = option 0 (do{ d <- digit ; return (digitToInt d) })
optionMaybe :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a) Source
optionMaybe p
tries to apply parser p
. If p
fails without consuming input, it return Nothing
, otherwise it returns Just
the value returned by p
.
optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source
optional p
tries to apply parser p
. It will parse p
or nothing. It only fails if p
fails after consuming input. It discards the result of p
.
sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source
sepBy p sep
parses zero or more occurrences of p
, separated by sep
. Returns a list of values returned by p
.
commaSep p = p `sepBy` (symbol ",")
sepBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source
sepBy1 p sep
parses one or more occurrences of p
, separated by sep
. Returns a list of values returned by p
.
endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source
endBy p sep
parses zero or more occurrences of p
, separated and ended by sep
. Returns a list of values returned by p
.
cStatements = cStatement `endBy` semi
endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source
endBy1 p sep
parses one or more occurrences of p
, separated and ended by sep
. Returns a list of values returned by p
.
sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source
sepEndBy p sep
parses zero or more occurrences of p
, separated and optionally ended by sep
, ie. haskell style statements. Returns a list of values returned by p
.
haskellStatements = haskellStatement `sepEndBy` semi
sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source
sepEndBy1 p sep
parses one or more occurrences of p
, separated and optionally ended by sep
. Returns a list of values returned by p
.
chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a Source
chainl p op x
parses zero or more occurrences of p
, separated by op
. Returns a value obtained by a left associative application of all functions returned by op
to the values returned by p
. If there are zero occurrences of p
, the value x
is returned.
chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a Source
chainl1 p op
parses one or more occurrences of p
, separated by op
Returns a value obtained by a left associative application of all functions returned by op
to the values returned by p
. This parser can for example be used to eliminate left recursion which typically occurs in expression grammars.
expr = term `chainl1` addop term = factor `chainl1` mulop factor = parens expr <|> integer mulop = do{ symbol "*"; return (*) } <|> do{ symbol "/"; return (div) } addop = do{ symbol "+"; return (+) } <|> do{ symbol "-"; return (-) }
chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a Source
chainr p op x
parses zero or more occurrences of p
, separated by op
Returns a value obtained by a right associative application of all functions returned by op
to the values returned by p
. If there are no occurrences of p
, the value x
is returned.
chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a Source
chainr1 p op x
parses one or more occurrences of |p|, separated by op
Returns a value obtained by a right associative application of all functions returned by op
to the values returned by p
.
eof :: (Stream s m t, Show t) => ParsecT s u m () Source
This parser only succeeds at the end of the input. This is not a primitive parser but it is defined using notFollowedBy
.
eof = notFollowedBy anyToken <?> "end of input"
notFollowedBy :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m () Source
notFollowedBy p
only succeeds when parser p
fails. This parser does not consume any input. This parser can be used to implement the 'longest match' rule. For example, when recognizing keywords (for example let
), we want to make sure that a keyword is not followed by a legal identifier character, in which case the keyword is actually an identifier (for example lets
). We can program this behaviour as follows:
keywordLet = try (do{ string "let" ; notFollowedBy alphaNum })
NOTE: Currently, notFollowedBy
exhibits surprising behaviour when applied to a parser p
that doesn't consume any input; specifically
-
notFollowedBy . notFollowedBy
is not equivalent tolookAhead
, and -
notFollowedBy eof
never fails.
See haskell/parsec#8 for more details.
manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a] Source
manyTill p end
applies parser p
zero or more times until parser end
succeeds. Returns the list of values returned by p
. This parser can be used to scan comments:
simpleComment = do{ string "<!--" ; manyTill anyChar (try (string "-->")) }
Note the overlapping parsers anyChar
and string "-->"
, and therefore the use of the try
combinator.
lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a Source
lookAhead p
parses p
without consuming any input.
If p
fails and consumes some input, so does lookAhead
. Combine with try
if this is undesirable.
anyToken :: (Stream s m t, Show t) => ParsecT s u m t Source
The parser anyToken
accepts any kind of token. It is for example used to implement eof
. Returns the accepted token.
Character Parsing
module Text.Parsec.Char
Error messages
data ParseError Source
The abstract data type ParseError
represents parse errors. It provides the source position (SourcePos
) of the error and a list of error messages (Message
). A ParseError
can be returned by the function parse
. ParseError
is an instance of the Show
and Eq
classes.
Instances
Eq ParseError | |
Defined in Text.Parsec.Error | |
Show ParseError | |
Defined in Text.Parsec.Error MethodsshowsPrec :: Int -> ParseError -> ShowS Source show :: ParseError -> String Source showList :: [ParseError] -> ShowS Source |
errorPos :: ParseError -> SourcePos Source
Extracts the source position from the parse error
Position
The abstract data type SourcePos
represents source positions. It contains the name of the source (i.e. file name), a line number and a column number. SourcePos
is an instance of the Show
, Eq
and Ord
class.
Instances
Eq SourcePos | |
Data SourcePos | |
Defined in Text.Parsec.Pos Methodsgfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> SourcePos -> c SourcePos Source gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c SourcePos Source toConstr :: SourcePos -> Constr Source dataTypeOf :: SourcePos -> DataType Source dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c SourcePos) Source dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c SourcePos) Source gmapT :: (forall b. Data b => b -> b) -> SourcePos -> SourcePos Source gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> SourcePos -> r Source gmapQr :: forall r r'. (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> SourcePos -> r Source gmapQ :: (forall d. Data d => d -> u) -> SourcePos -> [u] Source gmapQi :: Int -> (forall d. Data d => d -> u) -> SourcePos -> u Source gmapM :: Monad m => (forall d. Data d => d -> m d) -> SourcePos -> m SourcePos Source gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> SourcePos -> m SourcePos Source gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> SourcePos -> m SourcePos Source | |
Ord SourcePos | |
Defined in Text.Parsec.Pos | |
Show SourcePos | |
type SourceName = String Source
sourceName :: SourcePos -> SourceName Source
Extracts the name of the source from a source position.
sourceLine :: SourcePos -> Line Source
Extracts the line number from a source position.
sourceColumn :: SourcePos -> Column Source
Extracts the column number from a source position.
incSourceLine :: SourcePos -> Line -> SourcePos Source
Increments the line number of a source position.
incSourceColumn :: SourcePos -> Column -> SourcePos Source
Increments the column number of a source position.
setSourceLine :: SourcePos -> Line -> SourcePos Source
Set the line number of a source position.
setSourceColumn :: SourcePos -> Column -> SourcePos Source
Set the column number of a source position.
setSourceName :: SourcePos -> SourceName -> SourcePos Source
Set the name of the source.
Debugging
As a more comprehensive alternative for debugging Parsec parsers, there's also the parsec-free package.
parserTrace :: (Show t, Stream s m t) => String -> ParsecT s u m () Source
parserTrace label
is an impure function, implemented with Debug.Trace that prints to the console the remaining parser state at the time it is invoked. It is intended to be used for debugging parsers by inspecting their intermediate states.
*> parseTest (oneOf "aeiou" >> parserTrace "label") "atest" label: "test" ...
Since: parsec-3.1.12.0
parserTraced :: (Stream s m t, Show t) => String -> ParsecT s u m b -> ParsecT s u m b Source
parserTraced label p
is an impure function, implemented with Debug.Trace that prints to the console the remaining parser state at the time it is invoked. It then continues to apply parser p
, and if p
fails will indicate that the label has been backtracked. It is intended to be used for debugging parsers by inspecting their intermediate states.
*> parseTest (oneOf "aeiou" >> parserTraced "label" (oneOf "nope")) "atest" label: "test" label backtracked parse error at (line 1, column 2): ...
Since: parsec-3.1.12.0
Low-level operations
manyAccum :: (a -> [a] -> [a]) -> ParsecT s u m a -> ParsecT s u m [a] Source
Arguments
:: Stream s m t | |
=> (t -> String) | Token pretty-printing function. |
-> (SourcePos -> t -> s -> SourcePos) | Next position calculating function. |
-> (t -> Maybe a) | Matching function for the token to parse. |
-> ParsecT s u m a |
The parser tokenPrim showTok nextPos testTok
accepts a token t
with result x
when the function testTok t
returns Just x
. The token can be shown using showTok t
. The position of the next token should be returned when nextPos
is called with the current source position pos
, the current token t
and the rest of the tokens toks
, nextPos pos t toks
.
This is the most primitive combinator for accepting tokens. For example, the char
parser could be implemented as:
char c = tokenPrim showChar nextPos testChar where showChar x = "'" ++ x ++ "'" testChar x = if x == c then Just x else Nothing nextPos pos x xs = updatePosChar pos x
tokenPrimEx :: Stream s m t => (t -> String) -> (SourcePos -> t -> s -> SourcePos) -> Maybe (SourcePos -> t -> s -> u -> u) -> (t -> Maybe a) -> ParsecT s u m a Source
runPT :: Stream s m t => ParsecT s u m a -> u -> SourceName -> s -> m (Either ParseError a) Source
unknownError :: State s u -> ParseError Source
sysUnExpectError :: String -> SourcePos -> Reply s u a Source
mergeErrorReply :: ParseError -> Reply s u a -> Reply s u a Source
getParserState :: Monad m => ParsecT s u m (State s u) Source
Returns the full parser state as a State
record.
setParserState :: Monad m => State s u -> ParsecT s u m (State s u) Source
setParserState st
set the full parser state to st
.
updateParserState :: (State s u -> State s u) -> ParsecT s u m (State s u) Source
updateParserState f
applies function f
to the parser state.
class Monad m => Stream s m t | s -> t where Source
An instance of Stream
has stream type s
, underlying monad m
and token type t
determined by the stream
Some rough guidelines for a "correct" instance of Stream:
- unfoldM uncons gives the [t] corresponding to the stream
- A
Stream
instance is responsible for maintaining the "position within the stream" in the stream states
. This is trivial unless you are using the monad in a non-trivial way.
Instances
Monad m => Stream ByteString m Char | |
Defined in Text.Parsec.Prim Methodsuncons :: ByteString -> m (Maybe (Char, ByteString)) Source | |
Monad m => Stream ByteString m Char | |
Defined in Text.Parsec.Prim Methodsuncons :: ByteString -> m (Maybe (Char, ByteString)) Source | |
Monad m => Stream Text m Char | |
Monad m => Stream Text m Char | |
Monad m => Stream [tok] m tok | |
Defined in Text.Parsec.Prim |
runParsecT :: Monad m => ParsecT s u m a -> State s u -> m (Consumed (m (Reply s u a))) Source
Low-level unpacking of the ParsecT type. To run your parser, please look to runPT, runP, runParserT, runParser and other such functions.
mkPT :: Monad m => (State s u -> m (Consumed (m (Reply s u a)))) -> ParsecT s u m a Source
Low-level creation of the ParsecT type. You really shouldn't have to do this.
runP :: Stream s Identity t => Parsec s u a -> u -> SourceName -> s -> Either ParseError a Source
Instances
Constructors
Ok a !(State s u) ParseError | |
Error ParseError |
Instances
Constructors
State | |
Fields
|
setPosition :: Monad m => SourcePos -> ParsecT s u m () Source
setPosition pos
sets the current source position to pos
.
setInput :: Monad m => s -> ParsecT s u m () Source
setInput input
continues parsing with input
. The getInput
and setInput
functions can for example be used to deal with #include files.
Other stuff
setState :: Monad m => u -> ParsecT s u m () Source
An alias for putState for backwards compatibility.
updateState :: Monad m => (u -> u) -> ParsecT s u m () Source
An alias for modifyState for backwards compatibility.
parsecMap :: (a -> b) -> ParsecT s u m a -> ParsecT s u m b Source
parserReturn :: a -> ParsecT s u m a Source
parserBind :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b Source
parserFail :: String -> ParsecT s u m a Source
parserZero :: ParsecT s u m a Source
parserZero
always fails without consuming any input. parserZero
is defined equal to the mzero
member of the MonadPlus
class and to the empty
member of the Alternative
class.
parserPlus :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a Source
© The University of Glasgow and others
Licensed under a BSD-style license (see top of the page).
https://downloads.haskell.org/~ghc/8.10.2/docs/html/libraries/parsec-3.1.14.0/Text-Parsec.html