[svlug] Some pretty serious parsing
shaeffer at neuralscape.com
Sat Nov 14 13:12:02 PST 2015
On Sat, Nov 14, 2015 at 12:31:16PM -0500, Steve Litt wrote:
> On Sat, 14 Nov 2015 15:41:18 +0000
> Karen Shaeffer <shaeffer at neuralscape.com> wrote:
> > Hi Steve,
> > Parsers are not difficult to write, but you do need to be formal
> > about it to assure correctness for the entire input space of token
> > sequences. The first thing to do is to characterize your language.
> > I'd read up about the Chomsky Hierarchy, which is a containment
> > hierarchy of classes of formal languages.
> > type-3 grammars are regular grammars that can be parsed with regular
> > expressions in the form of a finite state automaton. If the Stylz
> > language is a type-3 grammar, then this is the easiest to parse.
> > type-2 grammars are context-free grammars. The Stylz language might
> > be a context free grammar. The question is about whether a token
> > appearing in a sequence of tokens (statement) affects the syntactical
> > rules of later tokens within that statement. If not, then your
> > language is context-free and rather easy to parse. Just do a google
> > search for context free parsing.
> > type-1 grammars are context-sensitive grammars. Such languages can be
> > processed with linear bounded automatons. Much more complicated than
> > context-free grammars, depending on the extent and number of
> > contextual idoms possible.
> Thanks Karen!
> I knew nothing about anything you wrote, so of course without your
> suggestion I'd have never gone down that road. I'll first learn about
> the Chomsky Hierarchy, probably relying heavily on making myself a
> terminology glossary, because all this stuff is brand new to me: I was
> an office automation programmer (applications). These are the times I
> wish I'd studied Comp-Sci in college.
> But anyway, now at least I know the right direction to follow, and what
> I need to learn, and after that, which way to influence Stylz (toward
> type3 or type2). And given that the project demands I prioritize fast
> and easy authoring over parsability, once I know what's type3 and
> type2, I could always write an informal preprocessor that turns the
> ideal authoring language into a type3 or type2, and when people crash
> the corner cases of my informal preparser, I'll just say "sorry, bug".
> In the long run, if a lot of people use Stylz, Version 2 will be
> written by someone much more knowledgeable than I anyway.
> Thanks for pointing me in the right direction!
Its a great project for you. Be sure to focus on formal grammar specifications
as well, because they formally define the rules that enable the implementation
of a parser.
For a context-free grammar, there are parser generators, such as GNU bison.
If your formal grammar is context-sensitive, then one simplifying strategy is
to break it into two or more context-free parsers, each of which could be
auto-generated for you by GNU bison for example. The idea is analogous to how
level parser finds a specific set of tokens, everything in there is parsed by a
different context-free parser.
Karen Shaeffer Be aware: If you see an obstacle in your path,
Neuralscape Services that obstacle is your path. Zen proverb
More information about the svlug