[svlug] Some pretty serious parsing

Karen Shaeffer shaeffer at neuralscape.com
Sat Nov 14 13:12:02 PST 2015


On Sat, Nov 14, 2015 at 12:31:16PM -0500, Steve Litt wrote:
> On Sat, 14 Nov 2015 15:41:18 +0000
> Karen Shaeffer <shaeffer at neuralscape.com> wrote:
> 
>  
> > Hi Steve,
> > Parsers are not difficult to write, but you do need to be formal
> > about it to assure correctness for the entire input space of token
> > sequences. The first thing to do is to characterize your language.
> > I'd read up about the Chomsky Hierarchy, which is a containment
> > hierarchy of classes of formal languages.
> > 
> > type-3 grammars are regular grammars that can be parsed with regular
> > expressions in the form of a finite state automaton. If the Stylz
> > language is a type-3 grammar, then this is the easiest to parse.
> > 
> > type-2 grammars are context-free grammars. The Stylz language might
> > be a context free grammar. The question is about whether a token
> > appearing in a sequence of tokens (statement) affects the syntactical
> > rules of later tokens within that statement. If not, then your
> > language is context-free and rather easy to parse. Just do a google
> > search for context free parsing.
> > 
> > type-1 grammars are context-sensitive grammars. Such languages can be
> > processed with linear bounded automatons. Much more complicated than
> > context-free grammars, depending on the extent and number of
> > contextual idoms possible.
> 
> Thanks Karen!
> 
> I knew nothing about anything you wrote, so of course without your
> suggestion I'd have never gone down that road. I'll first learn about
> the Chomsky Hierarchy, probably relying heavily on making myself a
> terminology glossary, because all this stuff is brand new to me: I was
> an office automation programmer (applications). These are the times I
> wish I'd studied Comp-Sci in college.
> 
> But anyway, now at least I know the right direction to follow, and what
> I need to learn, and after that, which way to influence Stylz (toward
> type3 or type2). And given that the project demands I prioritize fast
> and easy authoring over parsability, once I know what's type3 and
> type2, I could always write an informal preprocessor that turns the
> ideal authoring language into a type3 or type2, and when people crash
> the corner cases of my informal preparser, I'll just say "sorry, bug".
> In the long run, if a lot of people use Stylz, Version 2 will be
> written by someone much more knowledgeable than I anyway.
> 
> Thanks for pointing me in the right direction!
> 
> SteveT

Hi Steve,
Its a great project for you. Be sure to focus on formal grammar specifications
as well, because they formally define the rules that enable the implementation
of a parser.

For a context-free grammar, there are parser generators, such as GNU bison.

If your formal grammar is context-sensitive, then one simplifying strategy is
to break it into two or more context-free parsers, each of which could be
auto-generated for you by GNU bison for example. The idea is analogous to how
HTML embeds javascript and php and other languages inline. And so when your top
level parser finds a specific set of tokens, everything in there is parsed by a
different context-free parser.

enjoy,
Karen
-- 
Karen Shaeffer                 Be aware: If you see an obstacle in your path,
Neuralscape Services           that obstacle is your path.        Zen proverb



More information about the svlug mailing list