[svlug] Some pretty serious parsing

Karen Shaeffer shaeffer at neuralscape.com
Sat Nov 14 07:41:18 PST 2015


On Sat, Nov 14, 2015 at 08:42:09AM -0500, Steve Litt wrote:
> And *NOW* for my question. What's the best way of parsing my Stylz
> file? I've made most of Stylz pretty parseable, but now, with images
> and links (for want of better words I don't want to wax poetic about),
> I must nest things. Square brackets inside of square brackets inside of
> square brackets.
...
> So what do you all think? What's a good way to parse a fairly complex
> non-XML grammar to convert it to Xhtml?
> 

Hi Steve,
Parsers are not difficult to write, but you do need to be formal about it to
assure correctness for the entire input space of token sequences. The first
thing to do is to characterize your language. I'd read up about the Chomsky
Hierarchy, which is a containment hierarchy of classes of formal languages.

type-3 grammars are regular grammars that can be parsed with regular
expressions in the form of a finite state automaton. If the Stylz language is a
type-3 grammar, then this is the easiest to parse.

type-2 grammars are context-free grammars. The Stylz language might be a context
free grammar. The question is about whether a token appearing in a sequence of
tokens (statement) affects the syntactical rules of later tokens within that
statement. If not, then your language is context-free and rather easy to parse.
Just do a google search for context free parsing.

type-1 grammars are context-sensitive grammars. Such languages can be processed
with linear bounded automatons. Much more complicated than context-free
grammars, depending on the extent and number of contextual idoms possible.

type-0 grammars are unrestricted grammars that are recursively enumerable
languages.

Note recursive languages are characterized somewhere between type-0 and type-1
grammars.

Your goal should be to define the Stylz language to be a type-1 or type-2
grammar, because parsing is easy then. Once you have a formal definition of
your Stylz language, characterizing it as likely a type-2 context free
grammar, then it will be easy to find an appropriate parsing tool to do what
you want. Context free parsers are also quite easy to write in the language
of your choice. Context-sensitive grammars are more complex and a lot more
work, but they are quite common in the real world. E.g., the C++ language is
a context sensitive language.

Good luck with your project.
Karen
-- 
Karen Shaeffer                 Be aware: If you see an obstacle in your path,
Neuralscape Services           that obstacle is your path.        Zen proverb



More information about the svlug mailing list