[svlug] Some pretty serious parsing

Karen Shaeffer shaeffer at neuralscape.com
Sat Nov 14 07:47:39 PST 2015


On Sat, Nov 14, 2015 at 03:41:18PM +0000, Karen Shaeffer wrote:
> On Sat, Nov 14, 2015 at 08:42:09AM -0500, Steve Litt wrote:
> > And *NOW* for my question. What's the best way of parsing my Stylz
> > file? I've made most of Stylz pretty parseable, but now, with images
> > and links (for want of better words I don't want to wax poetic about),
> > I must nest things. Square brackets inside of square brackets inside of
> > square brackets.
> ...
> > So what do you all think? What's a good way to parse a fairly complex
> > non-XML grammar to convert it to Xhtml?
> > 
> 
> Hi Steve,
> Parsers are not difficult to write, but you do need to be formal about it to
> assure correctness for the entire input space of token sequences. The first
> thing to do is to characterize your language. I'd read up about the Chomsky
> Hierarchy, which is a containment hierarchy of classes of formal languages.
> 
> type-3 grammars are regular grammars that can be parsed with regular
> expressions in the form of a finite state automaton. If the Stylz language is a
> type-3 grammar, then this is the easiest to parse.
> 
> type-2 grammars are context-free grammars. The Stylz language might be a context
> free grammar. The question is about whether a token appearing in a sequence of
> tokens (statement) affects the syntactical rules of later tokens within that
> statement. If not, then your language is context-free and rather easy to parse.
> Just do a google search for context free parsing.
> 
> type-1 grammars are context-sensitive grammars. Such languages can be processed
> with linear bounded automatons. Much more complicated than context-free
> grammars, depending on the extent and number of contextual idoms possible.
> 
> type-0 grammars are unrestricted grammars that are recursively enumerable
> languages.
> 
> Note recursive languages are characterized somewhere between type-0 and type-1
> grammars.
> 
> Your goal should be to define the Stylz language to be a type-1 or type-2

Err, I meant to say type-3 or type-2 grammar...  (smiles ;)

> grammar, because parsing is easy then. Once you have a formal definition of
> your Stylz language, characterizing it as likely a type-2 context free
> grammar, then it will be easy to find an appropriate parsing tool to do what
> you want. Context free parsers are also quite easy to write in the language
> of your choice. Context-sensitive grammars are more complex and a lot more
> work, but they are quite common in the real world. E.g., the C++ language is
> a context sensitive language.
> 
> Good luck with your project.
> Karen
> -- 
> Karen Shaeffer                 Be aware: If you see an obstacle in your path,
> Neuralscape Services           that obstacle is your path.        Zen proverb
> 
> _______________________________________________
> svlug mailing list
> svlug at lists.svlug.org
> http://lists.svlug.org/lists/listinfo/svlug
--- end quoted text ---

-- 
Karen Shaeffer                 Be aware: If you see an obstacle in your path,
Neuralscape Services           that obstacle is your path.        Zen proverb



More information about the svlug mailing list