[svlug] Some pretty serious parsing

Steve Litt slitt at troubleshooters.com
Sat Nov 14 05:42:09 PST 2015

Hi all,

I need a fast, easy book authoring system to write books destined for
both PDF/paper and ePub. It does not currently exist in the free
software world.

I've used LyX to write books (to PDF/paper) since 2001, and would
continue to use it if it could write to both PDF and ePub. But it
can't: The (X)html LyX outputs is pigeon html rendering pigeon
ePubs with serious readtime deficiencies and inability to pass
standards with eBook vendors.

I could use Sigil and employ Python plus an XML parser to convert
its ePub native format to PDF. This would be an ideal solution, but
with its menagerie of dependencies, I don't trust Sigil to be around
as time goes by. I don't want my business depending on Sigil.

I could use Markdown or Multimarkdown or Asciidoc or one of the three
million wiki languages out there, but I insist on using nothing but
semantic style and replacing them with appearances only in the very
last step. This isn't how these wiki languages and lightweight markup
languages work.

I could use Docbook XML, but I don't want to deal with writing (and
matching when something goes wrong) all the end tags. I want something
that writes easy and writes fast. That's not Docbook XML.

I don't want to use XSLT. My IQ isn't high enough.

The preceding six paragraphs were a preface: A preface to show how,
before deciding to create my own bookwriting system, I explored a heck
of a lot of options and found them inadequate for my needs.

So I'm creating a book native format called Stylz. It's very simple,
tags are short and easily typed, it should be very fast for a touch
typist. Think Plain TeX, but unlike Plain TeX, Stylz is based on
styles, and only styles, not on appearance based stuff like TeX (with
macros thrown in). I'm about 3/4 of the way through defining the Stylz
native format right now, but stuff like links and images are causing

The Stylz cheatsheet is at
http://troubleshooters.com/projects/stylz/cheatsheet.htm . It evolves
every few days: It's still in a state of flux.

The way Stylz will work is this: The author writes the book in Stylz
using his favorite editor. A Stylz parser converts the Stylz file to
Xhtml. From there, Python programs with XML parsers can convert to PDF
(probably via LaTeX), ePub, or any new book format that might come
along later. Because styles are preserved til the bitter end.

And *NOW* for my question. What's the best way of parsing my Stylz
file? I've made most of Stylz pretty parseable, but now, with images
and links (for want of better words I don't want to wax poetic about),
I must nest things. Square brackets inside of square brackets inside of
square brackets.

I spoze I could maintain a level counter and stack, but that seems
barbaric. I'm not very good at recursion, so I won't do that unless I
truly know from the start that it's the right way to do it. I read a
little bit about lex and yacc, but even a Hello World with one of them
is a huge learning step that I wouldn't want to do unless I knew from
the start that they were the right way to go. And everyone says they're
dinosaurs so I wonder what replaced them.

So what do you all think? What's a good way to parse a fairly complex
non-XML grammar to convert it to Xhtml?



Steve Litt 
November 2015 featured book: Troubleshooting Techniques
     of the Successful Technologist

More information about the svlug mailing list