[svlug] Some pretty serious parsing

Akkana Peck akkana at shallowsky.com
Sun Nov 15 08:29:23 PST 2015

> Ivan Sergio Borgonovo <mail at webthatworks.it> wrote:
> > What are ebooks vendors standards?

Steve Litt writes:
> The big publishers just throw their old manuscripts into a meat
> grinder: Let the buyer beware. Some small publishers and
> self-publishers do an excellent job, while others are even more
> attrocious than the big publishers.

So true. Epub books vary tremendously and there doesn't seem to be a
good standard. I've been updating my epub python module recently to
handle and correct cover images (like all those Project Gutenberg
epubs that use a picture of a Palm Pilot as a cover image, with no
text to tell you what the book is). Finding the cover image is
really a matter of guesswork in most epubs: does it have "cover" in
the filename somewhere and does the extension imply it's an image?

> As far as ePub, I
> need to set my sights higher than what the publishers are doing,
> because I swear, their formatting mistakes cut your reading speed in
> half.

(Applause.) One inducement to use something like Stylz would be the
possibility of generating good clean epub (and html and mobi and
whatever other format you need to generate). It sounds like a
terrific project, and one that's needed. Wish I knew more about
parsers and formal grammars, but it looks like Karen was able to
point you in the right direction there.


More information about the svlug mailing list