[svlug] ePub processing: was Some pretty serious parsing

Akkana Peck akkana at shallowsky.com
Sun Nov 15 13:25:52 PST 2015

Steve Litt writes:
> In save_changes(), I don't see any code to guarantee that file mimetype
> is not compressed. I was under the impression this file must remain
> uncompressed, even if the rest of the .zip is compressed.

Chalk that down to my ignorance of the zip format. I didn't know
individual files in a zip could be compressed; I thought the only
compression was on the whole archive. Compression in python-zipfile
doesn't seem to be well documented but it looks like
zipfile.ZIP_STORED may mean uncompressed while zipfile.ZIP_DEFLATED
means compressed. So I'll add that.

> Also, your code in save_changes() quits after finding one .opf file.
> I'm not sure, but I think it's an error condition to have multiple .opf
> files. But of course, it happens, and I think I've had various magical
> ePub writers do it.  You might want to consider a loop at the top of
> save_changes() to find all .opf files, and if more than one is found,
> list them and error out, or at least ask the person which is the right
> one and inform him 2 is an error condition (I can't think of a reason
> for multiple .opf files).

Thanks! That's useful information. I had wondered about whether I
might ever encounter multiple opf files, but hadn't found any useful
guidelines on what multiple opf files might mean. I'll make that an error.

As for the covers, if I decide to generate a cover from scratch,
using SVG is a fun idea -- mostly because it's a good excuse to
play with python SVG libraries. The current code maintains the old
cover but adds text to it; but really, for those Gutenberg covers,
I can't say I see much point in keeping the Palm Pilot image.


More information about the svlug mailing list