La Vita è Bella

2011-01-22

Fix epubs

The wonderful epub reader for Android, Aldiko, released a major update recently. Besides some great new features/improvements, it also broke something: all Chinese characters in Pro Git Chinese version can't be displayed. But it didn't simply broke Chinese epubs at all. My other Chinese epubs are just fine.

I contacted Aldiko for help, they just replied claiming that I need to embed Chinese font into epub for them to display, and Adobe Digital Edition have the same problem. I tried my epubs with Adobe Digital Edition and it's really the same. But after I looked into my good and broken epubs (they are simply zip archives with metadata, html, css and pictures inside), I can't find embedded Chinese font (in the good ones, of course) at all.

Instead, I found something else interesting: all my good epubs have an "xml:lang" parameter set to "zh-CN" in every html tag, and the broken one didn't.

So I add that parameter to every html file in progit-zh.epub, and repack the file, now it works (in both Aldiko and Adobe Digital Edition)!

To simplify this work, I wrote a python script to do that automatically. It will unpack the epub file into a temp directory, add "xml:lang" parameter to every html file if didn't present, and then pack the epub back again. It will also output the replaced lines into stderr, like:

./progit_split_000.html:
<html xmlns="http://www.w3.org/1999/xhtml">
->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN">

Get it from my script repo on GitHub, and change the LANG constant if needed.

If you have other ideas about it, please let me know.

21:28:08 by fishy - Permanent Link

May the Force be with you. RAmen