2005年4月8日星期五

Convert CHM contents to normal HTML contents

What do I want?

I have some eBooks (CHM format or SRM format). Now I want to copy them to my cellphone. As CHM or SRM format could not be supported, thus I choose PalmDoc (.pdb) format.

Yes,I can convert a pack of HTML files into one PDB file. But:
1) Some CHM books don't have a content page. They use the CHM contents. With a content page, browsing the result PDB file would be not a happy experience.
2) SRM books could be exported as CHM files. All of them
don't have a content page either.

Then came this simple recipe.

I remember two or three years ago, I used to do these things in Perl. Perl's regex feature is so powerful. The only problem is that after a few days, the script seems to be unreadable. :-(
Python is differenent than Perl. This recipe is so simple, isn't it?

#!env python

from sgmllib import SGMLParser
import htmlentitydefs

from chmmaker import HHCWriter
import os

class SiteMapParser(SGMLParser):
def reset(self):
SGMLParser.reset(self)
# some temp variables
self.level = 0
self.link_url = ""
self.link_title = ""

def start_ul(self, attrs):
self.on_section_starts()

def end_ul(self):
self.on_section_ends()

def start_param(self, attrs):
if len(attrs)>1:
if attrs[0][0]=='name':
if attrs[0][1]=='Name':
self.link_title=attrs[1][1]
elif attrs[0][1]=="Local":
self.link_url=attrs[1][1]

def start_object(self, attrs):
self.link_title = ""
self.link_url = ""

def end_object(self):
self.on_link_found(self.link_title, self.link_url)

def on_section_starts(self):
self.level = self.level + 1

def on_section_ends(self):
self.level = self.level - 1

def on_link_found(self, title, url):
# you can override this
if title and url:
print " " * self.level + "%s [%s]" % (title, url)

class ContentParser(SiteMapParser):
""" A simple class to convert CHM contents (foo.hhc) to a normal HTML contents """
def reset(self):
print "<HTML><HEAD></HEAD><BODY>"
SiteMapParser.reset(self)

def on_section_starts(self):
print "<ul>"

def on_section_ends(self):
print "</ul>"

def on_link_found(self, title, url):
print '<li><a href="%s">%s</a></li>' % (url, title)

if __name__=='__main__':
import sys
if len(sys.argv)<2:
print "Usage: %s foo.hhc" % sys.argv[0]
sys.exit()

trans=ContentParser()
fh=open(sys.argv[1], "r")
try:
trans.feed(fh.read())
except:
pass
trans.close()
fh.close()
# vim:expandtab softtabstop=4


Powered by ScribeFire.

没有评论: