What do I want?
I have some eBooks (CHM format or SRM format). Now I want to copy them to my cellphone. As CHM or SRM format could not be supported, thus I choose PalmDoc (.pdb) format.
Yes,I can convert a pack of HTML files into one PDB file. But:
1) Some CHM books don't have a content page. They use the CHM contents. With a content page, browsing the result PDB file would be not a happy experience.
2) SRM books could be exported as CHM files. All of them
don't have a content page either.
Then came this simple recipe.
I remember two or three years ago, I used to do these things in Perl. Perl's regex feature is so powerful. The only problem is that after a few days, the script seems to be unreadable. :-(
Python is differenent than Perl. This recipe is so simple, isn't it?
I have some eBooks (CHM format or SRM format). Now I want to copy them to my cellphone. As CHM or SRM format could not be supported, thus I choose PalmDoc (.pdb) format.
Yes,I can convert a pack of HTML files into one PDB file. But:
1) Some CHM books don't have a content page. They use the CHM contents. With a content page, browsing the result PDB file would be not a happy experience.
2) SRM books could be exported as CHM files. All of them
don't have a content page either.
Then came this simple recipe.
I remember two or three years ago, I used to do these things in Perl. Perl's regex feature is so powerful. The only problem is that after a few days, the script seems to be unreadable. :-(
Python is differenent than Perl. This recipe is so simple, isn't it?
#!env python
from sgmllib import SGMLParser
import htmlentitydefs
from chmmaker import HHCWriter
import os
class SiteMapParser(SGMLParser):
def reset(self):
SGMLParser.reset(self)
# some temp variables
self.level = 0
self.link_url = ""
self.link_title = ""
def start_ul(self, attrs):
self.on_section_starts()
def end_ul(self):
self.on_section_ends()
def start_param(self, attrs):
if len(attrs)>1:
if attrs[0][0]=='name':
if attrs[0][1]=='Name':
self.link_title=attrs[1][1]
elif attrs[0][1]=="Local":
self.link_url=attrs[1][1]
def start_object(self, attrs):
self.link_title = ""
self.link_url = ""
def end_object(self):
self.on_link_found(self.link_title, self.link_url)
def on_section_starts(self):
self.level = self.level + 1
def on_section_ends(self):
self.level = self.level - 1
def on_link_found(self, title, url):
# you can override this
if title and url:
print " " * self.level + "%s [%s]" % (title, url)
class ContentParser(SiteMapParser):
""" A simple class to convert CHM contents (foo.hhc) to a normal HTML contents """
def reset(self):
print "<HTML><HEAD></HEAD><BODY>"
SiteMapParser.reset(self)
def on_section_starts(self):
print "<ul>"
def on_section_ends(self):
print "</ul>"
def on_link_found(self, title, url):
print '<li><a href="%s">%s</a></li>' % (url, title)
if __name__=='__main__':
import sys
if len(sys.argv)<2:
print "Usage: %s foo.hhc" % sys.argv[0]
sys.exit()
trans=ContentParser()
fh=open(sys.argv[1], "r")
try:
trans.feed(fh.read())
except:
pass
trans.close()
fh.close()
# vim:expandtab softtabstop=4
Powered by ScribeFire.
没有评论:
发表评论