巴蛮子的新万花筒: chm

显示标签为“chm”的博文。显示所有博文

2009年6月9日星期二

Windows下编译pychm模块

环境: Windows XP SP2, Python 2.5, Visual C++ Toolkit 2003, Platform SDK (2005.04)

1. 编译chmlib
源代码带了Visual Studio的工程文件，直接编译，可得到静态库libchm.lib
将libchm.lib改名为chm.lib

2. 编译pychm
1) 执行python-build-env.bat (具体说明见上次blog文章)，其内容如下@echo off

set VC_TOOLKIT_DIR=d:\Program Files\Microsoft Visual C++ Toolkit 2003\
call "%VC_TOOLKIT_DIR%\vcvars32.bat"

set DISTUTILS_USE_SDK=1

set MSSDK=d:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\
rem set MSSDK=e:\msys\1.0.11\mingw\

set include=%MSSDK%include;%include%
set lib=%MSSDK%lib;%lib%

echo "Now you can continue with 'python setup.py build/install'"

2) 将chmlib的目录加入include和lib环境变量

set include=e:\python\pychm-0.8.4\chmlib\src;%include%
set lib=e:\python\pychm-0.8.4\chmlib\src\release;%lib%

2) VC没有inttypes.h和strings.h，我也没有找到其它定义了uint8_t等类型和strcasecmp等函数的头文件，所以修改extra.c的如下两行

#include
#include

修改为:

#if !defined(_MSC_VER)
#include
#include
#else
    /* MSVC's C compiler doesn't support `inline' */
#define inline
typedef unsigned char   uint8_t;
typedef unsigned short uint16_t;
typedef unsigned   uint32_t;
typedef unsigned long long   uint64_t;
#define strcasecmp stricmp
#define strncasecmp strnicmp
#endif

3) 修改distutils\msvccompiler.py中MSVCCompiler.initialize()函数
将

self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:NO']

改为

self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:NO', '/NODEFAULTLIB:libc']

否则libc和msvcrt里面的一些函数会打架

4) python setup.py install

搞定

2008年1月29日星期二

《X档案》和《老友记》剧本CHM

在box.net上申请了一个账号，将《X档案》和《老友记》剧本CHM传了上去。

http://public.box.net/bamanzi

这个账号本来是为了上传ScrapBook中保存的文摘才申请的——我在ScrapBook的网站上看见了ScrapBox.net这个扩展的扩展。

网页速度有点慢，但上传、下载还不是太差。

以后一些非技术的东西或者体积比较大点的东西就传到这里吧。技术方面的东西还是在 http://bamanzi.inlsd.org 上

2007年8月2日星期四

做了个xulplanet的镜像chm

最近研究Mozilla XUL, 发现老是要到Mozilla Developer Center, Mozilla Wiki, XULPlanet这几个网站查阅文档，今天干脆用Teleport Pro拉了它们的镜像下来。考虑到小文件比较占空间，而且没有索引，于是想把它们进一步做成CHM。

首先搞定的是XULPlanet:
http://bamanzi.inlsd.org/xul/xulplanet.chm

写了一个小的Python脚本(html2hhk.py)，把所有的XUL Element属性/方法、XPCOM组件/接口都搜了出来转换为CHM的索引（其实这个脚本的功能是读出HTML的title和keywords meta tag作为关键字，改一改也可以输出devhelp的keyword列表)。

这个CHM文件还有些问题:
1. 目录还没有做，至少一些大的分类得列上去吧，这个东西好像没有什么简单方法；
2. 内容部分每页左边都有一个侧栏，供导航用的，这个东西在CHM里面没有作用，得用sed什么的批量处理一下。

2005年10月25日星期二

gnochm: show me the icons

去年我把Inside the X网站上提供的所有The X Files的对白都弄了下来，然后打包成了一个CHM文件。

最近在重看《X档案》的主线部分，就是有关黑油、外星殖民、超级战士的，在Inside the X主页上有星号标记的。为了方便，当初我做CHM时也将目录树中的节点用了特殊的图标。

现在的问题是我在Debian上一边看碟，一边用gnochm看剧本，gnochm的目录树没有图标，从目录里面却根本看不出来哪些是阴谋论部分的。

毛主席说：自己动手，丰衣足食。于是又来改gnochm。本以为要加个图标会比较麻烦，没想到非常简单。效果如下，补丁已经提交给upstream.

2005年8月2日星期二

gnochm问题的定位

在网友duh的激励下，对昨天发现的gnochm的索引问题进行了定位。

索引的排序问题很好解决，只要将index那个TreeView的model换成一个可以排序的就可以了。补丁如后面所示。

索引不全的问题其实并不能怪gnochm，而是有些关键字当中有非法字符导致了HTML解析失败
chm文件中的topic和index都是sitemap格式，以HTML格式为载体的

gnochm 采用python编写，很自然地用了HTMLParser这个类来解析这个文件，但碰到上面的非法标识(The "link-selected" signal，注意这里引号是不合法的)，后面的就都无法读取出来了，所以会丢掉很多关键字。而xchm就会忽略这个继续往下分析。

不能跳到archor的问题明天再来琢磨，也不知道是不是gtkhtml2的问题。

--------- 8< ---------------------


[bamanzi@saynomdk ~]$ diff -Nurp /usr/bin/gnochm gnochm
--- /usr/bin/gnochm     2005-03-18 09:27:00.000000000 +0800
+++ gnochm      2005-08-02 23:23:35.000000000 +0800
@@ -811,11 +811,13 @@ class MainApp:
     # Index
     self.imodel = gtk.TreeStore(gobject.TYPE_STRING,
                                 gobject.TYPE_STRING)
+       self.isortmodel = gtk.TreeModelSort(self.imodel)
     self.indexview = self.xml.get_widget('IndexTView')
-        self.indexview.set_model(self.imodel)
+       self.indexview.set_model(self.isortmodel)
     cell2 = gtk.CellRendererText()
     column2 = gtk.TreeViewColumn('Index', cell2, text=0)
     self.indexview.append_column(column2)
+       self.isortmodel.set_sort_column_id(0, gtk.SORT_ASCENDING)
     # Search
     self.smodel = gtk.ListStore(gobject.TYPE_STRING,
                                 gobject.TYPE_STRING)

Comments for post
HTMLParser
nick | 03/08/2005, 13:54

Use SGMLParser then. More fault tolerant.

test with sgml
nick | 03/08/2005, 13:59

$ python /usr/lib/python2.3/sgmllib.py sitemap.html

Shows that no big deal for sgml.

sorted list
nick | 03/08/2005, 14:02

For sorted list, I'd rather sort the list before feeding to the treemodel. Should be faster for big list. Treeview is already slow in itself.

But let the treemodel do the sorting is simpler.

Re: SGMLParser
bamanzi | 03/08/2005, 17:41

Really, SGMLParser works!

Thanks!

2005年8月1日星期一

还是xchm强 (was Re: CHM viewers总结)

前不久对现有的CHM viewers做了一下总结, 因为我对GNOME的喜爱，所以将look and feel一致的gnochm作为了我的首选。因为一直用来看电子书，倒也感觉挺好，没有什么问题。但这两天因为想写点代码，需要查gtk2和perl的API库，才注意到gnochm有着不少问题:

1. index功能做的很差，一方面是不能显示所有的关键字(上述glib2+pango+atk+gtk2合一的帮助文件就只能显示glib的一些函数，还不知道是否完整)，另外一方面是没有提供一个输入框供匹配(本来可以指望gtk2自己对列表控件提供的快捷匹配功能，但gnochm对关键字甚至没有排序)；

2. 对于hyperlink的archor很差，比如转到GtkTextView的页面，试图从顶部的函数、事件列表跳到gtk_text_view_get_buffer的详细说明去，居然跳不过去，每次都只能定位到文件，而不能到达具体的anchor。

回头装了个xchm，一切都搞定！

P.S.

1. 本来gnome平台的API文档查看工具是devhelp，但mandrake里面的惯例是对于正式发行版本以外的软件包一点质量保证都没有，反正我装了之后跑不起来。

2. perldoc是个不错的工具，还是学习一下用法比较好，毕竟随perl包提供。

2005年7月30日星期六

glib2,gtk2,pygtk2 reference in CHM format

I modified devhelp2chm a little, to workaround a problem I found when using gnochm to read the CHM files generated by it.

And updated two CHM files generated by it:
glib2/gtk2 gtk-2.6.8, glib-2.6.5, including FAQ and tutorial
(source: package libgtk2.0-doc, libglib2.0-doc)
pygtk2 reference pygtk2ref 2.6.0, and GtkSourceView, GtkSpell, GnomePrint, GnomePrintUI, GtkMozembed
(source: pygtk2 website)

----------------

How to build:

gtk2.chm:

apt-get install libglib2.0-doc libgtk2.0-doc
cd /usr/share/gtk-doc/html/
DIRS="gtk/ gdk/ gdk-pixbuf/ gtk-faq/ gtk-tutorial/ glib/ gobject/"
find $DIRS -name '*.devhelp.gz' | xargs gunzip
for d in $DIRS; do
(cd $d;
echo $d
for f in *.html; do
sed 's#/usr/share/gtk-doc/html/#../#' $f > $f.tmp
mv $f.tmp $f
done)
done
find $DIRS -name '*.devhelp | xargs ~/bin/devhelp2chm-v2.sh
-p gtk2 -T "GTK+ Reference Manual" -t gtk/index.html

...Then use HtmlHelp Workshop to build it.

pygtk2ref.chm

wget http://www.pygtk.org/dist/pygtk2reference.tbz2
tar jxf pygtk2reference.tbz2
find . -name '*.devhelp' | xargs ~/bin/devhelp2chm-v2.sh
-p pygtk2ref -T "PyGTK2 Reference" -t pygtk2reference/index.html

2005年6月25日星期六

Debian Reference CHM version (link?)

Created from Debian Reference 06/22/05.

English 简体中文版繁体中文版

I wrote a simple script to create the HtmlHelp project files.

2005年6月18日星期六

总结: CHM viewers

最近又发现了几个，索性就来总结一下吧。(Nearly all are based on chmlib.)

Viewer

Requires

CJK support

Project Description

Comments

xchm

wxWidget,chmlib

fair

"xCHM is a .chmviewer for UNIX (Linux, *BSD, Solaris). Success stories of xCHM on Mac OS X have also been received, and apparently xCHM even works if compiled under the Cygwin environment in Windows."

gnochm

GNOME,python-gnome

good

"GnoCHM is a CHM file viewer for Gnome2. It uses PyCHM, a set of Python wrappers around the C library libchm."

chmsee

gtk2,gnome-vfs2, gtkhtml3

"只支持简体和英文编码"

" ChmSee是一个浏览CHM文件的程序,但只支持简体和英文编码的CHM文件,其它编码暂不支持."

国人开发(作者忘了在主页上留自己的名字了:0)

arCHMage

chmlib,python

good

Actually it is not a real viewer. It is a HTTP server. You need a web browser to view the pages.

kio_chm

KDE3,chmlib

good

kio_chm is a plugin for KDevelop, but when installed, you can view CHM files in konquorer.

kchm

chmlib, KDevelop3(kio_chm), Qt3

"KCHM provides access to MS .chm files (help files) using Chmlib and Qt and KDE libraries. You can read your favourite ebooks on your Linux box!"

Just a UI front-end for kio_chm. UI written in Qt3.

kchmnew

KDE

"This is a chm file viewer + corresponding kpart and kio slave for KDE. It based on libchm and libchm++."

kchmviewer

chmlib, Qt3

good

"kchmviewer is a CHM (Winhelp) files viewer written on Qt/KDE. It can be build as a standalone Qt-based application, or a KDE application. The main point of kchmviewer is compatibility with non-English chm files, including most international charsets."

chmviewer

wxGTK, libmspack

Dead project? Seems no longer active

chm_viewer

Another chmviewer. Dead project?

I prefer to gnochm, as the UI fits better in the GNOME desktop. As a minimalist, and taken CJK support into account, xchm and kchmviewer seems to be a good choice. If you don't care the UI, then choose archmage.

Where to download CHM books for GNU tools:

http://lidn.sourceforge.net
http://htmlhelp.berlios.de/ (CHM books just updated on May 31)

How to:

How to convert DevHelp books into CHM format (devhelp2chm, written by myself :-)
How to convert CHM book into DevHelp format
How to convert TexInfo document into CHm format
How to compile CHM file in Linux (with Wine + HHW)

2005年4月30日星期六

Friends剧本CHM版本

应朋友们要求，将输出上载到主页空间了，点击这里下载(Friends.chm)
不一定能够长期保留，要下载趁早。

点击图片可在新窗口打开

是用脚本转的(见前面的帖子I, II, III)，当初学习Dive Into Python里面HTML Processing一章练手写的，(原来的Word文档)

p.s Part II的脚本有点问题，因为后来发现后面有些episode不是<hr>分隔的，改用"End"和"The End"划分准确性大一点，但还是有几个需要手工分割。脚本已经更新到下载区friends-split2.py。

2005年4月8日星期五

Convert CHM contents to normal HTML contents

What do I want?

I have some eBooks (CHM format or SRM format). Now I want to copy them to my cellphone. As CHM or SRM format could not be supported, thus I choose PalmDoc (.pdb) format.

Yes,I can convert a pack of HTML files into one PDB file. But:
1) Some CHM books don't have a content page. They use the CHM contents. With a content page, browsing the result PDB file would be not a happy experience.
2) SRM books could be exported as CHM files. All of them
don't have a content page either.

Then came this simple recipe.

I remember two or three years ago, I used to do these things in Perl. Perl's regex feature is so powerful. The only problem is that after a few days, the script seems to be unreadable. :-(
Python is differenent than Perl. This recipe is so simple, isn't it?

#!env python

from sgmllib import SGMLParser
import htmlentitydefs

from chmmaker import HHCWriter
import os

class SiteMapParser(SGMLParser):
   def reset(self):
       SGMLParser.reset(self)
       # some temp variables
       self.level = 0
       self.link_url = ""
       self.link_title = ""

   def start_ul(self, attrs):
       self.on_section_starts()

   def end_ul(self):
       self.on_section_ends()

   def start_param(self, attrs):
       if len(attrs)&gt;1:
           if attrs[0][0]=='name':
               if attrs[0][1]=='Name':
                   self.link_title=attrs[1][1]
               elif attrs[0][1]=="Local":
                   self.link_url=attrs[1][1]

   def start_object(self, attrs):
       self.link_title = ""
       self.link_url = ""

   def end_object(self):
       self.on_link_found(self.link_title, self.link_url)

   def on_section_starts(self):
       self.level = self.level + 1

   def on_section_ends(self):
       self.level = self.level - 1

   def on_link_found(self, title, url):
       # you can override this
       if title and url:
           print "  " * self.level + "%s [%s]" % (title, url)

class ContentParser(SiteMapParser):
   """ A simple class to convert CHM contents (foo.hhc) to a normal HTML contents """
   def reset(self):
       print "&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;"
       SiteMapParser.reset(self)

   def on_section_starts(self):
       print "&lt;ul&gt;"

   def on_section_ends(self):
       print "&lt;/ul&gt;"

   def on_link_found(self, title, url):
       print '&lt;li&gt;&lt;a href="%s"&gt;%s&lt;/a&gt;&lt;/li&gt;' % (url, title)

if __name__=='__main__':
   import sys
   if len(sys.argv)&lt;2:
       print "Usage: %s foo.hhc" % sys.argv[0]
       sys.exit()

   trans=ContentParser()
   fh=open(sys.argv[1], "r")
   try:
       trans.feed(fh.read())
   except:
       pass
   trans.close()
   fh.close()
# vim:expandtab softtabstop=4

2005年3月28日星期一

搞到Knoppix Hacks一书电子件

CHM格式, 在eDonkey上搜到的。我在这里放了一份。
这下子可以好好玩玩了。

其实O'Reilly对该书提供了在线阅读。

2005年3月10日星期四

QuickCHM这个工具还不错

原来用过一阵，觉得功能太弱。这次找了个最新版本(2.6)装上，感觉还行。

如果原来发现这个，就不必编写python脚本来自己提取html的标题生成hhp/hhc了
做X Files和Friends剧本的CHM版本也方便多了。

Old posts:
How to convert Friends.doc into a CHM file
Create a CHM file for PyGTK2 Tutorial

2004年11月18日星期四

How to convert Friends.doc into a CHM file (1)

Part1: Friends.doc -> friends.htm -> friends-thin.htm

1. Open Friends.doc with MS Word and save it into a html file (friends.html). I used Office XP,
and due the garbage info added by M Word, the output file is about 28M!

2. Write a simple script (friends-diet.html) to get rid of the garbage attributes
generated by evil M word, include 'class', 'style' and 'lang' etc.
This would cut the size 75% off!
#!/bin/python

from sgmllib import SGMLParser
import htmlentitydefs
import os, sys

class FriendsDiet(SGMLParser):
def reset(self):
self.output=open("friends-thin.html", "w"[img]/images/wink.gif[/img]
SGMLParser.reset(self)

def unknown_starttag(self, tag, attrs):
if tag=='p':
self.output.write("
n"[img]/images/wink.gif[/img]
elif tag=='span' or tag=='o':
pass
elif tag=='o:SmartTagType' or tag=='SmartTagType':
print "Ignore",tag
pass
else:
strattrs=""
for key, value in attrs:
if (key!='class' and key!='style' and key!='lang' and key[0:5]!='xmlns'):
strattrs = strattrs + ' %s="%s"' % (key, value)

self.output.write("<%s%s>" % (tag, strattrs))

if tag=='body':
self.output.write('n')

def unknown_endtag(self, tag):
if tag!='span' and tag!='p' and tag[0:2]!='o:':
self.output.write("n" % tag)

def handle_data(self, text):
if text.strip()!='':
self.output.write(text+"n"[img]/images/wink.gif[/img]

def handle_charref(self, ref):
self.output.write("&#%s;" % ref)

def handle_entityref(self, ref):
semicolon=""
if htmlentitydefs.entitydefs.has_key(ref):
semicolon=";"
self.output.write("&%s%s" % (ref, semicolon))

if __name__=='__main__':
import sys
parser=FriendsDiet()
#fh=open(sys.argv[1], "r"[img]/images/wink.gif[/img]
fh=open("friends.htm", "r"[img]/images/wink.gif[/img]
content=fh.read()
parser.feed(content)
fh.close()
parser.close()

2004年11月5日星期五

Create a CHM file for PyGTK2 Tutorial

Document source: PyGTK2 Tutorial[1]
[1] http://www.pygtk.org/dist/pygtk2tutorial.tgz

The main problem is to parse the contents from index.html write then into a HHC file for MS HtmlHelp Workshop.

You can use this simple script to do this (see below)
pygtk-index2hhc.py pygtk2tut.hhc index.html

P.S:
As to the PyGTK2 Reference[2], you can download the devhelp index file[3], and then use my devhelp2chm.sh[4]to get the .HHP, .HHC and .HHK.
[2] http://www.pygtk.org/dist/pygtk2reference.tbz2
[3] http://www.moeraki.com/pygtkreference/pygtk2reference.devhelp.gz
[4] http://www.linuxeden.com/forum/blog/resserver.php?blogId=110848&resource=devhelp2chm.sh

#!env python from sgmllib import SGMLParser import htmlentitydefs from chmmaker import HHCWriter import os class ContentParser(SGMLParser): def __init__(self, outputfile): self.hhcwriter = HHCWriter(outputfile) self.hhcwriter.print_header() SGMLParser.__init__(self) def __del__(self): self.hhcwriter.print_footer() def reset(self): SGMLParser.reset(self) # some temp variables self.level=0 self.title = self.url = "" self.in_href = False def start_dd(self, attrs):
self.hhcwriter.hhcfile.write("<UL>")

def end_dd(self):
self.hhcwriter.hhcfile.write("</UL>")

def start_a(self, attrs): for attr in attrs: if attr[0].lower()=='href': self.in_href = True self.title = "" self.url = attr[1] break def end_a(self): if self.in_href and self.title and self.url: self.on_href(self.title.replace('"', ""), self.url) self.in_href = False self.title = self.url = "" def handle_data(self, text): self.title += text def handle_charref(self, ref): if self.in_href: self.title += "&#%(ref)s;" % locals() def handle_entityref(self, ref): if self.in_href: self.title += "&%(ref)s" % locals() if htmlentitydefs.entitydefs.has_key(ref): self.title += ";" def on_href(self, title, url): target=url if url[-3:]=='fig': target= "" if title and target: self.hhcwriter.add_topic(title, target) if __name__=='__main__': import sys if len(sys.argv)<3: print "Usage: %s hhcfilename index.html" % sys.argv[0] sys.exit() trans=ContentParser(sys.argv[1]) fh=open(sys.argv[2], "r"[img]/images/wink.gif[/img] try: trans.feed(fh.read()) except: pass trans.close() fh.close()
# vim:expandtab softtabstop=4

2004年9月20日星期一

制作CHM文件时的未公开的选项

在工程文件(.HHP)中添加窗口定义，然后做相应更改:

添加MSDN菜单:
window定义中将style参数(第一个0x....数值)加上0x10000(比如0x23520->0X33520)

添加字体按钮:
window定义中将buttons参数(第二个0x....数值)加上0x100000(比如0x24385e->0x34385e)

工具条按钮不显示文字:
window定义中将style参数(第一个0x....数值)加上0x40(比如0x23520->0X23560)
[@more@]

工具条按钮说明(带括号者为隐藏选项，可能部分是过时的选项, 至少现在的HtmlHelp Workshop没有提供)

Hide/Show 0x0002
Back 0x0004
Forward 0x0008
Stop 0x0010
Refresh 0x0020
Home 0x0040
(Next) 0x0080 下一步, 不知何用
(Prev) 0x0100 上一步, 不知何用
(Notepad) 0x0200 便笺, 似乎没有作用
(Contents) 0x0400 目录,似乎没有作用
Locate 0x0800
Options 0x1000
Print 0x2000
(Index) 0x4000 索引,似乎没有作用
(Search) 0x8000 搜索,似乎没有作用
(History) 0x010000 历史,似乎没有作用
(Bookmark) 0x020000 书签,似乎没有作用
Jump1 0x040000
Jump2 0x080000
(Fonts) 0x100000 字体
(Next) 0x200000 上一步,不知何用
(Prev) 0x400000 下一步,不知何用

似乎唯一有用的也就是“字体”按钮了

订阅：博文 (Atom)