巴蛮子的新万花筒: perl

显示标签为“perl”的博文。显示所有博文

2008年12月7日星期日

Weekly Twitter 2008 #49

这期<程序员>上居然有篇文章讲perl 6的未来，拜托，务实一点不行啊，大家以前觉得变化太大的python 3.0都已经出来了... 1:12 PM Dec 5th, 2008 from web
lingoes挺不错, 体积小，又可以在firefox上取词，正好成为我上班时不能使用google toolbar取词的替代方案 2:13 PM Dec 2nd, 2008 from web

2005年10月18日星期二

regex compilation in Perl

昨天写一个简单的perl脚本处理一个比较大的文本文件时，觉得速度不太理想。想到regular expression多次匹配也许会有一些影响，记得不会变化的reg.ex.是可以编译的，编译后的执行速度应该会快一些，python里面的re模块就提供compile函数。但对于perl，却不记得有这么个东西，在perl的帮助里面找了半天也没有找到。

最后还是用了google查到(也许应该先问它才是)，很简单，在表达式后面添加一个修饰符(modifiier):

o - Compile a regular expression once
If you ever end up with a really long regular expression, you can use this modifier to compile it before it's used. This means that long and complicated expressions don't have to be compiled each time they're used.

The only thing you must remember is that if you use this modifier, you are promising Perl that you won't attempt to change it while the script is running. If you do, it won't be taken into account. There won't be an example of how to use this modifier, since if you're able to write regular expression this long and complicated, you're way ahead of anything I could tell you in this article!

2005年8月10日星期三

Perl这种语言...

在/.cn上看见一则<全球编程语言流行程度列表>，让我没有想到的是Perl居然排第四，只居于Java, C和C++之后。

恰好两周以前公司里面我原来所在的部门跟BT(Britian Telecom, 可不是BitTorrent)有个项目是用Perl开发的，偌大一个部门居然没有人会，我以前的主管打电话给我让我过去支援两周。

对于Perl 4我倒是比较熟悉，但对Perl 5以后的包、引用等的了解就少一点。考虑到很久没怎么用了(后来投身到Python去了)，于是赶紧找了两本电子书(一本Advanced Perl Programming, 一本Perl Cookbook)来翻。由于我还是更喜欢纸做的书，周末还打算去书店淘两本，谁知道跑了两家大书店，两家小书店，都没有几本Perl的书(仅看见 O'Reilly的Learning Perl, 还有一本Perl for C++ Programmer, 好像还有一本CGI Programming with Perl)。china-pub和当当也没什么好的。记得以前还常常看到一些的啊，怎么...也怪不得他们没多少人会了。

回头说说Perl这门语言，这个东西约定的东西太多了，到处都是约定、特殊变量、特殊语法。举个例子，Advanced Perl Programming第一章讲引用(reference):

$s = \('a', 'b', 'c');      # WARNING: probably not what you think

$s指向什么？指向('a', 'b', 'c')这样一个list么？嘿嘿，可不是:

As it happens, this is identical to
$s = ('a', 'b', 'c');    # List of references to scalars
An enumerated list always yields the last element in a scalar context (as in C), which means that $s contains a reference to the constant string c.

用Perl写东西，有些时候写起来还挺顺，但调起来就够费劲的，而且先不多写点注释的话回头就看不懂了。至少后来很多小玩意儿改用Python来做就易写易读了。

--------

BTW: 看见Delphi/Pascal的流行程度不断下降，很有些难过。前两日看见Bob Swart(Dr. Bob)在他的网站上打出了Forever Loyal to Delphi的标语，觉得都到了这个地步了，更是黯然。

2005年4月8日星期五

Convert CHM contents to normal HTML contents

What do I want?

I have some eBooks (CHM format or SRM format). Now I want to copy them to my cellphone. As CHM or SRM format could not be supported, thus I choose PalmDoc (.pdb) format.

Yes,I can convert a pack of HTML files into one PDB file. But:
1) Some CHM books don't have a content page. They use the CHM contents. With a content page, browsing the result PDB file would be not a happy experience.
2) SRM books could be exported as CHM files. All of them
don't have a content page either.

Then came this simple recipe.

I remember two or three years ago, I used to do these things in Perl. Perl's regex feature is so powerful. The only problem is that after a few days, the script seems to be unreadable. :-(
Python is differenent than Perl. This recipe is so simple, isn't it?

#!env python

from sgmllib import SGMLParser
import htmlentitydefs

from chmmaker import HHCWriter
import os

class SiteMapParser(SGMLParser):
   def reset(self):
       SGMLParser.reset(self)
       # some temp variables
       self.level = 0
       self.link_url = ""
       self.link_title = ""

   def start_ul(self, attrs):
       self.on_section_starts()

   def end_ul(self):
       self.on_section_ends()

   def start_param(self, attrs):
       if len(attrs)&gt;1:
           if attrs[0][0]=='name':
               if attrs[0][1]=='Name':
                   self.link_title=attrs[1][1]
               elif attrs[0][1]=="Local":
                   self.link_url=attrs[1][1]

   def start_object(self, attrs):
       self.link_title = ""
       self.link_url = ""

   def end_object(self):
       self.on_link_found(self.link_title, self.link_url)

   def on_section_starts(self):
       self.level = self.level + 1

   def on_section_ends(self):
       self.level = self.level - 1

   def on_link_found(self, title, url):
       # you can override this
       if title and url:
           print "  " * self.level + "%s [%s]" % (title, url)

class ContentParser(SiteMapParser):
   """ A simple class to convert CHM contents (foo.hhc) to a normal HTML contents """
   def reset(self):
       print "&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;"
       SiteMapParser.reset(self)

   def on_section_starts(self):
       print "&lt;ul&gt;"

   def on_section_ends(self):
       print "&lt;/ul&gt;"

   def on_link_found(self, title, url):
       print '&lt;li&gt;&lt;a href="%s"&gt;%s&lt;/a&gt;&lt;/li&gt;' % (url, title)

if __name__=='__main__':
   import sys
   if len(sys.argv)&lt;2:
       print "Usage: %s foo.hhc" % sys.argv[0]
       sys.exit()

   trans=ContentParser()
   fh=open(sys.argv[1], "r")
   try:
       trans.feed(fh.read())
   except:
       pass
   trans.close()
   fh.close()
# vim:expandtab softtabstop=4

2004年12月17日星期五

html2rtf.pl的超级链接支持

终于忍受不了html2rtf.pl，改了一下，添加了对URL的支持，在Word/Word和AbiWord都可以正确显示成超级链接(hyperlink)，但wordpad似乎有自己的显示方式: 显示成了: text ，保存后超级链接会丢失

line: 257 处添加如下代码:


    # now href
    urlobj_data1 = "{field{*fldinst {fs24insrsid13071880 hichaf1dbchaf13lochf1 hichaf1dbchaf13lochf1n HYPERLINK";
    urlobj_data2 = "hichaf1dbchaf13lochf1 }{fs24insrsid13071880charrsid13071880 {*datafield 00d0c9ea79f9bace118c8200aa004ba90b0200000003000000e0c9ea79f9bace118c8200aa004ba90b5a0000";
    # urlobj_data3 is the URL(unicode) in hex code. e.g. http://www.zope.org/Members/Brian/PythonNet/
    urlobj_data3 = "0068007400740070003a002f002f007700770077002e007a006f00700065002e006f00720067002f004d0065006d0062006500720073002f0042007200690061006e002f0050007900740068006f006e004e00650074002f";
    urlobj_data4 = "000000}}}{fldrslt {cs15fs24ulcf2insrsid13071880charrsid13071880 hichaf1dbchaf13lochf1 ";
    instream =~ s/]*>/urlobj_data1 "1" urlobj_data2&url_str2hex(1)urlobj_data4/ig;
    instream =~ s//}}}/ig

其中url_str2hex的实现如下，随便放在什么地方

# input: http://
# output: 0068007400740070003a002f002f
sub url_str2hex {
local(s);
s = _[0];

out = "";
i=0;
while(i
ch = substr(s, i, 1);
#printf "%04xn", ord(ch);
out = out.sprintf("%04x", ord(ch));
i ++;
}
printf out;
return out;
}

html2rtf.pl的网址: http://fresh.t-systems-sfr.com/unix/src/www/.warix/html2rtf.pl.html

订阅：博文 (Atom)