[译]正则表达式:从菜鸟到大师
Author: Jan Goyvaerts
Publish Date: 2 Feb. 2009
Blog entry: http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/
Translated By: Rex (http://iregex.org)
Rex注:本文是Jan Goyvaerts为自己的著作《Regular Expression Cookbook》写的序言中的一段。
One of my last tasks for the Regular Expression Cookbook was to write the preface, including my author bio. I told the story of how I went from my first real encounter with regular expressions in 2000, to the expert I am almost a decade later.
《Regular Expression Cookbook》即将完工,剩余的工作之一是作序,包括写我的作者小传。我讲述了自己如何在2000年第一次遭遇正则表达式,并在近乎十年之后才成为专家的经历。
My first attempt at writing the bio came out way too long compared with the other sections in the preface. The final bio is only half as long. Rather than let the long bio go to waste, I’m publishing it here, with added links.
我写的小传初稿,比序言中其它小节的篇幅要长得多,最终还是压缩到一半的长度。原版的较长自传并没有丢到故纸堆里,我将其发布到这里,并加上了链接。
In 1996, fresh out of high school, Jan Goyvaerts started a hobby project publishing his own software on his own website. It was less than a year before that Internet access had become available at local call rates to his Belgian hometown. In 1999, he decided that a university degree was only a ticket to joining the rat race, and focused on his ever more successful software development venture. He set up the business that would eventually become Just Great Software in 2000.
1996年,Jan Goyvaerts高中毕业刚毕业,就在其个人网站上发布自编的软件作为业余爱好项目。一年之后,互联网才在他的比利时老家以市话费的价格普及起来。1999年,他意识到,大学学位充其量不过是一块通向更加残酷的竞争之路的敲门砖,因此他集中精力继续做自己更加成功的软件开发。2000年,他成立了自己的公司,该公司是后来的Just Great Software(绝佳软件)公司的前身。
At that time, Jan had no idea he would ever become an expert on regular expressions. One of his early successes was a postcardware text editor called EditPad. Since postcards don’t pay the bills, he developed a commercial text editor called EditPad Pro. EditPad Pro, released mid-2000, needed regular expression support to compete. Only the best regex engine would do for “Just Great Software”. Jan decided to go with PCRE.
那时,Jan从未料到他有朝一日会成为正则表达式专家。他早期的成功项目之一是一款叫作EditPad的名信片软件。由于名信片不足以补贴家用开支,他开发了一款收费版的文本编辑器:EditPad Pro,发布时间是2000年中期。该软件以正则表达式支持作为卖点。只有最好的正则引擎才配得上“Just Great Software(绝佳软件)”这个招牌。Jan决定采用PCRE。
(rex注:名信片软件并不是编辑名信片的软件,而是该软件近乎免费,只要给作者寄一张名信片就可以注册。)
The regular expression features in EditPad Pro proved quite popular, particularly because PCRE offered a regex syntax compatible with Perl, which was all the rage. Most other text editors, even the big IDEs from Microsoft and Borland, had much simpler regular expression support. (In 2009, Visual Studio and Delphi still use those same old regex flavors. This frustrates Jan, because old and limited regex flavors don’t make good book material, and both IDEs are built on .NET, which provides a very rich regex flavor fully covered in this book.)
事实证明,EditPad Pro中的正则表达式风格极受欢迎,主要是因为PRCE中的正则表达式语法与Perl兼容,这种风格风靡一时。其它大多数的文本编辑器,包括当时的MicroSoft和Borland公司的集成开发环境,也仅仅提供非常简单的正则表达式支持。(时至2009年,Visual Studio和Delphi仍在延用其古老的正则式风格。两种集成开发环境均基于.Net,它的正则表达式风格异常丰富,这些特性在书中都在书中有全面介绍。可惜的是,这两款IDE基于.Net却在编辑器中没有体现出.Net正则式的支持,因此无法为本书提供更好的素材,这一点让Jan大皱眉头。)
(rex注:本段末尾括号中的长句,翻译时调整了顺序,以便于理清逻辑顺序;添加了一句话作为辅助,便于读者掌握原义。)
Sensing a need for more powerful tools for working with regular expressions on text files, Jan developed PowerGREP. PowerGREP took a slow start in late 2002. Today, it is clearly the most powerful tool on the Microsoft Windows platform for doing anything with regular expressions. One of the differentiating features early on was the inclusion of a detailed regular expression tutorial in the help file. Most other grep tools had only help topic listing all the syntax features that you could print on a single sheet of paper. PowerGREP had a separate detailed help topic for every feature in PCRE.
Jan感觉到在使用正则式处理文本文件,需要一款更强大的工具,于是他开发了PowerGREP软件。PowerGREP项目始于2002年底,起步时进展缓慢。如今,它无疑是微软平台下处理任何正则表达式工作的最强大的工具。原先它卓而不群的标志之一就是在帮助文档中包含了完备翔实的正则表达教程。其它大多数grep工具仅仅列出了所有的语法风格,篇幅不大,在一页纸上就能打印完毕。而PowerGREP对于PCRE中每一个正则式语法要点都言之甚详。
Jan didn’t have a big budget to advertise his software. With many internet marketers preaching that content is king on the search engines, Jan set up http://www.regular-expressions.info with the text from tutorial he had already written for PowerGREP. As he watched the site’s traffic and Google rank rise, ultimately beating the Wikipedia entry at the top, Jan started getting the idea that maybe this could become his area of expertise. Writing his own regular expression engine was still a scary thought.
那时Jan并没有太多的预算来为他的软件打广告。在许多互联网营销专家鼓吹内容为王的大环境下,Jan建立了http://www.regular-expressions.info,内容来源于他已经为PowerGREP写好的教程。当他看到网站流量和Google Rank值不断攀升,远远超过了顶部的Wikipedia条目,Jan意识到或许这才是他的英雄用武之地。不过,编写自己的正则表达式引擎,对于那时的Jan来说,就像是建造空中楼阁一样不敢想像。
(rex注:存疑:是页面顶部的wikipedia条目,还是排行名列前矛的..?)
Regular expressions hit the mainstream development community when .NET was released including a set of powerful regex classes. Not much later the Java platform added the same with the JDK 1.4 release. Seeing lots of Windows developers using regular expressions, and a customer base of EditPad Pro and PowerGREP users needing to test their regular expressions, Jan felt there was a need for a comprehensive tool to create, test, and edit regular expressions. RegexBuddy was released in 2004.
附有一整套正则式类库的.NET发布后,正则表达式成为开发界的主流。不久之后,Java平台也在JDK1.4发行版中加入相同内容。看到众多的Windows开发人员用上了正则表达式,以及相当数量的EditPad Pro和PowerGREP客户群需要对正则表达式进行测试,Jan感觉到,有必要开发一种集创建、测试、编辑正则式于一体的综合工具。2004年,RegexBuddy诞生了。
The PCRE engine which had been such a blessing in 2000 was now seriously limiting both PowerGREP and RegexBuddy. PowerGREP needed to search files larger than 2 GB, and RegexBuddy needed to be compatible with all major regex flavors, not just PCRE. Jan bit the bullet and sweat several months implementing a brand new regular expression engine. The result was a fusion regex flavor that supports almost all the features found in all the regex flavors discussed in this book, and that was fast and flexible enough to meet the needs of PowerGREP’s customers. The new regex engine made the 2005 releases of PowerGREP and RegexBuddy very successful.
对于PowerGREP和RegexBuddy来说,在2000年时PCRE引擎真可谓是来自上天的恩赐,可时至如今,它却成了一种极大的限制了。PowerGREP需要搜索2GB以上的大文件,RegexBuddy也需要兼容所有的主流正则风格,而不仅仅是PCRE风格。于是Jan咬紧牙关,大干数月,终于实现了一种全新的正则式引擎。它是正则式风格集大成者,支持本书中提及的所有正则流派的几乎所有的特性。它足够快速、灵活,满足了PowerGREP客户的需求。新的正则引擎给2005年发布的PowerGREP和RegexBuddy带来极大的成功。
By this time, Jan had become very aware of the differences between all the regular expression flavors. While RegexBuddy could now emulate nearly all the abilities of the popular regular expression flavors, it could not emulate their deficiencies. After much research and testing, Jan released RegexBuddy 3 in 2007 which can emulate the features, and lack thereof, of 15 different regular expression flavors.
到那时,Jan对于所有正则表达式流派之间的区别了如指掌。RegexBuddy能够模拟流行的正则表达式流派的所有性能,却不能模拟它们的缺陷。经过大量研究和测试之后,Jan于2007年发布了RegexBuddy 3,该版本可以模拟15种不同的正则表达式流派的所有风格,包括缺陷。
Having spent so much time researching regular expressions, Jan felt he was ready to write the book on regular expressions. But he didn’t actually set out to do it. It was Steven Levithan, a very enthusiastic RegexBuddy user, who asked him early 2008 if he wanted to co-write a book on regular expressions. Jan hesitated at first, books being much less profitable than software. After some reflection, he decided he would realize his childhood dream of seeing his name in print, before the printed book becomes obsolete.
Jan在研究正则表达式上花费的时间如此之久,他觉得自己可以写一本关于正则表达式的书了。但是他并没有动手写作。一位狂热的RegexBuddy用户,Steven Levithan,在2008年年初询问Jan是否愿意共同写一本关于正则表达式的书。Jan开始有些犹豫,毕竟写书不像写软件那样利润丰厚。再三考虑过之后,他意识到可以实现儿时的梦想,那就是在纸版书绝迹之前,看到自己的名字印到封面。
The result will be published in May 2009. Enjoy.
本书将于2009年5月出版,敬请期待。
Meanwhile, Jan has left cloudy Belgium for tropical Thailand. He now lives with his wife in Phuket, where he enjoys pretending to be a tourist, even though in reality he still spends far too much time flipping the switches on his DataHand.
于此同时,Jan离开了乌云密布的比利时,来到了泰国这个热带国家。他和妻子在普吉岛安顿下来。在这里他喜欢假装成游客,实际上他太多太多的时间还是花费在敲击自己的DataHand键盘上。
Rex评:读过了《Mastering Regular Expressions》,现在很期待读一下这本CookBook。不过,作者也明示、暗示过了,RegexBuddy和PowerGREP的应用会占用书中的相当篇幅;并坦言写书不如写软件赚钱。与《MRE》的作者Jeffrey Friedl相比,Jan的商人味更浓一些,而Jeffrey的学者味更浓。我敬佩资深的学者,羡慕成功的商人,Jan集二者于一身了。理解正则表达式的原理,MRE是一本很好的教材。而CookBook的书名暗示,它就像是菜谱一样,是实用的具体技术实践的指导。这两本我都想深入读,以便了解其术。我更想读的,还包括编译原理,什么时候能够真正写一款自己的编译器,写出自己的正则表达式引擎,即使是很简单地实现。
No trackbacks yet.
Comments are closed.
October 5, 2009 - 10:38 am
真不错,书非常好,工具一流,LZ介绍的也很到位
October 15, 2009 - 12:40 pm
谢谢捧场,欢迎常来。