<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>我爱正则表达式 &#187; translation</title>
	<atom:link href="http://iregex.org/blog/tag/translation/feed" rel="self" type="application/rss+xml" />
	<link>http://iregex.org</link>
	<description>原创、翻译、转载关于正则表达式的文章</description>
	<lastBuildDate>Sun, 27 Jun 2010 04:20:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://blogsearch.google.com/ping/RPC2"/><atom:link rel="hub" href="http://blog.yodao.com/ping/RPC2"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://www.xianguo.com/xmlrpc/ping.php"/><atom:link rel="hub" href="http://www.zhuaxia.com/rpc/server.php"/><atom:link rel="hub" href="http://rpc.technorati.com/rpc/ping"/><atom:link rel="hub" href="http://rpc.pingomatic.com/"/>	
<!-- Start Of Script Generated By WP-PostViews Plus -->
<script type='text/javascript' src='http://iregex.org/wp-includes/js/jquery/jquery.js?ver=1.4.2'></script>
<script type="text/javascript">
/* <![CDATA[ */
/* ]]> */
</script>
<!-- End Of Script Generated By WP-PostViews Plus -->
	<item>
		<title>[译]正则表达式：从菜鸟到大师</title>
		<link>http://iregex.org/blog/from-regex-newbie-to-regex-guru.html</link>
		<comments>http://iregex.org/blog/from-regex-newbie-to-regex-guru.html#comments</comments>
		<pubDate>Tue, 03 Feb 2009 02:06:11 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[翻译]]></category>
		<category><![CDATA[powergrep]]></category>
		<category><![CDATA[regexbuddy]]></category>
		<category><![CDATA[regexguru]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=55</guid>
		<description><![CDATA[Author: Jan Goyvaerts Publish Date: 2 Feb. 2009 Blog entry: http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/ Translated By: Rex (http://iregex.org) Rex注：本文是Jan Goyvaerts为自己的著作《Regular Expression Cookbook》写的... ]]></description>
			<content:encoded><![CDATA[<p>Author: Jan Goyvaerts<br />
  <br />Publish Date: 2 Feb. 2009 </p>
<p>Blog entry: <a title="http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/" href="http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/">http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/</a> </p>
<p>Translated By: Rex (<a href="http://iregex.org">http://iregex.org</a>) </p>
<p>Rex注：本文是Jan Goyvaerts为自己的著作《Regular Expression Cookbook》写的序言中的一段。</p>
<p><font color="#999999">One of my last tasks for the <a href="http://www.regexguru.com/2009/01/regular-expression-cookbook-available-for-pre-order/">Regular Expression Cookbook</a> was to write the preface, including my author bio. I told the story of how I went from my first real encounter with regular expressions in 2000, to the expert I am almost a decade later. </font></p>
<p><a href="http://iregex.org/blog/regular-expression-cookbook-available-for-pre-order.html" target="_blank">《Regular Expression Cookbook》</a>即将完工，剩余的工作之一是作序，包括写我的作者小传。我讲述了自己如何在2000年第一次遭遇正则表达式，并在近乎十年之后才成为专家的经历。</p>
<p><span id="more-55"></span></p>
<p><font color="#999999">My first attempt at writing the bio came out way too long compared with the other sections in the preface. The final bio is only half as long. Rather than let the long bio go to waste, I’m publishing it here, with added links. </font></p>
<p></font>我写的小传初稿，比序言中其它小节的篇幅要长得多，最终还是压缩到一半的长度。原版的较长自传并没有丢到故纸堆里，我将其发布到这里，并加上了链接。</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p><font color="#999999">In 1996, fresh out of high school, Jan Goyvaerts started a hobby project publishing his own software on his own website. It was less than a year before that Internet access had become available at local call rates to his <a href="http://www.heist-op-den-berg.be">Belgian hometown</a>. In 1999, he decided that a university degree was only a ticket to joining the rat race, and focused on his ever more successful software development venture. He set up the business that would eventually become <a href="http://www.just-great-software.com/">Just Great Software</a> in 2000.</font></p>
<p>1996年，Jan Goyvaerts高中毕业刚毕业，就在其个人网站上发布自编的软件作为业余爱好项目。一年之后，互联网才在他的<a href="http://www.heist-op-den-berg.be" rel="nofollow" target="_blank">比利时老家</a>以市话费的价格普及起来。1999年，他意识到，大学学位充其量不过是一块通向更加残酷的竞争之路的敲门砖，因此他集中精力继续做自己更加成功的软件开发。2000年，他成立了自己的公司，该公司是后来的<a href="http://www.just-great-software.com/" target="_blank">Just Great Software（绝佳软件）</a>公司的前身。</p>
<p><font color="#999999">At that time, Jan had no idea he would ever become an expert on regular expressions. One of his early successes was a postcardware text editor called EditPad. Since postcards don’t pay the bills, he developed a commercial text editor called <a href="http://www.editpadpro.com/">EditPad Pro</a>. EditPad Pro, released mid-2000, needed regular expression support to compete. Only the best regex engine would do for “Just Great Software”. Jan decided to go with <a href="http://www.regular-expressions.info/pcre.html">PCRE</a>.</font> </p>
<p></p>
<p>那时，Jan从未料到他有朝一日会成为正则表达式专家。他早期的成功项目之一是一款叫作EditPad的名信片软件。由于名信片不足以补贴家用开支，他开发了一款收费版的文本编辑器：<a href="http://www.editpadpro.com/">EditPad Pro</a>，发布时间是2000年中期。该软件以正则表达式支持作为卖点。只有最好的正则引擎才配得上“Just Great Software（绝佳软件）”这个招牌。Jan决定采用<a href="http://www.regular-expressions.info/pcre.html">PCRE</a>。</p>
<p><font color="#ff008c">(rex注：名信片软件并不是编辑名信片的软件，而是该软件近乎免费，只要给作者寄一张名信片就可以注册。)</font></p>
<p></p>
<p><font color="#999999">The regular expression features in EditPad Pro proved quite popular, particularly because PCRE offered a regex syntax compatible with <a href="http://www.regular-expressions.info/perl.html">Perl</a>, which was all the rage. Most other text editors, even the big IDEs from Microsoft and Borland, had much simpler regular expression support. (In 2009, Visual Studio and Delphi still use those same old regex flavors. This frustrates Jan, because old and limited regex flavors don’t make good book material, and both IDEs are built on .NET, which provides a very rich regex flavor fully covered in <a href="http://www.regular-expression-cookbook.com">this book</a>.)</font></p>
<p>事实证明，EditPad Pro中的正则表达式风格极受欢迎，主要是因为PRCE中的正则表达式语法与<a href="http://www.regular-expressions.info/perl.html">Perl</a>兼容，这种风格风靡一时。其它大多数的文本编辑器，包括当时的MicroSoft和Borland公司的集成开发环境，也仅仅提供非常简单的正则表达式支持。（时至2009年，Visual Studio和Delphi仍在延用其古老的正则式风格。两种集成开发环境均基于.Net，它的正则表达式风格异常丰富，这些特性在书中都在书中有全面介绍。可惜的是，这两款IDE基于.Net却在编辑器中没有体现出.Net正则式的支持，因此无法为<a href="http://www.regular-expression-cookbook.com" target="_blank">本书</a>提供更好的素材，这一点让Jan大皱眉头。） </p>
<p><font color="#ff008c">(rex注：本段末尾括号中的长句，翻译时调整了顺序，以便于理清逻辑顺序；添加了一句话作为辅助，便于读者掌握原义。)</font></p>
<p><font color="#999999">Sensing a need for more powerful tools for working with regular expressions on text files, Jan developed <a href="http://www.powergrep.com/">PowerGREP</a>. PowerGREP took a slow start in late 2002. Today, it is clearly the most powerful tool on the Microsoft Windows platform for doing anything with regular expressions. One of the differentiating features early on was the inclusion of a detailed regular expression tutorial in the help file. Most other grep tools had only help topic listing all the syntax features that you could print on a single sheet of paper. PowerGREP had a separate detailed help topic for every feature in PCRE.</font></p>
<p>Jan感觉到在使用正则式处理文本文件，需要一款更强大的工具，于是他开发了<a href="http://www.powergrep.com/">PowerGREP</a>软件。PowerGREP项目始于2002年底，起步时进展缓慢。如今，它无疑是微软平台下处理任何正则表达式工作的最强大的工具。原先它卓而不群的标志之一就是在帮助文档中包含了完备翔实的正则表达教程。其它大多数grep工具仅仅列出了所有的语法风格，篇幅不大，在一页纸上就能打印完毕。而PowerGREP对于PCRE中每一个正则式语法要点都言之甚详。</p>
<p><font color="#999999">Jan didn’t have a big budget to advertise his software. With many internet marketers preaching that content is king on the search engines, Jan set up <a href="http://www.regular-expressions.info">http://www.regular-expressions.info</a> with the text from tutorial he had already written for PowerGREP. As he watched the site’s traffic and Google rank rise, ultimately beating the Wikipedia entry at the top, Jan started getting the idea that maybe this could become his area of expertise. Writing his own regular expression engine was still a scary thought.</font></p>
<p>那时Jan并没有太多的预算来为他的软件打广告。在许多互联网营销专家鼓吹内容为王的大环境下，Jan建立了<a href="http://www.regular-expressions.info" target="_blank">http://www.regular-expressions.info</a>，内容来源于他已经为PowerGREP写好的教程。当他看到网站流量和Google Rank值不断攀升，远远超过了顶部的Wikipedia条目，Jan意识到或许这才是他的英雄用武之地。不过，编写自己的正则表达式引擎，对于那时的Jan来说，就像是建造空中楼阁一样不敢想像。</p>
<p><font color="#ff008c">(rex注：存疑：是页面顶部的wikipedia条目，还是排行名列前矛的..?)</font></p>
<p><font color="#999999">Regular expressions hit the mainstream development community when <a href="http://www.regular-expressions.info/dotnet.html">.NET</a> was released including a set of powerful regex classes. Not much later the <a href="http://www.regular-expressions.info/java.html">Java</a> platform added the same with the JDK 1.4 release. Seeing lots of Windows developers using regular expressions, and a customer base of EditPad Pro and PowerGREP users needing to test their regular expressions, Jan felt there was a need for a comprehensive tool to create, test, and edit regular expressions. <a href="http://www.regexbuddy.com/">RegexBuddy</a> was released in 2004.</font></p>
<p>附有一整套正则式类库的<a href="http://www.regular-expressions.info/dotnet.html">.NET</a>发布后，正则表达式成为开发界的主流。不久之后，<a href="http://www.regular-expressions.info/java.html">Java</a>平台也在JDK1.4发行版中加入相同内容。看到众多的Windows开发人员用上了正则表达式，以及相当数量的EditPad Pro和PowerGREP客户群需要对正则表达式进行测试，Jan感觉到，有必要开发一种集创建、测试、编辑正则式于一体的综合工具。2004年，<a href="http://www.regexbuddy.com/">RegexBuddy</a>诞生了。</p>
<p><font color="#999999">The PCRE engine which had been such a blessing in 2000 was now seriously limiting both PowerGREP and RegexBuddy. PowerGREP needed to search files larger than 2 GB, and RegexBuddy needed to be compatible with all major regex flavors, not just PCRE. Jan bit the bullet and sweat several months implementing a brand new regular expression engine. The result was a <a href="http://www.regular-expressions.info/refflavors.html">fusion regex flavor</a> that supports almost all the features found in all the regex flavors discussed in this book, and that was fast and flexible enough to meet the needs of PowerGREP’s customers. The new regex engine made the 2005 releases of PowerGREP and RegexBuddy very successful.</font></p>
<p>对于PowerGREP和RegexBuddy来说，在2000年时PCRE引擎真可谓是来自上天的恩赐，可时至如今，它却成了一种极大的限制了。PowerGREP需要搜索2GB以上的大文件，RegexBuddy也需要兼容所有的主流正则风格，而不仅仅是PCRE风格。于是Jan咬紧牙关，大干数月，终于实现了一种全新的正则式引擎。它是<a href="http://www.regular-expressions.info/refflavors.html" target="_blank">正则式风格集大成者</a>，支持本书中提及的所有正则流派的几乎所有的特性。它足够快速、灵活，满足了PowerGREP客户的需求。新的正则引擎给2005年发布的PowerGREP和RegexBuddy带来极大的成功。</p>
<p><font color="#999999">By this time, Jan had become very aware of the differences between all the regular expression flavors. While RegexBuddy could now emulate nearly all the abilities of the popular regular expression flavors, it could not emulate their deficiencies. After much research and testing, Jan released RegexBuddy 3 in 2007 which can emulate the features, and lack thereof, of 15 different regular expression flavors.</font></p>
<p>到那时，Jan对于所有正则表达式流派之间的区别了如指掌。RegexBuddy能够模拟流行的正则表达式流派的所有性能，却不能模拟它们的缺陷。经过大量研究和测试之后，Jan于2007年发布了RegexBuddy 3，该版本可以模拟15种不同的正则表达式流派的所有风格，包括缺陷。</p>
<p><font color="#999999">Having spent so much time researching regular expressions, Jan felt he was ready to write the book on regular expressions. But he didn’t actually set out to do it. It was <a href="http://blog.stevenlevithan.com">Steven Levithan</a>, a very enthusiastic RegexBuddy user, who asked him early 2008 if he wanted to co-write a book on regular expressions. Jan hesitated at first, books being much less profitable than software. After some reflection, he decided he would realize his childhood dream of seeing his name in print, before the printed book becomes obsolete.</font></p>
<p>Jan在研究正则表达式上花费的时间如此之久，他觉得自己可以写一本关于正则表达式的书了。但是他并没有动手写作。一位狂热的RegexBuddy用户，<a href="http://blog.stevenlevithan.com">Steven Levithan</a>，在2008年年初询问Jan是否愿意共同写一本关于正则表达式的书。Jan开始有些犹豫，毕竟写书不像写软件那样利润丰厚。再三考虑过之后，他意识到可以实现儿时的梦想，那就是在纸版书绝迹之前，看到自己的名字印到封面。</p>
<p><font color="#999999">The result <a href="http://www.regexguru.com/2009/01/regular-expression-cookbook-available-for-pre-order/">will be published in May 2009</a>. Enjoy.</font></p>
<p>本书将于<a href="http://iregex.org/blog/regular-expression-cookbook-available-for-pre-order.html" target="_blank">2009年5月出版</a>，敬请期待。</p>
<p><font color="#999999">Meanwhile, Jan has left cloudy Belgium for tropical Thailand. He now lives with his wife in <a href="http://www.phuket.me">Phuket</a>, where he enjoys pretending to be a tourist, even though in reality he still spends far too much time flipping the switches on his <a href="http://www.micro-isv.asia/2009/01/datahand-for-sale-again/">DataHand</a>.</font></p>
<p>于此同时，Jan离开了乌云密布的比利时，来到了泰国这个热带国家。他和妻子在普吉岛安顿下来。在这里他喜欢假装成游客，实际上他太多太多的时间还是花费在敲击自己的<a href="http://www.micro-isv.asia/2009/01/datahand-for-sale-again/">DataHand</a>键盘上。</p>
</blockquote>
<p><font color="#ff008c">Rex评：读过了《Mastering Regular Expressions》，现在很期待读一下这本CookBook。不过，作者也明示、暗示过了，RegexBuddy和PowerGREP的应用会占用书中的相当篇幅；并坦言写书不如写软件赚钱。与《MRE》的作者Jeffrey Friedl相比，Jan的商人味更浓一些，而Jeffrey的学者味更浓。我敬佩资深的学者，羡慕成功的商人，Jan集二者于一身了。理解正则表达式的原理，MRE是一本很好的教材。而CookBook的书名暗示，它就像是菜谱一样，是实用的具体技术实践的指导。这两本我都想深入读，以便了解其术。我更想读的，还包括编译原理，什么时候能够真正写一款自己的编译器，写出自己的正则表达式引擎，即使是很简单地实现。</font></p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/from-regex-newbie-to-regex-guru.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Regular Expression Cookbook 接受预订</title>
		<link>http://iregex.org/blog/regular-expression-cookbook-available-for-pre-order.html</link>
		<comments>http://iregex.org/blog/regular-expression-cookbook-available-for-pre-order.html#comments</comments>
		<pubDate>Wed, 28 Jan 2009 14:07:55 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[新闻]]></category>
		<category><![CDATA[cookbook]]></category>
		<category><![CDATA[regexguru]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=54</guid>
		<description><![CDATA[Regex大牛(regexguru)一直在写的一本书《Regular Expression Cookbook》现在可以在Amazon.com, Amazon.co.uk, Amazon.fr, Amazon.de以及其它许多网店预订了。本书有望于2009年5月15日出版，标价US$ 39.99。本文发布时，Am... ]]></description>
			<content:encoded><![CDATA[<p>Regex大牛(regexguru)一直在写的一本书《Regular Expression Cookbook》现在可以在<a ref="nofollow" href="http://www.amazon.com/gp/product/0596520689?ie=UTF8&amp;tag=jgsbookselection&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0596520689">Amazon.com</a>, <a ref="nofollow" href="http://www.amazon.co.uk/gp/product/0596520689?ie=UTF8&amp;tag=jgsbookselect&amp;linkCode=as2&amp;camp=1634&amp;creative=19450&amp;creativeASIN=0596520689">Amazon.co.uk</a>, <a ref="nofollow" href="http://www.amazon.fr/gp/product/0596520689?ie=UTF8&amp;tag=regularexpres-21&amp;linkCode=as2&amp;camp=1642&amp;creative=19458&amp;creativeASIN=0596520689">Amazon.fr</a>, <a ref="nofollow" href="http://www.amazon.de/gp/product/0596520689?ie=UTF8&amp;tag=regularexpr0a-21&amp;linkCode=as2&amp;camp=1638&amp;creative=19454&amp;creativeASIN=0596520689">Amazon.de</a>以及其它许多网店预订了。本书有望于2009年5月15日出版，标价US$ 39.99。本文发布时，Amazon.com 提供 34% 的优惠, Amazon.co.uk 提供 10% 的优惠。</p>
<p>本书的最后期限是1月31日，作者在该日期之前提交最后的修改勘误，并将为本书新建网站<a ref="nofollow" target="_blank" href="http://www.regular-expression-cookbook.com/">http://www.regular-expression-cookbook.com/</a>。
</p>
<p>新闻来源：RegexGuru: <a ref="nofollow" href="http://www.regexguru.com/2009/01/regular-expression-cookbook-available-for-pre-order/" target="_blank">Regular Expression Cookbook Available for Pre-Order</a></p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/regular-expression-cookbook-available-for-pre-order.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>[译]从文本中析取有效URL链接</title>
		<link>http://iregex.org/blog/translate-detecting-urls-in-a-block-of-text.html</link>
		<comments>http://iregex.org/blog/translate-detecting-urls-in-a-block-of-text.html#comments</comments>
		<pubDate>Fri, 07 Nov 2008 08:11:42 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[教程]]></category>
		<category><![CDATA[translation]]></category>
		<category><![CDATA[url]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=36</guid>
		<description><![CDATA[原文作者是Jan Goyvaerts(Regex Guru)，原页面链接是Detecting URLs in a Block of Text， 翻译者：rex，译者博客(http://iregex.org)。 rex注：URL是Uniform Resource Locator的缩写(wiki)，中文叫作统一资源定位符（百科）... ]]></description>
			<content:encoded><![CDATA[<h5>原文作者是Jan Goyvaerts(<a href="http://www.regex-guru.info">Regex Guru</a>)，原页面链接是<a href="http://www.regex-guru.info/2008/11/detecting-urls-in-a-block-of-text/">Detecting URLs in a Block of Text</a>，</p>
<p>翻译者：rex，译者博客(<a href="http://iregex.org">http://iregex.org</a>)。</h5>
<p>rex注：URL是Uniform Resource Locator的缩写(<a target="_blank" href="http://en.wikipedia.org/wiki/URL">wiki</a>)，中文叫作<strong>统一资源定位符</strong>（<a target="_blank" href="http://baike.baidu.com/view/1496.htm">百科</a>），解释如下：Internet上的每一个网页都具有一个唯一的名称标识，通常称之为URL地址，这种地址可以是本地磁盘，也可以是局域网上的某一台计算机，更多的是Internet上的站点。简单地说，URL就是Web地址，俗称&#8220;网址&#8221;。</p>
<p><span id="more-36"></span></p>
<p>In his blog post <a href="http://www.codinghorror.com/blog/archives/001181.html">The Problem with URLs</a> points out some of the issues with trying to detect URLs in a larger body of text using a regular expression.</p>
<p>在他的博客文章<a target="_blank" href="http://www.codinghorror.com/blog/archives/001181.html">URL难题</a>一文中指出了使用正则表达式在大量文本中尝试检测URL所遇到的一些问题。</p>
<p>The short answer is that it <b>can&#8217;t be done</b>. Pretty much <b>any character is valid in URLs</b>. The very simplistic \bhttp://\S+ not only fails to differentiate between punctuation that&#8217;s part of the URL, and punctuation used to quote the URL. It also fails to match URLs with spaces in them. Yes, spaces are valid in URLs, and I&#8217;ve encountered quite a few web sites that use them over the years. It also forgets other protocols, such as https.</p>
<p>简言之，答案是<strong>做不到</strong>。在多数情况下，<strong>任何字符在URL中都是合法字符</strong>。这条过分简单化的表达式<tt class="regex">\bhttp://\S+</tt>之所以失败，不单因为它无法区分作为URL一部分的标点符号与引用URL的标点符号，还在于它对URL中包含空格的情况也是无能为力。是的，空格在URL中也是合法的，这几年我颇遇到一些网址中包含空格的情况。本条正则式还忽略了其它的网络协议，例如https。</p>
<p>In <a href="http://www.regexbuddy.com/library.html">RegexBuddy&#8217;s library</a>, you&#8217;ll find this regex if you look up &#8220;URL: Find in full text&#8221;:</p>
<p>在<a href="http://www.regexbuddy.com/library.html">RegexBuddy&#8217;s library</a>(RegexBuddy标准库)中，你如果你查找&#8220;URL: Find in full text&#8221;：</p>
<p><tt class="regex">\b(https?|ftp|file)://[-A-Z0-9+&amp;@#/%?=~_|!:,.;]*[A-Z0-9+&amp;@#/%=~_|]</tt> (case insensitive大小写不敏感)</p>
<p>Like every other regex for extracting URLs, it&#8217;s <b>not perfect</b>. The key benefit of this regex is that it uses a separate character class for the last character in the URL, which allows less punctuation characters than the character class for the other characters in the URL. It excludes punctuation that is unlikely to occur at the end of the URL, and more likely to be punctuation that&#8217;s part of the sentence the URL is quoted in. It does not allow parentheses at all.</p>
<p>与其它试图析取URL的正则式一样，它<strong>并不完美</strong>。但是这条正则式的主要优势是，它在结尾处使用单独的文本类，这就限定了结尾处所允许出现的标点字符种类要少于URL的其它部分。它排除了在URL结尾处不太可能出现，而更像是URL所在文本结尾的标点。它根本不允许出现括号。</p>
<p>In <a href="http://www.editpadpro.com/cscs.html">EditPad Pro&#8217;s syntax coloring schemes</a>, which are fully editable and entirely based on regular expressions, you&#8217;ll often find this regex:</p>
<p><a target="_blank" href="http://www.editpadpro.com/cscs.html">EditPad Pro的语法色彩主题</a>是可自定义的，完全基于正则表达式的。在其中，你会经常发现这条正则式：</p>
<p><tt class="regex">\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&amp;@#/%=~_|$?!:,.]*[A-Z0-9+&amp;@#/%=~_|$]</tt></p>
<p>(case insensitive大小写不敏感)</p>
<p>The main difference with the previous regex is that this one matches URLs such as www.regex-guru.info <b>without the http:// protocol</b>. People often type URLs that way in their documents and messages, because most browsers accept them that way too.</p>
<p>本条正则式与上一条的主要区别是，它匹配<tt class="string">www.regex-guru.info</tt>之类的URL，之前没有<tt class="string">http://</tt>协议。人们经常在文件或消息中使用这种方式输入网址，同时大多数浏览器也接受这种方式。</p>
<p>EditPad&#8217;s built-in &#8220;clickable URLs&#8221; syntax highlighting uses this regex:<br />
  <br />EditPad的内置&#8220;可点击的URL&#8221;语法高亮，是由这条正则式实现的：</p>
<p><tt class="regex">\b(?:(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&amp;@#/%?=~_|$!:,.;]*[-A-Z0-9+&amp;@#/%=~_|$]</tt></p>
<p><tt class="regex">&#160;&#160; | ((?:mailto:)?[A-Z0-9._%+-]+@[A-Z0-9._%-]+\.[A-Z]{2,4})\b)</tt></p>
<p><tt class="regex">|&#8221;(?:(?:https?|ftp|file)://|www\.|ftp\.)[^&quot;\r\n]+&#8221;?</tt></p>
<p><tt class="regex">|&#8217;(?:(?:https?|ftp|file)://|www\.|ftp\.)[^'\r\n]+&#8217;?</tt> (free-spacing, case insensitive空格宽松模式，大小写不敏感)</p>
<p>This log regex adds three alternatives to the previous regex. It adds the ability to match <b>email addresses</b>, with or without mailto:, and it matches <b>URLs between single or double quotes</b>. When the URL is quoted, it allows all characters in the URL, except line breaks and the delimiting quote. This way, any URL with weird punctuation can be highlighted correctly by placing it between a pair of quote characters. Because this regex is used to highlight text as you type, the closing quotes are optional. The highlighting will run until the end of the line until you type the closing quote. Remove the question marks after the quote characters if you will use this regex to extract URLs.</p>
<p>这条正则式在前一条的基础上增加了三种备选匹配项。现在它可以匹配电邮地址，有无mailto:均可；匹配<strong>单引号或双引号之间的URL</strong>。当URL在引号内时，它允许出现除换行符或起界作用的引号之外的任意字符。使用这种方法，不论使用何种怪异标点引用的URL，都能将URL与引号分离开来，从而正确高亮显示。由于此正则式是在你输入的同时高亮文本，因此右侧的结尾引号不是必需的。高亮持续显示到本行结尾，直到你输入右侧的结尾引号。如果你要使用本正则式析取URL，请删除引号后面的引号。</p>
<p>So how about Jeff&#8217;s problem?<br />
  <br />我们再来看一下Jeff的问题：</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p>I couldn&#8217;t come up with a way for the regex alone to distinguish between URLs that legitimately end in parens (ala Wikipedia), and URLs that the user has enclosed in parens.<br />
    <br />我实在想不出，怎样仅仅使用正则式，就能正确地区分使用以括号结尾的URL和使用括号括起来的URL。</p>
</blockquote>
<p>That&#8217;s not too hard, if we add the restriction that we only allow unnested pairs of parentheses in URLs. Using the second regex in this article as the starting point, <b>add an alternative for a pair of parentheses to both character classes</b> in that regex:</p>
<p>如果限制在URL只使用不嵌套的括号的话，那么该问题不难解决。使用本文中第二条正则式作为开始，<strong>在正则式的两组字符类中都加上一对可选的括号</strong>：</p>
<p><tt class="regex">\b(?:(?:https?|ftp|file)://|www\.|ftp\.) </tt><br />
    <br /><tt class="regex">&#160; (?:\([-A-Z0-9+&amp;@#/%=~_|$?!:,.]*\)|[-A-Z0-9+&amp;@#/%=~_|$?!:,.])* </tt></p>
<p><tt class="regex">&#160; (?:\([-A-Z0-9+&amp;@#/%=~_|$?!:,.]*\)|[A-Z0-9+&amp;@#/%=~_|$])</tt> (free-spacing, case insensitive空格宽松模式，大小写不敏感)</tt></p>
<p>This regex allows the same set of characters in the middle of the URL, mixed with zero or more sequences of those characters between parentheses. It allows the URL to end with the same reduced set of characters, or a final run between parentheses. Because we require the opening parenthesis to be in the URL, we don&#8217;t have to do anything complicated to check if any closing parentheses we encounter are part of the URL or not.<br />
  <br />这条正则式允许相同的字符集出现在URL中的括号内，组成0个或多个字符的序列。它允许URL以简化的字符集结尾，或者在结尾出现最后一组括号。由于我们要求在URL中出现开括号，所以就不必再做任何复杂工作来验证之后所遇到的闭括号是不是URL的一部分了。</p>
<p>It&#8217;s important that you observe that in order to allow any number of pairs of parentheses in the middle of the regex, I <b>moved the star</b> from the character class to the group it is now in. I did <b>not add another star</b> to the group. A double-star combination like <tt class="regex">(a|b*)*</tt> is a sure-fire recipe for <a href="http://www.regular-expressions.info/catastrophic.html">catastrophic backtracking</a>.</p>
<p>你或许注意到，为了允许正则式中出现任意多次的成对括号，我将字符类中的<strong>星号移到它所在组</strong>。我并没有再给此组加星号。这一点至关重要。像<tt class="regex">(a|b*)*</tt>这样的双星号组合，是<a target="_blank" href="http://www.regular-expressions.info/catastrophic.html">灾难式回溯</a>的保证。</p>
<p></p>
<p>All the regexes in this article will be included in RegexBuddy&#8217;s library with the next free minor update. Current version is 3.2.0.<br />
  <br />本文提及的所有的正则式都将在RegexBuddy下次免费次要升级中添加到正则库库中。当前RegexBuddy的版本是3.2.0。</p>
<p>rex注：在RegexBuddy里，可以使用Alt+7唤出自带的正则库。输入URL回车，查到的结果如下：</p>
<p>&#160; <br /><a title="我爱正则达式" target="_blank" href="http://iregex.org"><img style="border-bottom: rgb(255,255,255) 1px solid; border-left: rgb(255,255,255) 1px solid; margin: 0px 10px 10px; padding-left: 0px; clear: both; border-top: rgb(255,255,255) 1px solid; border-right: rgb(255,255,255) 1px solid" src="http://i37.tinypic.com/20sj0j7.jpg" /></a> </p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">译后：</h3>
<p>我自己探索时，走得很兴奋；有人引路时，很走得很稳健。</p>
<p>抓饭最初就是使用正则表达式析取饭否消息的， 但是所有的文本都遵循良好的格式：XML。当然不可避免非法字符，这些在作为XML展示时会报错。不过，如果镶嵌在HTML中展示时安然无恙。</p>
<p>饭否、叽歪、做啥，都能完美地解析所输入的消息中的URL，并自做主张处理一下，变成自有格式，例如，饭否会把输入的<a href="http://regex.me">http://regex.me</a> 转换为<a title="http://fanfou.com/linkto/aHR0cDovL3JlZ2V4Lm1l" href="http://fanfou.com/linkto/aHR0cDovL3JlZ2V4Lm1l">http://fanfou.com/linkto/aHR0cDovL3JlZ2V4Lm1l</a>，即使在<a href="http://regex.me">http://regex.me</a>之后加上一两个汉字，也不影响转换结果。不过twitter就没这么强大了。在twitter中输入&#8220;<a href="http://regex.me">http://regex.me</a>正则表达式交流论坛&#8221;，输入的的结果是<a href="http://tinyurl.com/5azr6y">http://tinyurl.com/5azr6y</a>，点击展开，URL就成了<a title="http://www.regex.xn--me-y82c39klqi9nmf5umndl14f76g8ol/" href="http://www.regex.me">http://www.regex.me正则表达式交流论坛/</a>，很显然这是一个错误的URL。错误的原因就在于它没有正确地从文本解析URL。</p>
<p><a target="_blank" href="http://fanfou.com/statuses/xWerhzyROOU">撕烤者</a>说过：&#8220;汉语的一个优势是，我们可以很轻易的把它们和代码区分开。&#8221;颇然其说。中英文的区别，不仅仅是肉眼上一目了然，即使使用程序来区分也是毫不费力。不同的编码方案有利于正确解析HTML代码，找到其中的URL结束点。可惜twitter这一点做得实在太滥。</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/translate-detecting-urls-in-a-block-of-text.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
