<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>我爱正则表达式 &#187; 翻译</title>
	<atom:link href="http://iregex.org/blog/category/%e7%bf%bb%e8%af%91/feed" rel="self" type="application/rss+xml" />
	<link>http://iregex.org</link>
	<description>原创、翻译、转载关于正则表达式的文章</description>
	<lastBuildDate>Fri, 30 Mar 2012 12:50:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://blogsearch.google.com/ping/RPC2"/><atom:link rel="hub" href="http://blog.yodao.com/ping/RPC2"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://www.xianguo.com/xmlrpc/ping.php"/><atom:link rel="hub" href="http://www.zhuaxia.com/rpc/server.php"/><atom:link rel="hub" href="http://rpc.technorati.com/rpc/ping"/><atom:link rel="hub" href="http://rpc.pingomatic.com/"/>		<item>
		<title>PHP中的递归正则</title>
		<link>http://iregex.org/blog/recursive-regex-in-php.html</link>
		<comments>http://iregex.org/blog/recursive-regex-in-php.html#comments</comments>
		<pubDate>Mon, 10 May 2010 04:57:28 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[教程]]></category>
		<category><![CDATA[翻译]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[recursive]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=84</guid>
		<description><![CDATA[之前一篇文章翻译了Perl语言中的递归正则表达式. 其实不少语言中的正则都是支持递归的, 例如本文要介绍的PHP正则递归. 虽然, 工作中最常用的正则表达式都很&#8221;正则&#8221;, 只用最基本的语... ]]></description>
			<content:encoded><![CDATA[<p>之前一篇文章翻译了<a title="我爱正则表达式|Perl语言中的递归正则表达式" target="_blank" href="http://iregex.org/blog/recursive-regular-expressions.html">Perl语言中的递归正则表达式</a>. 其实不少语言中的正则都是支持递归的, 例如本文要介绍的PHP正则递归. 虽然, 工作中最常用的正则表达式都很&#8221;正则&#8221;, 只用最基本的语法就能解决85%以上的问题, 而且合理有效地使用普通正则来解决复杂问题也是一门技巧与学问; 但是高级一点的语法的确有它存的价值, 有时不用它还真办不了事儿; 况且学习正则的乐趣也在于<font color="#ff008c">尝试各种各样的可能性, 满足自己无穷无尽的好奇心</font>.</p>
<p><a href="http://iregex.org/blog/recursive-regex-in-php.html" title="我爱正则表达式 | PHP中的递归正则">本文</a>内容, 整理自网文<A HREF="http://www.skdevelopment.com/php-regular-expressions.php">Finer points of PHP regular expressions</A>. 其分析过程剥茧抽丝, 丝丝入扣, 值得一读. 该文系统地列出了PHP中正则表达式常见特性, 我只摘取其中递归部分翻译整理出来. </p>
<p></a><span id="more-84"></span></p>
<h2 style="background-color: rgb(153, 204, 0); line-height: 35px; border: 1px solid rgb(102, 102, 102); color: rgb(0, 0, 0); text-indent: 6px; font-size: 21px;">正文</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">例子</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p>什么时候会用到递归正则表达式呢? 当然是待匹配的字串中递归地出现某种模式时(貌似废话). 最经典的例子, 就是递归正则处理嵌套括号的问题了. 例子如下.  </p>
<p>假设你的文本中包含了正确配对的嵌套括号. 括号的深度可以是无限层. 你想捕获这样的括号组.</p>
<p>恕我剧透, 标准答案是这样的:</p>
<div class="codecolorer-container php mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
<span style="color: #000088;">$string</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;some text (a(b(c)d)e) more text&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;/\(([^()]+|(?R))*\)/&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$string</span><span style="color: #339933;">,</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;&lt;pre&gt;&quot;</span><span style="color: #339933;">;</span> <span style="color: #990000;">print_r</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;&lt;/pre&gt;&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #000000; font-weight: bold;">?&gt;</span></div></td></tr></tbody></table></div>
<p>其输出结果是:</p>
<div class="codecolorer-container php mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #990000;">Array</span><br />
<span style="color: #009900;">&#40;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#40;</span>b<span style="color: #009900;">&#40;</span>c<span style="color: #009900;">&#41;</span>d<span style="color: #009900;">&#41;</span>e<span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> e &nbsp; &nbsp;<br />
<span style="color: #009900;">&#41;</span></div></td></tr></tbody></table></div>
<p>可见, 我们所需要的文本, 已经捕获到<code class="codecolorer php default"><span class="php"><span style="color: #000088;">$matches</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span></span></code>中了.</p>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">原理</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p>现在思考原理.</p>
<p>上面的正则表达式中的关键点是<code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>?R<span style="color: #009900;">&#41;</span></span></code>. <code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>?R<span style="color: #009900;">&#41;</span></span></code>的作用就是递归地替换它所在的整条正则表达式. 在每次迭代时, PHP 语法分析器都会将<code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>?R<span style="color: #009900;">&#41;</span></span></code>替换为&#8221;<code class="codecolorer php default"><span class="php">\<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>^<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+|</span><span style="color: #009900;">&#40;</span>?R<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span>\<span style="color: #009900;">&#41;</span></span></code>&#8220;.</p>
<p>因此, 具体到上述的例子, 其正则表达式等价于:</p>
<p>            <code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;/\(([^()]+|\(([^()]+|\(([^()]+)*\))*\))*\)/&quot;</span></span></code></p>
<p>但是上面的代码只适合深度为3层的括号. 对于未知深度的括号嵌套, 就只好使用这种正则了:</p>
<p>            <code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;/\(([^()]+|(?R))*\)/&quot;</span></span></code> </p>
<p>它不但能够匹配无限深度, 还简化了正则表达式的语法. 功能强大, 语法简洁.</p>
<p>现在来细看一下<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;/\(([^()]+|(?R))*\)/&quot;</span></span></code>是怎样匹配<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;(a(b(c)d)e)&quot;</span></span></code>的:</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<ol>
<li><code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;(c)&quot;</span></span></code>这部分被正则式 <code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;\(([^()]+)*\)&quot;</span></span></code> 匹配. 请注意, <code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>c<span style="color: #009900;">&#41;</span></span></code> 其实就相当于整个递归的一个缩影, 麻雀虽小五脏俱全, 因此它用到了整个正则表达式.<br />换言之, 下一步中的<code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>c<span style="color: #009900;">&#41;</span></span></code>, 可以使用<code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>?R<span style="color: #009900;">&#41;</span></span></code> 来匹配.</li>
<li><code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>b<span style="color: #009900;">&#40;</span>c<span style="color: #009900;">&#41;</span>d<span style="color: #009900;">&#41;</span></span></code>的匹配过程为:
<ol>
<li><code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;\(&quot;</span></span></code>匹配<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;(&quot;</span></span></code>;</li>
<li><code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;[^()]+&quot;</span></span></code>匹配<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;b&quot;</span></span></code>;</li>
<li><code class="codecolorer php default"><span class="php">&nbsp;<span style="color: #009900;">&#40;</span>?R<span style="color: #009900;">&#41;</span></span></code>匹配<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;(c)&quot;</span></span></code>;</li>
<li><code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;[^()]+&quot;</span></span></code>匹配<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;d&quot;</span></span></code>;</li>
<li><code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;\)&quot;</span></span></code>匹配<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">&quot;)&quot;</span></span></code>.</li>
</ol>
</li>
</ol>
<p>根据上面的匹配原理, 不难理解为什么数组的第2个元素<code class="codecolorer php default"><span class="php"><span style="color: #000088;">$matches</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span></span></code>与<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">'e'</span></span></code>等价. 子串<code class="codecolorer php default"><span class="php"><span style="color: #0000ff;">'e'</span></span></code>是在最后一次匹配迭代中被捕获. 匹配过程中, <font color="#ff008c">只有最后一次的捕获结果才会保存到数组中</font>.</p>
<blockquote  style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>rex注: 关于这个特性, 可以自行尝试一下, 看看使用正则式<code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>a<span style="color: #339933;">-</span>z<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">-</span><span style="color: #cc66cc;">9</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span></span></code>来匹配字串<code class="codecolorer php default"><span class="php">abc123xyz890</span></code>, 其捕获结果<code class="codecolorer php default"><span class="php">$<span style="color:#800080;">1</span></span></code>是什么. 注意, 其结果与 Left Longest 原理并不冲突.</p></blockquote>
</blockquote>
<p>         如果我们只需要捕获 <code class="codecolorer php default"><span class="php"><span style="color: #000088;">$matches</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span></span></code>, 可以这样做:</p>
<div class="codecolorer-container php mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$string</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;some text (a(b(c)d)e) more text&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;/\((?:[^()]+|(?R))*\)/&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$string</span><span style="color: #339933;">,</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;&lt;pre&gt;&quot;</span><span style="color: #339933;">;</span> <span style="color: #990000;">print_r</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;&lt;/pre&gt;&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
<span style="color: #000000; font-weight: bold;">?&gt;</span></div></td></tr></tbody></table></div>
<p>产生的结果相同:</p>
<div class="codecolorer-container php mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #990000;">Array</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#40;</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#40;</span>b<span style="color: #009900;">&#40;</span>c<span style="color: #009900;">&#41;</span>d<span style="color: #009900;">&#41;</span>e<span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#41;</span></div></td></tr></tbody></table></div>
<p>所做的改动是捕获括号<code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></span></code>改为非捕获捕获括号<code class="codecolorer php default"><span class="php"><span style="color: #009900;">&#40;</span>?<span style="color: #339933;">:</span><span style="color: #009900;">&#41;</span></span></code>了.</p>
<p>还可以进一步完善为:</p>
<div class="codecolorer-container php mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$string</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;some text (a(b(c)d)e) more text&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;/\((?&gt;[^()]+|(?R))*\)/&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$string</span><span style="color: #339933;">,</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;&lt;pre&gt;&quot;</span><span style="color: #339933;">;</span> <span style="color: #990000;">print_r</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$matches</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;&lt;/pre&gt;&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
<span style="color: #000000; font-weight: bold;">?&gt;</span></div></td></tr></tbody></table></div>
<p>这里我们用到了所谓的一次性模式(rex注: 余晟先生译的《精通正则表达式v3.0》中, 谓之&#8221;固化分组&#8221;. 可参考该书.) PHP手册也推荐只要条件允许, 就尽可能使用这种模式, 以便提升正则表达式的速度.</p>
<p>一次性模式很简单, 这里不再详述. 如果感兴趣, 可以参考PHP 官方手册. 如果您想深入学习PERL兼容式正则表达式, 请参考文末链接.<br />
</blockquote>
</blockquote>
<h2  style="background-color: rgb(153, 204, 0); line-height: 35px; border: 1px solid rgb(102, 102, 102); color: rgb(0, 0, 0); text-indent: 6px; font-size: 21px;">提到的链接</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<ul>
<li>原文: <a href="http://www.skdevelopment.com/php-regular-expressions.php">Finer points of PHP regular expressions</a></li>
<li>Perl兼容正则表达式 <a href="http://www.pcre.org">官网</a> <a href="http://pcre.org/pcre.txt">文档</a></li>
<li><a href="http://us3.php.net/manual/en/reference.pcre.pattern.syntax.php">PHP官网的PCRE正则文档</a></li>
</ul>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/recursive-regex-in-php.html/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>[译]递归正则表达式</title>
		<link>http://iregex.org/blog/recursive-regular-expressions.html</link>
		<comments>http://iregex.org/blog/recursive-regular-expressions.html#comments</comments>
		<pubDate>Wed, 16 Dec 2009 15:06:47 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[翻译]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[recursive]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=73</guid>
		<description><![CDATA[原文在此。rex译于2009年12月15～17日，翻译过程中使用的是google docs@prism@firefox@ubuntu 9.10，很爽的体验。感谢余晟老师在正则和翻译方面的悉心指导。 平时我们用到的正则表达式，其实没那么“... ]]></description>
			<content:encoded><![CDATA[<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>原文<a title="在此" href="http://www.catonmat.net/blog/recursive-regular-expressions/" id="eklo">在此</a>。<a href="http://iregex.org">rex</a>译于2009年12月15～17日，翻译过程中使用的是google docs@prism@firefox@ubuntu 9.10，很爽的体验。感谢<a href="http://www.luanxiang.org/blog/">余晟</a>老师在正则和翻译方面的悉心指导。</p></blockquote>
<p><a href="http://iregex.org/blog/recursive-regular-expressions.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/yo-dawg-regex.jpg" alt="递归正则表达式" border="0"></a></p>
<p>平时我们用到的正则表达式，其实没那么“规则”。多数编程语言所支持的扩展的正则表达式，其运算能力比起<a title="形式语言理论" href="http://en.wikipedia.org/wiki/Formal_language" id="vcy6">形式语言理论</a>所定义的“<a title="规则" href="http://en.wikipedia.org/wiki/Regular_expression" id="gkvv">规则</a>”正则表达式要强得多。</p>
<p><span id="more-73"></span></p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>rex注：在这里其实可以看到，如果将<b>正则表达式</b>译为<b>正规表达式</b>，一切就都通顺了：<br />
平时我们用到的正规表达式，其实没那么“正规”。多数编程语言所支持的扩展的正规表达式，其运算能力比起正式语言理论所定义的“正规”正规表达式要强得多。</p>
<p>regular expression的日文是“正规表现”，在鸟哥书中，好像也将其称为正规表达式。<a title="via" href="http://fanfou.alwaysdata.net/status&amp;id=5144919240" id="bg:v">via</a></p>
</blockquote>
<p>例如，经常用到的<a title="捕获缓存" href="http://perldoc.perl.org/perlre.html#Capture-buffers" id="emj9">捕获缓存</a>，就是用来帮助我们临时存储任意正则表达式模式，以便重复使用。 又如，“<a title="环视断言" href="http://perldoc.perl.org/perlre.html#Look-Around-Assertions" id="xsz-">环视断言</a>”能让正则表达式引擎在做决定之前先偷偷看看环视一下。这些扩展让正则表达式非常强大，足以描述一些“<a title="上下文无关语法" href="http://en.wikipedia.org/wiki/Context-free_grammar" id="acrs">上下文无关语法</a>”。</p>
<p>Perl语言的正则表达式引擎特性异常丰富，其特征之一是<strong>懒惰正则子表达式</strong>（Lazy regular subexpressions），格式为(??{code})，其中的“code”可以是任意一段perl程序，该子表达式可能匹配时，这段程序就会执行。</p>
<p>我们可以利用这一特征来编写出非常有趣的东西，即将正则表达式自身嵌在它的“code”部分，由此生成<b>递归的正则表达式</b>(a recursive regular expression)！</p>
<p>一直以来，正则表达式无法匹配0<sup>n</sup>1<sup>n</sup>这种表达式，也就是由若干个0以及同等数量的1所组成的字符串。如果使用懒惰正则子表达式，这一经典问题就迎刃而解。</p>
<p>下面是匹配0<sup>n</sup>1<sup>n</sup>字串的perl正则表达式代码。</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #0000ff;">$regex</span> <span style="color: #339933;">=</span> <span style="color: #009966; font-style: italic;">qr/0(??{$regex})?1/</span><span style="color: #339933;">;</span></div></td></tr></tbody></table></div>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>rex注：紫龙书第四章有云：“文法是比正则表达式表达能力更强的表示方法。每个可以使用正则表达式描述的构造，都可以使用文法来描述，但是反之不成立。换句话说，每个正则语言都是一个上下文无关语言，但是反之不成立。”书中交待的正则，也应该是指的常规正则表达式，而非现代语言中的扩展的正则表达式。一般使用正则表达式来构造小部件，而使用文法来组建语言框架。
</p></blockquote>
<p>此正则表达式匹配一个字符0，之后是正则表达式自身0或1次，之后是字符1。如果正则表达式自身部分不能匹配，那么它只能匹配01；如果自身部分可以匹配，则正则表达式匹配的是00($regex)?11，此时若不能匹配自身则结果是0011，若可以匹配就是000($regex)?111，……依次顺延。</p>
<p>下面是匹配0<sup>50000</sup>1<sup>50000</sup>的Perl程序：</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #666666; font-style: italic;">#!/usr/bin/perl &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span><br />
<br />
<span style="color: #0000ff;">$str</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;0&quot;</span>x50000 <span style="color: #339933;">.</span> <span style="color: #ff0000;">&quot;1&quot;</span>x50000<span style="color: #339933;">;</span><br />
<span style="color: #0000ff;">$regex</span> <span style="color: #339933;">=</span> <span style="color: #009966; font-style: italic;">qr/0(??{$regex})*1/</span><span style="color: #339933;">;</span><br />
<br />
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$str</span> <span style="color: #339933;">=~</span> <span style="color: #009966; font-style: italic;">/^$regex$/</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; <span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;yes, it matches&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; <span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;no, it doesn't match&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span></div></td></tr></tbody></table></div>
<p>现在来看题图所示的Yo Dawg正则表达式。你先猜猜它的作用？正确答案是，它匹配(foo(bar())baz)这样完全嵌套的括号表达式（fully parenthesized expression）或((()()())())这样的平衡括号表达式（balanced parentheses）。</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #0000ff;">$regex</span> <span style="color: #339933;">=</span> <span style="color: #000066;">qr</span><span style="color: #339933;">/</span><br />
&nbsp; \<span style="color: #009900;">&#40;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;"># (1) match an open paren ( &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #(1)，此处匹配开括号</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#40;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #666666; font-style: italic;"># followed by &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#(2)，紧接着是</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#91;</span><span style="color: #339933;">^</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span> &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;"># &nbsp; (3) one or more non-paren character &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#(3)，1个或多个非括号字符</span><br />
&nbsp; &nbsp; <span style="color: #339933;">|</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;"># OR &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#(4)，或</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#40;</span><span style="color: #339933;">??</span><span style="color: #009900;">&#123;</span><span style="color: #0000ff;">$regex</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span> &nbsp; <span style="color: #666666; font-style: italic;"># &nbsp; (5) the regex itself &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #(5)，正则式自身</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#41;</span><span style="color: #339933;">*</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #666666; font-style: italic;"># (6) repeated zero or more times &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #(6)，重复0或多次</span><br />
&nbsp; \<span style="color: #009900;">&#41;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;"># (7) followed by a close paren ) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #(7)，紧接着是闭括号</span><br />
<span style="color: #339933;">/</span>x<span style="color: #339933;">;</span></div></td></tr></tbody></table></div>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>rex注：关于Yo Dawg图片的含义，可以参考<a title="这里" href="http://yoyodawgdawg.com/about" id="az.x">这里</a>。基本上是全是“Yo dawg, I herd you like X, so we put a Y in your Y so you can Z while you Z”的结构的配图文字。</p></blockquote>
<p>构造此正则表达式的思路是这样的。对于完全嵌套的括号表达式，它的开始字符是一个开括号。这是最简单的一步，我们直接写出（上面程序中的(1)）。同理，它的结束字符是闭括号，于是得到(7)。现在该动脑筋了，括号中间是什么呢？对，它可以是既不是开括号又不是闭括号的任意字符（第(3)点），<b>也可以是</b>另一个完全嵌套的表达式（即第(5)点）。所有的这些，既可以只匹配0次（第(3)点），以便构造最小的完全嵌套括号表达式()，也可以匹配多次来匹配较复杂的表达式。</p>
<p>去掉 /x 选项（即，不再使用多行风格的注释模式），可以简记为：</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #0000ff;">$regex</span> <span style="color: #339933;">=</span> <span style="color: #009966; font-style: italic;">qr/\(([^()]+|(??{$regex}))*\)/</span><span style="color: #339933;">;</span></div></td></tr></tbody></table></div>
<p>但是切勿在正式产品中使用这一特性，它太诡异，不好把握。建议使用较稳定成熟的<a href="http://search.cpan.org/dist/Text-Balanced/">Text::Balanced</a> 或 <a href="http://search.cpan.org/dist/Regexp-Common/">Regexp::Common</a> 模块。</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>rex注：对于(??{code})，perl<a href="http://perldoc.perl.org/perlre.html#%28??{-code-}%29">官方的提示</a>是：此正则表达式仅作测试使用，可能有更新而不作提示。代码执行时产生的副作用，因版本而异，运行结果或有不同，取决于正则引擎的后期优化。</p></blockquote>
<p>最后提醒大家，在Perl 5.10中已经可以使用<a title="递归捕获缓存" href="http://perldoc.perl.org/perlre.html#%28?PARNO%29-%28?-PARNO%29-%28?+PARNO%29-%28?R%29-%28?0%29" id="vchb">递归捕获缓存</a>来替代懒惰代码子表达式了，运行结果相同。</p>
<p>下面是匹配0<sup>n</sup>1<sup>n</sup>的递归捕获缓存语法(?N)的实现：</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$rx</span> <span style="color: #339933;">=</span> <span style="color: #009966; font-style: italic;">qr/(0(?1)*1)/</span><span style="color: #339933;">;</span></div></td></tr></tbody></table></div>
<p>(?1)*的含义是“匹配第一组0或多次”，这里的第一组是指整个正则表达式。</p>
<p>请自行动手，重写平衡括号的正则表达式，当作练习。</p>
<p>祝玩得开心！</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/recursive-regular-expressions.html/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>链接：正则表达式术语表</title>
		<link>http://iregex.org/blog/url-for-regex-terms.html</link>
		<comments>http://iregex.org/blog/url-for-regex-terms.html#comments</comments>
		<pubDate>Sat, 19 Sep 2009 09:42:53 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[翻译]]></category>
		<category><![CDATA[余晟]]></category>
		<category><![CDATA[术语]]></category>
		<category><![CDATA[精通正则表达式]]></category>
		<category><![CDATA[链接]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=66</guid>
		<description><![CDATA[8月份回家度假期间，整理了一下余晟老师译的《精通正则表达式》一书中的部分术语，以google docs形式贴出，这里给出链接... ]]></description>
			<content:encoded><![CDATA[<p>8月份回家度假期间，整理了一下余晟老师译的《精通正则表达式》一书中的部分术语，以google docs形式贴出，这里给出<a href="http://spreadsheets.google.com/pub?key=tgPFRpjDJecMrSRbnOTyi5g&#038;single=true&#038;gid=0&#038;output=html" target="_blank">链接</a>。</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/url-for-regex-terms.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>[转]正则表达式高级技巧基本概念实例详解[译]</title>
		<link>http://iregex.org/blog/crucial-concepts-behind-advanced-regular-expressions.html</link>
		<comments>http://iregex.org/blog/crucial-concepts-behind-advanced-regular-expressions.html#comments</comments>
		<pubDate>Tue, 02 Jun 2009 01:03:26 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[翻译]]></category>
		<category><![CDATA[Atomic Groups]]></category>
		<category><![CDATA[回调]]></category>
		<category><![CDATA[引用]]></category>
		<category><![CDATA[懒惰]]></category>
		<category><![CDATA[捕获]]></category>
		<category><![CDATA[贪婪]]></category>
		<category><![CDATA[递归]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=61</guid>
		<description><![CDATA[rex按：本文转自笨人干笨活的文章：正则表达式高级技巧基本概念实例详解[译]，其RSS在这里：http://blog.benhuoer.com/subscribe/。 英文原文来自Smashing Magazine。由笨活儿翻译。转载请注明出处。 正则... ]]></description>
			<content:encoded><![CDATA[<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>rex按：本文转自<a href="http://blog.benhuoer.com/" target="_blank">笨人干笨活</a>的文章：<a href="http://blog.benhuoer.com/posts/crucial-concepts-behind-advanced-regular-expressions.html" target="_blank">正则表达式高级技巧基本概念实例详解[译]</a>，其RSS在这里：<a href="http://blog.benhuoer.com/subscribe/" target="_blank">http://blog.benhuoer.com/subscribe/</a>。 </p></blockquote>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>英文原文来自<a title="Smashing Magazine" href="http://www.smashingmagazine.com/2009/05/06/introduction-to-advanced-regular-expressions/" target="_blank">Smashing Magazine</a>。由<a title="正则表达式高级技巧" href="http://blog.benhuoer.com/posts/crucial-concepts-behind-advanced-regular-expressions.html" target="_blank">笨活儿</a>翻译。转载请注明出处。</p>
</blockquote>
<p>正则表达式(Regular Expression,  <em>abbr. regex</em>) 功能强大，能够用于在一大串字符里找到所需信息。它利用约定俗成的字符结构表达式来发生作用。不幸的是，简单的正则表达式对于一些高级运用，功能远远不够。若要进行筛选的结构比较复杂，你可能就需要用到<strong>高级正则表达式</strong>。</p>
<p>本文为您<strong>介绍正则表达式的高级技巧</strong>。我们筛选出了八个常用的概念，并配上实例解析，每个例子都是满足某种复杂要求的简单写法。如果你对正则的基本概念尚缺乏了解，请先阅读<a title="正则表达式入门" href="http://unibetter.com/deerchao/zhengzhe-biaodashi-jiaocheng-se.htm#introduction" target="_blank">这篇文章</a>，或者<a href="http://www.regexlab.com/zh/deelx/syntax.htm" target="_blank">这个教程</a>，或者<a title="维基百科" href="http://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F" target="_blank">维基条目</a>。</p>
<p>这里的正则语法适用于PHP，与<a title="维基百科" href="http://zh.wikipedia.org/wiki/Perl" target="_blank">Perl</a>兼容。</p>
<p><span id="more-971"></span></p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">1. 贪婪/懒惰</h3>
<p><img src="http://i27.photobucket.com/albums/c156/jyyjcc/benhuoer-blog/regular-expression/greed.jpg" alt="Greed" width="400" height="300" /></p>
<p>所有能多次限定的正则运算符都是贪婪的。他们<strong>尽可能多</strong>地匹配目标字符串，也就是说匹配结果会<strong>尽可能地长</strong>。不幸的是，这种做法并不总是我们想要的。因此，我们添加“懒惰”限定符来解决问题。在各个贪婪运算符后添加“?”能让表达式只匹配<strong>尽可能短</strong>的长度。另外，修改器“U”也能惰化能多次限定的运算符。理解贪婪与懒惰的区别是运用高级正则表达式的基础。</p>
<h4>贪婪操作符</h4>
<p>操作符 * 匹配之前的表达式零次或零次以上。它是一个贪婪操作符。请看下面的例子：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_match( '/&amp;lt;h1&amp;gt;.*&amp;lt;\/h1&amp;gt;/', '&amp;lt;h1&amp;gt;这是一个标题。&amp;lt;/h1&amp;gt;<br />
&amp;lt;h1&amp;gt;这是另一个。&amp;lt;/h1&amp;gt;', $matches );</div></td></tr></tbody></table></div>
<p>句点(.)能代表除换行符外的任意字符。上面的正则表达式匹配 h1 标签以及标签内的所有内容。它用句点(.)和星号(*)来匹配标签内的所有内容。匹配结果如下：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;lt;h1&amp;gt;这是一个标题。&amp;lt;/h1&amp;gt;&amp;lt;h1&amp;gt;这是另一个。&amp;lt;/h1&amp;gt;</div></td></tr></tbody></table></div>
<p>整个字串都被返回。* 操作符会连续匹配所有内容—— 甚至包括中间的 h1 闭合标签。因为它是贪婪的，匹配整个字串是符合其利益最大化原则。</p>
<h4>懒惰操作符</h4>
<p>把上面的式子稍作修改，加上一个问号(?)，能让表达式变懒惰：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/&amp;lt;h1&amp;gt;.*?&amp;lt;\/h1&amp;gt;/</div></td></tr></tbody></table></div>
<p>这样它会觉得，只需匹配到第一个 h1 结尾标签就完成任务了。</p>
<p>另一个有着类似属性的贪婪操作符是 {n,} 。它代表之前的匹配模式重复n次或n次以上，如果没有加上问号，它会寻找尽可能多的重复次数，加上的话，则会尽可能少重复（当然也就是“重复n次”最少）。</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"># 建立字串<br />
$str = 'hihihi oops hi';<br />
# 使用贪婪的{n,}操作符进行匹配<br />
preg_match( '/(hi){2,}/', $str, $matches ); &nbsp;# matches[0] 将是 'hihihi'<br />
# 使用堕化了的 {n,}? 操作符匹配<br />
preg_match( '/(hi){2,}?/', $str, $matches ); &nbsp;# matches[0] 将是 'hihi'</div></td></tr></tbody></table></div>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">2. 回返引用(Back referencing)</h3>
<p><img src="http://i27.photobucket.com/albums/c156/jyyjcc/benhuoer-blog/regular-expression/back.jpg" alt="Back Referencing" width="300" height="290" /></p>
<h4>有什么用？</h4>
<p><strong>回返引用(Back referencing)</strong>一般被翻译成“反向引用”、“后向引用”、“向后引用”，个人觉得“回返引用”更为贴切[<a title="笨活儿" href="http://blog.benhuoer.com/">笨活儿</a>]。它是在正则表达式内部引用<strong>之前捕获到的内容</strong>的方法。例如，下面这个简单例子的目的是匹配出引号内部的内容：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"># 建立匹配数组<br />
$matches = array();<br />
&nbsp;<br />
# 建立字串<br />
$str = &amp;quot;&amp;quot;This is a 'string'&amp;quot;&amp;quot;;<br />
&nbsp;<br />
# 用正则表达式捕捉内容<br />
preg_match( &amp;quot;/(\&amp;quot;|').*?(\&amp;quot;|')/&amp;quot;, $str, $matches );<br />
&nbsp;<br />
# 输出整个匹配字串<br />
echo &nbsp;$matches[0];</div></td></tr></tbody></table></div>
<p>它会输出：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;quot;This is a'</div></td></tr></tbody></table></div>
<p>显然，这并不是我们想要的内容。</p>
<p>这个表达式从开头的双引号开始匹配，遭遇单引号之后就错误地结束了匹配。这是因为表达式里说：</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(&quot;|')</div></td></tr></tbody></table></div>
<p>，也就是双引号（</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&quot;</div></td></tr></tbody></table></div>
<p>）和单引号（</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">'</div></td></tr></tbody></table></div>
<p>）均可。要修正这个问题，你可以用到回返引用。<strong>表达式\1,\2,&#8230;,\9</strong> 是对前面已捕获到的各个子内容的编组序号，能作为对这些编组的“指针”而被引用。在此例中，第一个被匹配的引号就由</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\1</div></td></tr></tbody></table></div>
<p>代表。</p>
<h4>如何运用？</h4>
<p>将上面的例子中，后面的闭合引号替换为1：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_match( '/(\&amp;quot;|').*?\1/', $str, $matches );</div></td></tr></tbody></table></div>
<p>这会正确地返回字串：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;quot;This is a 'string'&amp;quot;</div></td></tr></tbody></table></div>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p><strong>译注思考题：</strong></p>
<p>如果是中文引号，前引号和后引号不是同一个字符，怎么办？</p>
</blockquote>
<p>还记得PHP函数</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_replace</div></td></tr></tbody></table></div>
<p>吗？其中也有回返引用。只不过我们没有用 \1 … \9，而是用了 $1 … $9 … $n （此处任意数目均可）作为回返指针。例如，如果你想把所有的段落标签</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;lt;p&amp;gt;</div></td></tr></tbody></table></div>
<p>都替换成文本：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$text = preg_replace( '/&amp;lt;p&amp;gt;(.*?)&amp;lt;/p&amp;gt;/',<br />
&amp;quot;&amp;amp;lt;p&amp;amp;gt;$1&amp;amp;lt;/p&amp;amp;gt;&amp;quot;, $html );</div></td></tr></tbody></table></div>
<p>参数$1是一个回返引用，代表段落标签</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;lt;p&amp;gt;</div></td></tr></tbody></table></div>
<p>内部的文字，并插入到替换后的文本里。这种简便易用的表达式写法为我们提供了一个获取已匹配文字的简单方法，甚至在替换文本时也能使用。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">3. 已命名捕获组(Named Groups)</h3>
<p>当在一个表达式内多次用到回调引用时，很容易就把事情搞混淆，要弄清那些数字（1 … 9）都代表哪一个子内容是件很麻烦的事。回调引用的一个替代方法是使用带名字的捕获组（下文简称“有名组”）。有名组使用</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(?P&amp;lt;name&amp;gt;pattern)</div></td></tr></tbody></table></div>
<p>来设定，name代表组名，pattern是配合该有名组的正则结构。请看下面的例子：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/(?P&amp;lt;quote&amp;gt;&amp;quot;|').*?(?P=quote)/</div></td></tr></tbody></table></div>
<p>上式中，quote就是组名，</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&quot;|'</div></td></tr></tbody></table></div>
<p>是改组匹配内容的正则。后面的(?P=quote)是在调用组名为quote的有名组。这个式子的效果和上面的回调引用实例一样，只不过是用了有名组来实现。是不是更加易读易懂了？</p>
<p>有名组也能用于处理已匹配内容之数组的内部数据。赋予特定正则的组名也能作为所匹配到的内容在数组内部的索引词。</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_match( '/(?P&amp;lt;quote&amp;gt;&amp;quot;|\')/', &amp;quot;'String'&amp;quot;, $matches );<br />
&nbsp;<br />
# 下面的语句输出“'”(不包括双引号)<br />
echo $matches[1];<br />
&nbsp;<br />
# 使用组名调用，也会输出“'”<br />
echo $matches['quote'];</div></td></tr></tbody></table></div>
<p>所以，有名组并不只是让写代码更容易，它也能用于组织代码。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">4. 字词边界(Word Boundaries)</h3>
<p><img src="http://i27.photobucket.com/albums/c156/jyyjcc/benhuoer-blog/regular-expression/boundary.jpg" alt="Word Boundaries" width="400" height="284" /></p>
<p><strong>字词边界</strong>是字串里的字词字符（包括字母、数字和下划线，自然也包括汉字）和非字词字符之间的位置。其特殊之处就在于，它并不匹配某个实在的字符。它的长度是<strong>零</strong>。</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\b</div></td></tr></tbody></table></div>
<p>匹配所有字词边界。</p>
<p>不幸的是，字词边界一般都被忽视掉了，大部分人都没有在意他的现实意义。 例如，如果你想要匹配单词“import”：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/import/</div></td></tr></tbody></table></div>
<p>注意了！正则表达式有时候很调皮的。下面的字串也能和上面的式子匹配成功：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">important</div></td></tr></tbody></table></div>
<p>你或许觉得，只要在import前后加上空格，不就可以匹配这个独立的单词了：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/ import /</div></td></tr></tbody></table></div>
<p>那如果遇上这种情况呢：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">The trader voted for the import</div></td></tr></tbody></table></div>
<p>当 import 这个词在字串开头或者结尾时，修改后的表达式仍然不能用。因此，考虑各种情况是必须的：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/(^import | import | import$)/i</div></td></tr></tbody></table></div>
<p>别慌，还没完呢。如果遇到标点符号了呢？就为了满足这一个单词的匹配，你的正则可能就需要这样写：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/(^import(:|;|,)? | import(:|;|,)? | import(\.|\?|\!)?$)/i</div></td></tr></tbody></table></div>
<p>对于只匹配一个单词来说，这样做实在是有点大动干戈了。正因如此，字词边界才显得意义重大。要适应上述要求，以及<strong>很多其他情况变种</strong>，有了字符边界，我们所需写的代码只是：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/\bimport\b/</div></td></tr></tbody></table></div>
<p>上面所有情况都得到了解决。</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\b</div></td></tr></tbody></table></div>
<p>的灵活性就在于，它是一个没有长度的匹配。它只匹配两个实际字符之间想象出的位置。它检查两个相邻字符是否是一个为单字，另一个为非单字。情况符合，就返回匹配。如果遇到了单词的开头或结尾，</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\b</div></td></tr></tbody></table></div>
<p>会把它当成是非单词字符对待。由于import里面的</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">i</div></td></tr></tbody></table></div>
<p>仍然被看成是单词字符，import 就被匹配出来了。</p>
<p>注意，与</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\b</div></td></tr></tbody></table></div>
<p>相对，我们还有</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\B</div></td></tr></tbody></table></div>
<p>，此操作符匹配两个单字或者两个非单字之间的位置。因此，如果你想匹配在某个单词内部的‘hi’，可以使用：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\Bhi\B</div></td></tr></tbody></table></div>
<p>“this”、“hight”，都会返回匹配，而“hi there”则不会返回匹配。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">5. 最小组团(Atomic Groups)</h3>
<p><img src="http://i27.photobucket.com/albums/c156/jyyjcc/benhuoer-blog/regular-expression/groups.jpg" alt="Advanced Operators" width="400" height="266" /></p>
<p><strong>最小组团</strong>是无捕捉的特殊正则表达式分组。通常用来提高正则表达式的效能，也能用于消除特定匹配。一个最小组团可以用(?&gt;pattern) 来定义，其中pattern是匹配式。</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/(?&amp;gt;his|this)/</div></td></tr></tbody></table></div>
<p>当正则引擎针对最小组团进行匹配时，它会跳过组团内标记的回溯位置。以单词“smashing”为例，当用上面的正则表达式匹配时，正则引擎会先尝试在“smashing”里寻找“his”。显然，找不到任何匹配。此时，最小组团就发挥作用了：正则引擎会放弃所有回溯位置。也就是说，它不会尝试再从“smashing”里查找“this”。为什么要这样设置？因为“his”都没有返回匹配结果，包含有“his”的“this”当然就更匹配不了了！</p>
<p>上面的例子并没有什么实用性，我们用</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/t?his?/</div></td></tr></tbody></table></div>
<p>也能达到效果。再看看下面的例子：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/\b(engineer|engrave|end)\b/</div></td></tr></tbody></table></div>
<p>如果把“engineering”拿去匹配，正则引擎会先匹配到“engineer”，但接下来就遇到了字词边界，</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\b</div></td></tr></tbody></table></div>
<p>，所以匹配不成功。然后，正则引擎又会尝试在字串里寻找下一个匹配内容：engrave。匹配到eng的时候，后面的又对不上了，匹配失败。最后，尝试“end”，结果同样是失败。仔细观察，你会发现，一旦engineer匹配失败，并且都抵达了字词边界，“engrave”和“end”这两个词就已经不可能匹配成功了。这两个词都比engineer短小，正则引擎不应该再多做无谓的尝试。</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/\b(?&amp;gt;engineer|engrave|end)\b/</div></td></tr></tbody></table></div>
<p>上面的替代写法更能节省正则引擎的匹配时间，提高代码的工作效率。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">6. 递归(Recursion)</h3>
<p><img src="http://i27.photobucket.com/albums/c156/jyyjcc/benhuoer-blog/regular-expression/recursion.jpg" alt="Recursion" width="400" height="300" /></p>
<p><strong>递归(Recursion)</strong>用于匹配嵌套结构，例如括弧嵌套， (this (that))，HTML标签嵌套</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;lt;div&amp;gt;</div></td></tr></tbody></table></div>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;lt;div&amp;gt;&amp;lt;/div&amp;gt;</div></td></tr></tbody></table></div>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;lt;/div&amp;gt;</div></td></tr></tbody></table></div>
<p>。我们使用</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(?R)</div></td></tr></tbody></table></div>
<p>来代表递归过程中的子模式。下面是一个匹配嵌套括弧的例子：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/\(((?&amp;gt;[^()]+)|(?R))*\)/</div></td></tr></tbody></table></div>
<p>最外层使用了反义符的括号“</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\(</div></td></tr></tbody></table></div>
<p>”匹配嵌套结构的开端。然后是一个多选项操作符</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">( * | * )</div></td></tr></tbody></table></div>
<p>，可能匹配除括号外的所有字符 “</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(?&amp;gt;[^()]+)</div></td></tr></tbody></table></div>
<p>”，也可能是通过子模式“</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(?R)</div></td></tr></tbody></table></div>
<p>”来再次匹配整个表达式。请注意，这个操作符会尽量多地匹配所有嵌套。</p>
<p>递归的另一个实例如下：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/&amp;lt;([\w]+).*?&amp;gt;((?&amp;gt;[^&amp;lt;&amp;gt;]+)|((?R)))*&amp;lt;\/\1&amp;gt;/</div></td></tr></tbody></table></div>
<p>以上表达式综合运用了字符分组，贪婪操作符、回溯，以及最小化组团来匹配嵌套标签。第一个括弧内分组</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">([w]+)</div></td></tr></tbody></table></div>
<p>匹配出标签名，用于接下来的应用。若找到这尖括号样式的标签，则尝试寻找标签内容的剩余部分。下一个括弧括起来的子表达式和上一个实例非常相似：要么匹配不包括尖括号的所有字符</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">?&amp;gt;[^&amp;lt;&amp;gt;]+</div></td></tr></tbody></table></div>
<p>，要么递归匹配整个表达式</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(?R)</div></td></tr></tbody></table></div>
<p>。表达式最后的</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&amp;lt;/1&amp;gt;</div></td></tr></tbody></table></div>
<p>代表闭合标签。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">7. 回调(Callbacks)</h3>
<p><img src="http://i27.photobucket.com/albums/c156/jyyjcc/benhuoer-blog/regular-expression/call.jpg" alt="Callbacks" width="400" height="290" /></p>
<p>匹配结果中的特定内容有时可能会需要某种特别的修改。要应用多重而复杂的修改，正则表达式<strong>的回调</strong>就有了用武之地。回调是用于函数</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_replace_callback</div></td></tr></tbody></table></div>
<p>中的动态修改字串的方式。你可以为</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_replace_callback</div></td></tr></tbody></table></div>
<p>指定某个函数为参数，此函数能接收匹配结果数组为参数，并将数组修改后返回，作为替换的结果。</p>
<p>例如，我们想将某字串中的字母全部转变成大写。十分不巧，PHP没有直接转化字母大小写的正则操作符。要完成这项任务，就可以用到正则回调。首先，表达式要匹配出所有需要被大写的字母：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/\b\w/</div></td></tr></tbody></table></div>
<p>上式同时使用了字词边界和字符类。光有这个式子还不够，我们还需要一个回调函数：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">function upper_case( $matches ) {<br />
return strtoupper( $matches[0] );<br />
}</div></td></tr></tbody></table></div>
<p>函数</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">upper_case</div></td></tr></tbody></table></div>
<p>接收匹配结果数组，并将整个匹配结果转化成大写。 在此例中，</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$matches[0]</div></td></tr></tbody></table></div>
<p>代表需要被大写化的字母。然后，我们再利用</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_replace_callback</div></td></tr></tbody></table></div>
<p>实现回调：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">preg_replace_callback( '/\b\w/', &amp;quot;upper_case&amp;quot;, $str );</div></td></tr></tbody></table></div>
<p>一个简单的回调即有这般强大的力量。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">8. 注释(Commenting)</h3>
<p><img src="http://i27.photobucket.com/albums/c156/jyyjcc/benhuoer-blog/regular-expression/comment.jpg" alt="Commenting" width="400" height="300" /></p>
<p><strong>注释</strong>不用来匹配字串，但确实是正则表达式中最重要的部分。当正则越写越深入，越写越复杂，要推译出究竟什么东西被匹配就会变得越来越困难。在正则表达式中间加上注释，是最小化将来的迷糊和困惑的最佳方式。</p>
<p>要在正则表达式内部加上注释，使用</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(?#comment)</div></td></tr></tbody></table></div>
<p>格式。把“comment”替换成你的注释语句：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/(?#数字)\d/</div></td></tr></tbody></table></div>
<p>如果你打算把代码公之于众，为正则表达式加上注释就显得尤为重要。这样别人才能更容易看懂和修改你的代码。和其他场合的注释一样，这样做也能为你重访自己以前写的程序时提供方便。</p>
<p>考虑使用“x”或“(?x)”修改器来格式化注释。这个修改器让正则引擎忽略表达式参数之间的空格。“有用的”空格仍然能够通过</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[ ]</div></td></tr></tbody></table></div>
<p>或</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\</div></td></tr></tbody></table></div>
<p>（反义符加空格）来匹配。</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/<br />
\d &nbsp; &nbsp;#digit<br />
[ ] &nbsp; #space<br />
\w+ &nbsp; #word<br />
/</div></td></tr></tbody></table></div>
<p>上面的代码与下面的式子作用一样：</p>
<div class="codecolorer-container text mac-classic brush: php;" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/\d(?#digit)[ ](?#space)\w+(?#word)/</div></td></tr></tbody></table></div>
<p>请时刻注意代码的可读性。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">更多资源（英文）</h3>
<ul>
<li><a href="http://www.regular-expressions.info/" target="_blank">Regular-Expressions.info</a> Comprehensive website on regular expressions</li>
<li><a href="http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/" target="_blank"> Cheat Sheet</a>Informative regular expressions cheat sheet</li>
<li><a href="http://www.jslab.dk/tools.regex.php" target="_blank"> Regex Generator</a>JavaScript regular expressions generator</li>
</ul>
<h4>关于作者</h4>
<p><em>Karthik Viswanathan 是一个喜欢编程和做网站的高中生。你可以到他的博客上查看他的作品：<a href="http://www.lateralcode.com/" target="_blank">Lateral Code</a>。你也可以关注一下他的线上<a href="http://twitter.lateralcode.com/" target="_blank">Twitter应用</a>。</em></p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/crucial-concepts-behind-advanced-regular-expressions.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>[译]正则表达式：从菜鸟到大师</title>
		<link>http://iregex.org/blog/from-regex-newbie-to-regex-guru.html</link>
		<comments>http://iregex.org/blog/from-regex-newbie-to-regex-guru.html#comments</comments>
		<pubDate>Tue, 03 Feb 2009 02:06:11 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[翻译]]></category>
		<category><![CDATA[powergrep]]></category>
		<category><![CDATA[regexbuddy]]></category>
		<category><![CDATA[regexguru]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=55</guid>
		<description><![CDATA[Author: Jan Goyvaerts Publish Date: 2 Feb. 2009 Blog entry: http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/ Translated By: Rex (http://iregex.org) Rex注：本文是Jan Goyvaerts为自己的著作《Regular Expression Cookbook》写的... ]]></description>
			<content:encoded><![CDATA[<p>Author: Jan Goyvaerts<br />
  <br />Publish Date: 2 Feb. 2009 </p>
<p>Blog entry: <a title="http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/" href="http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/">http://www.regexguru.com/2009/02/from-regex-newbie-to-regex-guru/</a> </p>
<p>Translated By: Rex (<a href="http://iregex.org">http://iregex.org</a>) </p>
<p>Rex注：本文是Jan Goyvaerts为自己的著作《Regular Expression Cookbook》写的序言中的一段。</p>
<p><font color="#999999">One of my last tasks for the <a href="http://www.regexguru.com/2009/01/regular-expression-cookbook-available-for-pre-order/">Regular Expression Cookbook</a> was to write the preface, including my author bio. I told the story of how I went from my first real encounter with regular expressions in 2000, to the expert I am almost a decade later. </font></p>
<p><a href="http://iregex.org/blog/regular-expression-cookbook-available-for-pre-order.html" target="_blank">《Regular Expression Cookbook》</a>即将完工，剩余的工作之一是作序，包括写我的作者小传。我讲述了自己如何在2000年第一次遭遇正则表达式，并在近乎十年之后才成为专家的经历。</p>
<p><span id="more-55"></span></p>
<p><font color="#999999">My first attempt at writing the bio came out way too long compared with the other sections in the preface. The final bio is only half as long. Rather than let the long bio go to waste, I’m publishing it here, with added links. </font></p>
<p></font>我写的小传初稿，比序言中其它小节的篇幅要长得多，最终还是压缩到一半的长度。原版的较长自传并没有丢到故纸堆里，我将其发布到这里，并加上了链接。</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p><font color="#999999">In 1996, fresh out of high school, Jan Goyvaerts started a hobby project publishing his own software on his own website. It was less than a year before that Internet access had become available at local call rates to his <a href="http://www.heist-op-den-berg.be">Belgian hometown</a>. In 1999, he decided that a university degree was only a ticket to joining the rat race, and focused on his ever more successful software development venture. He set up the business that would eventually become <a href="http://www.just-great-software.com/">Just Great Software</a> in 2000.</font></p>
<p>1996年，Jan Goyvaerts高中毕业刚毕业，就在其个人网站上发布自编的软件作为业余爱好项目。一年之后，互联网才在他的<a href="http://www.heist-op-den-berg.be" rel="nofollow" target="_blank">比利时老家</a>以市话费的价格普及起来。1999年，他意识到，大学学位充其量不过是一块通向更加残酷的竞争之路的敲门砖，因此他集中精力继续做自己更加成功的软件开发。2000年，他成立了自己的公司，该公司是后来的<a href="http://www.just-great-software.com/" target="_blank">Just Great Software（绝佳软件）</a>公司的前身。</p>
<p><font color="#999999">At that time, Jan had no idea he would ever become an expert on regular expressions. One of his early successes was a postcardware text editor called EditPad. Since postcards don’t pay the bills, he developed a commercial text editor called <a href="http://www.editpadpro.com/">EditPad Pro</a>. EditPad Pro, released mid-2000, needed regular expression support to compete. Only the best regex engine would do for “Just Great Software”. Jan decided to go with <a href="http://www.regular-expressions.info/pcre.html">PCRE</a>.</font> </p>
<p></p>
<p>那时，Jan从未料到他有朝一日会成为正则表达式专家。他早期的成功项目之一是一款叫作EditPad的名信片软件。由于名信片不足以补贴家用开支，他开发了一款收费版的文本编辑器：<a href="http://www.editpadpro.com/">EditPad Pro</a>，发布时间是2000年中期。该软件以正则表达式支持作为卖点。只有最好的正则引擎才配得上“Just Great Software（绝佳软件）”这个招牌。Jan决定采用<a href="http://www.regular-expressions.info/pcre.html">PCRE</a>。</p>
<p><font color="#ff008c">(rex注：名信片软件并不是编辑名信片的软件，而是该软件近乎免费，只要给作者寄一张名信片就可以注册。)</font></p>
<p></p>
<p><font color="#999999">The regular expression features in EditPad Pro proved quite popular, particularly because PCRE offered a regex syntax compatible with <a href="http://www.regular-expressions.info/perl.html">Perl</a>, which was all the rage. Most other text editors, even the big IDEs from Microsoft and Borland, had much simpler regular expression support. (In 2009, Visual Studio and Delphi still use those same old regex flavors. This frustrates Jan, because old and limited regex flavors don’t make good book material, and both IDEs are built on .NET, which provides a very rich regex flavor fully covered in <a href="http://www.regular-expression-cookbook.com">this book</a>.)</font></p>
<p>事实证明，EditPad Pro中的正则表达式风格极受欢迎，主要是因为PRCE中的正则表达式语法与<a href="http://www.regular-expressions.info/perl.html">Perl</a>兼容，这种风格风靡一时。其它大多数的文本编辑器，包括当时的MicroSoft和Borland公司的集成开发环境，也仅仅提供非常简单的正则表达式支持。（时至2009年，Visual Studio和Delphi仍在延用其古老的正则式风格。两种集成开发环境均基于.Net，它的正则表达式风格异常丰富，这些特性在书中都在书中有全面介绍。可惜的是，这两款IDE基于.Net却在编辑器中没有体现出.Net正则式的支持，因此无法为<a href="http://www.regular-expression-cookbook.com" target="_blank">本书</a>提供更好的素材，这一点让Jan大皱眉头。） </p>
<p><font color="#ff008c">(rex注：本段末尾括号中的长句，翻译时调整了顺序，以便于理清逻辑顺序；添加了一句话作为辅助，便于读者掌握原义。)</font></p>
<p><font color="#999999">Sensing a need for more powerful tools for working with regular expressions on text files, Jan developed <a href="http://www.powergrep.com/">PowerGREP</a>. PowerGREP took a slow start in late 2002. Today, it is clearly the most powerful tool on the Microsoft Windows platform for doing anything with regular expressions. One of the differentiating features early on was the inclusion of a detailed regular expression tutorial in the help file. Most other grep tools had only help topic listing all the syntax features that you could print on a single sheet of paper. PowerGREP had a separate detailed help topic for every feature in PCRE.</font></p>
<p>Jan感觉到在使用正则式处理文本文件，需要一款更强大的工具，于是他开发了<a href="http://www.powergrep.com/">PowerGREP</a>软件。PowerGREP项目始于2002年底，起步时进展缓慢。如今，它无疑是微软平台下处理任何正则表达式工作的最强大的工具。原先它卓而不群的标志之一就是在帮助文档中包含了完备翔实的正则表达教程。其它大多数grep工具仅仅列出了所有的语法风格，篇幅不大，在一页纸上就能打印完毕。而PowerGREP对于PCRE中每一个正则式语法要点都言之甚详。</p>
<p><font color="#999999">Jan didn’t have a big budget to advertise his software. With many internet marketers preaching that content is king on the search engines, Jan set up <a href="http://www.regular-expressions.info">http://www.regular-expressions.info</a> with the text from tutorial he had already written for PowerGREP. As he watched the site’s traffic and Google rank rise, ultimately beating the Wikipedia entry at the top, Jan started getting the idea that maybe this could become his area of expertise. Writing his own regular expression engine was still a scary thought.</font></p>
<p>那时Jan并没有太多的预算来为他的软件打广告。在许多互联网营销专家鼓吹内容为王的大环境下，Jan建立了<a href="http://www.regular-expressions.info" target="_blank">http://www.regular-expressions.info</a>，内容来源于他已经为PowerGREP写好的教程。当他看到网站流量和Google Rank值不断攀升，远远超过了顶部的Wikipedia条目，Jan意识到或许这才是他的英雄用武之地。不过，编写自己的正则表达式引擎，对于那时的Jan来说，就像是建造空中楼阁一样不敢想像。</p>
<p><font color="#ff008c">(rex注：存疑：是页面顶部的wikipedia条目，还是排行名列前矛的..?)</font></p>
<p><font color="#999999">Regular expressions hit the mainstream development community when <a href="http://www.regular-expressions.info/dotnet.html">.NET</a> was released including a set of powerful regex classes. Not much later the <a href="http://www.regular-expressions.info/java.html">Java</a> platform added the same with the JDK 1.4 release. Seeing lots of Windows developers using regular expressions, and a customer base of EditPad Pro and PowerGREP users needing to test their regular expressions, Jan felt there was a need for a comprehensive tool to create, test, and edit regular expressions. <a href="http://www.regexbuddy.com/">RegexBuddy</a> was released in 2004.</font></p>
<p>附有一整套正则式类库的<a href="http://www.regular-expressions.info/dotnet.html">.NET</a>发布后，正则表达式成为开发界的主流。不久之后，<a href="http://www.regular-expressions.info/java.html">Java</a>平台也在JDK1.4发行版中加入相同内容。看到众多的Windows开发人员用上了正则表达式，以及相当数量的EditPad Pro和PowerGREP客户群需要对正则表达式进行测试，Jan感觉到，有必要开发一种集创建、测试、编辑正则式于一体的综合工具。2004年，<a href="http://www.regexbuddy.com/">RegexBuddy</a>诞生了。</p>
<p><font color="#999999">The PCRE engine which had been such a blessing in 2000 was now seriously limiting both PowerGREP and RegexBuddy. PowerGREP needed to search files larger than 2 GB, and RegexBuddy needed to be compatible with all major regex flavors, not just PCRE. Jan bit the bullet and sweat several months implementing a brand new regular expression engine. The result was a <a href="http://www.regular-expressions.info/refflavors.html">fusion regex flavor</a> that supports almost all the features found in all the regex flavors discussed in this book, and that was fast and flexible enough to meet the needs of PowerGREP’s customers. The new regex engine made the 2005 releases of PowerGREP and RegexBuddy very successful.</font></p>
<p>对于PowerGREP和RegexBuddy来说，在2000年时PCRE引擎真可谓是来自上天的恩赐，可时至如今，它却成了一种极大的限制了。PowerGREP需要搜索2GB以上的大文件，RegexBuddy也需要兼容所有的主流正则风格，而不仅仅是PCRE风格。于是Jan咬紧牙关，大干数月，终于实现了一种全新的正则式引擎。它是<a href="http://www.regular-expressions.info/refflavors.html" target="_blank">正则式风格集大成者</a>，支持本书中提及的所有正则流派的几乎所有的特性。它足够快速、灵活，满足了PowerGREP客户的需求。新的正则引擎给2005年发布的PowerGREP和RegexBuddy带来极大的成功。</p>
<p><font color="#999999">By this time, Jan had become very aware of the differences between all the regular expression flavors. While RegexBuddy could now emulate nearly all the abilities of the popular regular expression flavors, it could not emulate their deficiencies. After much research and testing, Jan released RegexBuddy 3 in 2007 which can emulate the features, and lack thereof, of 15 different regular expression flavors.</font></p>
<p>到那时，Jan对于所有正则表达式流派之间的区别了如指掌。RegexBuddy能够模拟流行的正则表达式流派的所有性能，却不能模拟它们的缺陷。经过大量研究和测试之后，Jan于2007年发布了RegexBuddy 3，该版本可以模拟15种不同的正则表达式流派的所有风格，包括缺陷。</p>
<p><font color="#999999">Having spent so much time researching regular expressions, Jan felt he was ready to write the book on regular expressions. But he didn’t actually set out to do it. It was <a href="http://blog.stevenlevithan.com">Steven Levithan</a>, a very enthusiastic RegexBuddy user, who asked him early 2008 if he wanted to co-write a book on regular expressions. Jan hesitated at first, books being much less profitable than software. After some reflection, he decided he would realize his childhood dream of seeing his name in print, before the printed book becomes obsolete.</font></p>
<p>Jan在研究正则表达式上花费的时间如此之久，他觉得自己可以写一本关于正则表达式的书了。但是他并没有动手写作。一位狂热的RegexBuddy用户，<a href="http://blog.stevenlevithan.com">Steven Levithan</a>，在2008年年初询问Jan是否愿意共同写一本关于正则表达式的书。Jan开始有些犹豫，毕竟写书不像写软件那样利润丰厚。再三考虑过之后，他意识到可以实现儿时的梦想，那就是在纸版书绝迹之前，看到自己的名字印到封面。</p>
<p><font color="#999999">The result <a href="http://www.regexguru.com/2009/01/regular-expression-cookbook-available-for-pre-order/">will be published in May 2009</a>. Enjoy.</font></p>
<p>本书将于<a href="http://iregex.org/blog/regular-expression-cookbook-available-for-pre-order.html" target="_blank">2009年5月出版</a>，敬请期待。</p>
<p><font color="#999999">Meanwhile, Jan has left cloudy Belgium for tropical Thailand. He now lives with his wife in <a href="http://www.phuket.me">Phuket</a>, where he enjoys pretending to be a tourist, even though in reality he still spends far too much time flipping the switches on his <a href="http://www.micro-isv.asia/2009/01/datahand-for-sale-again/">DataHand</a>.</font></p>
<p>于此同时，Jan离开了乌云密布的比利时，来到了泰国这个热带国家。他和妻子在普吉岛安顿下来。在这里他喜欢假装成游客，实际上他太多太多的时间还是花费在敲击自己的<a href="http://www.micro-isv.asia/2009/01/datahand-for-sale-again/">DataHand</a>键盘上。</p>
</blockquote>
<p><font color="#ff008c">Rex评：读过了《Mastering Regular Expressions》，现在很期待读一下这本CookBook。不过，作者也明示、暗示过了，RegexBuddy和PowerGREP的应用会占用书中的相当篇幅；并坦言写书不如写软件赚钱。与《MRE》的作者Jeffrey Friedl相比，Jan的商人味更浓一些，而Jeffrey的学者味更浓。我敬佩资深的学者，羡慕成功的商人，Jan集二者于一身了。理解正则表达式的原理，MRE是一本很好的教材。而CookBook的书名暗示，它就像是菜谱一样，是实用的具体技术实践的指导。这两本我都想深入读，以便了解其术。我更想读的，还包括编译原理，什么时候能够真正写一款自己的编译器，写出自己的正则表达式引擎，即使是很简单地实现。</font></p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/from-regex-newbie-to-regex-guru.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

