<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>我爱正则表达式 &#187; 问答</title>
	<atom:link href="http://iregex.org/blog/category/qa/feed" rel="self" type="application/rss+xml" />
	<link>http://iregex.org</link>
	<description>原创、翻译、转载关于正则表达式的文章</description>
	<lastBuildDate>Sun, 27 Jun 2010 04:20:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://blogsearch.google.com/ping/RPC2"/><atom:link rel="hub" href="http://blog.yodao.com/ping/RPC2"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://www.xianguo.com/xmlrpc/ping.php"/><atom:link rel="hub" href="http://www.zhuaxia.com/rpc/server.php"/><atom:link rel="hub" href="http://rpc.technorati.com/rpc/ping"/><atom:link rel="hub" href="http://rpc.pingomatic.com/"/>	
<!-- Start Of Script Generated By WP-PostViews Plus -->
<script type='text/javascript' src='http://iregex.org/wp-includes/js/jquery/jquery.js?ver=1.4.2'></script>
<script type="text/javascript">
/* <![CDATA[ */
/* ]]> */
</script>
<!-- End Of Script Generated By WP-PostViews Plus -->
	<item>
		<title>小议“排除型匹配”</title>
		<link>http://iregex.org/blog/negate-match.html</link>
		<comments>http://iregex.org/blog/negate-match.html#comments</comments>
		<pubDate>Mon, 24 May 2010 08:46:29 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[教程]]></category>
		<category><![CDATA[问答]]></category>
		<category><![CDATA[exclude]]></category>
		<category><![CDATA[lookaround]]></category>
		<category><![CDATA[negate]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=122</guid>
		<description><![CDATA[网友cfc4n问及关于(?!)的正则表达式问题。回答之后，顺便总结了一下Perl语言中如何匹配“不出现”某元素，贴在这里。 问题 问题描述 有如下文本，如何使用正则式，将其中不含color选项的item... ]]></description>
			<content:encoded><![CDATA[<p>网友cfc4n问及关于(?!)的正则表达式问题。回答之后，顺便总结了一下Perl语言中如何匹配“不出现”某元素，贴在这里。<span id="more-122"></span></p>
<h2 style="background-color: rgb(153, 204, 0); border: 1px solid rgb(102, 102, 102); color: rgb(0, 0, 0); font-size: 21px; line-height: 35px; padding-top: 3px; text-indent: 6px;">问题</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">问题描述</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>
    有如下文本，如何使用正则式，将其中<b>不含color选项的item</b>匹配出来？</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&lt;item&gt;<br />
&nbsp; &nbsp; color:red;<br />
&lt;/item&gt;<br />
&lt;item&gt;<br />
&nbsp; &nbsp; size:12;<br />
&nbsp; &nbsp; number:45;<br />
&nbsp; &nbsp; type:good;<br />
&lt;/item&gt;</div></td></tr></tbody></table></div>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">典型的错误答案</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p>新手容易提供这样的错误答案：<code class="codecolorer perl default"><span class="perl"><span style="color: #009999;">&lt;item&gt;</span><span style="color: #339933;">.*?</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>color<span style="color: #009900;">&#41;</span><span style="color: #339933;">.*?&lt;/</span>item<span style="color: #339933;">&gt;</span></span></code>。其出发点是正确的：只有当color不出现在目标字串时，该匹配才是所需要的。事实上，这样的正则表达式不能如君所愿，它匹配所有的<code class="codecolorer text default"><span class="text">&lt;item&gt;...&lt;/item&gt;</span></code>。这是为什么呢？</p>
</blockquote>
</blockquote>
<h2 style="background-color: rgb(153, 204, 0); border: 1px solid rgb(102, 102, 102); color: rgb(0, 0, 0); font-size: 21px; line-height: 35px; padding-top: 3px; text-indent: 6px;">Perl之排除型匹配</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">最简单的排除型匹配</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p>匹配是<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">=~</span></span></code>, 不匹配当然是 <code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">!~</span></span></code> 了。写到这里想到，在正则式中，凡是由<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">=</span></span></code>组成的正则式符号，全可以使用<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">!</span></span></code>来替代，以表现相反的意思。例如<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?=</span><span style="color: #009900;">&#41;</span></span></code>与<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span><span style="color: #009900;">&#41;</span></span></code>，<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?&lt;=</span><span style="color: #009900;">&#41;</span></span></code>与<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?&lt;!</span><span style="color: #009900;">&#41;</span></span></code>，<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">=~</span></span></code>与<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">!~</span></span></code>。</p>
<p>返回正题，看个例子。如果要检测某字串是否含有good，当然要用<code class="codecolorer perl default"><span class="perl"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$string</span> <span style="color: #339933;">=~</span> <span style="color: #009966; font-style: italic;">/good/</span><span style="color: #009900;">&#41;</span></span></code>，如果<code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">$string</span></span></code>里有good则条件为真，否则为假；</p>
<p>如果要检测某字串是否<b>不</b>含有good，可以用<code class="codecolorer perl default"><span class="perl"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$string</span> <span style="color: #339933;">!~</span> <span style="color: #009966; font-style: italic;">/good/</span><span style="color: #009900;">&#41;</span></span></code>，如果<code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">$string</span></span></code>里没有good则条件为真，否则为假。</p>
<p>这种匹配测试，较适合于在大段的字串中搜索某个简单的模式，然后对于匹配的结果作出两种不同的判断，非此即彼。虽然迅速干练，但是对于复杂情况的判断，还是有些累赘。</p>
<p>对于文章开始提出的问题而言，当然可以这样解决：先搜索所有的 <code class="codecolorer text default"><span class="text">&lt;item&gt;...&lt;/item&gt;</span></code>，然后分别判断是否存在color项即可：</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #666666; font-style: italic;">#!/usr/bin/perl -w</span><br />
<br />
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$text</span><span style="color: #339933;">=</span><span style="color: #cc0000; font-style: italic;">&lt;&lt;END;<br />
&lt;item&gt;<br />
&nbsp; &nbsp; color:red;<br />
&lt;/item&gt;<br />
&lt;item&gt;<br />
&nbsp; &nbsp; size:12;<br />
&nbsp; &nbsp; number:45;<br />
&nbsp; &nbsp; type:good;<br />
&lt;/item&gt;<br />
END</span><br />
<br />
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">@result</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$text</span><span style="color: #339933;">=~</span> <span style="color: #000066;">m</span><span style="color: #339933;">!</span><span style="color: #009999;">&lt;item&gt;</span><span style="color: #339933;">.*?&lt;/</span>item<span style="color: #339933;">&gt;!</span>sg<span style="color: #339933;">;</span><br />
<span style="color: #b1b100;">foreach</span> <span style="color: #0000ff;">$item</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@result</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$item</span> <span style="color: #339933;">!~</span> <span style="color: #009966; font-style: italic;">/color/</span><span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;$item&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
<span style="color: #009900;">&#125;</span></div></td></tr></tbody></table></div>
<p>输出结果是:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&lt;item&gt;<br />
&nbsp; &nbsp; size:12;<br />
&nbsp; &nbsp; number:45;<br />
&nbsp; &nbsp; type:good;<br />
&lt;/item&gt;</div></td></tr></tbody></table></div>
<p>虽然也不错，但是它总是“宁可错杀不可错放”地找完所有可能项，再一一重新进行排除。能否一开始就先界定，我们要找的是<strong>不含color的item</strong>呢？<span style="color:#ff008c">排除型匹配</span>正是为此而生。</p>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">排除型匹配</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p>不好意思，“排除型匹配”这个词是我生造的。其它的说法或许是“否定断言”，“否定环视”等等。后两者的命名，都是从匹配过程的角度出发；而此处命名，是从结果出发。具体说来，就是使用 <code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!...</span><span style="color: #009900;">&#41;</span></span></code>和<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?&lt;!...</span><span style="color: #009900;">&#41;</span></span></code>作为辅助条件判断，来简化正则表达式，方便快捷地找到符合要求的匹配。</p>
<p>这两个东东的使用方法类似，都是指，当前位置<span style="color:#ff008c">不出现</span>某种模式。不同的是，<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!...</span><span style="color: #009900;">&#41;</span></span></code>是指当前位置的右边，而<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?&lt;!</span><span style="color: #009900;">&#41;</span></span></code>自然就是指左边了。</p>
<p>这里隆重推出<a href="http://anrs.sacredfir.com/" target="_blank" title="我爱正则表达式">Anrs</a>同学翻译的教程: <a href="http://anrs.sacredfir.com/archives/295" target="_blank" title="我爱正则表达式">环视一</a>以及<a href="http://anrs.sacredfir.com/archives/338" target="_blank" title="我爱正则表达式">环视二</a>。仔细阅读这两文章，彻底明白环视这两个概念，将会提升您的正则表达式功力。后文将建立在您已经理解环视这个概念的基础上。</p>
<p>闲话一句。既然使用“左边”和“右边”既形象又好懂，为什么没见过“左瞻”，“右瞻”，“左向”，“右向”，反而全是些“前瞻后瞻”，“正向逆向”这样的不好理解的说法呢？<a href="https://twitter.com/kwl_01_skz/status/14069944812" target="_blank" title="我爱正则表达式">撕烤者</a>也同有此问。我的理解是，或许是为了照顾阿语等从右向左书写的用户的习惯吧。无论如何，将从 <code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">^</span></span></code>到 <code class="codecolorer perl default"><span class="perl">$</span></code>的方向称之为“向前”总不会错。</p>
<p>描述当前位置（左侧或右侧）的模式，从而辅助判断正则式是否匹配，是环视的作用。它只描述，不消耗字符；只辅助判断，从不单独出现。这与<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">^</span></span></code>和<code class="codecolorer perl default"><span class="perl">$</span></code>简直如出一辙。</p>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">一则例子</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p><strong>例子. </strong>现在有许多与fanfou.com类似的网址。如何写一条正则表达式，来匹配域名含fanfou，但是TLS不是.com的模式？</p>
<p><strong>答案：</strong><code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">/</span><span style="color: #0000ff;">\bfanfou</span>\<span style="color: #339933;">.</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>com<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#91;</span>a<span style="color: #339933;">-</span>z<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#123;</span><span style="color: #cc66cc;">2</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">4</span><span style="color: #009900;">&#125;</span><span style="color: #0000ff;">\b</span><span style="color: #339933;">/</span>i</span></code>。分析这条正则表达式：</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<ul>
<li>以<code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">\b</span></span></code>开始，明确字符边界；</li>
<li>fanfou主域名不可少；</li>
<li><code class="codecolorer perl default"><span class="perl">\<span style="color: #339933;">.</span></span></code>匹配一个普通的点号；此处不要使用点号元字符；</li>
<li><code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>com<span style="color: #009900;">&#41;</span></span></code>表示此处（即从<code class="codecolorer text default"><span class="text">fanfou.</span></code>的右边）不得出现com三个连续字符；</li>
<li><code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#91;</span>a<span style="color: #339933;">-</span>z<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#123;</span><span style="color: #cc66cc;">2</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">4</span><span style="color: #009900;">&#125;</span></span></code>表示是2至4位的拉丁字母；因为域名的TLS最短是2位（如.au, .us），最长可为4位（如.info, .asia）；</li>
<li>右侧边界同样重要，否则我们之前的{2,4}就白费了；</li>
<li>使用i表示不分大小写；这是域名的特征之一。</li>
</ul>
</blockquote>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">回到本题</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>
        按照要求，一步步建立这条正则式。</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<ul>
<li>该正则式匹配的是<code class="codecolorer text default"><span class="text">&lt;item&gt;...&lt;/item&gt;</span></code>结构。因此，正则式以<code class="codecolorer perl default"><span class="perl"><span style="color: #009999;">&lt;item&gt;</span></span></code>开始。</li>
<li>在<code class="codecolorer text default"><span class="text">&lt;item&gt;</span></code>和<code class="codecolorer text default"><span class="text">&lt;/item&gt;</span></code>之间不得出现color，是这条正则式的难点。因为，<code class="codecolorer text default"><span class="text">color</span></code>可能位于这个结构之内的任意一点，因此要规定，此内任意一点都不得出现color一词。这样的点为：<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>color<span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span></span></code>。这样的点重复1+次，正则式写为<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>color<span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span></span></code>。注意这里有个小陷阱：不要写为<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>color<span style="color: #009900;">&#41;</span><span style="color: #339933;">.+</span></span></code>，否则它只描述了最左侧的一点不得出现color，其余部分则都无所谓。而写为<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>color<span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span></span></code>则保证每一点都不出现color。</li>
<li>正则式此时为<code class="codecolorer perl default"><span class="perl"><span style="color: #009999;">&lt;item&gt;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>color<span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+?&lt;/</span>item<span style="color: #339933;">&gt;</span></span></code>。为了节省资源，括号通常写成非捕获模式<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:...</span><span style="color: #009900;">&#41;</span></span></code>；为了保证点号匹配换行符，可以指定s模式或使用<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#91;</span>\<span style="color: #000066;">s</span><span style="color: #0000ff;">\S</span><span style="color: #009900;">&#93;</span></span></code>代替点号元字符。此处仍使用点号。正则式修改为<code class="codecolorer perl default"><span class="perl"><span style="color: #009999;">&lt;item&gt;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?!</span>color<span style="color: #009900;">&#41;</span><span style="color: #339933;">.</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+?&lt;/</span>item<span style="color: #339933;">&gt;</span></span></code>。</li>
</ul>
</blockquote>
</blockquote>
</blockquote>
<p>总体来说，环视相对于基本的元字符还是要抽象一些。不过一旦理解并掌握了它，就会发现它在精确匹配和替换时十分有用。上面的分析，希望有所帮助。如果您有类似的问题，欢迎提出。</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/negate-match.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>使用正则表达式删除注释</title>
		<link>http://iregex.org/blog/uncomment-program-with-regex.html</link>
		<comments>http://iregex.org/blog/uncomment-program-with-regex.html#comments</comments>
		<pubDate>Sat, 03 Apr 2010 09:51:56 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[问答]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[negative lookaround]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=83</guid>
		<description><![CDATA[问题 以下摘自某网友来信: 难点 javascript不支持点号匹配换行符, 因此无法直接进行多行匹配; 处理前面没有http:的//, 当然要用否定前瞻( negative lookbehine)了:&#40;?&#60;!http:&#41;\/\/. 可惜javascript不支... ]]></description>
			<content:encoded><![CDATA[<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">问题</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">以下摘自某网友来信: </h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>
<a href="http://iregex.org/blog/uncomment-program-with-regex.html" target="_blank" title="javascript正则中的否定前瞻"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100402104810-1.png" border="0" alt="javascript正则中的否定前瞻"></a>
</p></blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">难点</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<ol>
<li>javascript不支持点号匹配换行符, 因此<strong>无法直接</strong>进行多行匹配; </li>
<li>处理前面没有<code class="codecolorer text default"><span class="text">http:</span></code>的<code class="codecolorer text default"><span class="text">//</span></code>, 当然要用否定前瞻( negative lookbehine)了:<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?&lt;!</span>http<span style="color: #339933;">:</span><span style="color: #009900;">&#41;</span>\<span style="color: #339933;">/</span>\<span style="color: #339933;">/</span></span></code>. 可惜javascript不支持.<br /><a href="http://iregex.org/blog/uncomment-program-with-regex.html" target="_blank" title="javascript正则中的否定前瞻"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100401091312.png" border="0" alt="javascript正则中的否定前瞻"></a></li>
</ol>
</blockquote>
<p><span id="more-83"></span></p>
</blockquote>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">思路</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">关于多行匹配</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>这个问题, 之前我已经说过, 要点是使用<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\S</span>\<span style="color: #000066;">s</span><span style="color: #009900;">&#93;</span></span></code>来模拟匹配换行符的点号. 原文在这里:《<a href="http://iregex.org/blog/diy-match-all-mode-dot.html">DIY万能通配符</a>》.  可以以此写出这样的javascript代码来消除多行注释:</p>
<div class="codecolorer-container javascript mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br /></div></td><td><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #006600; font-style: italic;">//to uncomment C-style multiple line comment</span><br />
<span style="color: #003366; font-weight: bold;">function</span> uncomment_multi<span style="color: #009900;">&#40;</span>str<span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #000066; font-weight: bold;">return</span> str.<span style="color: #660066;">replace</span><span style="color: #009900;">&#40;</span><span style="color: #009966; font-style: italic;">/\/\*[\S\s]*?\*\//g</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span></div></td></tr></tbody></table></div>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">单行注释之javascript实现(不完善)</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<p>单行注释并没有想像中的那样简单. 如果你认为只要 <code class="codecolorer javascript default"><span class="javascript">str.<span style="color: #660066;">replace</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;//.*$&quot;</span><span style="color: #009900;">&#41;</span></span></code>即可, 那么必须保证所要处理的文本都是最简单的, 如下:</p>
<div class="codecolorer-container javascript mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #003366; font-weight: bold;">var</span> pig<span style="color: #339933;">=</span><span style="color: #3366CC;">&quot;ase&quot;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">//this is a comment.</span></div></td></tr></tbody></table></div>
<p>事实上这是行不通的. 现实程序中下面的例子比比皆是:</p>
<div class="codecolorer-container javascript mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br /></div></td><td><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #003366; font-weight: bold;">var</span> url<span style="color: #339933;">=</span><span style="color: #3366CC;">&quot;http://iregex.org&quot;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">//this is my site.</span><br />
<span style="color: #003366; font-weight: bold;">var</span> url<span style="color: #339933;">=</span><span style="color: #3366CC;">&quot;//not real comment here http://iregex.org&quot;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">//this is my site.</span></div></td></tr></tbody></table></div>
<p>我尝试使用javascript写了个模拟否定前瞻的函数, 可以处理<code class="codecolorer text default"><span class="text">http://</span></code>这种情况, 但是该函数看起来并不令人赏心悦目, 而且也不能处理引号中有双斜杠的情况. 我对javascript的正则式支持的特性之简陋实在很失望. 于是, 我求助于perl完成这一任务. 先看一下我写的javascript的删除单行注释的函数:</p>
<div class="codecolorer-container javascript mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br /></div></td><td><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #003366; font-weight: bold;">function</span> uncomment_single<span style="color: #009900;">&#40;</span>str<span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> result<span style="color: #339933;">;</span> <br />
&nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> single<span style="color: #339933;">=</span><span style="color: #003366; font-weight: bold;">new</span> RegExp<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;<span style="color: #000099; font-weight: bold;">\/</span><span style="color: #000099; font-weight: bold;">\/</span>.&quot;</span><span style="color: #339933;">,</span><span style="color: #3366CC;">&quot;ig&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> start<span style="color: #339933;">=</span><span style="color: #CC0000;">0</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #000066; font-weight: bold;">while</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>result<span style="color: #339933;">=</span>single.<span style="color: #660066;">exec</span><span style="color: #009900;">&#40;</span>str<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">!=</span><span style="color: #003366; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#123;</span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> part<span style="color: #339933;">=</span>str.<span style="color: #660066;">slice</span><span style="color: #009900;">&#40;</span>start<span style="color: #339933;">,</span>result.<span style="color: #660066;">index</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #003366; font-weight: bold;">var</span> negLeft<span style="color: #339933;">=</span><span style="color: #003366; font-weight: bold;">new</span> RegExp<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;http:$&quot;</span><span style="color: #339933;">,</span><span style="color: #3366CC;">&quot;i&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000066; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span> negLeft.<span style="color: #660066;">test</span><span style="color: #009900;">&#40;</span>part<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000066; font-weight: bold;">return</span> str.<span style="color: #660066;">slice</span><span style="color: #009900;">&#40;</span><span style="color: #CC0000;">0</span><span style="color: #339933;">,</span>result.<span style="color: #660066;">index</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#125;</span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; start<span style="color: #339933;">=</span>result.<span style="color: #660066;">index</span><span style="color: #339933;">+</span>result<span style="color: #009900;">&#91;</span><span style="color: #CC0000;">0</span><span style="color: #009900;">&#93;</span>.<span style="color: #660066;">length</span><span style="color: #339933;">-</span><span style="color: #CC0000;">1</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
&nbsp; &nbsp; <span style="color: #000066; font-weight: bold;">return</span> str<span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span></div></td></tr></tbody></table></div>
</blockquote>
</blockquote>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">perl版删除注释思路及源码(相对完善)</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">待测试文本</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>
好吧, 既然祭出了强大的perl, 之前的小打小闹似的例子就一边去吧. 我将使用如下相对复杂的文本来验证我的程序:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&lt;!DOCTYPE h/tml PUBLIC &quot;-//W3C//DTD XHTML\&quot; 1.0 Transitional//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt; sdfasdf//real comment here//&quot;</div></td></tr></tbody></table></div>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">认真分析单行注释的特点</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>
正确地分析其特点, 是写出合理高效的程序的前提. 观察可知, 单行注释的特点如下:</p>
<ol>
<li>引号内(包括单引号和双引号)的双斜线不算注释.</li>
<li>引号是配对出现的, 两个引号之间的以反斜线转义掉的引号不算结束符. 例如<code class="codecolorer text default"><span class="text">&quot;hello \&quot; //world&quot;</span></code>, 这里的<code class="codecolorer text default"><span class="text">//world</span></code>部分不能算做注释.</li>
<li>由连续的非引号非斜线部分组成的字符串也不是注释. 特别指出, 单个斜线不能算做注释. 为什么前半部分不但要非引号而且要非斜线呢? 因为<code class="codecolorer text default"><span class="text">[^'&quot;]+</span></code>是有可能误匹配<code class="codecolorer text default"><span class="text">abcde//real comment &quot;quoted string in comment&quot;</span></code>这样的情况, 因此我们归纳出一个条件<code class="codecolorer text default"><span class="text">[^'&quot;/]+</span></code>; 又因为还要避免<code class="codecolorer text default"><span class="text">abcde/real comment &quot;quoted string in comment&quot;</span></code>这样的情况, 还需要特别补充规定单个的斜线不是注释. 正则式是<code class="codecolorer text default"><span class="text">[^'&quot;/]|(?&lt;!/)/(?!/)</span></code>.</li>
<li>除去上述内容以外, 以双斜线开始直至行尾的部分就是注释. 因为我们用到了<strong>行尾</strong>这个概念, 需要在正则式中特别指出是<code class="codecolorer text default"><span class="text">^$</span></code>匹配行首行尾的多行模式. 使用<code class="codecolorer text default"><span class="text">//m</span></code>来表示.</li>
</ol>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">正则实现</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #666666; font-style: italic;">#!/usr/bin/perl -w</span><br />
<span style="color: #0000ff;">$str</span> <span style="color: #339933;">=</span> <span style="color: #cc0000; font-style: italic;">&lt;&lt;&quot;EOF&quot;;<br />
&lt;!DOCTYPE h/tml PUBLIC &quot;-//W3C//DTD XHTML\&quot; 1.0 Transitional//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt; sdfasdf//real comment here//&quot; <br />
EOF</span><br />
<span style="color: #666666; font-style: italic;">#print $str;</span><br />
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$str</span><span style="color: #339933;">=~</span> <br />
&nbsp; &nbsp; m<span style="color: #339933;">%</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #339933;">^</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&#91;</span><span style="color: #339933;">^</span><span style="color: #ff0000;">'&quot;/]|<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (?&lt;!/)/(?!/)|<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (?&lt;quote&gt;['</span><span style="color: #ff0000;">&quot;])<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (?:<span style="color: #000099; font-weight: bold;">\\</span> <span style="color: #000099; font-weight: bold;">\g</span>{quote}|<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (?!<span style="color: #000099; font-weight: bold;">\g</span>{quote}).)*<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000099; font-weight: bold;">\g</span>{quote}<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; )*<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (?&lt;comment&gt;//.*)<br />
&nbsp; &nbsp; &nbsp; &nbsp; $<br />
&nbsp; &nbsp; %xm) <br />
&nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; print $+{comment}; <br />
}</span></div></td></tr></tbody></table></div>
</blockquote>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">几点补充</h3>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<ul>
<li>该程序在perl5.10版才能运行成功. 因为用到了命名捕获<code class="codecolorer perl default"><span class="perl"><span style="color: #009900;">&#40;</span><span style="color: #339933;">?</span><span style="color: #009999;">&lt;quote&gt;</span><span style="color: #009900;">&#91;</span><span style="color: #ff0000;">'&quot;])</span></span></code>这样比较高阶的特性. 当然, 不使用5.10也并非没有办法, 我们大可以使用numbered capture, 只不过看起来更不直观罢了.</li>
<li>匹配结束后, 命名捕获都保存在hash表<code class="codecolorer text default"><span class="text">%+</span></code>中了. 使用<code class="codecolorer text default"><span class="text">print $+{comment}</span></code>这样的方式可以方便地调用.
<li>指定了x模式, 以便加入空白字符和换行, 让正则表达式看起来有层次感. 事实上, 对于复杂的正则表达式, 不使用x模式是极其不明智的做法.
<li>为了在字串中方便地表示单双引号, 使用了heredoc的方式. 个人觉得不如python的三重引号方便.
</ul>
</blockquote>
</blockquote>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">小结</h2>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>
从正则表达式的角度来说, javascript实在太弱. 当然, 也与本人的javascript功底较浅有关系. perl对于正则表达式的支持实在是强撼且不遗余力. 上面的实现, 应该可以涵盖绝大多数的注释情况了. 如果您测试出现bug, 或者遇到更BT的字串, 欢迎留言讨论.
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/uncomment-program-with-regex.html/feed</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>精通正则表达式之重复单词程序详解</title>
		<link>http://iregex.org/blog/double-words.html</link>
		<comments>http://iregex.org/blog/double-words.html#comments</comments>
		<pubDate>Sat, 20 Mar 2010 12:22:30 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[问答]]></category>
		<category><![CDATA[perl mre]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=81</guid>
		<description><![CDATA[见chinaunix论坛上有这样一问： 搜索重复单词（“this this”）问题，问为什么第4行要加^： 1234567&#160; &#160; $/ = &#34;.\n&#34;; &#160; &#160; while &#40;&#60;&#62;&#41; &#123; &#160; &#160; &#160; next if !s/\b&#40;&#91;a-z... ]]></description>
			<content:encoded><![CDATA[<p>见<a href="http://bbs.chinaunix.net/viewthread.php?tid=1678860">chinaunix论坛</a>上有这样一问：<br />
搜索重复单词（“this this”）问题，问为什么第4行要加^：</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; <span style="color: #0000ff;">$/</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">&lt;&gt;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">next</span> <span style="color: #b1b100;">if</span> <span style="color: #339933;">!</span><span style="color: #000066;">s</span><span style="color: #339933;">/</span><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>a<span style="color: #339933;">-</span>z<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span><span style="color: #0000ff;">\s</span><span style="color: #339933;">&lt;&lt;</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^&gt;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+&gt;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span>\<span style="color: #cc66cc;">1</span><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span>7m<span style="color: #0000ff;">$1</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span><span style="color: #000066;">m</span><span style="color: #0000ff;">$2</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span>7m<span style="color: #0000ff;">$3</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span><span style="color: #000066;">m</span><span style="color: #339933;">/</span>ig<span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009966; font-style: italic;">s/^(?:[^\e]*\n)+//mg</span><span style="color: #339933;">;</span> &nbsp; <span style="color: #666666; font-style: italic;"># Remove any unmarked lines. &nbsp; # 为何需要加^</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #009966; font-style: italic;">s/^/$ARGV: /mg</span><span style="color: #339933;">;</span> &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;"># Ensure lines begin with filename.</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #000066;">print</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span></div></td></tr></tbody></table></div>
<p>要解答这个问题，不能只<code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;hello world&quot;</span><span style="color: #339933;">;</span></span></code>着眼于这一行。程序不长，从头到尾详细读一下，这个问题就兵不血刃地解决了。<span id="more-81"></span></p>
<ul>
<li><code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">$/</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span></span></code>这一行的作用是设置分行符. 如果在程序中使用了<code class="codecolorer perl default"><span class="perl"><span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">&lt;&gt;</span><span style="color: #009900;">&#41;</span></span></code>这样的东东, 就可以使用<code class="codecolorer bash default"><span class="bash"><span style="color: #c20cb9; font-weight: bold;">perl</span> script.pl file.txt</span></code>这样调用该perl脚本了, 它源源不断地读入file.txt中的内容. <code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">$/</span></span></code>是分行符, 其默认值是<code class="codecolorer perl default"><span class="perl">newline</span></code>. 默认情况下, 每个while的循环处理file.txt中的一行. 如果将 <code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">$/</span></span></code> 定义为<code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">undef</span></span></code>, 即在程序开头 <code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">undef</span> <span style="color: #0000ff;">$/</span></span></code> 一下, file.txt中的全部内容就都读入到一个变量<code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">$_</span></span></code>中去了. <br />本程序中, 换行符定义为<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">.</span><span style="color: #0000ff;">\n</span></span></code>, 其含义就是, 将以点号后跟一个<code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">\n</span></span></code>视为换行符. 这样就忽略了普通的折行的情况, 使程序更加智能, 可以处理一行行尾是this, 下行行首还是this的重复词的情况. 关于<code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">$/</span></span></code>的详细资料, 请参考<a href="http://perldoc.perl.org/perlvar.html">perl 文档</a>. </li>
<li><code class="codecolorer perl default"><span class="perl"><span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">&lt;&gt;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span></span></code>这一行无须细说.</li>
<li><code class="codecolorer perl default"><span class="perl"><span style="color: #b1b100;">next</span> <span style="color: #b1b100;">if</span> <span style="color: #339933;">!</span><span style="color: #000066;">s</span><span style="color: #339933;">/</span><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>a<span style="color: #339933;">-</span>z<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span>\<span style="color: #000066;">s</span><span style="color: #339933;">|&lt;</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^&gt;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+&gt;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span>\<span style="color: #cc66cc;">1</span><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span>7m<span style="color: #0000ff;">$1</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span><span style="color: #000066;">m</span><span style="color: #0000ff;">$2</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span>7m<span style="color: #0000ff;">$3</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#91;</span><span style="color: #000066;">m</span><span style="color: #339933;">/</span>ig<span style="color: #339933;">;</span></span></code>这一行需要说一下. 其框架虽然是next if !s///, 看起来什么也没做似的, 但是一点儿都没偷懒. 它先判断该行是否匹配某模式, 如果匹配则进行替换(至于如何替换, 下文交待); 如果不匹配, 才提前结束本次循环, 进入下一次循环. <br />
现在再来看一下它是如何进行替换的. 看上去很复杂, 但是并不是不可理解. 我们先去掉\e[m之类的东东, 这是在bash下对字符进行高亮控制的.<br />
<code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">s</span><span style="color: #339933;">/</span><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>a<span style="color: #339933;">-</span>z<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span>\<span style="color: #000066;">s</span><span style="color: #339933;">|&lt;</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^&gt;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+&gt;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span>\<span style="color: #cc66cc;">1</span><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span><span style="color: #0000ff;">$1</span> <span style="color: #0000ff;">$2</span> <span style="color: #0000ff;">$3</span><span style="color: #339933;">/</span>ig<span style="color: #339933;">;</span></span></code><br />
这样是不是清晰多了? 后边的$1, $2, $3不去管它, 我们只看一下匹配部分的正则式:<br />
<code class="codecolorer perl default"><span class="perl"><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#91;</span>a<span style="color: #339933;">-</span>z<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span>\<span style="color: #000066;">s</span><span style="color: #339933;">|&lt;</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^&gt;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">+&gt;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span>\<span style="color: #cc66cc;">1</span><span style="color: #0000ff;">\b</span><span style="color: #009900;">&#41;</span></span></code><br />
左边界符; 一个英文单词; 一个或多个空白字符或<...>; 前边出现过的那个英文单词再次出现; 右边界符.<br />
很清晰嘛!
</li>
<li><code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">s</span><span style="color: #339933;">/^</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #0000ff;">\n</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+//</span>mg<span style="color: #339933;">;</span> &nbsp; <span style="color: #666666; font-style: italic;"># Remove any unmarked lines. &nbsp; # 为何需要加^</span></span></code><br />
这一句很简练. 它的作用是, 将没有出现重复词的行去掉. 请注意, 这里的行, 是指逻辑行, 而非物理行. 程序一开始就说了, 以.\n结尾的才算一行(物理行).<br />
<code class="codecolorer perl default"><span class="perl"><span style="color: #339933;">^</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #0000ff;">\n</span><span style="color: #009900;">&#41;</span></span></code> 这是指整个逻辑行中, 从行首到\n, 全是由0个或多个<strong>非\e</strong>组成, 即该行未被插入高亮控制字符, 亦即该行原来不存在重复词.<br />
本正则替换语句使用的模式是mg, 即^匹配中间的逻辑行的行首; 且整个字串或许包含多个逻辑行, 因此使用g来进行全局替换.
</li>
<li><code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">s</span><span style="color: #339933;">/^</span><span style="color: #009966; font-style: italic;">/$ARGV: /mg</span><span style="color: #339933;">;</span></span></code>这是让输出结果的每一行的行首(同样是逻辑行), 插入文件名和冒号. 匹配模式mg的含义同上.</li>
<p><code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">print</span><span style="color: #339933;">;</span></span></code>这是打印输出默认变量$_的值. 上面所有没有出现&#8221;主语&#8221;的, 都默认是对$_进行操作. 地球人都知道.
</ul>
<p>现在再来看一个例子. 对于文本文件a.txt:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">this is a apple.<br />
this is a red red apple.<br />
i love regex<br />
i love regular expressions<br />
this is a fat<br />
&lt;very&gt;fat&lt;/very&gt; pig.</div></td></tr></tbody></table></div>
<p>它是怎么被程序处理的呢?</p>
<p>首先跟据$/的设定, 文本文件被分为这样几行:</p>
<ol>
<li>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">this is a apple.</div></td></tr></tbody></table></div>
</li>
<li>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">this is a red red apple.</div></td></tr></tbody></table></div>
</li>
<li>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">i love regex<br />
i love regular expressions<br />
this is a fat<br />
&lt;very&gt;fat&lt;/very&gt; pig.</div></td></tr></tbody></table></div>
</li>
</ol>
<p>为什么后三行被划到一起呢? 因为分行符是.\n.</p>
<p>处理步骤:</p>
<ul>
<li>程序在处理第一组
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">this is a apple.</div></td></tr></tbody></table></div>
<p>时, 直接next过, 不予操作;</li>
<li>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">this is a red red apple.</div></td></tr></tbody></table></div>
<p>red red被高亮, 打印输出.</li>
<li>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">i love regex<br />
i love regular expressions<br />
this is a fat<br />
&lt;very&gt;fat&lt;/very&gt; pig.</div></td></tr></tbody></table></div>
<p>fat fat被高亮; i love regex以及i love regular expressions被<code class="codecolorer perl default"><span class="perl"><span style="color: #000066;">s</span><span style="color: #339933;">/^</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">^</span><span style="color: #0000ff;">\e</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">*</span><span style="color: #0000ff;">\n</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">+//</span>mg</span></code>击中, 分别删除(替换为空).</li>
</ul>
<p>解释完毕.</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/double-words.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>两条与密码验证相关的正则表达式问题</title>
		<link>http://iregex.org/blog/2-regex-problems-about-password-verification.html</link>
		<comments>http://iregex.org/blog/2-regex-problems-about-password-verification.html#comments</comments>
		<pubDate>Fri, 16 Oct 2009 14:38:39 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[问答]]></category>
		<category><![CDATA[lookaround]]></category>
		<category><![CDATA[密码验证]]></category>
		<category><![CDATA[环视]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=68</guid>
		<description><![CDATA[在正则表达式论坛上，有人问了这样两个问题（原贴在这里）： 问题1: 密码验证：由且仅由数字、字母（大小写）、特殊符号（@ % &#38;&#8230;）组成，三者缺一不可，密码不少于8位。 问题2: 十... ]]></description>
			<content:encoded><![CDATA[<p>在<a target="_blank" href="http://regex.me/" title="正则表达式论坛">正则表达式论坛</a>上，有人问了这样两个问题（原贴在<a target="_blank" href="http://regex.me/thread-149-page-1.html" title="正则表达式论坛">这里</a>）：</p>
<div align="left">
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<ul>
<li><b>问题1</b>: 密码验证：由且仅由数字、字母（大小写）、特殊符号（@ % &amp;&#8230;）组成，三者缺一不可，密码不少于8位。</li>
<li><b>问题2</b>: 十位的数字、字母组合密码，其中包含4位数字和6位字母。</li>
</ul>
</blockquote>
<p>感兴趣的话，建议您在读下文之前，自己思考一下解法，以免被我的思路干扰。</p>
</div>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">Stage0<br />
</h2>
<div align="left">这两个问题其实都是一个路子：对于一段字符串，有多个并列的限定条件。</p>
<p>对于问题1，它需要满足的条件如下：</p>
<ul>
<li>8位以上；</li>
<li>必须包含1位以上的数字；</li>
<li>必须包含1位以上的字母；</li>
<li>必须包含1位以上的特殊字符。</li>
</ul>
<p>对于这样的要求，简单使用[0-9a-za-Z@%&amp;]{8,}来匹配的。因此它也匹配像00000000、1111aaaaa这样只含一种或两种字符的字符串。因此，我们要加上更为严格的限制条件，以便匹配更精确。</p>
</div>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">Stage1</h2>
<div align="left">
数字必须出现一次，则对于每个字符位置来说，它应该是这样的：</p>
<p>代码:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[0-9a-zA-Z@%&amp;amp;]+\d</div></td></tr></tbody></table></div>
<p>字母必须出现一次，则对于每个字符位置来说，它应该是这样的：</p>
<p>代码:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[0-9a-zA-Z@%&amp;amp;]+[a-zA-Z]</div></td></tr></tbody></table></div>
<p>特殊字符必须出现一次，则对于每个字符位置来说，它应该是这样的：</p>
<p>代码:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[0-9a-zA-Z@%&amp;amp;]+[@%&amp;amp;]</div></td></tr></tbody></table></div>
<p>这三个条件必须同时满足，因此：<br />
代码:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(?=[0-9a-zA-Z@%&amp;amp;]+\d)(?=[0-9a-zA-Z@%&amp;amp;]+[a-zA-Z])(?=[0-9a-zA-Z@%&amp;amp;]+[@%&amp;amp;]).{8,}</div></td></tr></tbody></table></div>
<p>为了保证字符整行匹配，需要加上条件^$：<br />
代码:</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">^(?=[0-9a-zA-Z@%&amp;amp;]+\d)(?=[0-9a-zA-Z@%&amp;amp;]+[a-zA-Z])(?=[0-9a-zA-Z@%&amp;amp;]+[@%&amp;amp;]).{8,}$</div></td></tr></tbody></table></div>
<p>它匹配的是，8位(包括)以上字符，由且仅由数字、字母和特殊字符组成。</p>
</div>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">Stage2</h2>
<div align="left">
stage1 中的解法，已经可以匹配所需要的结果了。但是，与stage0 代码[0-9a-za-Z@%&amp;]{8,}相反，它只能匹配部分合乎条件的字串，同时会漏掉另一外一些情况。看图：</p>
<p><a href="http://i293.photobucket.com/albums/mm60/zhasm/iregex/20091016104924_half.png" target="_blank" title="我爱正则表达式|两条与密码验证相关的正则表达式问题"><img src="http://i293.photobucket.com/albums/mm60/zhasm/iregex/20091016104924_half.png" alt="我爱正则表达式|两条与密码验证相关的正则表达式问题" border="0" /></a></p>
<p>上图中Test部分中彩色部分为正则表达所匹配的字串。但是前三条是符合要求的，却不被匹配。之所以会出现这样的情况，是因为在环视条件中使用了<font color="#ff008c">+量词</font>，这会将本来用作辅助验证的字符被消耗掉，原本合格的字串被误认为不合格了。</p>
<p>问题出在+上，因此我们使用<font color="#ff008c">*量词</font>，这样就好多了。正则表达式为：</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">^(?=[0-9a-zA-Z@%&amp;amp;]*\d)(?=[0-9a-zA-Z@%&amp;amp;]*[a-zA-Z])(?=[0-9a-zA-Z@%&amp;amp;]*[@%&amp;amp;]).{8,}$</div></td></tr></tbody></table></div>
<p>匹配效果如下所示：</p>
<p><a href="http://i293.photobucket.com/albums/mm60/zhasm/iregex/20091016net.png" target="_blank" title="我爱正则表达式|两条与密码验证相关的正则表达式问题"><img src="http://i293.photobucket.com/albums/mm60/zhasm/iregex/20091016net.png" alt="我爱正则表达式|两条与密码验证相关的正则表达式问题" border="0" /></a></div>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">Stage3</h2>
<p>但是问题依然存在。测试发现，像这样的字串也是匹配的，但是它显然不是合格的密码字串：</p>
<p><a href="http://s293.photobucket.com/albums/mm60/zhasm/iregex/?action=view&amp;current=screenshot_001.png" target="_blank" title="我爱正则表达式|两条与密码验证相关的正则表达式问题"><img src="http://i293.photobucket.com/albums/mm60/zhasm/iregex/screenshot_001.png" alt="Photobucket" border="0" /></a></p>
<p>之所以出现这样的问题，是因为stage2代码中</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">.{8,}$</div></td></tr></tbody></table></div>
<p>前边千辛万苦使用[0-9a-zA-Z@%&amp;]所界定的条件，在这里轻轻松松被破坏了。stage2其实只管前8位，只要前8位字符符合要求，它就认为万事大吉了。</p>
<p>认识到这一点，我写个一条长长的正则式：</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">^(?=[0-9a-zA-Z@%&amp;amp;]*\d)(?=[0-9a-zA-Z@%&amp;amp;]*[a-zA-Z])(?=[0-9a-zA-Z@%&amp;amp;]*[@%&amp;amp;])[0-9a-zA-Z@%&amp;amp;]{8,}$</div></td></tr></tbody></table></div>
<p>但是这条正则表达太复杂了。能不能短一些呢？当然可以。从上文可以看出，前边其实不必界定太复杂的条件，只要在最后加上条件判断即可。因此，正则表达式可以改为：</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">^(?=.*\d)(?=.*[a-zA-Z])(?=.*[@%&amp;amp;])[0-9a-zA-Z@%&amp;amp;]{8,}$</div></td></tr></tbody></table></div>
<p>这样一来，我们就得到了这道题迄今为止最简洁的解法。</p>
<p>同理可得，第二道题的解法是：</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">^(?=.*\d)(?=.*[a-zA-Z])(?=.*[@%&amp;amp;])[0-9a-zA-Z@%&amp;amp;]{8,}$</div></td></tr></tbody></table></div>
<p>不多解释。</p>
<p>在思考本题的过程中，感谢<a target="_blank" href="http://hi.baidu.com/jyf1987">创亿无限</a>在stage2的测试，感谢<a target="_blank" href="http://www.luanxiang.org/blog/">余晟老师</a>在stage3中的指点。余老师现在正写一本正则表达式的傻瓜书，请点击<a target="_blank" href="http://www.luanxiang.org/blog/">余晟老师</a>的博客来探寻详情。</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/2-regex-problems-about-password-verification.html/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>正则表达式匹配ABCD随机字串</title>
		<link>http://iregex.org/blog/regex-against-abcd.html</link>
		<comments>http://iregex.org/blog/regex-against-abcd.html#comments</comments>
		<pubDate>Fri, 05 Dec 2008 02:03:21 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[问答]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=37</guid>
		<description><![CDATA[前一段时间在chinaunix论坛上发现这样一则问题： 要求abcd四个字母连续，但每个字母有且仅出现一次，并且顺序可以不固定，也就是要匹配abcd adbc bcda等等情况 我说一下自己的解决思路。 第一... ]]></description>
			<content:encoded><![CDATA[<p>前一段时间在<a href="http://bbs.chinaunix.net/thread-1300693-1-1.html" target="_blank" rel="nofollow">chinaunix论坛</a>上发现这样一则问题：</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>要求abcd四个字母连续，但每个字母有且仅出现一次，并且顺序可以不固定，也就是要匹配<tt class="string">abcd adbc bcda</tt>等等情况</p></blockquote>
<p>我说一下自己的解决思路。</p>
<p><span id="more-37"></span></p>
<ul>
<li>第一个字母是<tt class="string">abcd</tt>中的任意一个，因此正则式<tt class="regex">[abcd]</tt>即符合要求。第一个字母最简单。</li>
<li>第二个字母是<tt class="string">abcd</tt>中的任意一个，但是不能使用第一步中已经使用过的字母。因此回退一步，将第一个字母括起来，以备引用<tt class="regex">([abcd])</tt>。不能使用<tt class="regex">\1</tt>，亦即<tt class="regex">(?!\1)</tt>，它表示当前位置不出现<tt class="regex">\1</tt>，即第一个字母。至此，正则式如下：<tt class="regex">([abcd])(?!\1)([abcd])</tt>。</li>
<li>第三个字母与第二个字母的原理相同，但是它既不能使用第一个字母，也不能使用第二个字母。正则式如下：<tt class="regex">([abcd])(?!\1)([abcd])(?!\1|\2)([abcd])</tt>。</li>
<li>第四步无需多言，完整正则式如下：<tt class="regex">([abcd])(?!\1)([abcd])(?!\1|\2)([abcd])(?!\1|\2|\3)([abcd])</tt>。</li>
</ul>
<p>回顾刚才的正则式，对于新手来说，或许<tt class="regex">(?!\1)</tt>是个不容易理解的地方。我就对此作用简单介绍。<tt class="regex">(?!...)</tt>的形式称为<a href="http://www.regular-expressions.info/lookaround.html">look around</a>，与<tt class="regex">^</tt>或<tt class="regex">$</tt>一样，它<b>只匹配位置</b>，<b>而不实际消耗字符</b>。如果想匹配“某字串之后不出现某字串”这种情况，那么<tt class="regex">(?!)</tt>negative lookahead是不可缺少的。例如，如果想匹配<tt class="regex">q</tt>之后不出现<tt class="regex">u</tt>的字串，就可以使用<tt class="regex">q(?!u)</tt>，它表示<tt class="regex">q</tt>后边不出现<tt class="regex">u</tt>。它可以匹配<tt class="match">"Iraq"</tt>,<tt class="match">"icq"</tt>,<tt class="match">"qq"</tt>等字串。</p>
<p>如果您读到这里还没有头痛，请思考它与<tt class="regex">q[^u]</tt>的区别。</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/regex-against-abcd.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>匹配用户名的asp正则表达式(包括中文)</title>
		<link>http://iregex.org/blog/regular-expressions-to-match-chinese-username-in-asp.html</link>
		<comments>http://iregex.org/blog/regular-expressions-to-match-chinese-username-in-asp.html#comments</comments>
		<pubDate>Sun, 13 Jul 2008 14:01:17 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[问答]]></category>
		<category><![CDATA[asp]]></category>
		<category><![CDATA[chinese]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=18</guid>
		<description><![CDATA[有人在正则表达式中文站贴出这样一道问题： 求ASP 用户名 表达式 用户名长度在2-20字符之间，由中文/大小写字母/数字/中划线-/下线线_组成。 这个问题不算难，只要下边一行核心代码就能搞... ]]></description>
			<content:encoded><![CDATA[<p>有人在<a href="http://www.regex.net.cn" target="_blank">正则表达式中文站</a>贴出<a href="http://www.regex.net.cn/redirect.php?tid=30&amp;goto=lastpost#lastpost" target="_blank">这样</a>一道问题：</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">求ASP 用户名 表达式</h3>
<p>用户名长度在2-20字符之间，由中文/大小写字母/数字/中划线-/下线线_组成。</p></blockquote>
<p>这个问题不算难，只要下边一行核心代码就能搞定：</p>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff0000;">&quot;^[-_a-zA-Z0-9<span style="color: #000099; font-weight: bold;">\u</span>4e00-<span style="color: #000099; font-weight: bold;">\u</span>9fa5]{2,20}$&quot;</span></div></td></tr></tbody></table></div>
<p>关键是没有使用过ASP语言。按<span style="color: #6666cc;"><a href="http://www.webase.net.cn/html/Program/Asp/200711/29.html" target="_blank">此页</a></span>的提示，设置了ASP环境。查询了一些在线的入门级ASP教程之后，解答如下：<span id="more-19"></span></p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #006600; font-weight: bold;">&lt;</span>form action<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;verify.asp&quot;</span> method<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;post&quot;</span><span style="color: #006600; font-weight: bold;">&gt;</span><br />
姓名：<br />
<span style="color: #006600; font-weight: bold;">&lt;</span>input name<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;name&quot;</span> type<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;text&quot;</span> <span style="color: #006600; font-weight: bold;">/&gt;</span><br />
<br />
<span style="color: #006600; font-weight: bold;">&lt;</span>input name<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;Submit&quot;</span> type<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;submit&quot;</span> value<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;提交&quot;</span> <span style="color: #006600; font-weight: bold;">/&gt;</span><br />
<span style="color: #006600; font-weight: bold;">&lt;</span>input name<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;Submit2&quot;</span> type<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;reset&quot;</span> value<span style="color: #006600; font-weight: bold;">=</span><span style="color: #cc0000;">&quot;重置&quot;</span> <span style="color: #006600; font-weight: bold;">/&gt;</span><br />
<span style="color: #006600; font-weight: bold;">&lt;/</span>form<span style="color: #006600; font-weight: bold;">&gt;</span></div></td></tr></tbody></table></div>
<p>它调用以下verify.asp文件：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #006600; font-weight: bold;">&amp;</span>lt<span style="color: #006600; font-weight: bold;">;%</span><br />
<span style="color: #0000ff; font-weight: bold;">Function</span> RegExpTest<span style="color: #006600; font-weight:bold;">&#40;</span>patrn, strng<span style="color: #006600; font-weight:bold;">&#41;</span><br />
<span style="color: #990099; font-weight: bold;">Dim</span> regEx, retVal <span style="color: #008000;">' 建立变量。</span><br />
<span style="color: #990099; font-weight: bold;">Set</span> regEx <span style="color: #006600; font-weight: bold;">=</span> <span style="color: #0000ff; font-weight: bold;">New</span> RegExp <span style="color: #008000;">' 建立正则表达式。</span><br />
regEx.<span style="color: #9900cc;">Pattern</span> <span style="color: #006600; font-weight: bold;">=</span> patrn <span style="color: #008000;">' 设置模式。</span><br />
regEx.<span style="color: #9900cc;">IgnoreCase</span> <span style="color: #006600; font-weight: bold;">=</span> <span style="color: #0000ff; font-weight: bold;">False</span> <span style="color: #008000;">' 设置是否区分大小写。</span><br />
retVal <span style="color: #006600; font-weight: bold;">=</span> regEx.<span style="color: #9900cc;">Test</span><span style="color: #006600; font-weight:bold;">&#40;</span>strng<span style="color: #006600; font-weight:bold;">&#41;</span> <span style="color: #008000;">' 执行搜索测试。</span><br />
<span style="color: #990099; font-weight: bold;">If</span> retVal <span style="color: #990099; font-weight: bold;">Then</span><br />
RegExpTest <span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;合法用户名。&quot;</span><br />
<span style="color: #990099; font-weight: bold;">Else</span><br />
RegExpTest <span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;非法用户名。&quot;</span><br />
<span style="color: #990099; font-weight: bold;">End</span> <span style="color: #990099; font-weight: bold;">If</span><br />
<span style="color: #990099; font-weight: bold;">End</span> <span style="color: #0000ff; font-weight: bold;">Function</span><br />
<br />
name<span style="color: #006600; font-weight: bold;">=</span><span style="color: #990099; font-weight: bold;">request</span>.<span style="color: #330066;">form</span><span style="color: #006600; font-weight:bold;">&#40;</span><span style="color: #cc0000;">&quot;name&quot;</span><span style="color: #006600; font-weight:bold;">&#41;</span><br />
psw<span style="color: #006600; font-weight: bold;">=</span><span style="color: #990099; font-weight: bold;">request</span>.<span style="color: #330066;">form</span><span style="color: #006600; font-weight:bold;">&#40;</span><span style="color: #cc0000;">&quot;psw&quot;</span><span style="color: #006600; font-weight:bold;">&#41;</span><br />
sex<span style="color: #006600; font-weight: bold;">=</span><span style="color: #990099; font-weight: bold;">request</span>.<span style="color: #330066;">form</span><span style="color: #006600; font-weight:bold;">&#40;</span><span style="color: #cc0000;">&quot;sex&quot;</span><span style="color: #006600; font-weight:bold;">&#41;</span><br />
city<span style="color: #006600; font-weight: bold;">=</span><span style="color: #990099; font-weight: bold;">request</span>.<span style="color: #330066;">form</span><span style="color: #006600; font-weight:bold;">&#40;</span><span style="color: #cc0000;">&quot;city&quot;</span><span style="color: #006600; font-weight:bold;">&#41;</span><br />
<span style="color: #990099; font-weight: bold;">Response</span>.<span style="color: #330066;">write</span> RegExpTest<span style="color: #006600; font-weight:bold;">&#40;</span><span style="color: #cc0000;">&quot;^[-_a-zA-Z0-9\u4e00-\u9fa5]{2,20}$&quot;</span>, name<span style="color: #006600; font-weight:bold;">&#41;</span><br />
<span style="color: #006600; font-weight: bold;">%&amp;</span>gt<span style="color: #006600; font-weight: bold;">;</span></div></td></tr></tbody></table></div>
<p>运行界面见附图。<img style="max-width: 800px;" src="http://i3.6.cn/cvbnm/83/1a/5c/bc56d8b70e9fcc5f9565b47cc651def5.jpg" alt="" /></p>
<p>另外，还有一些<a href="http://iregex.org">正则表达式</a>可供参考：</p>
<p>匹配中文字符的<a href="http://iregex.org">正则表达式</a>：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #006600; font-weight:bold;">&#91;</span>\u4e00-\u9fa5<span style="color: #006600; font-weight:bold;">&#93;</span></div></td></tr></tbody></table></div>
<p>匹配双字节字符(包括汉字在内)<a href="http://iregex.org">正则表达式</a>：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #006600; font-weight:bold;">&#91;</span>^\x00-\xff<span style="color: #006600; font-weight:bold;">&#93;</span></div></td></tr></tbody></table></div>
<p>匹配空行的<a href="http://iregex.org">正则表达式</a>：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">\n<span style="color: #006600; font-weight:bold;">&#91;</span>\s<span style="color: #006600; font-weight: bold;">|</span> &nbsp; <span style="color: #006600; font-weight:bold;">&#93;</span><span style="color: #006600; font-weight: bold;">*</span>\r</div></td></tr></tbody></table></div>
<p>匹配HTML标记的<a href="http://iregex.org">正则表达式</a>：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #006600; font-weight: bold;">/</span> <span style="color: #006600; font-weight: bold;">&amp;</span>lt<span style="color: #006600; font-weight: bold;">;</span><span style="color: #006600; font-weight:bold;">&#40;</span>.<span style="color: #006600; font-weight: bold;">*</span><span style="color: #006600; font-weight:bold;">&#41;</span><span style="color: #006600; font-weight: bold;">&amp;</span>gt<span style="color: #006600; font-weight: bold;">;</span> .<span style="color: #006600; font-weight: bold;">*</span> <span style="color: #006600; font-weight: bold;">&amp;</span>lt<span style="color: #006600; font-weight: bold;">;</span>\<span style="color: #006600; font-weight: bold;">/</span>\<span style="color: #800000;">1</span><span style="color: #006600; font-weight: bold;">&amp;</span>gt<span style="color: #006600; font-weight: bold;">;</span> <span style="color: #006600; font-weight: bold;">|</span> <span style="color: #006600; font-weight: bold;">&amp;</span>lt<span style="color: #006600; font-weight: bold;">;</span><span style="color: #006600; font-weight:bold;">&#40;</span>.<span style="color: #006600; font-weight: bold;">*</span><span style="color: #006600; font-weight:bold;">&#41;</span> &nbsp; \<span style="color: #006600; font-weight: bold;">/&amp;</span>gt<span style="color: #006600; font-weight: bold;">;</span> <span style="color: #006600; font-weight: bold;">/</span></div></td></tr></tbody></table></div>
<p>匹配首尾空格的<a href="http://iregex.org">正则表达式</a>：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #006600; font-weight:bold;">&#40;</span>^\s<span style="color: #006600; font-weight: bold;">*</span><span style="color: #006600; font-weight:bold;">&#41;</span><span style="color: #006600; font-weight: bold;">|</span><span style="color: #006600; font-weight:bold;">&#40;</span>\s<span style="color: #006600; font-weight: bold;">*</span>$<span style="color: #006600; font-weight:bold;">&#41;</span></div></td></tr></tbody></table></div>
<p>用<a href="http://iregex.org">正则表达式</a>限制只能输入中文：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">onkeyup<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;value=value.replace(/[^\u4E00-\u9FA5]/g, ' ') &quot;</span> &nbsp; onbeforepaste<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;clipboardData.setData( 'text ',clipboardData.getData( 'text ').replace(/[^\u4E00-\u9FA5]/g, ' ')) &quot;</span></div></td></tr></tbody></table></div>
<p>用<a href="http://iregex.org">正则表达式</a>限制只能输入全角字符：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">onkeyup<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;value=value.replace(/[^\uFF00-\uFFFF]/g, ' ') &quot;</span> &nbsp; onbeforepaste<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;clipboardData.setData( 'text ',clipboardData.getData( 'text ').replace(/[^\uFF00-\uFFFF]/g, ' ')) &quot;</span></div></td></tr></tbody></table></div>
<p>用<a href="http://iregex.org">正则表达式</a>限制只能输入数字：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">onkeyup<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;value=value.replace(/[^\d]/g, ' ') &nbsp; &quot;</span>onbeforepaste<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;clipboardData.setData( 'text ',clipboardData.getData( 'text ').replace(/[^\d]/g, ' ')) &quot;</span></div></td></tr></tbody></table></div>
<p>用<a href="http://iregex.org">正则表达式</a>限制只能输入数字和英文：</p>
<div class="codecolorer-container asp mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="asp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">onkeyup<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;value=value.replace(/[\W]/g, ' ') &nbsp; &quot;</span>onbeforepaste<span style="color: #006600; font-weight: bold;">=</span> <span style="color: #cc0000;">&quot;clipboardData.setData( 'text ',clipboardData.getData( 'text ').replace(/[^\d]/g, ' ')) &quot;</span></div></td></tr></tbody></table></div>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/regular-expressions-to-match-chinese-username-in-asp.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>[老贴整理]如何使用正则式从英文句子里提取词根</title>
		<link>http://iregex.org/blog/%e8%80%81%e8%b4%b4%e6%95%b4%e7%90%86%e5%a6%82%e4%bd%95%e4%bd%bf%e7%94%a8%e6%ad%a3%e5%88%99%e5%bc%8f%e4%bb%8e%e8%8b%b1%e6%96%87%e5%8f%a5%e5%ad%90%e9%87%8c%e6%8f%90%e5%8f%96%e8%af%8d%e6%a0%b9.html</link>
		<comments>http://iregex.org/blog/%e8%80%81%e8%b4%b4%e6%95%b4%e7%90%86%e5%a6%82%e4%bd%95%e4%bd%bf%e7%94%a8%e6%ad%a3%e5%88%99%e5%bc%8f%e4%bb%8e%e8%8b%b1%e6%96%87%e5%8f%a5%e5%ad%90%e9%87%8c%e6%8f%90%e5%8f%96%e8%af%8d%e6%a0%b9.html#comments</comments>
		<pubDate>Fri, 25 Apr 2008 08:07:09 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[问答]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[root]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=3</guid>
		<description><![CDATA[以前在chinaunix回答过这样一个问题，用到了正则表达式（而且我认为正则式解决此类问题是最合适的。） 学英语的一些例句，每句都有若干词根相同的词，例如 She swears to wear the pearls that appear ... ]]></description>
			<content:encoded><![CDATA[<p>以前在chinaunix回答过<a href="http://bbs.chinaunix.net/viewthread.php?tid=1021624" target="_blank">这样一个问题</a>，用到了正则表达式（而且我认为正则式解决此类问题是最合适的。）</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>学英语的一些例句，每句都有若干词根相同的词，例如 She swears to wear the pearls that appear to be pears. 但是每句的词根都未必相同；我希望把这些包含词根的词都标记出来，请问如何写？</p>
<p>这里说的<strong>词根</strong>不是原本词根的定义，只是一组字母序列，比如</p>
<p>9. The dust in the industrial zone frustrated the industrious man.</p>
<p>词根是dust或ust</p>
<p>10. The just budget judge just justifies the adjustment of justice.</p>
<p>词根是dust</p>
<p>11. I used to abuse the unusual usage, but now I&#8217;m not used to doing so.</p>
<p>词根是use，有变形</p>
<p>12. The lace placed in the palace is replaced first, and displaced later.</p>
<p>词根是lace</p>
<p>13. I paced in the peaceful spacecraft.</p>
<p>词根是pace</p>
<p>14. Sir, your bird stirred my girlfriend&#8217;s birthday party.</p>
<p>词根是ir</p></blockquote>
<p>如果您对此问题感兴趣，请独立思考后再继续阅读本站提供的解决方法。</p>
<p><span id="more-5"></span></p>
<p>我的思路是，既然每行句子的结构是一致的，依靠循环就能解决所有问题。因此只要分析一句即可。对于每一句，需要每个单词进行逐个分析。</p>
<p>对第一句作手术分析。</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>She swears to wear the pearls that appear to be pears.</p></blockquote>
<p>人的眼睛一下子就能看出ear是词根。但是，就跟《1984》里面2+2＝?这个含义深刻的式子一样，如何证明它等于几，才是问题所在。</p>
<p>我把自己想像成正则式机器人。我可以一句一句地读取原文。（perl 语法：while(&lt;FILE&gt;)），然后可以读取每个单词来分析（perl语法：\w+表示每个单词）。对于每个单词的任意N（N最小为3，最大为该词词长）个连续字母（记作$matchstr），在整句中检验其出现的次数，将此“词根”和出现次数保存在hash表中。hash表在此的作用是：如果该词根没有记录，则创建该记录，并自动加1。</p>
<p>思路如下。</p>
<ol>
<li>对于每1行</li>
<li>对于每个单词</li>
<li>对于这个单词的任意连续3－N个字母，检查其在文本行中出现的频率M，记录在HASH表中。</li>
<li>对HASH表的值进行排序。取出最大的个一。打印输出。</li>
</ol>
<div class="codecolorer-container perl mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br />30<br />31<br />32<br />33<br />34<br />35<br />36<br />37<br />38<br />39<br />40<br />41<br />42<br />43<br />44<br />45<br /></div></td><td><div class="perl codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #666666; font-style: italic;">#!/usr/bin/perl -w</span><br />
<span style="color: #0000ff;">$/</span> <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
<br />
<span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&amp;lt</span><span style="color: #339933;">;</span><span style="color: #0000ff;">&amp;gt</span><span style="color: #339933;">;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><br />
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">@array</span><span style="color: #339933;">=</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">%myhash</span><span style="color: #339933;">=</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;---------------------------<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;the line is :$_<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #009966; font-style: italic;">/^\w+/</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
<span style="color: #009966; font-style: italic;">s/^(\w+)\W+(.*)$/$2/</span><span style="color: #339933;">;</span><br />
<span style="color: #000066;">push</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@array</span><span style="color: #339933;">,</span><span style="color: #000066;">lc</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$1</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> &nbsp; &nbsp;<span style="color: #666666; font-style: italic;">#save all the words(in lower case format) into array.</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #0000ff;">@b</span><span style="color: #339933;">=</span><span style="color: #0000ff;">@array</span><span style="color: #339933;">;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #666666; font-style: italic;">#copy this array to b, for checking</span><br />
<br />
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$len</span><span style="color: #339933;">;</span><br />
<br />
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$matchlen</span><span style="color: #339933;">;</span><br />
<br />
<span style="color: #b1b100;">foreach</span> <span style="color: #0000ff;">$item</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@array</span><span style="color: #009900;">&#41;</span><br />
<br />
<span style="color: #009900;">&#123;</span><br />
<span style="color: #0000ff;">$len</span><span style="color: #339933;">=</span><span style="color: #000066;">length</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$item</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$matchlen</span><span style="color: #339933;">=</span><span style="color: #0000ff;">$len</span><span style="color: #339933;">;</span><span style="color: #0000ff;">$matchlen</span><span style="color: #0000ff;">&amp;gt</span><span style="color: #339933;">;=</span><span style="color: #cc66cc;">3</span><span style="color: #339933;">;</span><span style="color: #0000ff;">$matchlen</span><span style="color: #339933;">--</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span><span style="color: #0000ff;">$i</span><span style="color: #0000ff;">&amp;lt</span><span style="color: #339933;">;=</span><span style="color: #0000ff;">$len</span><span style="color: #339933;">-</span><span style="color: #0000ff;">$matchlen</span><span style="color: #339933;">;</span><span style="color: #0000ff;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
<span style="color: #0000ff;">$matchstr</span><span style="color: #339933;">=</span><span style="color: #000066;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$item</span><span style="color: #339933;">,</span><span style="color: #0000ff;">$i</span><span style="color: #339933;">,</span><span style="color: #0000ff;">$matchlen</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> &nbsp;<span style="color: #666666; font-style: italic;">#define the matchstring.</span><br />
<span style="color: #b1b100;">foreach</span> <span style="color: #0000ff;">$pig</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@b</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
<span style="color: #b1b100;">next</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$item</span> <span style="color: #b1b100;">eq</span> <span style="color: #0000ff;">$pig</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;">#the word can not match against itself.</span><br />
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">$pig</span> <span style="color: #339933;">=~</span> <span style="color: #009966; font-style: italic;">/$matchstr/</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
<span style="color: #0000ff;">$myhash</span><span style="color: #009900;">&#123;</span><span style="color: #0000ff;">$matchstr</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">++;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;">#if matches, record them.</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000066;">keys</span> <span style="color: #0000ff;">%myhash</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;$_:$myhash{$_};<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span><span style="color: #339933;">;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;">#print all the successful match records.</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span></div></td></tr></tbody></table></div>
<p>注：本站使用了WP-CODEBOX Plugin，您可以参考<a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank">此处格式</a>在评论中加入代码。</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/%e8%80%81%e8%b4%b4%e6%95%b4%e7%90%86%e5%a6%82%e4%bd%95%e4%bd%bf%e7%94%a8%e6%ad%a3%e5%88%99%e5%bc%8f%e4%bb%8e%e8%8b%b1%e6%96%87%e5%8f%a5%e5%ad%90%e9%87%8c%e6%8f%90%e5%8f%96%e8%af%8d%e6%a0%b9.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
