<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>我爱正则表达式 &#187; firefox</title>
	<atom:link href="http://iregex.org/blog/tag/firefox/feed" rel="self" type="application/rss+xml" />
	<link>http://iregex.org</link>
	<description>原创、翻译、转载关于正则表达式的文章</description>
	<lastBuildDate>Sun, 27 Jun 2010 04:20:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://blogsearch.google.com/ping/RPC2"/><atom:link rel="hub" href="http://blog.yodao.com/ping/RPC2"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://www.xianguo.com/xmlrpc/ping.php"/><atom:link rel="hub" href="http://www.zhuaxia.com/rpc/server.php"/><atom:link rel="hub" href="http://rpc.technorati.com/rpc/ping"/><atom:link rel="hub" href="http://rpc.pingomatic.com/"/>	
<!-- Start Of Script Generated By WP-PostViews Plus -->
<script type='text/javascript' src='http://iregex.org/wp-includes/js/jquery/jquery.js?ver=1.4.2'></script>
<script type="text/javascript">
/* <![CDATA[ */
/* ]]> */
</script>
<!-- End Of Script Generated By WP-PostViews Plus -->
	<item>
		<title>饭否消息解析之从minidom到xpath</title>
		<link>http://iregex.org/blog/fanfou-message-extractor-from-minidom-to-xpath.html</link>
		<comments>http://iregex.org/blog/fanfou-message-extractor-from-minidom-to-xpath.html#comments</comments>
		<pubDate>Tue, 14 Oct 2008 10:00:58 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[教程]]></category>
		<category><![CDATA[fanfou]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=35</guid>
		<description><![CDATA[抛板砖，引白玉：为何不用xpath，什么是xpath？ 最近拾起了以前的小项目，在完善上篇文章发布后，“那个谁”的回复让我很感兴趣。他问，“为什么不用xpath？” xpath是什么东东？我反问。反... ]]></description>
			<content:encoded><![CDATA[<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">抛板砖，引白玉：为何不用xpath，什么是xpath？</h2>
<p>最近拾起了以前的小项目，在完善<a href="http://iregex.org/blog/fanfou-message-extractor-regex-vs-xml.html">上篇文章</a>发布后，“那个谁”的回复让我很感兴趣。他问，“为什么不用xpath？”</p>
<p>xpath是什么东东？我反问。反问之前，当然少不了先google一番，以免……那个啥。<br />
<span id="more-35"></span><br />
首先映入眼帘的是<a href="http://www.w3c.org/TR/xpath">w3c</a> ，对xpath的介绍如下：</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. </p></blockquote>
<p>直译为中文就是，</p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>XPath 是一种语言，用于在XML文档中定位各部分内容，可由XSLT或XPointer调用。</p></blockquote>
<p>还搜索到<a href="http://www.zvon.org/xxl/XPathTutorial/General/examples.html">xpath</a>的教程，在这里。草草看过，当时并未着意。</p>
<p>虽如此，但是python里的minidom模块，也有此功效呀。为什么非要使用xpath呢？尤其是考虑到在python中还需要额外安装，不如minidom之放之四海而皆可运行。</p>
<p>跟那个谁再交流，意见仍是“力荐”。还推荐我细读<a href="http://www.zvon.org/xxl/XPathTutorial/General/examples.html">教程</a>，并在firefox里使用<a href="https://addons.mozilla.org/zh-CN/firefox/addon/1095">XPath Checker</a>插件。</p>
<p>于是就照办了。</p>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">发硎新试,其快可知</h2>
<p>一试XPath Checker，果然石破天惊。选中部分网页文字后，在右键菜单中选&#8221;View Xpath&#8221;，立即显示出该节点的XPath路径。层次清晰，定位精准。只是我对其语法尚未了了。于是细读教程，边学边用；半小时后，已经能够运用到之前写的饭否信息抓取程序上。虽然写代码还有些吃力，但是思路很清晰，不会纠缠于细节中无法脱身。</p>
<p>那个谁还提议，一般的html文档不是标准的xml文档，因此用xpath解析时，最好格式化一下。</p>
<p>我也注意到这个问题了。从饭否html中取出的有用内容，只占全文的一小部分；额外的部分白白拖慢速度，增强析取难度。</p>
<p>经过实验，我将原代码改进如下：</p>
<p>1. 仍用原来的minidom模块下载、分析文档，只取&lt;ol&gt;与&lt;/ol&gt;之间的部分。这部分保存成字符串格式，备用。只取需要的那部分，使结构清晰，层次浅显。</p>
<p>2. 使用xpath来解析上一步取出的字串。</p>
<p>到现在，/，//，@，[]，=，等等，每个符号都从原来的meaningless变成helpful，在我的工具箱中有了合适的位置，随取随用，十分方便。我已经成了xpath的受益者。现在才觉得学习xpath真是很有趣、有用。</p>
<p>目前还有个小问题，无法使用纯粹的xpath语法解决。问题描述如下：</p>
<p>xpath只能解析实体内容，不能&#8221;囫囵吞枣&#8221;地解析。例如：</p>
<div class="codecolorer-container xml mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br /></div></td><td><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;li<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;a</span> <span style="color: #000066;">href</span>=<span style="color: #ff0000;">'http://a.com'</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>hello world<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/a<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><br />
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/li<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></div></td></tr></tbody></table></div>
<p>在view xpath 下，使用/li/a，得到的是</p>
<div class="codecolorer-container xml mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;a</span> <span style="color: #000066;">href</span>=<span style="color: #ff0000;">'http://a.com'</span><span style="color: #000000; font-weight: bold;">&gt;</span></span>hello world<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/a<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></div></td></tr></tbody></table></div>
<p>全部内容；</p>
<p>但是在python下，使用</p>
<div class="codecolorer-container python mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">method=doc.<span style="color: black;">xpath</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string(/li/a)'</span><span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span></div></td></tr></tbody></table></div>
<p>虽然，也能通过/li/a/@href得到&#8217;http://a.com&#8217;的内容。</p>
<p>却只能得到hello world。xpath把所有的&lt;&gt;之内的东西给消灭掉了。很诡异。</p>
<p>遇到这种情况，如果我想得到整条的信息，就使用list.childNodes[index-1].firstChild.toxml()[22:-7]这种变通方式。不过，之前的doc = Parse(str(list.toxml()))我觉得用得挺好，是自己的一个&#8221;创举&#8221;，在程序中再度使用一下传统的xml解析方式，也无可厚非。当然，如果能够在xpath下把上述所有的事情都处理掉，是最好的。</p>
<p>经过了一点点的修补、改进，最终的饭否消息程序如下（核心代码部分）：</p>
<div class="codecolorer-container python mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br /></div></td><td><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> __getMsgByPage__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>,page<span style="color: black;">&#41;</span>:<br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; url=<span style="color: #483d8b;">&quot;http://fanfou.com/&quot;</span>+<span style="color: #008000;">self</span>.<span style="color: #dc143c;">user</span>+<span style="color: #483d8b;">&quot;/p.&quot;</span>+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; node = minidom.<span style="color: black;">parse</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">urllib2</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">list</span> = node.<span style="color: black;">getElementsByTagName</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;ol&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; doc = Parse<span style="color: black;">&#40;</span><span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">list</span>.<span style="color: black;">toxml</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; cu=<span style="color: #008000;">self</span>.<span style="color: black;">sql</span>.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">max</span>=doc.<span style="color: black;">xpath</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">''</span><span style="color: #483d8b;">'count(/ol/li)'</span><span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">max</span>=<span style="color: #008000;">int</span><span style="color: black;">&#40;</span><span style="color: #008000;">max</span><span style="color: black;">&#41;</span>+<span style="color: #ff4500;">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">max</span>==<span style="color: #ff4500;">1</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">max</span>=<span style="color: #008000;">int</span><span style="color: black;">&#40;</span><span style="color: #008000;">max</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> index <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>,<span style="color: #008000;">max</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; method=doc.<span style="color: black;">xpath</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string(/ol/li[%d]//span[@class='</span>method<span style="color: #483d8b;">'])'</span><span style="color: #483d8b;">''</span> <span style="color: #66cc66;">%</span> index<span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span>:<span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; method=method.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">' '</span>,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> method==<span style="color: #483d8b;">&quot;彩信&quot;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #dc143c;">time</span>=doc.<span style="color: black;">xpath</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string(/ol/li[%d]//span[@class=&quot;time&quot;]/@title)'</span><span style="color: #483d8b;">''</span>\<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">%</span> index<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uuid=doc.<span style="color: black;">xpath</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string(/ol/li[%d]//a[@class='</span>photo<span style="color: #483d8b;">']/@href)'</span><span style="color: #483d8b;">''</span>\<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">%</span> index<span style="color: black;">&#41;</span> <span style="color: black;">&#91;</span>-<span style="color: #ff4500;">11</span>:<span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #dc143c;">time</span>=doc.<span style="color: black;">xpath</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string(/ol/li[%d]//a[@class='</span><span style="color: #dc143c;">time</span><span style="color: #483d8b;">']/@title)'</span><span style="color: #483d8b;">''</span>\<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #66cc66;">%</span> index<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uuid=doc.<span style="color: black;">xpath</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">''</span><span style="color: #483d8b;">'string(/ol/li[%d]//a[@class='</span><span style="color: #dc143c;">time</span><span style="color: #483d8b;">']/@href)'</span><span style="color: #483d8b;">''</span> <span style="color: #66cc66;">%</span> index<span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>-<span style="color: #ff4500;">11</span>:<span style="color: black;">&#93;</span><br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; content = <span style="color: #008000;">list</span>.<span style="color: black;">childNodes</span><span style="color: black;">&#91;</span>index-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>.<span style="color: black;">firstChild</span>.<span style="color: black;">toxml</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">22</span>:-<span style="color: #ff4500;">7</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># content, uuid, time, method are now available for further use.</span></div></td></tr></tbody></table></div>
<p>最关键的代码，只有几行而已。省掉了原来长篇累牍的coding。效率也错，我将自己近3000条饭否消息批量下载，共150余页，历时86秒。饭否服务器也很给面子，中途没有封锁我。</p>
<p><strong>总结一下</strong>：Xpath很适合在xml中定位各部分内容，定位精准，描述性极佳，是xml中的搜索利器。经常做xml解析的，不妨尝试一把。</p>
<h2 style="background-color:#99CC00; font-size:14px; padding-bottom:3px; padding-left:10px; padding-top:3px;  line-height:1.5em; margin:1.5em 0 1em;">个人感言</h2>
<p>从纯手工正则表达式解析，到使用minidom解析，再到使用xpath，看似弯路，其实蛮有收获。从自己事必躬亲精确控制每一个细节（用手工作），再到借助工具实现一部分功能（手脑并用），再到完全用合适的工具来处理全部事情（用脑工作），似乎正是良性的发展路径。自豪地说，由于我已经使用过纯手工正则表达式的解析，即使现有的工具不适合我，我进可攻，退可守；我知道解析的细节，现有的工具（好看的封装而已嘛）骗不了我，即使它包装得再好，还是正则表达式在作引擎（曾经读过python处理xml的相关库文件的python代码，感谢开源）；从追求实现(it works!)到追求卓越的实现(the excellent solution)，也是进步的必然。我不是说使用正则式就低级——我从来没有说过诸如此类的话，不论是对正则表达式，还是对正则表达式的使用者；事实上，正则表达式一直是我的箧中飞刃；我爱正则表达式！——只是说，不同的工具在合适的场合，有不同的效用。不单要知道某种工具的缺点以便能够避其短，更重要的是要知道它的优点以便扬其长。这样才能从容地调兵遣将，手下无不可用之工具。</p>
<p>相关链接：</p>
<ul>
<li><a href="http://www.w3.org/TR/xpath">W3C关于XPath的介绍</a></li>
<li><a href="http://www.zvon.org/xxl/XPathTutorial/General/examples.html">xpath教程</a>，有中文版，图文并茂，清晰易懂。</li>
<li><a href="http://4suite.org">4suite</a>，python的xpath套件</li>
<li><a href="http://search.cpan.org/~samtregar/Class-XPath-1.4/XPath.pm">perl其实也有xpath的</a>。未测试试。</li>
<li><a href="https://addons.mozilla.org/zh-CN/firefox/addon/1095">XPath Checker</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/fanfou-message-extractor-from-minidom-to-xpath.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>如狐添翼：FireFox3正则表达式搜索插件/Find Bar/</title>
		<link>http://iregex.org/blog/find-bar-for-firefox-and-thunderbird-as-regular-expression-searching-engine.html</link>
		<comments>http://iregex.org/blog/find-bar-for-firefox-and-thunderbird-as-regular-expression-searching-engine.html#comments</comments>
		<pubDate>Mon, 28 Jul 2008 02:18:57 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[软件]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[thunderbird]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=20</guid>
		<description><![CDATA[原来在FireFox2.x中使用过一个插件FindBar，十分好用。升级到firefox3.0之后，FB居然冬眠。今天终于看到更新，能继续在firefox中使用了，心情超爽，特写日志以记之。 意译一下作者的介绍。插件主... ]]></description>
			<content:encoded><![CDATA[<p>原来在FireFox2.x中使用过一个插件FindBar，十分好用。升级到firefox3.0之后，FB居然冬眠。今天终于看到更新，能继续在firefox中使用了，心情超爽，特写日志以记之。<a href="https://1n5vfq.bay.livefilestore.com/y1p9yjXzii5NFmllV2lT0g4PYDrg90_Z4F1mRJBxymy-0BQ4AB-Maiua3F9ktYIVYLWP4yd_EqbB_mMUAq_6zKc7g/%E5%8F%A4%E5%BA%99%E5%87%84%E9%A3%8E.mp3"></a><br />
<span id="more-20"></span><br />
意译一下作者的介绍。插件主页、介绍原文在<a href="http://www.oxymoronical.com/web/firefox/FindBarRX" target="_blank">这里</a>。</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<a href="http://iregex.org" target="_blank">正则表达式</a>分割线&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">/Find Bar/</h3>
<p>/Find Bar/ is a new extension for Firefox that is still a little bit in progress. I&#8217;ve always believed that one of the best features of Firefox is it&#8217;s quick find bar. I probably use it about 20 times a day, if not more. But it has to be said its a fairly simple beast. When it comes to more powerful searches you&#8217;re just out of luck. This extension adds a whole new dimension to the find bar, regular expressions. The regular expressions are implemented using the JavaScript engine so check the JavaScript RegExp syntax for the full details.</p>
<p>/Find Bar/是FireFox的新插件，目前仍在完善中。对于firefox，我一直觉得它方便快捷的搜索栏是其亮点之一。我每天使用该功能20次以上。但是必须承认，其搜索功能太过薄弱。当你要搜索更复杂的内容时，它就无能为力了。</p>
<p>本插件为搜索栏添加了新选项：<a href="http://iregex.org" target="_blank">正则表达式</a>。它使用了JavaScript正则式引擎。请查询<a href="http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Guide:Regular_Expressions" target="_blank">JavaScript RegExp文档</a>来了解语法细节内容。</p>
<p>有待解决的两个问题：</p>
<ul>
<li> Whitespace. HTML is made up a lot of this, most of it is ignored by the browser and not visible on the page. At the moment this extension doesn’t ignore anything so you may find there are more spaces between words than you expected. Should I err on the side of accuracy as it is now, or collapse all whitespace?空白符（水平制表符、空格等）。HTML中包含了许多空白符，其中一大部分被浏览器忽略掉，不在页面上显示出来。目前该插件没有忽略任何字符，因此你会发现，单词之间往往有更多的空白符符。我是该精确地显示出每个空白字符呢，还是该将其全部压缩？</li>
<li> Block content. The standard find bar wont find searches that span blocks (paragraphs in human terms). This extension does, which while potentially useful also causes some issues. One thought is to make each paragraph like a line then you can use line breaks to match paragraphs as you might expect.区块内容。标准的搜索栏不会跨区块（亦即段落）搜索，而本插件搜索时却会跨越区块，这既是便利之处，但是也有可能带来副作用。一种思路是将每一段落处理为一个文本行，这样您就可以使用换行符来匹配段落。</li>
</ul>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">该插件的兼容性</h3>
<p style="text-align: center; font-weight: bold; font-size: 107%;">
<table border="1" cellspacing="1" cellpadding="2" width="281">
<tbody>
<tr>
<td width="101" valign="top">FireFox</td>
<td width="52" valign="top"><img style="vertical-align: middle;" title="Firefox" src="http://www.oxymoronical.com/shared/images/firefox.png" alt="Firefox" /></td>
<td width="122" valign="top">2.0b1 &#8211; 3.1a2pre</td>
</tr>
<tr>
<td width="103" valign="top">ThunderBird</td>
<td width="54" valign="top"><img style="vertical-align: middle;" title="Thunderbird" src="http://www.oxymoronical.com/shared/images/thunderbird.png" alt="Thunderbird" /></td>
<td width="122" valign="top">2.0 &#8211; 3.0a1</td>
</tr>
</tbody>
</table>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">下载地址</h3>
<p style="text-align: center; font-weight: bold; font-size: 107%;"><a href="https://addons.mozilla.org/downloads/file/34434/_find_bar_-1.0.1-fx+tb.xpi"><img style="vertical-align: middle;" src="https://addons.mozilla.org/img/addon-icn.png" alt="/Find Bar/" /></a> <a href="https://addons.mozilla.org/downloads/file/34434/_find_bar_-1.0.1-fx+tb.xpi">Install v1.0.1</a></p>
<p style="text-align: center; font-size: 84%;">(Installs from <a href="https://addons.mozilla.org/addon/6534">addons.mozilla.org</a>)</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<a href="http://iregex.org" target="_blank">正则表达式</a>分割线&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</p>
<p>跨区块搜索我觉得不是大问题，对于空白字符，我的解决方法是，只要是使用<a href="http://iregex.org" target="_blank">正则表达式</a>来搜索，就在使用空格时使用\s+来表示。这样，从内容上，它表示了任何种类的空白字符；从数量上，它表示了最少一个（多则不限）的空白字符，既不错杀，也无冤死。</p>
<p>由此引发的感慨是：人的眼睛所能看到的字符是有限的，其所看到内容与事实的真相有时未必相符。而程序是精确的，每一个比特都明察秋毫。（例如在汇编语言中的数字0与ASCII字符&#8217;0&#8242;就截然不同。虽然在perl中，两者单独出现时被视为是同一事物。）</p>
<p>该插件还能结合原搜索栏的全部高亮显示选项来显示，便于显示所有的匹配。在搜索英文等外语时，你会发现这一点十分有用。</p>
<p>另外有一点得陇望蜀的心思是，如果该插件支持保存常用搜索就好了。比方搜索email地址的、URL的，日期的，诸如此类。虽如此，这款插件已经十分好用，强烈推荐。</p>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/find-bar-for-firefox-and-thunderbird-as-regular-expression-searching-engine.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
