<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>我爱正则表达式 &#187; curl</title>
	<atom:link href="http://iregex.org/blog/tag/curl/feed" rel="self" type="application/rss+xml" />
	<link>http://iregex.org</link>
	<description>原创、翻译、转载关于正则表达式的文章</description>
	<lastBuildDate>Sun, 27 Jun 2010 04:20:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://blogsearch.google.com/ping/RPC2"/><atom:link rel="hub" href="http://blog.yodao.com/ping/RPC2"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://www.xianguo.com/xmlrpc/ping.php"/><atom:link rel="hub" href="http://www.zhuaxia.com/rpc/server.php"/><atom:link rel="hub" href="http://rpc.technorati.com/rpc/ping"/><atom:link rel="hub" href="http://rpc.pingomatic.com/"/>	
<!-- Start Of Script Generated By WP-PostViews Plus -->
<script type='text/javascript' src='http://iregex.org/wp-includes/js/jquery/jquery.js?ver=1.4.2'></script>
<script type="text/javascript">
/* <![CDATA[ */
/* ]]> */
</script>
<!-- End Of Script Generated By WP-PostViews Plus -->
	<item>
		<title>skydrive外链mp3方案</title>
		<link>http://iregex.org/blog/skydrive-mp3-with-google-player.html</link>
		<comments>http://iregex.org/blog/skydrive-mp3-with-google-player.html#comments</comments>
		<pubDate>Sun, 10 Jan 2010 12:29:48 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[应用]]></category>
		<category><![CDATA[curl]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[mp3]]></category>
		<category><![CDATA[skydrive]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=74</guid>
		<description><![CDATA[使用合租空间的独立博客，例如本人，有时想在自己的空间上传mp3，又有版权（美国空间要求趋严）、流量（被迅雷爬到后果很严重）的担心。经过比较，觉得skydrive的空间挺不错的，25G空间，... ]]></description>
			<content:encoded><![CDATA[<p>使用合租空间的独立博客，例如本人，有时想在自己的空间上传mp3，又有版权（美国空间要求趋严）、流量（被迅雷爬到后果很严重）的担心。经过比较，觉得skydrive的空间挺不错的，25G空间，可支持外链。唯一不足之处是操作比较复杂，使用普通的方法不容易批量提取mp3的外链。今天下午做出一种简单易行的方法，可以直接抓取skydrive的公开文件夹里的mp3音乐文件绝对地址并生成Google Player播放代码（因此您就不需要再安装播放mp3的wordpress各种插件了）。所写的php源码一并贴出，有兴趣的自行研究。如果是<a title="我爱正则表达式" target="_blank" href="http://iregex.org" id="n2fe">正则表达式</a>方面的讨论，欢迎跟贴；其它问题恕不回复，见谅。<br />
<span id="more-74"></span><br />
最终效果如下图：</p>
<p><a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110_190109.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a></p>
<p><a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110_183937.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a></p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">上传</h3>
<p>使用您的liveID在<a title="Skydrive, 25G space!" target="_blank" href="http://skydrive.live.com/" id="nu96">这里</a>登录，然后新建一个<span style="color: rgb(255, 0, 255);">公开</span>的文件夹。之所以要公开，是因为您的mp3是要放在博客上播放的，如果设为私密型，别人就无法欣赏到了。</p>
<p>修改权限的方法见贴图：<br />
<a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110_184249.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a></p>
<p><a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110_184319.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a><br />
上传时，如果是在IE浏览器下，会有提示安装插件，建议安装。这样就可以将待上传的文件批量拖过来上传了。每个文件不超过50M。总文件的大小没有限制。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">指定需要外链的文件地址<br />
</h3>
<p>您可以指定为某个文件夹生成代码，也可以指定文件生成代码。无论哪种方式，都是一个文件对应一段代码，而不是将所有的播放文件生成一个播放列表。您需要先记下该文件的页面地址，然后根据该地址生成代码。</p>
<p>获得单个文件的地址：<br />
<a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110_185001.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a></p>
<p>获取文件夹的地址：<br />
<a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110_184937.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a></p>
<p>拷贝好页面地址备用。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">生成播放代码</h3>
<p>请移步到这里：<br />
<a title="我爱正则表达式" target="_blank" href="http://zh-en.org/livemp3/" id="ikzb">http://zh-en.org/livemp3/</a></p>
<p><a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110201229.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a></p>
<p>输入上一步得到的页面地址，点击OK，大约2秒钟之后，就会看到这样的内容了：</p>
<p>&nbsp;<br />
<a href="http://iregex.org/blog/skydrive-mp3-with-google-player.html" target="_blank"><img src="http://i293.photobucket.com/albums/mm60/zhasm/20100110201150.png" alt="我爱正则表式|mp3+Skydrive+GooglePlayer" border="0"></a></p>
<p>将生成的源代码拷贝到wordpress中，就能看到播放器了。</p>
<h3 style="color: #127ADB; font-size:14px; padding-bottom:3px; padding-top:3px; margin:1.5em 0 1em;">源代码 </h3>
<p>程序很简单，获得页面地址，使用curl来下载页面，然后使用正则表达式来析取绝对地址，然后生成播放代码，如此而已。其中google player的代码，我是在google reader中读《<a href="http://www.baibanbao.net/">白板报</a>》的海盗电台时发现的。</p>
<p>如果您感兴趣，还可以将此方案扩展，做skydrive图床，原理一致。不赘述。<br />
<br />
php代码如下：</p>
<div class="codecolorer-container php mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br />30<br />31<br />32<br />33<br />34<br />35<br />36<br />37<br />38<br />39<br />40<br />41<br />42<br />43<br />44<br />45<br />46<br />47<br />48<br />49<br />50<br />51<br />52<br />53<br />54<br />55<br />56<br />57<br />58<br />59<br />60<br />61<br />62<br />63<br />64<br />65<br />66<br />67<br />68<br />69<br />70<br />71<br />72<br />73<br />74<br />75<br />76<br />77<br /></div></td><td><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000000; font-weight: bold;">&lt;?php</span><br />
<span style="color: #666666; font-style: italic;">//use: &nbsp;get mp3, wam, wmv direct links from skydrive's public folder, and generate google player code for that.</span><br />
<span style="color: #666666; font-style: italic;">//author's email&amp;gtalk: &nbsp; rex [at] zhasm [dot] com</span><br />
<span style="color: #666666; font-style: italic;">//last edit: &nbsp; &nbsp;20100110 18:14</span><br />
<br />
<span style="color: #666666; font-style: italic;">//get the curl handle</span><br />
<span style="color: #666666; font-style: italic;">//</span><br />
<span style="color: #000000; font-weight: bold;">function</span> init_curl<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$ch</span> <span style="color: #339933;">=</span> <span style="color: #990000;">curl_init</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <br />
&nbsp; &nbsp; <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_RETURNTRANSFER<span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_BINARYTRANSFER<span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_REFERER<span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;http://skydrive.live.com/&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <br />
&nbsp; &nbsp; <span style="color: #666666; font-style: italic;">//curl_setopt($ch, CURLOPT_POST, 1);</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span style="color: #b1b100;">return</span> <span style="color: #000088;">$ch</span><span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span><br />
<span style="color: #666666; font-style: italic;">// extract mp3 from the given root page;</span><br />
<span style="color: #000000; font-weight: bold;">function</span> get_list<span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; extract_mp3<span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<br />
&nbsp; &nbsp; <span style="color: #666666; font-style: italic;">//echo $url;</span><br />
&nbsp; &nbsp; <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_URL<span style="color: #339933;">,</span> <span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$output</span> <span style="color: #339933;">=</span> <span style="color: #990000;">curl_exec</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #666666; font-style: italic;">//trim the unnecessary parts, for safety</span><br />
<br />
&nbsp; &nbsp; <span style="color: #000088;">$links</span> <span style="color: #339933;">=</span> <span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/^.*?(?=&lt;div id=[\'&quot;]tileView[\'&quot;] class=[\'&quot;]tvContainer[\'&quot;]&gt;)|&lt;div class=&quot;bpViewPermissionsLink&quot;.*$/si'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">''</span><span style="color: #339933;">,</span> <span style="color: #000088;">$output</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #666666; font-style: italic;">//you can add your own music filter </span><br />
&nbsp; &nbsp; <span style="color: #990000;">preg_match_all</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/(?&lt;=&lt;a class=&quot;tvLink&quot;)[^&lt;&gt;]+href=&quot;([^&quot;]+)(?&lt;=mp3|wav|wmv)&quot;/si'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$links</span><span style="color: #339933;">,</span> <span style="color: #000088;">$result</span><span style="color: #339933;">,</span> PREG_PATTERN_ORDER<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$result</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$result</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$r</span><span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; extract_mp3<span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span><span style="color: #000088;">$r</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
<span style="color: #009900;">&#125;</span><br />
&nbsp;<br />
<span style="color: #666666; font-style: italic;">// extract mp3 from the given sub page, generate output code.</span><br />
<span style="color: #000000; font-weight: bold;">function</span> extract_mp3<span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span><span style="color: #000088;">$link</span><span style="color: #009900;">&#41;</span><br />
<span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #990000;">curl_setopt</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span> CURLOPT_URL<span style="color: #339933;">,</span> <span style="color: #000088;">$link</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$output</span> <span style="color: #339933;">=</span> <span style="color: #990000;">curl_exec</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #666666; font-style: italic;">//&nbsp; &lt;a id=&quot;spPreviewLink&quot; href=</span><br />
&nbsp; &nbsp; <span style="color: #990000;">preg_match_all</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'%(?&lt;=&lt;title&gt;)[^&gt;&lt;]+(?= - Windows Live&lt;/title&gt;)%'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$output</span><span style="color: #339933;">,</span> <span style="color: #000088;">$result</span><span style="color: #339933;">,</span> PREG_PATTERN_ORDER<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$title</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$result</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span style="color: #990000;">preg_match_all</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/(?&lt;=&lt;a\sid=&quot;spPreviewLink&quot;\shref=&quot;)[^&quot;]+(?=&amp;#63;download&quot;)/'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$output</span><span style="color: #339933;">,</span> <span style="color: #000088;">$result</span><span style="color: #339933;">,</span> PREG_PATTERN_ORDER<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$result</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <br />
&nbsp; &nbsp; <span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$result</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$r</span><span style="color: #009900;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#123;</span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000088;">$demo</span><span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot; &nbsp; &nbsp;&lt;div class=<span style="color: #000099; font-weight: bold;">\&quot;</span>audio-player-placeholder<span style="color: #000099; font-weight: bold;">\&quot;</span>&gt;<br />
&nbsp; &nbsp; &lt;embed classname=<span style="color: #000099; font-weight: bold;">\&quot;</span>audio-player-embed<span style="color: #000099; font-weight: bold;">\&quot;</span> type=<span style="color: #000099; font-weight: bold;">\&quot;</span>application/x-shockwave-flash<span style="color: #000099; font-weight: bold;">\&quot;</span> src=<span style="color: #000099; font-weight: bold;">\&quot;</span>https://www.google.com/reader/ui/3247397568-audio-player.swf?audioUrl=&quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$r</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\&quot;</span> allowscriptaccess=<span style="color: #000099; font-weight: bold;">\&quot;</span>never<span style="color: #000099; font-weight: bold;">\&quot;</span> allowfullscreen=<span style="color: #000099; font-weight: bold;">\&quot;</span>true<span style="color: #000099; font-weight: bold;">\&quot;</span> quality=<span style="color: #000099; font-weight: bold;">\&quot;</span>best<span style="color: #000099; font-weight: bold;">\&quot;</span> bgcolor=<span style="color: #000099; font-weight: bold;">\&quot;</span>#ffffff<span style="color: #000099; font-weight: bold;">\&quot;</span> wmode=<span style="color: #000099; font-weight: bold;">\&quot;</span>transparent<span style="color: #000099; font-weight: bold;">\&quot;</span> flashvars=<span style="color: #000099; font-weight: bold;">\&quot;</span>playerMode=embedded<span style="color: #000099; font-weight: bold;">\&quot;</span> pluginspage=<span style="color: #000099; font-weight: bold;">\&quot;</span>http://www.macromedia.com/go/getflashplayer<span style="color: #000099; font-weight: bold;">\&quot;</span> height=<span style="color: #000099; font-weight: bold;">\&quot;</span>27px<span style="color: #000099; font-weight: bold;">\&quot;</span> width=<span style="color: #000099; font-weight: bold;">\&quot;</span>400px<span style="color: #000099; font-weight: bold;">\&quot;</span>&gt;<br />
&nbsp; &nbsp; &lt;/div&gt;&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;文件：&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$title</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&lt;br /&gt;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;效果：&lt;br /&gt;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$demo</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&lt;br /&gt;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;代码：&lt;br /&gt;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">'&lt;textarea cols=&quot;50&quot; rows=&quot;10&quot;&gt;'</span><span style="color: #339933;">,</span><span style="color: #000088;">$demo</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'&lt;/textarea&gt;'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;&lt;br /&gt;<span style="color: #000099; font-weight: bold;">\n</span>&lt;br /&gt;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&#125;</span><br />
<span style="color: #009900;">&#125;</span><br />
&nbsp;<br />
<span style="color: #666666; font-style: italic;">//get user input</span><br />
<span style="color: #000088;">$url</span><span style="color: #339933;">=@</span><span style="color: #000088;">$_GET</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">&quot;url&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span><br />
<br />
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span><br />
&nbsp; &nbsp; <span style="color: #990000;">exit</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><br />
<span style="color: #009900;">&#125;</span><br />
&nbsp;<br />
&nbsp;<br />
<span style="color: #000088;">$ch</span><span style="color: #339933;">=</span>init_curl<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <br />
<span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">'&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html;charset=utf-8&quot; /&gt;'</span><span style="color: #339933;">;</span><br />
<br />
<span style="color: #000088;">$info</span><span style="color: #339933;">=</span>get_list<span style="color: #009900;">&#40;</span><span style="color: #000088;">$ch</span><span style="color: #339933;">,</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <br />
&nbsp;<br />
<span style="color: #000000; font-weight: bold;">?&gt;</span></div></td></tr></tbody></table></div>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/skydrive-mp3-with-google-player.html/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>抓取页面图片的单行命令</title>
		<link>http://iregex.org/blog/download-images-with-single-line-command.html</link>
		<comments>http://iregex.org/blog/download-images-with-single-line-command.html#comments</comments>
		<pubDate>Mon, 07 Dec 2009 11:58:42 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[应用]]></category>
		<category><![CDATA[cmdline]]></category>
		<category><![CDATA[curl]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[wget]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=72</guid>
		<description><![CDATA[命令如下： 1curl -s $URL &#124;perl -nle &#34;print for m{http://[^\&#34;]+(?:jpg&#124;png&#124;gif)}g;&#34;&#124;sort -u &#124;xargs wget 流程： 将包含图片链接的页面（例如http://www.flickr.com/photos/anyaanja/4165312465/sizes/o/ 下... ]]></description>
			<content:encoded><![CDATA[<div>命令如下：<br />
</p>
<div class="codecolorer-container bash mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">curl <span style="color: #660033;">-s</span> <span style="color: #007800;">$URL</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">perl</span> <span style="color: #660033;">-nle</span> <span style="color: #ff0000;">&quot;print for m{http://[^<span style="color: #000099; font-weight: bold;">\&quot;</span>]+(?:jpg|png|gif)}g;&quot;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-u</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #c20cb9; font-weight: bold;">wget</span></div></td></tr></tbody></table></div>
</div>
<p><span id="more-72"></span><br />
流程：</p>
<ul>
<li>将包含图片链接的页面（例如http://www.flickr.com/photos/anyaanja/4165312465/sizes/o/ 下载下来，以便析取图片地址。使用的命令是curl -s $URL。这里的地址需要手动替换为你所需要的地址。curl 的-s选项是表明使用silent模式，避免任何输出。
</li>
<li>使用perl解析刚刚下载的页面，找到以http开头，以jpg、png、gif结尾的图片地址。这里的图片类型任意，只要按照类似的语法可以扩展或缩减。perl的-nle选项表示循环读入输入行，搜索相应匹配行，输出相应部分。详细参见<a title="perl one liners" target="_blank" href="http://sial.org/howto/perl/one-liner/" id="x85m">perl one liners</a>。</li>
<li>perl在这里起解析网页的作用。awk应该也有同样的功效，只是个人感觉awk的<a href="http://iregex.org/blog/download-images-with-single-line-command.html">正则表达式</a>功能太弱较弱。
</li>
<li>使用sort -u将生成的url排序。如果有重复项，只保留其一，以免重复下载。</li>
<li>使用wget来下载这些图片到当前目录。由于wget 默认无法接收standard input的输入，因此使用xargs作为中转。</li>
</ul>
<p>
<span style="color: rgb(255, 0, 255);">2009120</span>更新：</p>
<ul>
<li>使用</p>
<div class="codecolorer-container bash mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">curl <span style="color: #660033;">-s</span> <span style="color: #007800;">$URL</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-o</span> <span style="color: #ff0000;">&quot;http://.*\?\(png\|jpg\)&quot;</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-u</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #c20cb9; font-weight: bold;">wget</span></div></td></tr></tbody></table></div>
<p>or</p>
<div class="codecolorer-container bash mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">curl <span style="color: #660033;">-s</span> <span style="color: #007800;">$URL</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-E</span> <span style="color: #660033;">-o</span> <span style="color: #ff0000;">&quot;http://.*?(png|jpg)&quot;</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-u</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #c20cb9; font-weight: bold;">wget</span></div></td></tr></tbody></table></div>
<p>能实现同样的作用。其中，-o是表示只显示匹配部分，而不必显示整行文本（默认情况下是显示整行文本）；-E 是扩展模式的正则，在此模式下问号、括号、竖线都可直接使用，不必在前边加反斜杠。</li>
<li>使用perl的话，正则表达式部分比较强大，只是命令臃肿；使用grep，灵活小巧，但是有可能无法使用复杂的正则表达式。</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/download-images-with-single-line-command.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>使用饭否新版API编写批量抓取饭否消息的程序</title>
		<link>http://iregex.org/blog/fanfou-msg-extractor-via-new-api.html</link>
		<comments>http://iregex.org/blog/fanfou-msg-extractor-via-new-api.html#comments</comments>
		<pubDate>Tue, 06 Jan 2009 02:27:50 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[杂项]]></category>
		<category><![CDATA[curl]]></category>
		<category><![CDATA[fanfou]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=52</guid>
		<description><![CDATA[我在断断续续地写一款抓饭程序。预想的功能包括：下载、更新饭否消息，搜索，统计。 近日饭否官方释出搜索功能，可以使用关键字搜索自己曾经发布的消息。作离线版的饭否消息管理工具... ]]></description>
			<content:encoded><![CDATA[<p><img style="display: inline; margin-left: 0px; margin-right: 0px" align="right" src="http://static.fanfou.com/img/fanfou.png"> 我在断断续续地写一款抓饭程序。预想的功能包括：下载、更新饭否消息，搜索，统计。 </p>
<p>近日饭否官方释出搜索功能，可以使用关键字搜索自己曾经发布的消息。作离线版的饭否消息管理工具，似乎没有必要。不过，有的网友习惯将饭否消息列到blog上，因此，我的程序还是有用的。 </p>
<p>我原来写的程序，时间都消耗在饭否消息的下载、解析上。好在饭否新版API提供了任意页码的饭否消息，大大简化了抓取难度，因此编写一款饭否消息管理工具不再是一件难事。以python语言为例，我把自己的思路写出来，供各位有类似兴趣的朋友参考。</p>
<p><span id="more-52"></span></p>
<ol>
<li><strong>两种导出方式：(网页解析|饭否API)的比较。</strong>
<ol>
<li><strong>难易度</strong>：使用网页解析的方式，无疑是比较复杂的，不论是使用正则表达式解析，还是使用XML方式解析。现在饭否提供完备的API，可以按页码导出近乎所有的饭否消息，将导出饭否消息程序的难度降至新低。
<li><strong>可靠性</strong>：我觉得使用手工的网页解析的方式，可以掌控每一个环节、细节，因此，得到的结果也最可靠。而使用API，经过实践，发现还存在漏消息的情况。
<li><strong>涵盖面</strong>：使用手工网页解析方式，可以抓取普通饭否消息、彩信、“饭否分享”消息等等，当然也可以只抓分享、只抓私信、@me消息，等等。而API方式只允许抓取普通饭否消息。 </li>
</ol>
<li><strong>饭否消息的下载。</strong>
<ol>
<li>
<p><strong>使用curl命令行模式。 <br /></strong>根据饭否官方API文档网页，（<a target="_blank" href="http://help.fanfou.com/api.html">旧版饭否API</a>，<a target="_blank" href="http://code.google.com/p/fanfou-api/wiki/ApiDocumentation">新版饭否API</a>），有这样一句话： </p>
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>如果你的系统中有 cURL，就可以通过非常简单的方式使用这些API了。 </p></blockquote>
<p>正是由于这句话的指引，我才认识了curl，并让它在我在程序中发挥了巨大的作用。cURL具有windows/linux版本，支持php/python/perl语言，是一种强烈推荐的下载利器。我习惯使用<a href="http://api.fanfou.com/statuses/user_timeline.[json|xml|rss">http://api.fanfou.com/statuses/user_timeline.[json|xml|rss</a>]这条api来下载饭否消息。由于它支持id、since_id、page，我只要使用下面的命令，就能下载自己的饭否消息：</p>
<div class="codecolorer-container txt mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><br /><strong>GeSHi Error:</strong> GeSHi could not find the language txt (using path /home/zhasm/www/iregex.org/wp-content/plugins/codecolorer/lib/geshi/) (code 2)<br /></div>
<p>它的作用是：下载id为zhasm的饭否消息，第1－180页，保存为&#8221;页码.xml&#8221;网页。第1页就是 1.xml，依次类推。 </p>
<p>之后，可以cat *.xml &gt;complete.xml，将所有的饭否消息合并到complete.xml文件中。就可以准备下一步的解析。 </p>
<li>
<p><strong>使用程序下载</strong> <br />python,perl,php，无甚区别。我还是习惯使用curl模块来实现。以python为例： </p>
<div class="codecolorer-container python mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><br /><strong>GeSHi Error:</strong> GeSHi could not find the language txt (using path /home/zhasm/www/iregex.org/wp-content/plugins/codecolorer/lib/geshi/) (code 2)<br /></div>
<p>这个python函数能够接受饭友ID，页码page，以及其它参数，下载饭否消息页面。注意，它只是下载完整的页面，还不能解析。 </li>
</ol>
<li><strong>饭否消息的解析</strong>
<ol>
<li><strong>消息格式 <br /></strong>我们先观察一下饭否消息的格式，再来做“解剖”：
<div class="codecolorer-container xml mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br /></div></td><td><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;statuses<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;status<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;created_at<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Mon Jan 05 05:56:36 +0000 2009<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/created_at<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>M6pa52Ykb1s<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;text<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>[抓饭]由于饭否释出新的API，我用python重写了抓饭工具，共150行（包括注释）。功能：下载、同步、输出饭否消息（不重复下载旧消息；不处理彩信、分享）。命令行版已经写完。GUI太烦琐了。现在网速慢，今晚还要聚会，只好明晚上传程序。<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/text<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;source<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>网页<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/source<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;truncated<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>false<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/truncated<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;in_reply_to_status_id<span style="color: #000000; font-weight: bold;">&gt;</span></span><span style="color: #000000; font-weight: bold;">&lt;/in_reply_to_status_id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;in_reply_to_user_id<span style="color: #000000; font-weight: bold;">&gt;</span></span><span style="color: #000000; font-weight: bold;">&lt;/in_reply_to_user_id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;favorited<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>false<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/favorited<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;in_reply_to_screen_name<span style="color: #000000; font-weight: bold;">&gt;</span></span><span style="color: #000000; font-weight: bold;">&lt;/in_reply_to_screen_name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;user<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>zhasm<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/id<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>.rex<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;screen_name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>.rex<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/screen_name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;location<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>北京<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/location<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>?【内测ing】好玩、有用的饭否批量处理程序： <br />
<br />
http://code.google.com/p/fanfoufans/?<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/description<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;profile_image_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://avatar.fanfou.com/s0/00/57/sg.jpg?1225428475<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/profile_image_url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://fanfou.com/zhasm<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/url<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;protected<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>false<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/protected<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;followers_count<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>229<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/followers_count<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/user<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/status<span style="color: #000000; font-weight: bold;">&gt;</span></span></span> <br />
&nbsp; &nbsp; ... <br />
&nbsp; &nbsp; <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/statuses<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></div></td></tr></tbody></table></div>
<li><strong>使用xml方式解析</strong> <br />这个相对简单，因为可以使用xpath技术。例如，如果找饭否消息，可以使用表达式//statuses/status/text，定位发送时间，可以用//statuses/status/created_at，诸如此类。
<li><strong>正则表达式（python版）</strong> <br />这个相对于xpath是复杂些，不过还算做是比较简单的正则表达式应用，因为所需解析的文本极其“正则”。正则式如下：
<div class="codecolorer-container python mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br /></div></td><td><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">p=<span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span> <br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span style="color: #483d8b;">&quot;&quot;&quot;&lt;created_at&gt;([^&amp;lt;]+)&lt;/created_at&gt;<span style="color: #000099; font-weight: bold;">\s</span>* <br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;id&gt;([^&amp;lt;]+)&lt;/id&gt;<span style="color: #000099; font-weight: bold;">\s</span>* <br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;text&gt;(.*?)&lt;/text&gt;<span style="color: #000099; font-weight: bold;">\s</span>* <br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;source&gt;([^&amp;lt;]+)&lt;/source&gt;<span style="color: #000099; font-weight: bold;">\s</span>*&quot;&quot;&quot;</span>, <span style="color: #dc143c;">re</span>.<span style="color: black;">DOTALL</span> | <span style="color: #dc143c;">re</span>.<span style="color: black;">VERBOSE</span><span style="color: black;">&#41;</span></div></td></tr></tbody></table></div>
<p><strong>说明：</strong> </p>
<ul>
<li>使用了re.VERBOSE，来指定空格宽松模式，便于将一条长长的正则式折行来写；
<li>使用了re.DOTALL模式，来指定点号&#8221;.&#8221;可以匹配包括换行符在内的所有文本。饭否的text字段会出现特殊字符，正则式可以处理，xml却会折戟沉沙。以前我使用xpath解析时可费了不少力气处理特殊字符。而正则式一个点号就能解决。
<li>其它字段，例如created_at，source，来来回回就那几个可以预测的字符，我使用([^&lt;]+)来匹配和捕获。它表示，捕获在下一个&lt;之前的所有文本。
<li>由于&gt;和&lt;之间会有不定数量的（0个或多个）空白字符，我加入了\s*来匹配。 </li>
</ul>
<p>写好正则表达式后，解析只需要两行：</p>
<div class="codecolorer-container python mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br /></div></td><td><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">p.<span style="color: black;">match</span><span style="color: black;">&#40;</span>text<span style="color: black;">&#41;</span><br />
<span style="color: #ff7700;font-weight:bold;">return</span> p.<span style="color: black;">findall</span><span style="color: black;">&#40;</span>text<span style="color: black;">&#41;</span></div></td></tr></tbody></table></div>
</li>
</ol>
<li><strong>存储</strong>
<ol>
<li><strong>建立表格</strong> <br />我使用Sqlite库来处理数据。先存储，再输出。sqlite语句为：
<div class="codecolorer-container sql mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br /></div></td><td><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">cu<span style="color: #66cc66;">.</span>execute<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #ff0000;">&quot;create table if not exists msg( <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; content Text, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uuid Varchar(12) NOT NULL PRIMARY KEY, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; time Time, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tool Text <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; )&quot;</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #66cc66;">&#41;</span></div></td></tr></tbody></table></div>
<p>创建时先看一眼该表是否存在。如果不存在才创建。&nbsp;&nbsp; </p>
<li><strong>存储： </strong><br />每解析一页（20条消息），存储一次，再commit()一次，方便、高效。 
<li><strong>同步更新</strong><br />谁也不希望每次下载，都需要从第1条，一直下载到当前的第3333条；当你更新至第3344条时，其实只需更新最新的11条即可，没必要再重复下载前边的3333条。这一点对于用户来说，是节约下载时间；对于饭否官方服务器来说，是节省负荷。
<p>看一下饭否官方为此而新释出的api参数：since_id&nbsp;<br />
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;">* since_id (可选) &#8211; 仅返回比此 ID 大的消息。 示例： <a href="http://api.fanfou.com/statuses/user_timeline.xml?since_id=6IAZmgy1TzA1">http://api.fanfou.com/statuses/user_timeline.xml?since_id=6IAZmgy1TzA1</a></p></blockquote>
<p>有了这枚参数的支持，我们就很省事了：</p>
<div class="codecolorer-container bash mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">curl -<span style="color: #666666; font-style: italic;">#1 http://api.fanfou.com/statuses/user_timeline.xml?id=zhasm&amp;amp;page=[N]&amp;amp;since_id=6IAZmgy1TzA1 (N可变；since_id不变。)</span></div></td></tr></tbody></table></div>
<p>这样，就可以持续下载，一直到上次更新的那条了。我设定的退出条件是，下载函数返回的条数为0。这时该页已经不再返回新的消息，视为结束。 <br />怎样找到上次更新的临界点呢？我用的sql语句是：</p>
<div class="codecolorer-container text mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br /></div></td><td><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">select distinct uuid from msg order by time DESC limit 1 <br />
#在msg消息表中，以时间为序，找到1项最新的uuid，返回之。</div></td></tr></tbody></table></div>
<ul>
<li>如果存在（非空表），我就让它生成&amp;since_id=uuid格式的条件语句，加在curl的下载条件中。
<li>如果不存在（新建立的表），则上述的条件语句置空。&nbsp; </li>
</ul>
</li>
</ol>
<li><strong>细节</strong> <br />还有一些细节问题，需要编程者操心，你不能把这些问题留给程序的使用者。
<ol>
<li><strong>时区的转换</strong> <br />观察饭否API返回的文本，它的created_at字段给出的时间格式是这样的：<br />
<blockquote style="border-left:2px solid #DDDDDD; margin:15px 30px 0 10px; padding-left:20px;"><p>Mon Jan 05 11:35:27 +0000 2009 </p></blockquote>
<p>它表示的是，2009年1月5日11:35:27，周一。时区是0时区。 <br />可是绝大多数饭否用户使用的时区是东八区。上面的时间格式、时区，都需要调整。我写函数是：</p>
<div class="codecolorer-container python mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br /></div></td><td><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">def</span> time_from_0_to_8<span style="color: black;">&#40;</span>timestr,timezone=<span style="color: #ff4500;">8</span><span style="color: black;">&#41;</span>: <br />
<br />
&nbsp; &nbsp; TIMEFORMAT=<span style="color: #483d8b;">&quot;%a %b %d %X +0000 %Y&quot;</span> <br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#Sat Jan 03 23:08:54 +0000 2009 </span><br />
&nbsp; &nbsp; ISOTIMEFORMAT=<span style="color: #483d8b;">'%Y-%m-%d %X'</span> <br />
&nbsp; &nbsp; x=<span style="color: #dc143c;">time</span>.<span style="color: black;">strptime</span><span style="color: black;">&#40;</span>timestr, TIMEFORMAT<span style="color: black;">&#41;</span> <br />
&nbsp; &nbsp; m=<span style="color: #dc143c;">time</span>.<span style="color: black;">mktime</span><span style="color: black;">&#40;</span>x<span style="color: black;">&#41;</span>+<span style="color: #ff4500;">60</span><span style="color: #66cc66;">*</span><span style="color: #ff4500;">60</span><span style="color: #66cc66;">*</span>timezone <br />
&nbsp; &nbsp; p=<span style="color: #dc143c;">time</span>.<span style="color: black;">strftime</span><span style="color: black;">&#40;</span>ISOTIMEFORMAT,<span style="color: #dc143c;">time</span>.<span style="color: black;">localtime</span><span style="color: black;">&#40;</span>m<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> p</div></td></tr></tbody></table></div>
<p>其中timezone的默认值是8（for 东八区），如果你需要，当然你可以将其换成你需要的时间值。 </p>
<li><strong>escape编码</strong><br />为了让饭否消息更加安全（html语法上），许多字符都被转义为其对应的escape编码，例如小于号&lt;会被替换成&lt;，以免与网页格式所需要的&lt;混淆。我利用了这一点（而不是自己再转回来），将所输出的消息使用html方式输出，这样原来被转义的字符，在浏览器中还会显出原形。由于饭否消息默认的编码格式是UTF8，我当然也在输出页面加上：
<div class="codecolorer-container html4strict mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="html4strict codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">meta</span> <span style="color: #000066;">http-equiv</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;Content-Type&quot;</span> <span style="color: #000066;">content</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;text/html; charset=utf-8&quot;</span> <span style="color: #66cc66;">/</span>&gt;</span></div></td></tr></tbody></table></div>
</li>
</ol>
</li>
</ol>
<p>至此，解析、下载、输出的工作就都解释完毕。在饭否强大的API的支持下，编写饭否程序，尤其是以下载消息为基础的程序，其门槛已经降到新低。至于各位编程爱好者能做出什么应用，那就八仙过海，各显神通吧。我把自己的程序附在文后，以资参考。编译好的命令行版程序就先不发了。我目前在做GUI。 </p>
<p>附：python程序。需要安装若干调用模块，请自行下载。</p>
<div class="codecolorer-container python mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br />30<br />31<br />32<br />33<br />34<br />35<br />36<br />37<br />38<br />39<br />40<br />41<br />42<br />43<br />44<br />45<br />46<br />47<br />48<br />49<br />50<br />51<br />52<br />53<br />54<br />55<br />56<br />57<br />58<br />59<br />60<br />61<br />62<br />63<br />64<br />65<br />66<br />67<br />68<br />69<br />70<br />71<br />72<br />73<br />74<br />75<br />76<br />77<br />78<br />79<br />80<br />81<br />82<br />83<br />84<br />85<br />86<br />87<br />88<br />89<br />90<br />91<br />92<br />93<br />94<br />95<br />96<br />97<br />98<br />99<br />100<br />101<br />102<br />103<br />104<br />105<br />106<br />107<br />108<br />109<br />110<br />111<br />112<br />113<br />114<br />115<br />116<br />117<br />118<br />119<br />120<br />121<br />122<br />123<br />124<br />125<br />126<br />127<br />128<br />129<br />130<br />131<br />132<br />133<br />134<br />135<br />136<br />137<br />138<br />139<br />140<br />141<br />142<br />143<br />144<br />145<br />146<br />147<br /></div></td><td><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #808080; font-style: italic;">#!/bin/env python</span><br />
<span style="color: #808080; font-style: italic;"># -*- coding: utf-8 -*-</span><br />
<br />
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span><br />
<span style="color: #008000;">reload</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span><span style="color: black;">&#41;</span><br />
<span style="color: #dc143c;">sys</span>.<span style="color: black;">setdefaultencoding</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;utf-8&quot;</span><span style="color: black;">&#41;</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#ensure the utf8 encoding</span><br />
<span style="color: #ff7700;font-weight:bold;">import</span> pysqlite2.<span style="color: black;">dbapi2</span> <span style="color: #ff7700;font-weight:bold;">as</span> sqlite &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#sqlite3 </span><br />
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#regular expression to parse msg</span><br />
<span style="color: #ff7700;font-weight:bold;">import</span> pycurl &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#downloading engine </span><br />
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">StringIO</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#to &nbsp;receive the downloaded text</span><br />
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">time</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#time zone convertion</span><br />
<br />
<span style="color: #808080; font-style: italic;"># important regex to parse the xml file</span><br />
p=<span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; r<span style="color: #483d8b;">&quot;&quot;&quot;&lt;created_at&gt;([^&lt;]+)&lt;/created_at&gt;<span style="color: #000099; font-weight: bold;">\s</span>*<br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;id&gt;([^&lt;]+)&lt;/id&gt;<span style="color: #000099; font-weight: bold;">\s</span>*<br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;text&gt;(.*?)&lt;/text&gt;<span style="color: #000099; font-weight: bold;">\s</span>*<br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;source&gt;([^&lt;]+)&lt;/source&gt;<span style="color: #000099; font-weight: bold;">\s</span>*&quot;&quot;&quot;</span>, <span style="color: #dc143c;">re</span>.<span style="color: black;">DOTALL</span> | <span style="color: #dc143c;">re</span>.<span style="color: black;">VERBOSE</span><span style="color: black;">&#41;</span><br />
<br />
<span style="color: #808080; font-style: italic;">###############################################################################</span><br />
<span style="color: #ff7700;font-weight:bold;">def</span> time_from_0_to_8<span style="color: black;">&#40;</span>timestr,timezone=<span style="color: #ff4500;">8</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'convert fanfou +0000 time string to locole chinese time string.<br />
&nbsp; &nbsp; &nbsp;if you live in another timezone, please modify the timezone parameter.<br />
&nbsp; &nbsp; '</span><span style="color: #483d8b;">''</span><br />
&nbsp; &nbsp; TIMEFORMAT=<span style="color: #483d8b;">&quot;%a %b %d %X +0000 %Y&quot;</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">#Sat Jan 03 23:08:54 +0000 2009</span><br />
&nbsp; &nbsp; ISOTIMEFORMAT=<span style="color: #483d8b;">'%Y-%m-%d %X'</span><br />
&nbsp; &nbsp; x=<span style="color: #dc143c;">time</span>.<span style="color: black;">strptime</span><span style="color: black;">&#40;</span>timestr, TIMEFORMAT<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; m=<span style="color: #dc143c;">time</span>.<span style="color: black;">mktime</span><span style="color: black;">&#40;</span>x<span style="color: black;">&#41;</span>+<span style="color: #ff4500;">60</span><span style="color: #66cc66;">*</span><span style="color: #ff4500;">60</span><span style="color: #66cc66;">*</span>timezone<br />
&nbsp; &nbsp; p=<span style="color: #dc143c;">time</span>.<span style="color: black;">strftime</span><span style="color: black;">&#40;</span>ISOTIMEFORMAT,<span style="color: #dc143c;">time</span>.<span style="color: black;">localtime</span><span style="color: black;">&#40;</span>m<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> p <br />
<br />
<span style="color: #808080; font-style: italic;">###############################################################################</span><br />
<span style="color: #ff7700;font-weight:bold;">def</span> download<span style="color: black;">&#40;</span><span style="color: #008000;">id</span>,page=<span style="color: #ff4500;">1</span>,other=<span style="color: #483d8b;">&quot;&quot;</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">&quot;&quot;&quot;<br />
&nbsp; &nbsp; to download user id's message by page number. the default <br />
&nbsp; &nbsp; page is the 1st one. <br />
&nbsp; &nbsp; &quot;&quot;&quot;</span><br />
&nbsp; &nbsp; c = pycurl.<span style="color: black;">Curl</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; url=<span style="color: #483d8b;">&quot;http://api.fanfou.com/statuses/user_timeline.xml?id=%s%s&amp;page=%d&quot;</span><span style="color: #66cc66;">%</span><span style="color: black;">&#40;</span><span style="color: #008000;">id</span>,other,page<span style="color: black;">&#41;</span> <br />
&nbsp; &nbsp; c.<span style="color: black;">setopt</span><span style="color: black;">&#40;</span>pycurl.<span style="color: black;">URL</span>, url<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; c.<span style="color: black;">setopt</span><span style="color: black;">&#40;</span>pycurl.<span style="color: black;">HTTPHEADER</span>, <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;Accept:&quot;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; b = <span style="color: #dc143c;">StringIO</span>.<span style="color: #dc143c;">StringIO</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; c.<span style="color: black;">setopt</span><span style="color: black;">&#40;</span>pycurl.<span style="color: black;">WRITEFUNCTION</span>, b.<span style="color: black;">write</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; c.<span style="color: black;">perform</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> b.<span style="color: black;">getvalue</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
<br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> parsemsg<span style="color: black;">&#40;</span>text,p<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'<br />
&nbsp; &nbsp; parse all the messeges from the given text, <br />
&nbsp; &nbsp; return the message timestamp, msg tex, and uuid.<br />
&nbsp; &nbsp; the structure of the returned list:<br />
&nbsp; &nbsp; list[(time,id,msg,tool),(time,id,msg,tool)...]<br />
&nbsp; &nbsp; '</span><span style="color: #483d8b;">''</span><br />
&nbsp; &nbsp; p.<span style="color: black;">match</span><span style="color: black;">&#40;</span>text<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> p.<span style="color: black;">findall</span><span style="color: black;">&#40;</span>text<span style="color: black;">&#41;</span><br />
<br />
<span style="color: #808080; font-style: italic;">###############################################################################</span><br />
<span style="color: #ff7700;font-weight:bold;">def</span> initdb<span style="color: black;">&#40;</span><span style="color: #008000;">id</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'<br />
&nbsp; &nbsp; &nbsp; &nbsp; init the database, create if not exists.<br />
&nbsp; &nbsp; '</span><span style="color: #483d8b;">''</span><br />
&nbsp; &nbsp; dbname=<span style="color: #008000;">id</span>+<span style="color: #483d8b;">'.db3'</span><br />
&nbsp; &nbsp; cx=sqlite.<span style="color: black;">connect</span><span style="color: black;">&#40;</span>dbname<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cu=cx.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cu.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;&quot;create table if not exists msg(<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; content Text,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uuid Varchar(12) NOT NULL PRIMARY KEY,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; time Time,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tool Text<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; )&quot;&quot;&quot;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> cx<br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> latest_uid<span style="color: black;">&#40;</span>db<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; cu=db.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cu.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'select distinct uuid from msg order by time DESC limit 1'</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; rs=cu.<span style="color: black;">fetchone</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> rs:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> rs<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">''</span><br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> store<span style="color: black;">&#40;</span><span style="color: #008000;">list</span>,db<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">''</span><span style="color: #483d8b;">'<br />
&nbsp; &nbsp; list[(time,id,msg,tool),(time,id,msg,tool)...]<br />
&nbsp; &nbsp; '</span><span style="color: #483d8b;">''</span><br />
&nbsp; &nbsp; cu=db.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; index=<span style="color: #ff4500;">0</span> <br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">list</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #dc143c;">time</span>=time_from_0_to_8<span style="color: black;">&#40;</span>item<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">id</span>=item<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; msg=item<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; tool=item<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cu.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">''</span><span style="color: #483d8b;">'insert into msg values(&quot;%s&quot;,&quot;%s&quot;,&quot;%s&quot;,&quot;%s&quot;)'</span><span style="color: #483d8b;">''</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>msg,<span style="color: #008000;">id</span>,<span style="color: #dc143c;">time</span>,tool<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index+=<span style="color: #ff4500;">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">except</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'insert error'</span> <br />
&nbsp; &nbsp; db.<span style="color: black;">commit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;%d messages parsed&quot;</span> <span style="color: #66cc66;">%</span> index<br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> printmsg<span style="color: black;">&#40;</span>db,index,sep=<span style="color: #483d8b;">&quot;　&quot;</span><span style="color: black;">&#41;</span>: <br />
&nbsp; &nbsp; cu=db.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; cu.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'select content, time from msg where 1 order by time'</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; rs=cu.<span style="color: black;">fetchone</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; result=<span style="color: #483d8b;">&quot;&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">while</span> rs:<br />
&nbsp; &nbsp; &nbsp; &nbsp; result+=<span style="color: #008000;">str</span><span style="color: black;">&#40;</span>index<span style="color: black;">&#41;</span>+sep+rs<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>+sep+rs<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>+<span style="color: #483d8b;">&quot;&lt;br /&gt;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rs=cu.<span style="color: black;">fetchone</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; index+=<span style="color: #ff4500;">1</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> result<br />
&nbsp;<span style="color: #808080; font-style: italic;">###############################################################################</span><br />
<span style="color: #008000;">id</span>=<span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><br />
<span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">id</span><span style="color: black;">&#41;</span><span style="color: #66cc66;">&lt;</span><span style="color: #ff4500;">2</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;please start this program with your id&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;for example: ff.exe zhasm, where zhasm is the fanfou id&quot;</span><br />
&nbsp; &nbsp; exit<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
<br />
db=initdb<span style="color: black;">&#40;</span><span style="color: #008000;">id</span><span style="color: black;">&#41;</span><br />
since=latest_uid<span style="color: black;">&#40;</span>db<span style="color: black;">&#41;</span><br />
<span style="color: #ff7700;font-weight:bold;">if</span> since:<br />
&nbsp; &nbsp; condition=<span style="color: #483d8b;">&quot;&amp;since_id=&quot;</span>+since<br />
<span style="color: #ff7700;font-weight:bold;">else</span>:<br />
&nbsp; &nbsp; condition=<span style="color: #483d8b;">''</span><br />
page=<span style="color: #ff4500;">160</span><br />
<span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #ff4500;">1</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'downloading page'</span>,page<br />
&nbsp; &nbsp; msg=download<span style="color: black;">&#40;</span><span style="color: #008000;">id</span>,page,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #008000;">list</span>=parsemsg<span style="color: black;">&#40;</span>msg,p<span style="color: black;">&#41;</span> &nbsp; &nbsp;<br />
&nbsp; &nbsp; store<span style="color: black;">&#40;</span><span style="color: #008000;">list</span>,db<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">list</span><span style="color: black;">&#41;</span>==<span style="color: #ff4500;">0</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">break</span><br />
&nbsp; &nbsp; page+=<span style="color: #ff4500;">1</span><br />
filename=<span style="color: #008000;">id</span>+<span style="color: #483d8b;">&quot;.html&quot;</span><br />
<span style="color: #008000;">file</span> = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>filename,<span style="color: #483d8b;">&quot;w&quot;</span><span style="color: black;">&#41;</span><br />
<span style="color: #008000;">file</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">''</span><span style="color: #483d8b;">'<br />
&lt;html&gt;<br />
&lt;head&gt;<br />
&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=utf-8&quot; /&gt;<br />
&lt;/head&gt;<br />
&lt;body&gt;'</span><span style="color: #483d8b;">''</span> <span style="color: black;">&#41;</span><br />
<span style="color: #008000;">file</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span>printmsg<span style="color: black;">&#40;</span>db,<span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
<span style="color: #008000;">file</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">''</span><span style="color: #483d8b;">'<br />
&nbsp; &nbsp; &lt;/body&gt;<br />
&nbsp; &nbsp; &lt;/html&gt;'</span><span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span><br />
<span style="color: #008000;">file</span>.<span style="color: black;">close</span><span style="color: black;">&#40;</span> <span style="color: black;">&#41;</span></div></td></tr></tbody></table></div>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/fanfou-msg-extractor-via-new-api.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
