<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>我爱正则表达式 &#187; wget</title>
	<atom:link href="http://iregex.org/blog/tag/wget/feed" rel="self" type="application/rss+xml" />
	<link>http://iregex.org</link>
	<description>原创、翻译、转载关于正则表达式的文章</description>
	<lastBuildDate>Fri, 30 Mar 2012 12:50:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://blogsearch.google.com/ping/RPC2"/><atom:link rel="hub" href="http://blog.yodao.com/ping/RPC2"/><atom:link rel="hub" href="http://www.feedsky.com/api/RPC2"/><atom:link rel="hub" href="http://www.xianguo.com/xmlrpc/ping.php"/><atom:link rel="hub" href="http://www.zhuaxia.com/rpc/server.php"/><atom:link rel="hub" href="http://rpc.technorati.com/rpc/ping"/><atom:link rel="hub" href="http://rpc.pingomatic.com/"/>		<item>
		<title>抓取页面图片的单行命令</title>
		<link>http://iregex.org/blog/download-images-with-single-line-command.html</link>
		<comments>http://iregex.org/blog/download-images-with-single-line-command.html#comments</comments>
		<pubDate>Mon, 07 Dec 2009 11:58:42 +0000</pubDate>
		<dc:creator>rex</dc:creator>
				<category><![CDATA[应用]]></category>
		<category><![CDATA[cmdline]]></category>
		<category><![CDATA[curl]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[wget]]></category>

		<guid isPermaLink="false">http://iregex.org/?p=72</guid>
		<description><![CDATA[命令如下： 1curl -s $URL &#124;perl -nle &#34;print for m{http://[^\&#34;]+(?:jpg&#124;png&#124;gif)}g;&#34;&#124;sort -u &#124;xargs wget 流程： 将包含图片链接的页面（例如http://www.flickr.com/photos/anyaanja/4165312465/sizes/o/ 下... ]]></description>
			<content:encoded><![CDATA[<div>命令如下：<br />
</p>
<div class="codecolorer-container bash mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">curl <span style="color: #660033;">-s</span> <span style="color: #007800;">$URL</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">perl</span> <span style="color: #660033;">-nle</span> <span style="color: #ff0000;">&quot;print for m{http://[^<span style="color: #000099; font-weight: bold;">\&quot;</span>]+(?:jpg|png|gif)}g;&quot;</span><span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-u</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #c20cb9; font-weight: bold;">wget</span></div></td></tr></tbody></table></div>
</div>
<p><span id="more-72"></span><br />
流程：</p>
<ul>
<li>将包含图片链接的页面（例如http://www.flickr.com/photos/anyaanja/4165312465/sizes/o/ 下载下来，以便析取图片地址。使用的命令是curl -s $URL。这里的地址需要手动替换为你所需要的地址。curl 的-s选项是表明使用silent模式，避免任何输出。
</li>
<li>使用perl解析刚刚下载的页面，找到以http开头，以jpg、png、gif结尾的图片地址。这里的图片类型任意，只要按照类似的语法可以扩展或缩减。perl的-nle选项表示循环读入输入行，搜索相应匹配行，输出相应部分。详细参见<a title="perl one liners" target="_blank" href="http://sial.org/howto/perl/one-liner/" id="x85m">perl one liners</a>。</li>
<li>perl在这里起解析网页的作用。awk应该也有同样的功效，只是个人感觉awk的<a href="http://iregex.org/blog/download-images-with-single-line-command.html">正则表达式</a>功能太弱较弱。
</li>
<li>使用sort -u将生成的url排序。如果有重复项，只保留其一，以免重复下载。</li>
<li>使用wget来下载这些图片到当前目录。由于wget 默认无法接收standard input的输入，因此使用xargs作为中转。</li>
</ul>
<p>
<span style="color: rgb(255, 0, 255);">2009120</span>更新：</p>
<ul>
<li>使用</p>
<div class="codecolorer-container bash mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">curl <span style="color: #660033;">-s</span> <span style="color: #007800;">$URL</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-o</span> <span style="color: #ff0000;">&quot;http://.*\?\(png\|jpg\)&quot;</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-u</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #c20cb9; font-weight: bold;">wget</span></div></td></tr></tbody></table></div>
<p>or</p>
<div class="codecolorer-container bash mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">curl <span style="color: #660033;">-s</span> <span style="color: #007800;">$URL</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #660033;">-E</span> <span style="color: #660033;">-o</span> <span style="color: #ff0000;">&quot;http://.*?(png|jpg)&quot;</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-u</span> <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">xargs</span> <span style="color: #c20cb9; font-weight: bold;">wget</span></div></td></tr></tbody></table></div>
<p>能实现同样的作用。其中，-o是表示只显示匹配部分，而不必显示整行文本（默认情况下是显示整行文本）；-E 是扩展模式的正则，在此模式下问号、括号、竖线都可直接使用，不必在前边加反斜杠。</li>
<li>使用perl的话，正则表达式部分比较强大，只是命令臃肿；使用grep，灵活小巧，但是有可能无法使用复杂的正则表达式。</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://iregex.org/blog/download-images-with-single-line-command.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

