使用 shell 构建多进程的 CommandlineFu 爬虫 | Linux 中国

找不到分类 evilven 5个月前 (05-03) 157次浏览 已收录 0个评论 扫描二维码
使用 shell 构建多进程的 CommandlineFu 爬虫 | Linux 中国

CommandlineFu 是一个记录脚本片段的网站,每个片段都有对应的功能说明和对应的标签。我想要做的就是尝试用 shell 写一个多进程的爬虫把这些代码片段记录在一个 org 文件中。

— Lujun9972


CommandlineFu[1] 是一个记录脚本片段的网站,每个片段都有对应的功能说明和对应的标签。我想要做的就是尝试用 shell 写一个多进程的爬虫把这些代码片段记录在一个 org 文件中。

参数定义

这个脚本需要能够通过 -n 参数指定并发的爬虫数(默认为 CPU 核的数量),还要能通过 -f 指定保存的 org 文件路径(默认输出到 stdout)。

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">!</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">/usr/</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">bin</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">/</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">env</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">bash</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">proc_num</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">nproc</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">store_file</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">/dev/</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">stdout</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">while</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> getopts </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">f</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> OPT</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">;</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">do</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">case</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $OPT </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">in</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        n</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|+</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            proc_num</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$OPTARG"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">;;</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        f</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|+</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">f</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            store_file</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$OPTARG"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">;;</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">*)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"usage: ${0##*/} [+-n proc_num] [+-f org_file} [--]"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">exit</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">2</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">esac</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">done</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">shift $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">((</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> OPTIND </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">1</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">))</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">OPTIND</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">1</span></code></section></li>
</ol>

解析命令浏览页面

我们需要一个进程从 CommandlineFu 的浏览列表中抽取各个脚本片段的 URL,这个进程将抽取出来的 URL 存放到一个队列中,再由各个爬虫进程从进程中读取 URL 并从中抽取出对应的代码片段、描述说明和标签信息写入 org 文件中。

这里就会遇到三个问题:

1. 进程之间通讯的队列如何实现
2. 如何从页面中抽取出 URL、代码片段、描述说明、标签等信息
3. 多进程对同一文件进行读写时的乱序问题

实现进程之间的通讯队列

这个问题比较好解决,我们可以通过一个命名管道来实现:

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">queue</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">mktemp</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">--</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">dry</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">run</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">mkfifo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">queue</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">exec</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">99</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"><></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">queue</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">trap </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"rm ${queue} 2>/dev/null"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> EXIT</span></code></section></li>
</ol>

从页面中抽取想要的信息

从页面中提取元素内容主要有两种方法:

1. 对于简单的 HTML 页面,我们可以通过 sedgrepawk 等工具通过正则表达式匹配的方式来从 HTML 中抽取信息。
2. 通过 html-xml-utils[2] 工具集中的 hxselect[3] 来根据 CSS 选择器提取相关元素。

这里我们使用 html-xml-utils 工具来提取:

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> extract_views_from_browse_page</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">if</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">[[</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">eq </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">0</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">]];</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">then</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">cat</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">else</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$*"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">fi</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">hxclean </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">hxselect </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">c </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">s </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"n"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"li.list-group-item > div:nth-child(1) > div:nth-child(1) > a:nth-child(1)::attr(href)"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">sed</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">'s@^@https://www.commandlinefu.com/@'</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> extract_nextpage_from_browse_page</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">if</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">[[</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">eq </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">0</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">]];</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">then</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">cat</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">else</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$*"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">fi</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">hxclean </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">hxselect </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">s </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"n"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"li.list-group-item:nth-child(26) > a"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">grep</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">'>'</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">hxselect </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">c </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"::attr(href)"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">sed</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">'s@^@https://www.commandlinefu.com/@'</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
</ol>

这里需要注意的是:hxselect 对 HTML 解析时要求遵循严格的 XML 规范,因此在用 hxselect 解析之前需要先经过 hxclean 矫正。另外,为了防止 HTML 过大,超过参数列表长度,这里允许通过管道的形式将  HTML 内容传入。

循环读取下一页的浏览页面,不断抽取代码片段 URL 写入队列

这里要解决的是上面提到的第三个问题: 多进程对管道进行读写时如何保障不出现乱序? 为此,我们需要在写入文件时对文件加锁,然后在写完文件后对文件解锁,在 shell 中我们可以使用 flock 来对文件进行枷锁。 关于 flock 的使用方法和注意事项,请参见另一篇博文 Linux shell flock 文件锁的用法及注意事项[4]

由于需要在 flock 子进程中使用函数 extract_views_from_browse_page,因此需要先导出该函数:

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;"><li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">export</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">f extract_views_from_browse_page</span></code></section></li></ol>

由于网络问题,使用 curl 获取内容可能失败,需要重复获取:

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> fetch</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$1"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">while</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">!</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> curl </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">L $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">2</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">/dev/</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">null</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">;</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">do</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">done</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
</ol>

collector 用来从种子 URL 中抓取待爬的 URL,写入管道文件中,写操作期间管道文件同时作为锁文件:

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> collector</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$*"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">while</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">[[</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">]];</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">do</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"从$url中抽取"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">fetch </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"${url}"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"${html}"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">flock $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">queue</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">c </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"extract_views_from_browse_page >${queue}"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"${html}"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">extract_nextpage_from_browse_page</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">done</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">让后面解析代码片段的爬虫进程能够正常退出,而不至于被阻塞.</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">for</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">((</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">i</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">0</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">;</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">i</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"><</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">proc_num</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">};</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">i</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">++))</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">do</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">queue</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">done</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
</ol>

这里要注意的是, 在找不到下一页 URL 后,我们用一个 for 循环往队列里写入了 =proc_num= 个空行,这一步的目的是让后面解析代码片段的爬虫进程能够正常退出,而不至于被阻塞。

解析脚本片段页面

我们需要从脚本片段的页面中抽取标题、代码片段、描述说明以及标签信息,同时将这些内容按 org 模式的格式写入存储文件中。

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">  </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> view_page_handler</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">  </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$1"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$(fetch "</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">")"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> headline</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> headline</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$(echo ${html} |hxclean |hxselect -c -s "</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">" "</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">.</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">col</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">md</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">8</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> h1</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">nth</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">child</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">1</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">")"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> command</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> command</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$(echo ${html} |hxclean |hxselect -c -s "</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">" "</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">.</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">col</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">md</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">8</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> div</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">nth</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">child</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">2</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> span</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">nth</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">child</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">2</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"|pandoc -f html -t org)"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> description</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> description</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$(echo ${html} |hxclean |hxselect -c -s "</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">" "</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">.</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">col</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">md</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">8</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> div</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">.</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">description</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"|pandoc -f html -t org)"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> tags</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> tags</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$(echo ${html} |hxclean |hxselect -c -s "</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">" "</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">.</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">functions </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> a</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">")"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">if</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">[[</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"${tags}"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">]];</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">then</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">          tags</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">":${tags}"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">fi</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> build org content</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">cat</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"><<</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">EOF </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">flock </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">x $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">store_file</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">tee</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">a $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">store_file</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">*</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">headline</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">      $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">tags</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">PROPERTIES</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">URL</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">       $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">END</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">description</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#+</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">begin_src shell</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">command</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#+</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">end_src</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">EOF</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">  </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
</ol>

这里抽取信息的方法跟上面的类似,不过代码片段和描述说明中可能有一些 HTML 代码,因此通过 pandoc 将之转换为 org 格式的内容。

注意最后输出 org 模式的格式并写入存储文件中的代码不要写成下面这样:

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    flock </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">x $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">store_file</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">cat</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"><<</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">EOF </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">store_file</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">*</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">headline</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">tt $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">tags</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">description</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#+</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">begin_src shell</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">command</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#+</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">end_src</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">EOF</span></code></section></li>
</ol>

它的意思是使用 flock 对 cat 命令进行加锁,再把 flock 整个命令的结果通过重定向输出到存储文件中,而重定向输出的这个过程是没有加锁的。

spider 从管道文件中读取待抓取的 URL,然后实施真正的抓取动作。

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> spider</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">while</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">do</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">if</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">!</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">flock $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">queue</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">c </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">'read -t 1 -u 99 url && echo $url'</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">then</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">sleep</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">1</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">continue</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">fi</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">if</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">[[</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">z </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$url"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">]];</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">then</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">            </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">break</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">fi</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        view_page_handler $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">done</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
</ol>

这里要注意的是,为了防止发生死锁,从管道中读取 URL 时设置了超时,当出现超时就意味着生产进程赶不上消费进程的消费速度,因此消费进程休眠一秒后再次检查队列中的 URL。

组合起来

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">collector </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"https://www.commandlinefu.com/commands/browse"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">&</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">for</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">((</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">i</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">0</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">;</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">i</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"><</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">proc_num</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">};</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">i</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">++))</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">do</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    spider </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">&</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">done</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">wait</span></code></section></li>
</ol>

抓取其他网站

通过重新定义 extract_views_from_browse_page、 extract_nextpage_from-browse_page、 view_page_handler 这几个函数, 以及提供一个新的种子 URL,我们可以很容易将其改造成抓取其他网站的多进程爬虫。

例如通过下面这段代码,就可以用来爬取 xkcd[5] 上的漫画:

<ol class="linenums list-paddingleft-2" style="margin-left: 2em;margin-right: 2em;">
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> extract_views_from_browse_page</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">if</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">[[</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">eq </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">0</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">]];</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">then</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">cat</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">else</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">        </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$*"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">fi</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    max</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"${html}"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">hxclean </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">hxselect </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">c </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">s </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"n"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"#middleContainer"</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">grep</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"Permanent link to this comic"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">|</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">awk </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">F </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"/"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">'{print $4}'</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    seq </span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">1</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">max</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}|</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">sed</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">'s@^@https://xkcd.com/@'</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> extract_nextpage_from_browse_page</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">""</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">function</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> view_page_handler</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">()</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$1"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> html</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"$(fetch "</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">$</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">url</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}/</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">")"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">local</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> image</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">=</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"https:$(echo ${html} |hxclean |hxselect -c -s "</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">n</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">" "</span><span class="com" style="overflow-wrap: break-word;color: rgb(174, 174, 174);font-style: italic;">#</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">comic </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">></span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> img</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">:</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">nth</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">-</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">child</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="lit" style="overflow-wrap: break-word;color: rgb(51, 135, 204);">1</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)::</span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">attr</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">(</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">src</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">)</span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">")"</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">echo</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">image</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">    </span><span class="kwd" style="overflow-wrap: break-word;color: rgb(226, 137, 100);">wget</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> $</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">{</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">image</span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">}</span></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"></code></section></li>
<li><section style="overflow-wrap: break-word;width: 1200px;max-width: 1200px !important;"><code style="overflow-wrap: break-word;background: none;color: rgb(33, 150, 243);line-height: 1.2em;padding-left: 10px !important;border-radius: 0px !important;margin-top: 1em !important;margin-bottom: 1em !important;border-width: initial !important;border-style: none !important;border-color: initial !important;"><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">collector </span><span class="str" style="overflow-wrap: break-word;color: rgb(101, 176, 66);">"https://xkcd.com/"</span><span class="pln" style="overflow-wrap: break-word;color: rgb(184, 255, 184);"> </span><span class="pun" style="overflow-wrap: break-word;color: rgb(184, 255, 184);">&</span></code><span style="background-color: rgb(255, 255, 255);color: rgb(0, 0, 0);font-family: Optima-Regular, PingFangTC-light;letter-spacing: 2px;white-space: normal;"> </span></section></li>
</ol>

使用 shell 构建多进程的 CommandlineFu 爬虫 | Linux 中国


Hacking For Fun , 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:使用 shell 构建多进程的 CommandlineFu 爬虫 | Linux 中国
喜欢 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址