提交 98125d6d 编写于 作者: T TRHX

Site updated: 2019-09-28 11:47:06

上级 91cb494f
......@@ -483,10 +483,10 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<a id="more"></a>
<p>观察猫眼电影TOP100榜,请求地址为:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a> ,每页展示10条电影信息,翻页观察 url 变化<br>第二页:<a href="https://maoyan.com/board/4?offset=10" target="_blank" rel="noopener">https://maoyan.com/board/4?offset=10</a><br>第三页:<a href="https://maoyan.com/board/4?offset=20" target="_blank" rel="noopener">https://maoyan.com/board/4?offset=20</a><br>一共有10页,利用一个 for 循环,从 0 到 100 每隔 10 取一个值拼接到 url,实现循环爬取每一页</p>
<p>观察猫眼电影TOP100榜,请求地址为:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>每页展示10条电影信息,翻页观察 url 变化:<br>第一页:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>第二页:<a href="https://maoyan.com/board/4?offset=10" target="_blank" rel="noopener">https://maoyan.com/board/4?offset=10</a><br>第三页:<a href="https://maoyan.com/board/4?offset=20" target="_blank" rel="noopener">https://maoyan.com/board/4?offset=20</a><br>一共有10页,利用一个 for 循环,从 0 到 100 每隔 10 取一个值拼接到 url,实现循环爬取每一页</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">index_page</span><span class="params">(number)</span>:</span></span><br><span class="line"> url = <span class="string">'https://maoyan.com/board/4?offset=%s'</span> % number</span><br><span class="line"> response = requests.get(url=url, headers=headers)</span><br><span class="line"> <span class="keyword">return</span> response.text</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">0</span>, <span class="number">100</span>, <span class="number">10</span>):</span><br><span class="line"> index = index_page(i)</span><br></pre></td></tr></table></figure>
<p>定义一个页面解析函数 <code>parse_page()</code>,使用 lxml 解析库的 Xpath 方法依次提取电影排名(ranking)、电影名称(movie_name)、主演(performer)、上映时间(releasetime)、评分(score)、电影封面图 url(movie_img)</p>
<p>通过对主演部分的提取发现有多余的空格符和换行符,循环 performer 列表,使用 <code>strip()</code> 方法去除字符串头尾空格和换行符</p>
......@@ -510,10 +510,10 @@
<div class='new-meta-box'>
<div class="new-meta-item date" itemprop="dateUpdated" datetime="2019-09-24T20:38:11+08:00">
<div class="new-meta-item date" itemprop="dateUpdated" datetime="2019-09-28T00:58:23+08:00">
<a class='notlink'>
<i class="fas fa-clock" aria-hidden="true"></i>
<p>最后更新于 2019年9月24</p>
<p>最后更新于 2019年9月28</p>
</a>
</div>
......@@ -527,7 +527,7 @@
<a class="-mob-share-qq" title="QQ好友" rel="external nofollow noopener noreferrer"
href="http://connect.qq.com/widget/shareqq/index.html?url=https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/&title=Python3 爬虫实战 — 猫眼电影TOP100 | TRHX'S BLOG&pics=https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/trhx.png&summary=爬取时间:2019-09-24爬取难度:★☆☆☆☆请求链接:https://maoyan.com/board/4爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存
href="http://connect.qq.com/widget/shareqq/index.html?url=https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/&title=Python3 爬虫实战 — 猫眼电影TOP100 | TRHX'S BLOG&pics=https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/trhx.png&summary=爬取时间:2019-09-24爬取难度:★☆☆☆☆☆请求链接:猫眼电影TOP100榜爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存
"
>
......@@ -540,7 +540,7 @@
<a class="-mob-share-qzone" title="QQ空间" rel="external nofollow noopener noreferrer"
href="https://sns.qzone.qq.com/cgi-bin/qzshare/cgi_qzshare_onekey?url=https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/&title=Python3 爬虫实战 — 猫眼电影TOP100 | TRHX'S BLOG&pics=https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/trhx.png&summary=爬取时间:2019-09-24爬取难度:★☆☆☆☆请求链接:https://maoyan.com/board/4爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存
href="https://sns.qzone.qq.com/cgi-bin/qzshare/cgi_qzshare_onekey?url=https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/&title=Python3 爬虫实战 — 猫眼电影TOP100 | TRHX'S BLOG&pics=https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/trhx.png&summary=爬取时间:2019-09-24爬取难度:★☆☆☆☆☆请求链接:猫眼电影TOP100榜爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存
"
>
......@@ -561,7 +561,7 @@
<a class="-mob-share-weibo" title="微博" rel="external nofollow noopener noreferrer"
href="http://service.weibo.com/share/share.php?url=https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/&title=Python3 爬虫实战 — 猫眼电影TOP100 | TRHX'S BLOG&pics=https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/trhx.png&summary=爬取时间:2019-09-24爬取难度:★☆☆☆☆请求链接:https://maoyan.com/board/4爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存
href="http://service.weibo.com/share/share.php?url=https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/&title=Python3 爬虫实战 — 猫眼电影TOP100 | TRHX'S BLOG&pics=https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/trhx.png&summary=爬取时间:2019-09-24爬取难度:★☆☆☆☆☆请求链接:猫眼电影TOP100榜爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存
"
>
......
......@@ -494,7 +494,7 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<div class="readmore">
......
......@@ -494,7 +494,7 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<div class="readmore">
......
此差异已折叠。
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/</loc>
<lastmod>2019-09-27</lastmod>
</url> <url>
<loc>https://www.itrhx.com/2019/08/23/A34-UserAgent/</loc>
<lastmod>2019-09-24</lastmod>
</url> <url>
......@@ -93,9 +96,6 @@
</url> <url>
<loc>https://www.itrhx.com/2019/08/23/A29-Python3-spider-C01/</loc>
<lastmod>2019-09-24</lastmod>
</url> <url>
<loc>https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/</loc>
<lastmod>2019-09-24</lastmod>
</url> <url>
<loc>https://www.itrhx.com/2019/08/01/A27-image-hosting/</loc>
<lastmod>2019-09-19</lastmod>
......
......@@ -496,7 +496,7 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<div class="readmore">
......
......@@ -478,7 +478,7 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<div class="readmore">
......
此差异已折叠。
......@@ -435,7 +435,7 @@
<div class="new-meta-item date">
<a class='notlink'>
<i class="fas fa-calendar-alt" aria-hidden="true"></i>
<p>2019-09-25</p>
<p>2019-09-28</p>
</a>
</div>
......@@ -735,7 +735,7 @@
</a>
<a class='friend-card' style='background:#71BCFF; color:#fff'
target="_blank" rel="external nofollow noopener noreferrer" href='https://attack204.com/'>
target="_blank" rel="external nofollow noopener noreferrer" href='http://attack204.com/'>
<div class='friend-left'>
<img class='avatar' src='https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/bitmap.gif' data-echo='https://cdn.jsdelivr.net/gh/TRHX/ImageHosting/ITRHX-LINKS/attack204.png'/>
</div>
......@@ -1017,6 +1017,23 @@
</div>
</a>
<a class='friend-card' style='background:#34A853; color:#fff'
target="_blank" rel="external nofollow noopener noreferrer" href='https://wangzhijuno.com'>
<div class='friend-left'>
<img class='avatar' src='https://cdn.jsdelivr.net/gh/TRHX/CDN-for-itrhx.com@2.1.9/images/bitmap.gif' data-echo='https://cdn.jsdelivr.net/gh/TRHX/ImageHosting/ITRHX-LINKS/wangzhijuno.png'/>
</div>
<div class='friend-right'>
<p class="friend-name">土豆先生</p>
<div class='friend-tags-wrapper'>
<p class="tags"><i class="fas fa-hashtag fa-fw" aria-hidden="true"></i>一个非常爱吃土豆的程序猿</p>
</div>
</div>
</a>
</div>
</div>
......
......@@ -397,6 +397,12 @@
......@@ -413,14 +419,14 @@
<section class='meta'>
<a title='利用官方支持为基于GitHub Pages的Hexo博客启用HTTPS' href='/2019/08/11/A28-hexo-add-https/'><img class='thumbnail' src='https://cdn.jsdelivr.net/gh/TRHX/ImageHosting/ITRHX-PIC/thumbnail/hexo.png'></a>
<a title='Github+jsDelivr+PicGo 打造稳定快速、高效免费图床' href='/2019/08/01/A27-image-hosting/'><img class='thumbnail' src='https://cdn.jsdelivr.net/gh/TRHX/ImageHosting/ITRHX-PIC/thumbnail/img.png'></a>
<div class="meta" id="header-meta">
<h2 class="title">
<a href="/2019/08/11/A28-hexo-add-https/">
利用官方支持为基于GitHub Pages的Hexo博客启用HTTPS
<a href="/2019/08/01/A27-image-hosting/">
Github+jsDelivr+PicGo 打造稳定快速、高效免费图床
</a>
</h2>
......@@ -448,7 +454,7 @@
<div class="new-meta-item date">
<a class='notlink'>
<i class="fas fa-calendar-alt" aria-hidden="true"></i>
<p>2019-08-11</p>
<p>2019-08-01</p>
</a>
</div>
......@@ -458,9 +464,9 @@
<div class='new-meta-item category'>
<a href='/categories/Hexo/' rel="nofollow">
<a href='/categories/图床/' rel="nofollow">
<i class="fas fa-folder-open" aria-hidden="true"></i>
<p>Hexo</p>
<p>图床</p>
</a>
</div>
......@@ -499,12 +505,10 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<blockquote>
<p>利用官方支持为基于GitHub Pages的Hexo博客启用HTTPS</p>
</blockquote>
图床是每个博主不可或缺的工具,稳定快速、高效免费的图床越来越少,Github+jsDelivr+PicGo是一个不错的选择!
<div class="readmore">
<a href="/2019/08/11/A28-hexo-add-https/" class="flat-box">
<a href="/2019/08/01/A27-image-hosting/" class="flat-box">
<i class="fas fa-book-open fa-fw" aria-hidden="true"></i>
阅读全文
</a>
......@@ -514,9 +518,11 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>Hexo</a>
<a href="/tags/jsDelivr/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>jsDelivr</a>
<a href="/tags/HTTPS/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>HTTPS</a>
<a href="/tags/图床/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>图床</a>
<a href="/tags/PicGo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>PicGo</a>
</div>
......@@ -537,14 +543,14 @@
<section class='meta'>
<a title='Github+jsDelivr+PicGo 打造稳定快速、高效免费图床' href='/2019/08/01/A27-image-hosting/'><img class='thumbnail' src='https://cdn.jsdelivr.net/gh/TRHX/ImageHosting/ITRHX-PIC/thumbnail/img.png'></a>
<a title='利用官方支持为基于GitHub Pages的Hexo博客启用HTTPS' href='/2019/08/11/A28-hexo-add-https/'><img class='thumbnail' src='https://cdn.jsdelivr.net/gh/TRHX/ImageHosting/ITRHX-PIC/thumbnail/hexo.png'></a>
<div class="meta" id="header-meta">
<h2 class="title">
<a href="/2019/08/01/A27-image-hosting/">
Github+jsDelivr+PicGo 打造稳定快速、高效免费图床
<a href="/2019/08/11/A28-hexo-add-https/">
利用官方支持为基于GitHub Pages的Hexo博客启用HTTPS
</a>
</h2>
......@@ -572,7 +578,7 @@
<div class="new-meta-item date">
<a class='notlink'>
<i class="fas fa-calendar-alt" aria-hidden="true"></i>
<p>2019-08-01</p>
<p>2019-08-11</p>
</a>
</div>
......@@ -582,9 +588,9 @@
<div class='new-meta-item category'>
<a href='/categories/图床/' rel="nofollow">
<a href='/categories/Hexo/' rel="nofollow">
<i class="fas fa-folder-open" aria-hidden="true"></i>
<p>图床</p>
<p>Hexo</p>
</a>
</div>
......@@ -623,10 +629,12 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
图床是每个博主不可或缺的工具,稳定快速、高效免费的图床越来越少,Github+jsDelivr+PicGo是一个不错的选择!
<blockquote>
<p>利用官方支持为基于GitHub Pages的Hexo博客启用HTTPS</p>
</blockquote>
<div class="readmore">
<a href="/2019/08/01/A27-image-hosting/" class="flat-box">
<a href="/2019/08/11/A28-hexo-add-https/" class="flat-box">
<i class="fas fa-book-open fa-fw" aria-hidden="true"></i>
阅读全文
</a>
......@@ -636,11 +644,9 @@
<div class="full-width auto-padding tags">
<a href="/tags/jsDelivr/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>jsDelivr</a>
<a href="/tags/图床/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>图床</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>Hexo</a>
<a href="/tags/PicGo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>PicGo</a>
<a href="/tags/HTTPS/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>HTTPS</a>
</div>
......@@ -713,12 +719,6 @@
......@@ -943,7 +943,7 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<div class="readmore">
......
此差异已折叠。
......@@ -4,7 +4,14 @@
<url>
<loc>https://www.itrhx.com/friends/index.html</loc>
<lastmod>2019-09-25T13:34:44.380Z</lastmod>
<lastmod>2019-09-28T03:45:27.530Z</lastmod>
</url>
<url>
<loc>https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/</loc>
<lastmod>2019-09-27T16:58:23.850Z</lastmod>
</url>
......@@ -239,13 +246,6 @@
</url>
<url>
<loc>https://www.itrhx.com/2019/09/24/A51-pyspider-combat-maoyan/</loc>
<lastmod>2019-09-24T12:38:11.631Z</lastmod>
</url>
<url>
<loc>https://www.itrhx.com/404.html</loc>
......
......@@ -496,7 +496,7 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<div class="readmore">
......
......@@ -478,7 +478,7 @@
<section class="article typo">
<div class="article-entry" itemprop="articleBody">
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">https://maoyan.com/board/4</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<p>爬取时间:2019-09-24<br>爬取难度:★☆☆☆☆<br>请求链接:<a href="https://maoyan.com/board/4" target="_blank" rel="noopener">猫眼电影TOP100榜</a><br>爬取目标:猫眼 TOP100 的电影名称、排名、主演、上映时间、评分、封面图地址,数据保存为 CSV 文件<br>涉及知识:请求库 requests、解析库 lxml、Xpath 语法、CSV 文件储存</p>
<hr>
<div class="readmore">
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册