博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Python 爬取斗图啦图片
阅读量:6438 次
发布时间:2019-06-23

本文共 2007 字,大约阅读时间需要 6 分钟。

代码
# -*- coding:utf-8 -*-# pip install requests 框架import requests# pip install beautifulsoup4 框架# pip install lxml 解析器from bs4 import BeautifulSoupimport osclass doutuSpider(object):    headers = {        "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"    }    def get_url(self, url):        data = requests.get(url, headers=self.headers)        soup = BeautifulSoup(data.content, 'lxml')        totals = soup.findAll("a", {"class": "list-group-item"})        for one in totals:            sub_url = one.get('href')            global path            path = 'E:\\img' + '\\' + sub_url.split('/')[-1]            os.mkdir(path)            try:                self.get_img_url(sub_url)            except:                pass            pass        pass    def get_img_url(self, url):        data = requests.get(url, headers = self.headers)        soup = BeautifulSoup(data.content, 'lxml')        totals = soup.findAll('div', {'class': 'artile_des'})        for one in totals:            img = one.find('img')            try:                sub_url = img.get('src')            except Exception as e:                raise e            finally:                urls = sub_url            try:                self.get_img(urls)            except:                print urls                pass            pass        pass    def get_img(self, url):        filename = url.split('/')[-1]        global path        img_path = path + '\\' + filename        img = requests.get(url, headers = self.headers)        try:            with open(img_path, 'wb') as f:                f.write(img.content)        except:            pass        pass    def create(self):        for count in range(1,10):            url = 'https://www.doutula.com/article/list/?page={}'.format(count)            print 'download {} page'.format(count)            self.get_url(url)            pass        passif __name__ == '__main__':    doutu = doutuSpider()    doutu.create()

转载地址:http://lokwo.baihongyu.com/

你可能感兴趣的文章
ThinkPad预装win8系统机型安装win7系统的操作指导
查看>>
JS高效关键字搜索---转
查看>>
PowerShell【变量篇】
查看>>
CSVN部署安装,实现web管理svn
查看>>
10-python-字典
查看>>
Codeforce915C
查看>>
2、内核的配置和移植
查看>>
BZOJ2115:[WC2011] Xor(线性基)
查看>>
BZOJ4520:[CQOI2016]K远点对(K-D Tree)
查看>>
Cassandra create a new user
查看>>
LNMP部署(分享十七)
查看>>
HDU 1050 Moving Tables
查看>>
springboot+mybatis+thymeleaf项目搭建及前后端交互
查看>>
使用redis来实现分布式锁
查看>>
ICC_lab总结——ICC_lab2:设计规划
查看>>
ICC_lab总结——ICC_lab6:版图完成
查看>>
实验三
查看>>
Hive的作用
查看>>
sql中的case when
查看>>
struts2重定向
查看>>