怎么在Python中使用Scrapy爬取豆瓣图片(python,scrapy,开发技术)

怎么在Python中使用Scrapy爬取豆瓣图片

导读：本文共3528.5字符，通常情况下阅读需要12分钟。同时您也可以点击右侧朗读，来听本文内容。按键盘←（左） →（右）方向键可以翻页。

摘要： 1.首先我们在命令行进入到我们要创建的目录，输入 scrapy startproject banciyuan 创建scrapy项目创建的项目结构如下2.为了方便使用pycharm执行scrapy项目，新建main.pyfromscrapyimportcmdlinecmdline.execute("scrapycrawlbanciyuan"... ...

音频解说

fromscrapyimportSpiderimportscrapyfrombanciyuan.itemsimportBanciyuanItemclassBanciyuanSpider(Spider):name='banciyuan'allowed_domains=['movie.douban.com']start_urls=["https://movie.douban.com/celebrity/1025156/photos/"]url="https://movie.douban.com/celebrity/1025156/photos/"defparse(self,response):num=response.xpath('//div[@class="paginator"]/a[last()]/text()').extract_first('')print(num)foriinrange(int(num)):suffix='?type=C&start='+str(i*30)+'&sortby=like&size=a&subtype=a'yieldscrapy.Request(url=self.url+suffix,callback=self.get_page)defget_page(self,response):href_list=response.xpath('//div[@class="article"]//div[@class="cover"]/a/@href').extract()#print(href_list)forhrefinhref_list:yieldscrapy.Request(url=href,callback=self.get_info)defget_info(self,response):src=response.xpath('//div[@class="article"]//div[@class="photo-show"]//div[@class="photo-wp"]/a[1]/img/@src').extract_first('')title=response.xpath('//div[@id="content"]/h2/text()').extract_first('')#print(response.body)item=BanciyuanItem()item['title']=titleitem['src']=[src]yielditem

4.items.py

#Defineherethemodelsforyourscrapeditems##Seedocumentationin:#https://docs.scrapy.org/en/latest/topics/items.htmlimportscrapyclassBanciyuanItem(scrapy.Item):#definethefieldsforyouritemherelike:src=scrapy.Field()title=scrapy.Field()

pipelines.py

#Defineyouritempipelineshere##Don'tforgettoaddyourpipelinetotheITEM_PIPELINESsetting#See:https://docs.scrapy.org/en/latest/topics/item-pipeline.html#usefulforhandlingdifferentitemtypeswithasingleinterfacefromitemadapterimportItemAdapterfromscrapy.pipelines.imagesimportImagesPipelineimportscrapyclassBanciyuanPipeline(ImagesPipeline):defget_media_requests(self,item,info):yieldscrapy.Request(url=item['src'][0],meta={'item':item})deffile_path(self,request,response=None,info=None,*,item=None):item=request.meta['item']image_name=item['src'][0].split('/')[-1]#image_name.replace('.webp','.jpg')path='%s/%s'%(item['title'].split('')[0],image_name)returnpath

settings.py

#Scrapysettingsforbanciyuanproject##Forsimplicity,thisfilecontainsonlysettingsconsideredimportantor#commonlyused.Youcanfindmoresettingsconsultingthedocumentation:##https://docs.scrapy.org/en/latest/topics/settings.html#https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#https://docs.scrapy.org/en/latest/topics/spider-middleware.htmlBOT_NAME='banciyuan'SPIDER_MODULES=['banciyuan.spiders']NEWSPIDER_MODULE='banciyuan.spiders'#Crawlresponsiblybyidentifyingyourself(andyourwebsite)ontheuser-agentUSER_AGENT={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/71.0.3578.80Safari/537.36'}#Obeyrobots.txtrulesROBOTSTXT_OBEY=False#ConfiguremaximumconcurrentrequestsperformedbyScrapy(default:16)#CONCURRENT_REQUESTS=32#Configureadelayforrequestsforthesamewebsite(default:0)#Seehttps://docs.scrapy.org/en/latest/topics/settings.html#download-delay#Seealsoautothrottlesettingsanddocs#DOWNLOAD_DELAY=3#Thedownloaddelaysettingwillhonoronlyoneof:#CONCURRENT_REQUESTS_PER_DOMAIN=16#CONCURRENT_REQUESTS_PER_IP=16#Disablecookies(enabledbydefault)#COOKIES_ENABLED=False#DisableTelnetConsole(enabledbydefault)#TELNETCONSOLE_ENABLED=False#Overridethedefaultrequestheaders:#DEFAULT_REQUEST_HEADERS={#'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',#'Accept-Language':'en',#}#Enableordisablespidermiddlewares#Seehttps://docs.scrapy.org/en/latest/topics/spider-middleware.html#SPIDER_MIDDLEWARES={#'banciyuan.middlewares.BanciyuanSpiderMiddleware':543,#}#Enableordisabledownloadermiddlewares#Seehttps://docs.scrapy.org/en/latest/topics/downloader-middleware.html#DOWNLOADER_MIDDLEWARES={#'banciyuan.middlewares.BanciyuanDownloaderMiddleware':543,#}#Enableordisableextensions#Seehttps://docs.scrapy.org/en/latest/topics/extensions.html#EXTENSIONS={#'scrapy.extensions.telnet.TelnetConsole':None,#}#Configureitempipelines#Seehttps://docs.scrapy.org/en/latest/topics/item-pipeline.htmlITEM_PIPELINES={'banciyuan.pipelines.BanciyuanPipeline':1,}IMAGES_STORE='./images'#EnableandconfiguretheAutoThrottleextension(disabledbydefault)#Seehttps://docs.scrapy.org/en/latest/topics/autothrottle.html#AUTOTHROTTLE_ENABLED=True#Theinitialdownloaddelay#AUTOTHROTTLE_START_DELAY=5#Themaximumdownloaddelaytobesetincaseofhighlatencies#AUTOTHROTTLE_MAX_DELAY=60#TheaveragenumberofrequestsScrapyshouldbesendinginparallelto#eachremoteserver#AUTOTHROTTLE_TARGET_CONCURRENCY=1.0#Enableshowingthrottlingstatsforeveryresponsereceived:#AUTOTHROTTLE_DEBUG=False#EnableandconfigureHTTPcaching(disabledbydefault)#Seehttps://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings#HTTPCACHE_ENABLED=True#HTTPCACHE_EXPIRATION_SECS=0#HTTPCACHE_DIR='httpcache'#HTTPCACHE_IGNORE_HTTP_CODES=[]#HTTPCACHE_STORAGE='scrapy.extensions.httpcache.FilesystemCacheStorage'

5.爬取结果

怎么在Python中使用Scrapy爬取豆瓣图片

 </div> <div class="zixun-tj-product adv-bottom"></div> </div> </div> <div class="prve-next-news">

本文：怎么在Python中使用Scrapy爬取豆瓣图片的详细内容，希望对您有所帮助，信息来源于网络。

怎么在Python中使用Scrapy爬取豆瓣图片(python,scrapy,开发技术)

目录

1.首先我们在命令行进入到我们要创建的目录，输入 `scrapy startproject banciyuan` 创建scrapy项目

12 人围观 / 0 条评论 ↓快速评论↓

搜索

最新文章

猜你喜欢

特价优惠

标签

流量统计