Python如何实现获取动态图表
导读:本文共5525.5字符,通常情况下阅读需要18分钟。同时您也可以点击右侧朗读,来听本文内容。按键盘←(左) →(右) 方向键可以翻页。
摘要: 开发工具Python版本: 3.6.4相关模块:re模块;requests模块;urllib模块;pandas模块;以及一些Python自带的模块。环境搭建安装Python并添加到环境变量,pip安装需要的相关模块即可。看一下B站2023年「数据可视化」版块的情况,第一个视频超2百万的播放量,4万+的弹幕百度指数获取百度指数,首先需要登陆你的百度账号以关键词「王... ...
目录
(为您整理了一些要点),点击可以直达。开发工具
Python版本: 3.6.4
相关模块:
re模块;
requests模块;
urllib模块;
pandas模块;
以及一些Python自带的模块。
环境搭建
安装Python并添加到环境变量,pip安装需要的相关模块即可。
看一下B站2023年「数据可视化」版块的情况,第一个视频超2百万的播放量,4万+的弹幕
百度指数
获取百度指数,首先需要登陆你的百度账号
以关键词「王者荣耀」为例,时间自定义为2020-10-01~2020-10-10
通过开发者工具,我们就能看到曲线图的数据接口
然而一看请求得到的结果,发现并没有数据,原因是这里使用了JS加密
找到解决方法,成功实现爬取,代码实现
importtimeimportjsonimportexecjsimportdatetimeimportrequestsfromurllib.parseimporturlencodedefget_data(keywords,startDate,endDate,area):"""获取加密的参数数据"""#data_url="http://index.baidu.com/api/SearchApi/index?area=0&word=[[%7B%22name%22:%22%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80%22,%22wordType%22:1%7D]]&startDate=2020-10-01&endDate=2020-10-10"params={'word':json.dumps([[{'name':keyword,'wordType':1}]forkeywordinkeywords]),'startDate':startDate,'endDate':endDate,'area':area}data_url='http://index.baidu.com/api/SearchApi/index?'+urlencode(params)#print(data_url)headers={#复制登录后的cookie"Cookie":'你的cookie',"Referer":"http://index.baidu.com/v2/main/index.html","User-Agent":"Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/77.0.3865.90Safari/537.36"}#获取data和uniqidres=requests.get(url=data_url,headers=headers).json()data=res["data"]["userIndexes"][0]["all"]["data"]uniqid=res["data"]["uniqid"]#获取js函数中的参数t="ev-fxk9T8V1lwAL6,51348+.9270-%"t_url="http://index.baidu.com/Interface/ptbk?uniqid={}".format(uniqid)rep=requests.get(url=t_url,headers=headers).json()t=rep["data"]return{"data":data,"t":t}defget_search_index(word,startDate,endDate,area):"""获取最终数据"""word=wordstartDate=startDateendDate=endDate#调用get_data获取data和uniqidres=get_data(word,startDate,endDate,area)e=res["data"]t=res["t"]#读取js文件withopen('parsing_data_function.js',encoding='utf-8')asf:js=f.read()#通过compile命令转成一个js对象docjs=execjs.compile(js)#调用function方法,得到指数数值res=docjs.call('decrypt',t,e)#print(res)returnresdefget_date_list(begin_date,end_date):"""获取时间列表"""dates=[]dt=datetime.datetime.strptime(begin_date,"%Y-%m-%d")date=begin_date[:]whiledate<=end_date:dates.append(date)dt+=datetime.timedelta(days=1)date=dt.strftime("%Y-%m-%d")returndatesdefget_area():areas={"901":"山东","902":"贵州","903":"江西","904":"重庆","905":"内蒙古","906":"湖北","907":"辽宁","908":"湖南","909":"福建","910":"上海","911":"北京","912":"广西","913":"广东","914":"四川","915":"云南","916":"江苏","917":"浙江","918":"青海","919":"宁夏","920":"河北","921":"黑龙江","922":"吉林","923":"天津","924":"陕西","925":"甘肃","926":"新疆","927":"河南","928":"安徽","929":"山西","930":"海南","931":"台湾","932":"西藏","933":"香港","934":"澳门"}forvalueinareas.keys():try:word=['王者荣耀']time.sleep(1)startDate='2020-10-01'endDate='2020-10-10'area=valueres=get_search_index(word,startDate,endDate,area)result=res.split(',')dates=get_date_list(startDate,endDate)fornum,dateinzip(result,dates):print(areas[value],num,date)withopen('area.csv','a+',encoding='utf-8')asf:f.write(areas[value]+','+str(num)+','+date+'\n')except:passdefget_word():words=['诸葛大力','张伟','胡一菲','吕子乔','陈美嘉','赵海棠','咖喱酱','曾小贤','秦羽墨']forwordinwords:try:time.sleep(2)startDate='2020-10-01'endDate='2020-10-10'area=0res=get_search_index(word,startDate,endDate,area)result=res.split(',')dates=get_date_list(startDate,endDate)fornum,dateinzip(result,dates):print(word,num,date)withopen('word.csv','a+',encoding='utf-8')asf:f.write(word+','+str(num)+','+date+'\n')except:passget_area()get_word()
得到的CSV文件结果如下,有两种形式的数据
一种是多个关键词每日指数数据,另一种是一个关键词各省市每日指数数据
有了数据就可以用Python制作动图
importpandasaspdimportbar_chart_raceasbcr#读取数据#df=pd.read_csv('word.csv',encoding='utf-8',header=None,names=['name','number','day'])df=pd.read_csv('area.csv',encoding='utf-8',header=None,names=['name','number','day'])#数据处理,数据透视表df_result=pd.pivot_table(df,values='number',index=['day'],columns=['name'],fill_value=0)#生成GIF#bcr.bar_chart_race(df_result,filename='word.gif',title='爱情公寓5演职人员热度排行')bcr.bar_chart_race(df_result,filename='area.gif',title='国内各省市王者荣耀热度排行')
微博指数
百度搜索新浪的微博指数,打开网站一看,发现网页版无法使用
我们只需打开开发者工具,将你的浏览器模拟为手机端,刷新网页即可
可以看到,微指数的界面出来了
添加关键词,查看指数的数据接口
请求是Post方法,并且不需要登陆微博账号
importreimporttimeimportjsonimportrequestsimportdatetime#请求头信息headers="""accept:application/jsonaccept-encoding:gzip,deflate,braccept-language:zh-CN,zh;q=0.9content-length:50content-type:application/x-www-form-urlencodedcookie:'你的cookie'origin:https://data.weibo.comreferer:https://data.weibo.com/index/newindex?visit_type=trend&wid=1011224685661sec-fetch-mode:corssec-fetch-site:same-originuser-agent:Mozilla/5.0(iPhone;CPUiPhoneOS11_0likeMacOSX)AppleWebKit/604.1.38(KHTML,likeGecko)Version/11.0Mobile/15A372Safari/604.1x-requested-with:XMLHttpRequest"""#将请求头字符串转化为字典headers=dict([line.split(":",1)forlineinheaders.split("\n")])print(headers)#数据接口url='https://data.weibo.com/index/ajax/newindex/getchartdata'#获取时间列表defget_date_list(begin_date,end_date):dates=[]dt=datetime.datetime.strptime(begin_date,"%Y-%m-%d")date=begin_date[:]whiledate<=end_date:dates.append(date)dt+=datetime.timedelta(days=1)date=dt.strftime("%Y-%m-%d")returndates#相关信息names=['汤唯','朱亚文','邓家佳','乔振宇','王学圻','张艺兴','俞灏明','吴越','梁冠华','李昕亮','苏可','孙骁骁','赵韩樱子','孙耀琦','魏巍']#获取微指数数据fornameinnames:try:#获取关键词IDurl_id='https://data.weibo.com/index/ajax/newindex/searchword'data_id={'word':name}html_id=requests.post(url=url_id,data=data_id,headers=headers)pattern=re.compile(r'liwid=\\\"(.*?)\\\"word')id=pattern.findall(html_id.text)[0]#接口参数data={'wid':id,'dateGroup':'1month'}time.sleep(2)#请求数据html=requests.post(url=url,data=data,headers=headers)result=json.loads(html.text)#处理数据ifresult['data']:values=result['data'][0]['trend']['s']startDate='2019-01-01'endDate='2020-01-01'dates=result['data'][0]['trend']['x']#保存数据forvalue,dateinzip(values,dates):print(name,value,date)withopen('weibo.csv','a+',encoding='utf-8')asf:f.write(name+','+str(value)+','+date+'\n')except:pass
获取到的信息
也来生成一个动态图表
importpandasaspdimportbar_chart_raceasbcr#读取数据df=pd.read_csv('weibo.csv',encoding='utf-8',header=None,names=['name','number','day'])#数据处理,数据透视表df_result=pd.pivot_table(df,values='number',index=['day'],columns=['name'],fill_value=0)#print(df_result[:10])#生成GIFbcr.bar_chart_race(df_result[:10],filename='weibo.gif',title='大明风华演职人员热度排行')
</div> <div class="zixun-tj-product adv-bottom"></div> </div> </div> <div class="prve-next-news">
Python如何实现获取动态图表的详细内容,希望对您有所帮助,信息来源于网络。