爬虫如何爬取中国最好大学排行榜

jnPvp958 · 发表于 2019-6-28 12:28:23

我们先看一下，爬虫是如何爬取中国最好大学排行榜的。我们先打开最好大学网，选择中国大学排名的2019社会声誉排名。主要爬取学校排名、学校名称、省市等相关信息。

　　具体代码如下：

import csv

import requests

from bs4 import BeautifulSoup

#请求头

headers={'User-Agent':"agent信息"}

def getInfo(url):

#获取页面代码

html=requests.get(url,headers=headers).content.decode('utf-8','ignore')

soup=BeautifulSoup(html,'lxml')

#获取表头

theadList=[]

thead=soup.select('thead th')

for head in thead:

theadList.append(head.text)

print(theadList)

with open('ranking.csv', 'w') as f:

writer = csv.writer(f)

writer.writerow(theadList)

#获取学校列表

schoolList=soup.select('tr.alt')

for school in schoolList:

#排名

ranking=school.select('td:nth-of-type(1)')[0].text

#学校名称

schoolName=school.select('td:nth-of-type(2)')[0].text

#省市

schoolAddress=school.select('td:nth-of-type(3)')[0].text

#社会捐赠收入（千元）

socialIncome = school.select('td:nth-of-type(4)')[0].text

#综合排名

compreRanking=school.select('td:nth-of-type(5)')[0].text

if len(compreRanking)==0:

compreRanking='暂无数据'

data=[ranking,schoolName,schoolAddress,socialIncome,compreRanking]

print(data)

#写入csv

with open('ranking.csv','a+') as f:

writer=csv.writer(f)

writer.writerow(data)

if __name__ == '__main__':

url='http://www.zuihaodaxue.com/shehuishengyupaiming2019.html'

getInfo(url)

　　通过上述代码，我们就可以获取到中国最好大学排行榜的相关排行信息了。河马爬虫代理，数据采集服务服务提供商，为您提供更快，更可靠的服务。

扣扣：3372-----------92------404

		自动登录	找回密码
密码			立即注册

爬虫如何爬取中国最好大学排行榜

浏览过的版块