Java爬虫怎么实现Jsoup利用dom方法遍历Document对象(document,java,jsoup,开发技术)

时间:2024-04-28 06:06:04 作者 : 石家庄SEO 分类 : 开发技术
  • TAG :

先给出网页地址:

https://wall.alphacoders.com/featured.php?lang=Chinese

主要步骤:

利用Jsoup的connect方法获取Document对象

Stringhtml="https://wall.alphacoders.com/featured.php?lang=Chinese";Documentdoc=Jsoup.connect(html).get();

内容过长,就不再显示。

我们以这部分为例:

<ulclass="navnav-pills"><li><ahref="https://alphacoders.com/site/about-us"rel="externalnofollow"rel="externalnofollow">AboutUs</a></li><li><ahref="https://alphacoders.com/site/faq"rel="externalnofollow"rel="externalnofollow">FAQ</a></li><li><ahref="https://alphacoders.com/site/privacy"rel="externalnofollow"rel="externalnofollow">PrivacyPolicy</a></li><li><ahref="https://alphacoders.com/site/tos"rel="externalnofollow"rel="externalnofollow">TermsOfService</a></li><li><ahref="https://alphacoders.com/site/acceptable_use"rel="externalnofollow"rel="externalnofollow">AcceptableUse</a></li><li><ahref="https://alphacoders.com/site/etiquette"rel="externalnofollow"rel="externalnofollow">Etiquette</a></li><li><ahref="https://alphacoders.com/site/advertising"rel="externalnofollow"rel="externalnofollow">AdvertiseWithUs</a></li><li><aid="change_consent">ChangeConsent</a></li></ul>

我们先找到所有的ul:

Elementselements=doc.getElementsByTag("ul");

输出如下:

<ulclass="navnavbar-navcenter"><li><atitle="SubmitWallpapers"href="https://alphacoders.com/site/submit-wallpaper"rel="externalnofollow"><iclass="elel-circle-arrow-up"></i>提交</a></li><li><ahref="https://alphacoders.com/contest"rel="externalnofollow"><iclass="elel-gift"></i>精美奖品</a></li></ul><ulclass="navnavbar-navnavbar-rightcenter"><li><ahref="language.php?lang=Chinese"rel="externalnofollow"><imgsrc="https://qixn-bj.oss-cn-beijing.aliyuncs.com/seosjz/uploadfile/all/png/cxz4yddtz2t.png"alt="Chinese-flag">中文</a></li><li><ahref="https://alphacoders.com/users/login"rel="externalnofollow"><iclass="elel-user"></i>登录</a></li><li><ahref="https://alphacoders.com/users/register"rel="externalnofollow"><iclass="elel-edit"></i>注册</a></li></ul><ulclass="pagination"><liclass="active"><aid="prev_page"href="#"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">&lt;上一页</a></li><liclass="active"><a>1</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=2"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">2</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=3"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">3</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=4"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">4</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=5"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">5</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=6"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">6</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=7"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">7</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=8"rel="externalnofollow">8</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=9"rel="externalnofollow">9</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=10"rel="externalnofollow">10</a></li><li><a>...</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=319"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">319</a></li><li><aid="next_page"href="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=2"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">下一页&gt;</a></li></ul><ulclass="pagination"><liclass="active"><ahref="#"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">&lt;上一页</a></li><liclass="active"><a>1</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=2"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">2</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=3"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">3</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=4"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">4</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=5"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">5</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=6"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">6</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=7"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">7</a></li><li><a>...</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=319"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">319</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=2"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">下一页&gt;</a></li></ul><ulclass="pagination"><liclass="active"><ahref="#"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">&lt;&lt;</a></li><liclass="active"><ahref="#"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">&lt;上一页</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=2"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">下一页&gt;</a></li><li><atitle="末页(319)"href="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=319"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">&gt;&gt;</a></li></ul><ulclass="pagination"><liclass="active"><ahref="#"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">1</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=2"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">2</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=3"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">3</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=4"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">4</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=5"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">5</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=6"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">6</a></li><li><ahref="https://wall.alphacoders.com/featured.php?lang=Chinese&amp;page=7"rel="externalnofollow"rel="externalnofollow"rel="externalnofollow">7</a></li></ul><ulclass="navnav-pills"><li><ahref="https://alphacoders.com/site/about-us"rel="externalnofollow"rel="externalnofollow">AboutUs</a></li><li><ahref="https://alphacoders.com/site/faq"rel="externalnofollow"rel="externalnofollow">FAQ</a></li><li><ahref="https://alphacoders.com/site/privacy"rel="externalnofollow"rel="externalnofollow">PrivacyPolicy</a></li><li><ahref="https://alphacoders.com/site/tos"rel="externalnofollow"rel="externalnofollow">TermsOfService</a></li><li><ahref="https://alphacoders.com/site/acceptable_use"rel="externalnofollow"rel="externalnofollow">AcceptableUse</a></li><li><ahref="https://alphacoders.com/site/etiquette"rel="externalnofollow"rel="externalnofollow">Etiquette</a></li><li><ahref="https://alphacoders.com/site/advertising"rel="externalnofollow"rel="externalnofollow">AdvertiseWithUs</a></li><li><aid="change_consent">ChangeConsent</a></li></ul>

可以发现class为"nav nav-pills"的只有一个,我们找到它:

Elementselements=doc.getElementsByTag("ul");//System.out.println(elements);ElementtempElement=null;for(Elementelement:elements){if(element.className().equals("navnav-pills")){tempElement=element;//System.out.println(element.className());break;}}

循环遍历这个ul,输出其中每一个li里每一个a的href和rel属性:

Elementsli=tempElement.getElementsByTag("li");for(Elementelement:li){Elementselement2=element.getElementsByTag("a");for(Elementelement3:element2){StringhrefString=element3.attr("href");StringrelString=element3.attr("rel");if(hrefString!=""&&relString!=""){System.out.println("href="+hrefString+"rel="externalnofollow"rel="externalnofollow""+"rel="+relString);}}}

最终结果:

href=https://alphacoders.com/site/about-us rel=nofollow
href=https://alphacoders.com/site/faq rel=nofollow
href=https://alphacoders.com/site/privacy rel=nofollow
href=https://alphacoders.com/site/tos rel=nofollow
href=https://alphacoders.com/site/acceptable_use rel=nofollow
href=https://alphacoders.com/site/etiquette rel=nofollow
href=https://alphacoders.com/site/advertising rel=nofollow

完整代码:

importorg.jsoup.nodes.Document;importorg.jsoup.nodes.Element;importorg.jsoup.select.Elements;importjava.io.IOException;importorg.jsoup.Jsoup;/***@ClassName:Jsoup_Test*@description:*@author:KI*@Date:2020年8月17日下午8:15:14*/publicclassJsoup_Test{publicstaticvoidmain(String[]args)throwsIOException{//TODO自动生成的方法存根Stringhtml="https://wall.alphacoders.com/featured.php?lang=Chinese";Documentdoc=Jsoup.connect(html).get();System.out.println(doc);Elementselements=doc.getElementsByTag("ul");//System.out.println(elements);ElementtempElement=null;for(Elementelement:elements){if(element.className().equals("navnav-pills")){tempElement=element;//System.out.println(element.className());break;}}System.out.println(tempElement);Elementsli=tempElement.getElementsByTag("li");for(Elementelement:li){Elementselement2=element.getElementsByTag("a");for(Elementelement3:element2){StringhrefString=element3.attr("href");StringrelString=element3.attr("rel");if(hrefString!=""&&relString!=""){System.out.println("href="+hrefString+"rel="externalnofollow"rel="externalnofollow""+"rel="+relString);}}}}}
 </div> <div class="zixun-tj-product adv-bottom"></div> </div> </div> <div class="prve-next-news">
本文:Java爬虫怎么实现Jsoup利用dom方法遍历Document对象的详细内容,希望对您有所帮助,信息来源于网络。
上一篇:Android开发优化之Apk怎么优化下一篇:

6 人围观 / 0 条评论 ↓快速评论↓

(必须)

(必须,保密)

阿狸1 阿狸2 阿狸3 阿狸4 阿狸5 阿狸6 阿狸7 阿狸8 阿狸9 阿狸10 阿狸11 阿狸12 阿狸13 阿狸14 阿狸15 阿狸16 阿狸17 阿狸18