The Java Web Scraping Handbook

简介：

Web抓取或抓取是通过下载和解析HTML代码以提取所需数据来从第三方网站获取数据的艺术。这可能很难。从糟糕的HTML代码到繁重的Javascript使用和反机器人技术，这往往是棘手的。许多公司使用它来获取有关竞争对手价格，新闻聚合，潜在客户生成的知识...本书将教您如何从任何网站中提取数据，如何处理AJAX/Javascript繁重的网站，打破验证码，在云中部署您的抓取工具以及许多其他高级技术。本书附带了六个示例应用程序的完整Java源代码，您可以下载或直接从我们的web服务器运行。这是一个悬而未决的问题，如果这本书碰巧有更多的代码行比文本行...

英文简介：

Web scraping or crawling is the art of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want. It can be hard. From bad HTML code to heavy Javascript use and anti-bot techniques, it is often tricky.

Lots of companies use it to obtain knowledge concerning competitor prices, news aggregation, lead generation ...

This book will teach you how to extract data from any website, how to deal with AJAX / Javascript heavy websites, break captchas, deploy your scrapers in the cloud and many other advanced techniques.

The book comes with the complete Java source code of six example apps that you can download or directly run from our web server. It is an open question if the book happens to come with more lines of code than lines of text...

书名: The Java Web Scraping Handbook
译名: Java Web 抓取手册
语言: 英语
年份: 2018
页数: 115页
大小: 4.55 MB
标签: Java
下载: The Java Web Scraping Handbook.pdf
密码: 65536

最后更新：2025-04-12 23:54:37

←Advanced Java Programming

→Object-Oriented vs. Functional Programming - Bridging the Divide between Opposing Paradigms