《数据工程师的 Apache Spark 指南》

《The Data Engineer's Guide to Apache Spark》

数据工程师的 Apache Spark 指南

简介:

本书适用于希望利用Apache Spark的巨大增长来构建更快,更可靠的数据管道的数据工程师。 Apache Spark是用于大数据系统的快速、可扩展且灵活的开源分布式处理引擎,是迄今为止最活跃的开源大数据项目之一。本书帮助您构建实用的大数据解决方案,利用Spark惊人的速度,可扩展性,简单性和多功能性。 本书简单明了,循序渐进的方法向您展示了如何部署,编程,优化,管理,集成和扩展Spark-现在和未来几年。您将了解如何创建功能强大的解决方案,包括云计算、实时流处理、机器学习等。每一堂课都建立在你已经学到的东西的基础上,为你在现实世界中的成功打下坚实的基础。

英文简介:

This book is for data engineers looking to leverage the immense growth of Apache Spark to build faster and more reliable data pipelines.

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. This book helps you build practical Big Data solutions that leverage Spark's amazing speed, scalability, simplicity, and versatility.

This book's straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You'll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success.

语言
英文/English
在线查阅
The Data Engineer's Guide to Apache Spark (Databricks)

最后更新:2025-03-15 16:47:05

←《数据工程师应该知道的 97 件事:专家的集体智慧》

→《理解深度学习》