Spark: The Definitive Guide: Big Data Processing Made Simple

Spark: The Definitive Guide: Big Data Processing Made Simple

  • Downloads:7873
  • Type:Epub+TxT+PDF+Mobi
  • Create Date:2021-08-20 06:54:59
  • Update Date:2025-09-06
  • Status:finish
  • Author:Bill Chambers
  • ISBN:1491912219
  • Environment:PC/Android/iPhone/iPad/Kindle

Summary

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework。 With an emphasis on improvements and new features in Spark 2。0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals。

You'll explore the basic operations and common functions of Spark's structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications。 Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark's scalable machine-learning library。


Get a gentle overview of big data and Spark
Learn about DataFrames, SQL, and Datasets--Spark's core APIs--through worked examples
Dive into Spark's low-level APIs, RDDs, and execution of SQL and DataFrames
Understand how Spark runs on a cluster
Debug, monitor, and tune Spark clusters and applications
Learn the power of Structured Streaming, Spark's stream-processing engine
Learn how you can apply MLlib to a variety of problems, including classification or recommendation

Download

Reviews

Dylan Meeus

Definitely a good introduction to Spark。 I've used this book (along with "Learning Spark") to pass my Spark Databricks Certification exam。 In Chapter I, II and IV it covers enough ground to get a good grip on the architecture and "how it works", but, as with anything code related, the best way to really learn the DataSet / DataFrame API is to just start using it。 There was a bit of repetition that perhaps wasn't needed though。 +1 for having Scala and Python examples!What could definitely be bett Definitely a good introduction to Spark。 I've used this book (along with "Learning Spark") to pass my Spark Databricks Certification exam。 In Chapter I, II and IV it covers enough ground to get a good grip on the architecture and "how it works", but, as with anything code related, the best way to really learn the DataSet / DataFrame API is to just start using it。 There was a bit of repetition that perhaps wasn't needed though。 +1 for having Scala and Python examples!What could definitely be better are some of the architecture drawings。 Using just random shapes and assigning them in text as "executor" / "driver" is much less clear than just labeling them explicitely。 Also some of the figures had no legend。 Stuff like that :-) 。。。more

Maxim

I've learnt a lot from the book。 In my opinion, it's much better than "Learning Spark 2nd edition"。 I've learnt a lot from the book。 In my opinion, it's much better than "Learning Spark 2nd edition"。 。。。more

Ankit

Unnecessarily long。 Could have been written in 40 pages。 You don't have to explain things again and again, we get it, it's easier than Hadoop。Anyway since it's so low level, didn't help me much at all, anyway I have to write code with annotations so the low-level understanding didn't help me much, because in a collaborative structured project, there's no need to find optimization in spark。 (Everyone use HiveQL anyway but shhhh! Don't let the buyers know that。) Unnecessarily long。 Could have been written in 40 pages。 You don't have to explain things again and again, we get it, it's easier than Hadoop。Anyway since it's so low level, didn't help me much at all, anyway I have to write code with annotations so the low-level understanding didn't help me much, because in a collaborative structured project, there's no need to find optimization in spark。 (Everyone use HiveQL anyway but shhhh! Don't let the buyers know that。) 。。。more

LIUF

This is a good entry level book to learn spark SQL。 After finishing this book, I could write satisfying spark DataFrame code for production use。

Gary Bake

Clear and concise。

Joaquín Chemile

Un libro interesante。 No es malo。 Pero el principal problema es que trata de abarcar multiples tópicos complejos: desde machine learning, realización de etls, paquetes de Spark, producción de aplicaciones hasta streaming。 Lo que hace un libro a mitad del camino entre un cookbook y un libro teórico。 El overview de los primeros capítulos es muy interesante; Aunque un poco desactualizado segun he estado consultando blogs especializados y la propia API de Spark。 Creo que sería más útil tener dos lib Un libro interesante。 No es malo。 Pero el principal problema es que trata de abarcar multiples tópicos complejos: desde machine learning, realización de etls, paquetes de Spark, producción de aplicaciones hasta streaming。 Lo que hace un libro a mitad del camino entre un cookbook y un libro teórico。 El overview de los primeros capítulos es muy interesante; Aunque un poco desactualizado segun he estado consultando blogs especializados y la propia API de Spark。 Creo que sería más útil tener dos libros: uno que trate sobre los asuntos relacionados al diseño de aplicaciones de consumo de datos desde una óptica práctica。 Funcionando como el cookbook de Spark del libro: "Designing Data Intensive Applications"; y otro que trate las librerías de machine learning disponibles。 Las parte de Streaming lo eliminaría quitaría (Parte V) para profundizar más sobre las Low-Level APIs (Parte III)Si uno está interesado en aprender sobre procesamiento de datos con Spark:I。 Gentle Overview of Big Data and SparkII。 Structured APIs—DataFrames, SQL, and DatasetsIV。 Production ApplicationsSi uno está interesado en realizar entender como se implementan algoritmos de machine learning en spark (no es un libro para aprender desde cero, sino para saber como hacerlo en spark):I。 Gentle Overview of Big Data and SparkVI。 Advanced Analytics and Machine Learning 。。。more

Michael David Cobb

Snore。 I'll stick with KSQL。 Streams are going to be much more interesting。 Snore。 I'll stick with KSQL。 Streams are going to be much more interesting。 。。。more

Carlos

Very good if you are coming from pandas and to scale up。

Kalyan Tirunahari

Lives to the expectation of the title。 For all levels of readers but familiarity with Scala or Python is needed。 Examples in both the languages。 Tuning and optimization should have been covered in more detail。

Alan

There are a lot of typos in this book。

Rupesh Agarwal

Awesome

Gourav Sengupta

if you can use additional data sets from the internet, then this makes for brilliant reading。 The examples are just introductory, therefore, using additional data sets to work out different scenarios will really benefit。

Johnny

This has to be the most poorly edited book I've ever read。 some examples: there is a figure with boxes that represent two different kinds of components。 the way to tell the components apart is by their shading。 however the shading for all the boxes in the figure is exactly the same。 there are long running code examples that could not possibly compile, and violate basic principles of Scala programming (e。g。 case classes treated as mutable objects)。 And there are TODO style notes still present in This has to be the most poorly edited book I've ever read。 some examples: there is a figure with boxes that represent two different kinds of components。 the way to tell the components apart is by their shading。 however the shading for all the boxes in the figure is exactly the same。 there are long running code examples that could not possibly compile, and violate basic principles of Scala programming (e。g。 case classes treated as mutable objects)。 And there are TODO style notes still present in the text, such as, something like "talk to the people at O'Reilly to get the names of some books to fill out this section。"Many sections of the book provide only surface level treatment of a topic, or leave out discussion of topics entirely as "outside the scope of this book。" Not that there is anything wrong with this in general, but definitely not what I expect from a book entitled "The Definitive Guide。"Setting aside these two flaws, this is an excellent book。 I sincerely hope they do more proofing and editing for the next edition。 。。。more

Keyton

A real thriller!

Alp Oz

Must have in terms of the root mechanisms of the Spark but take account that all major APIs are continuously being changed so always consider the version

f1yegor

Broad, used to systemize knowledge for Databricks certification and Spark 2。2 update

Alex Ott

Really between 4 & 5 stars because of some discrepancies in examples, etc。But, it's really good book about current version of Spark (2。2 & some mentions of 2。3)。 The book is mostly concentrated on the DataFrames, in contrast with other Spark books that mostly talking about RDDs。 A lot of useful information, including Structured Streaming, Machine learning, and even short description of GraphFrames。Highly recommneded Really between 4 & 5 stars because of some discrepancies in examples, etc。But, it's really good book about current version of Spark (2。2 & some mentions of 2。3)。 The book is mostly concentrated on the DataFrames, in contrast with other Spark books that mostly talking about RDDs。 A lot of useful information, including Structured Streaming, Machine learning, and even short description of GraphFrames。Highly recommneded 。。。more

Gavin

It's fine, covers everything shallowly。 The API changes so frequently that you probably need this book: 95% of the Google hits for a given Spark feature are now either wrong or suboptimal。 It's fine, covers everything shallowly。 The API changes so frequently that you probably need this book: 95% of the Google hits for a given Spark feature are now either wrong or suboptimal。 。。。more

Delhi Irc

Location: GG5 IRC, GG6 IRC, GG7 IRC, ND6 IRCAccession No: DL029894-903

Wojtekwalczak

Width over depth, but as an overview of Spark and its ecosystem, the book will do。