Data Pipelines Pocket Reference: Moving and Processing Data for Analytics

Data Pipelines Pocket Reference: Moving and Processing Data for Analytics

  • Downloads:9791
  • Type:Epub+TxT+PDF+Mobi
  • Create Date:2022-01-22 08:55:04
  • Update Date:2025-09-06
  • Status:finish
  • Author:James Densmore
  • ISBN:1492087831
  • Environment:PC/Android/iPhone/iPad/Kindle

Summary

Data pipelines are the foundation for success in data analytics。 Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it。 This pocket reference defines data pipelines and explains how they work in today's modern data stack。

You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy。 This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions。

You'll learn:


What a data pipeline is and how it works
How data is moved and processed on modern data infrastructure, including cloud platforms
Common tools and products used by data engineers to build pipelines
How pipelines support analytics and reporting needs
Considerations for pipeline maintenance, testing, and alerting

Download

Reviews

Diaa Mohammed

Nice Introduction to many data pipeline technologies, and to the point。。。 Good book to get started as a data engineer, given you are already a senior developer。

Mi Lia

Very well written small book。 The only reason I've put 4/5 and not 5/5 was that。。。for some weird reason I hadn't realized that the book would be such short。 So it was a small disappointment。 But surely the content of the book was worthwhile, if only there was more。。。 Very well written small book。 The only reason I've put 4/5 and not 5/5 was that。。。for some weird reason I hadn't realized that the book would be such short。 So it was a small disappointment。 But surely the content of the book was worthwhile, if only there was more。。。 。。。more

Florent

The data engineering industry suffers from a lack of good books。 This one is very practical and ELT-focused。 It complements well theoretical books like Kimball's and Designing Data-Intensive Applications。It still far from being perfect though。 Some parts are already outdated。 The focus on ELT, without extensive discussion of its tradeoffs, is highly questionable。 The data engineering industry suffers from a lack of good books。 This one is very practical and ELT-focused。 It complements well theoretical books like Kimball's and Designing Data-Intensive Applications。It still far from being perfect though。 Some parts are already outdated。 The focus on ELT, without extensive discussion of its tradeoffs, is highly questionable。 。。。more

Evan Oman

Nice, quick overview of ELT pipelines focusing on SQL and Airflow。 It covers a pretty narrow slice of the data engineering world but was still a useful read。

Scott Haines

This is a goood book for data engineers looking to work with CDC ETL & EtLT data moving systems。 It covers a little orchestration management with Airflow too。 Nice book for the desk of anyone working in the data industry。

Sebastian Gebski

I don't think that 'Pocket Reference' is the proper way to describe this book。An example? Sample path? One of the ways to do it? A representative case?A bit of theory, some SQL, basic introduction to how to structure processing pipeline - that's what you can get out of this book。It's probably OK if you want to figure out what actually powers (under the hood) modern data processing pipelines, but I wouldn't say it's useful if you want to set a solid foundation for a more thorough research。 I don't think that 'Pocket Reference' is the proper way to describe this book。An example? Sample path? One of the ways to do it? A representative case?A bit of theory, some SQL, basic introduction to how to structure processing pipeline - that's what you can get out of this book。It's probably OK if you want to figure out what actually powers (under the hood) modern data processing pipelines, but I wouldn't say it's useful if you want to set a solid foundation for a more thorough research。 。。。more

Yi Zheng

PracticalGood book, very practical。 Teach you how set and manage a pipeline entirely。 * * * * * * *

Mikhail Erofeev

I read this book to get up to speed with modern software data engineering。 I think I achived the goal, although I finished with a knowledge of how much I do not know, rather than with the confidence in building the solutions myself。 James seems to take an opinionated approach by using cloud warehouse databases (Redshift and Snowflake)。 The use cases and computations are well suited to them, and I would need to read other recourses to see how the patterns mentioned play with other technologies。 T I read this book to get up to speed with modern software data engineering。 I think I achived the goal, although I finished with a knowledge of how much I do not know, rather than with the confidence in building the solutions myself。 James seems to take an opinionated approach by using cloud warehouse databases (Redshift and Snowflake)。 The use cases and computations are well suited to them, and I would need to read other recourses to see how the patterns mentioned play with other technologies。 The price/ops complexity of possible stacks is not mentioned。The chapters with SQL examples look great。 I learned a bunch there。 There are also enough mentions of various technologies and books throughout the book -- I learned about Kimball modeling, dbt, Airflow, Atlas。。。It would be great to extend the reasoning about production and operations, pitfalls and risks -- such as schema migration, scaling, schema registry, deployment, versioning, durability risks, retention, backups, recomputing。。。 Validation, metrics collection, and slack notifications are presented and I would like to hear more about some visualization。Overall it is a good book and I only wish every chapter of it would be bigger。 Oh wait, there is "Pocket" in the name。 Nevermind, then。 。。。more

Stephanie

Some technologies are already a bit outdated, but the book serves as a good overview of the whole ETL/ELT processes。