Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python

  • Downloads:8914
  • Type:Epub+TxT+PDF+Mobi
  • Create Date:2021-03-09 03:19:55
  • Update Date:2025-09-07
  • Status:finish
  • Author:Peter Bruce
  • ISBN:149207294X
  • Environment:PC/Android/iPhone/iPad/Kindle

Summary

Statistical methods are a key part of data science, yet few data scientists have formal statistical training。 Courses and books on basic statistics rarely cover the topic from a data science perspective。 The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not。

Many data science resources incorporate statistical methods but lack a deeper statistical perspective。 If you're familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format。

With this book, you'll learn:


Why exploratory data analysis is a key preliminary step in data science
How random sampling can reduce bias and yield a higher-quality dataset, even with big data
How the principles of experimental design yield definitive answers to questions
How to use regression to estimate outcomes and detect anomalies
Key classification techniques for predicting which categories a record belongs to
Statistical machine learning methods that "learn" from data
Unsupervised learning methods for extracting meaning from unlabeled data

Download

Reviews

Niklas Emanuelsson

Great book that works as both a reference and for learning about different statistical techniques。Readers will benefit from having some prior knowledge of statistics for much of the material in the book。Great with both python and R code examples。 I primarily program python in my work, but also I use R because I find it easier to learn new statistical calculations and visualizations with R。

Manish Choudhary

very nice for python and R programmers 。

marc

Should probably be advertised as more of a 'primer'。 Covers a large breadth of material, and is probably readable cover to cover in about a week。 Inclusion of both Python and R for nearly all examples is a nice plus。Despite brevity of many sections, there are suggested textbook readings for nearly all major areas。 Some sections have more depth to them - the Decision Tree / Random Forest area is a really impressive breakdown w/ detailed algorithmic pseudocode。 Same goes for linkage discussion w/ Should probably be advertised as more of a 'primer'。 Covers a large breadth of material, and is probably readable cover to cover in about a week。 Inclusion of both Python and R for nearly all examples is a nice plus。Despite brevity of many sections, there are suggested textbook readings for nearly all major areas。 Some sections have more depth to them - the Decision Tree / Random Forest area is a really impressive breakdown w/ detailed algorithmic pseudocode。 Same goes for linkage discussion w/ Agglomerative Clustering。Only gripe is now and again there are some errors that are a bit head-scratching - e。g。, in the RF section it calls out in the recursive tree splitting process: "For the first split, sample p < P variables at random without replacement。" 99% sure that's wrong and what's happening is that you randomly select any available p from P at each split step (w/ replacement), e。g。, see ESLR p。 589 https://web。stanford。edu/~hastie/Elem。。。 。。。more