R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
by Hadley Wickham (Author), Garrett Grolemund (Author)
On This Page
Description
"This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience"--Page 4 of cover.Tags
Recommendations
Member Recommendations
sergiouribe Data science + data visualization
Member Reviews
If the above quote is the mission of this book, consider the task accomplished. Where most books in computer science fall down in trying to be cute while communicating an educational message, this book addresses the task of education about R squarely, and it does so in a manner that engages the mind with interesting problems.
Usually, I skip the exercises sections of most computer books because, well, they offer challenges that are underwhelming. Recall is all that is required to answer them. Usually, I can figure them out in the confines of my mind so that I don't have to waste my time looking up the answers or coding example code to check whether I'm right or where I err.
Not so for Hadley Wickham. Many of his questions were awakened my show more curiosity and had me applying me new knowledge in R Studio immediately. In fact, the only way I could answer my burning curiosity was to write code in order to test my hypotheses.
Rare is the computer book that is a page turner. This book qualifies as just that if one has the aptitude in statistics to embrace the challenges. R is an ideal language to handles these challenges in statistics, and Wickham and Grodemund fill the role of ideal apostles/evangelists to share this free fruit.
The fun part about R is that it is free, creative, and well-supplied with packages to solve interesting statistical problems. This book carries that message squarely to my lap (and then to my brain) in an engaging manner. show less
Usually, I skip the exercises sections of most computer books because, well, they offer challenges that are underwhelming. Recall is all that is required to answer them. Usually, I can figure them out in the confines of my mind so that I don't have to waste my time looking up the answers or coding example code to check whether I'm right or where I err.
Not so for Hadley Wickham. Many of his questions were awakened my show more curiosity and had me applying me new knowledge in R Studio immediately. In fact, the only way I could answer my burning curiosity was to write code in order to test my hypotheses.
Rare is the computer book that is a page turner. This book qualifies as just that if one has the aptitude in statistics to embrace the challenges. R is an ideal language to handles these challenges in statistics, and Wickham and Grodemund fill the role of ideal apostles/evangelists to share this free fruit.
The fun part about R is that it is free, creative, and well-supplied with packages to solve interesting statistical problems. This book carries that message squarely to my lap (and then to my brain) in an engaging manner. show less
This is one of the best O'Reilly books I've read. For context, I'm a graphics programmer that fell into sci vis, e.g., visualizing fluid simualtions, and is now pivoting into info vis.
Part I: Explore gives an overview of using R+ggplot2+some tidyverse to do exploratory data analysis. It is one of the best intro overview dives I've come across for any type of programming. Most dives of this sort have at least one or two gaps in material or unclear motivation or try to do too much. This was perfectly crafted to lead someone into the tidyverse.
Part II: Wrangle is a more thorough look at the tidyverse. I recommend supplementing this by reading Wickham's original paper on tidy data.
Part III: Program was a little tedious because I already show more have decades of programming experience, though the coverage of purrr is interesting.
Part IV: Model covers building linear and non- models. I don't have a statistics background but even so found this easy to follow and very clear.
Part V: Communicate is a smorgasbord of R Markdown and options building on top of it. I thought this section had a bit of a conflicting message to end on, because after 400 some pages of doing work in RStudio with .R script files, the authors all of a sudden seem to say to forget all that and do everything as R Markdown. Which is fine, but if that's their recommendation I think introducing that earlier would have been better.
There are some copy editing issues, luckily Wickham has an updated online edition with corrections. Some of the exercises weren't entirely clear as to intent, but that could entirely be do to my lacking stats background. (Plenty of people have posted solutions online if you get stuck.) show less
Part I: Explore gives an overview of using R+ggplot2+some tidyverse to do exploratory data analysis. It is one of the best intro overview dives I've come across for any type of programming. Most dives of this sort have at least one or two gaps in material or unclear motivation or try to do too much. This was perfectly crafted to lead someone into the tidyverse.
Part II: Wrangle is a more thorough look at the tidyverse. I recommend supplementing this by reading Wickham's original paper on tidy data.
Part III: Program was a little tedious because I already show more have decades of programming experience, though the coverage of purrr is interesting.
Part IV: Model covers building linear and non- models. I don't have a statistics background but even so found this easy to follow and very clear.
Part V: Communicate is a smorgasbord of R Markdown and options building on top of it. I thought this section had a bit of a conflicting message to end on, because after 400 some pages of doing work in RStudio with .R script files, the authors all of a sudden seem to say to forget all that and do everything as R Markdown. Which is fine, but if that's their recommendation I think introducing that earlier would have been better.
There are some copy editing issues, luckily Wickham has an updated online edition with corrections. Some of the exercises weren't entirely clear as to intent, but that could entirely be do to my lacking stats background. (Plenty of people have posted solutions online if you get stuck.) show less
Like a week-long workshop with the authors, this book presents data analysis in terms of the R packages in the tidyverse. I don't think you can read it and fail to learn a lot. It has an especially nice organized approach to data import and non-tidy data. I think I would recommend it to almost anyone who does some data analysis. My only caveat would be that although you could start learning R with this book, it might be a difficult and non-traditional path for some complete beginners.
El mejor libro para ciencia de datos por el Wickham, creador de todo un nuevo lenguaje que permite remodelar, visualizar y resumir datos para extraer de ellos información.
He tomado varios cursos de Grolemund y destaca que va de lo simple a lo complejo. Por ejemplo, el curso de HarvardX comienza con...FUNCIONES. Hay algunos que ocupamos R para procesar cantidades pequeñas de datos, como en estudios epidemiológicos o clínicos, en comparación a quienes procesan datos de Facebook o Google, que son TB de información. En este libro las funciones vienen en la parte 15. O sea, este libro va enseñando realmente de menos a más, comenzando con lo fácil y simple para llegar a lo difícil y complejo, pero usualmente más útil.
Por show more ejemplo, en R Base ordenar sería algo como
df[order(df$recuento,decreasing=TRUE), ]
mientras que con dplyr sería
arrange(df, desc(recuento))
lo que un humano puede leer: ordenar (la base de datos, en forma descendente mediante la variable Count.
El hecho que ahora pueda prescindir de los [] permite agilizar mucho cualquier escritura de código.
La calidad del libro es perfecta, con varios colores que resaltan distintas partes de los códigos para indicar como funcionan.
Es un libro indispensable para cualquiera que tenga que analizar datos. show less
He tomado varios cursos de Grolemund y destaca que va de lo simple a lo complejo. Por ejemplo, el curso de HarvardX comienza con...FUNCIONES. Hay algunos que ocupamos R para procesar cantidades pequeñas de datos, como en estudios epidemiológicos o clínicos, en comparación a quienes procesan datos de Facebook o Google, que son TB de información. En este libro las funciones vienen en la parte 15. O sea, este libro va enseñando realmente de menos a más, comenzando con lo fácil y simple para llegar a lo difícil y complejo, pero usualmente más útil.
Por show more ejemplo, en R Base ordenar sería algo como
df[order(df$recuento,decreasing=TRUE), ]
mientras que con dplyr sería
arrange(df, desc(recuento))
lo que un humano puede leer: ordenar (la base de datos, en forma descendente mediante la variable Count.
El hecho que ahora pueda prescindir de los [] permite agilizar mucho cualquier escritura de código.
La calidad del libro es perfecta, con varios colores que resaltan distintas partes de los códigos para indicar como funcionan.
Es un libro indispensable para cualquiera que tenga que analizar datos. show less
if ur gonna use R, its probably the best resource out there, aside from wickham's advanced R guide; but the language is so antiquated and outdated, with so many issues in its fundamental data structures, that its frustrating to pretend it can b an elegant front end for research development; it seems like wickham's energy would b better directed towards developing an R2 (couldn't u just ship a wrapper to make R1 packages compatible?) rather than trying to patch R as it is w more and more packages to try to smooth things over
Ratings
Members
- Recently Added By
Author Information

Hadley Wickham is Chief Scientist at RStudio, an Adjunct Professor at Stanford University and the University of Auckland, and a member of the R Foundation. He is the lead developer of the tidyverse, a collection of R packages, including ggplot2 and dplyr, designed to support data science. He is also the author of R for Data Science (with Garrett show more Grolemund), R Packages, and ggpiot2: Elegant Graphics for Data Analysis. show less
Garrett Grolemund is a statistician, teacher, and R developer who works as a data scientist and Master Instructor at RStudio. Garrett received his PhD at Rice University, where his research traced the origins of data analysis as a cognitive process and identified how attentional and epistemological concerns guide every data analysis.
Work Relationships
Common Knowledge
- Original publication date
- 2017
- Original language*
- Englisch
*Some information comes from Common Knowledge in other languages. Click "Edit" for more information.
Classifications
- Genres
- Technology, Nonfiction, General Nonfiction
- DDC/MDS
- 006.312 — Computer science, information & general works Computer science, knowledge & systems Special computer methods (AI, barcoding, VR, web design, social media) Artificial Intelligence Machine Learning Data mining
- LCC
- QA276.45 .R3 .W53 — Science Mathematics Mathematics Probabilities. Mathematical statistics
- BISAC
Statistics
- Members
- 294
- Popularity
- 109,253
- Reviews
- 5
- Rating
- (4.56)
- Languages
- English, German
- Media
- Paper, Ebook
- ISBNs
- 12
- ASINs
- 4





























































