Skip to main content Link Search Menu Expand Document (external link)

Reading 3

This reading introduces the process of cleaning and organizing data—often called data “wrangling” or “munging.” We’ll focus primarily on spreadsheet-style (tabular) data, since it’s often the first type of data you’ll encounter in future courses and it’s ubiquitous in real-world work.

While “tidy data” principles also apply to more complex data types—like images, text, audio, and video—the core skills and ideas presented here are foundational. Learning how to structure and tidy tabular data is an essential first step toward wrangling messy, high-dimensional, or unstructured data later on.

Reading Guide

  • R3 Reading Guide coming soon! ↗
    • This reading guide highlights the major ideas to focus on in the paper. You will not turn in the guide—it’s simply here to help you distill the paper and keep track of the main points. Use it before you read (to preview what to look for), while you read (to take notes), and after you read (to review). It will also be a helpful reference when preparing for the reading quiz/exam!

Additional Resources

Here is an example ↗ from Garrett Grolemund on how you can use a programming language such as R to work with and quickly turn a dataset into tidy data. This walk-through provides some insight into how data is wrangled, as well as, some of the benefits afforded to the analyst when working with data that is tidy.