Underutilized functions for data exploration

Tips from exploring hundreds of variables

Eric Leung

2018-06-02

Given data…

Orange data set on growth of orange trees

The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees.

Typical exploratory functions

dim() - dimensions of data

head() - first 6 or so observations in data

summary() - summary statistics of data

str() - display structure of object

Review functions: str()

str(Orange)

Frustrations and laziness

Three fruitful functions

Skim through your data quickly

library(skimr)
skim(Orange)

Keep grounded with the basics

stem(Orange$age)

Better description of your data

library(Hmisc)
describe(Orange)

Summary and thanks!

Hmisc::describe()

base::stem()

skimr::skim()


Eric Leung

@erictleung

https://erictleung.com