Mahbubul Majumder, PhD
Sep 9, 2014
Learn about underlying structure of the data
Identify the variables (could be hidden !)
Potential outliers or anomalies with the data
Missing or systematic pattern in the data
Discover the unexpected
How valuable the data is to answer the potential question
Identify features the model can't find
Identify whether assumptions of a model is satisfied
tips <- read.csv("http://www.ggobi.org/book/data/tips.csv")
tips$rate <- with(tips,tip/totbill)
model <- lm(rate~size, data=tips)
summary(model)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.184375 0.011191 16.475 2.094e-41
size -0.009173 0.004085 -2.245 2.565e-02
library(ggplot2)
ggplot(tips) +
geom_histogram(aes(tip), binwidth=1) +
scale_x_continuous(
breaks=seq(2,10,by=2))
library(ggplot2)
ggplot(tips) + geom_histogram(aes(tip), binwidth=0.1)+
scale_x_continuous(
breaks=seq(2,10,by=2))
ggplot(tips,
aes(totbill,tip)) +
geom_point()
ggplot(tips,
aes(totbill,tip)) +
geom_point() +
geom_smooth(method="lm", se=F)
ggplot(tips,
aes(totbill,tip)) +
geom_point() +
geom_smooth(method="lm", se=F) +
facet_grid(sex~smoker)
Exploratory study reveals more interesting features
Model was not giving those insights
How was it possible?
let us learn it now or never
R
: install.packages("ggplot2")
library(ggplot2)
Note: One variable could be mapped to multiple aesthetics
p <- ggplot(data=tips, aes(x=totbill,y=tip))
p + geom_point(size=3) + geom_smooth(method="lm", se=F)
p <- ggplot(data=tips, aes(x=totbill,y=tip,color=sex))
p + geom_point(size=3) + geom_smooth(method="lm", se=F)
p + geom_point(size=3) + geom_smooth(method="lm",se=F) + facet_wrap(~time)
Geometric functions:
geom_point() geom_line() geom_bar() geom_histogram() geom_density() geom_boxplot()
Statistics:
stat_smooth() stat_quantile()
Faceting:
facet_wrap() facet_grid()
Scaling: scale_x_continuous() scale_y_continuous()
Theme:
theme_bw() theme_gray()
For more please visit http://docs.ggplot2.org/current/index.html
ggplot(tips, aes(totbill)) +
geom_histogram(aes(y = ..density..)) +
geom_density(size=1.5, color="steelblue") +
theme_bw(18)
ggplot(tips, aes(factor(size),tip,color=factor(size))) +
geom_boxplot(aes(fill=factor(size)), alpha=0.3) +
xlab("Party size") +
theme(legend.position = "none")
The Grammar of Graphics, by Leland Wilkinson.
ggplot2, Elegant Graphics for Data Analysis by Hadley Wickham.
ggplot2
web site with nice documentation
http://docs.ggplot2.org/current/