Mahbubul Majumder, PhD
Dec 9, 2014
time sex dose
1 M 3
1 F 5
2 M 10
2 F 11
3 F 10
library(reshape2)
dcast(dat, time ~ sex)
time F M
1 1 5 3
2 2 11 10
3 3 10 NA
Missing Not At Random (MNAR)
Missing At Random (MAR)
completely random (MCAR)
Missing event, when not at random (MNAR) may be informative
We can ignore missing records if it is MAR or MCAR
Perform a complete case analysis. Ignore or Delete all missing observations.
Last observation carried forward: usually for longitudinal data
Imputation of missing data
Multiple imputation fills the missing values by normal(mean, sd) values
Its messy, variables are hard to be determined.
[1] "*$তথ্য*$%&*$кики%&%&19ки19#%/>[|]<7272%&#%তথ্য[|]>@*72#%#%*$<>72@*#%*$@*/*$<[|]кики><<19#%*$/#%#%*$<ки1972%&*$@**$@*19[|][|]%&киতথ্য/%&@*%&তথ্য@*//%&*$[|]/%&%&/ки<[|]%&90তথ্য<ки"
The meeting will held on Tuesday the 9th December, 2014. Next meeting is scheduled on Wednesday the 17th December 2014
<input type="hidden" name="title" value="Special:SearchWiki">
<input type="hidden" name="uselang" value="en">
<input type="hidden" name="searchproject" value="p">
time | Friday | Saturday | Sunday | Monday | Tuesday | Wednesday | Thursday |
---|---|---|---|---|---|---|---|
1 | 22 | 24 | 14 | 21 | 20 | 18 | 17 |
2 | 15 | 22 | 22 | 15 | 19 | 17 | 22 |
3 | 24 | 17 | 11 | 28 | 30 | 21 | 11 |
4 | 11 | 20 | 29 | 13 | 16 | 15 | 21 |
5 | 21 | 22 | 20 | 12 | 22 | 17 | 16 |
6 | 22 | 21 | 16 | 19 | 23 | 19 | 10 |
7 | 15 | 15 | 11 | 22 | 21 | 20 | 25 |
8 | 18 | 29 | 16 | 22 | 24 | 18 | 14 |
9 | 18 | 24 | 20 | 19 | 20 | 12 | 19 |
10 | 27 | 14 | 21 | 18 | 22 | 19 | 34 |
Delivery | Amount |
---|---|
On Sunday | |
10:30 | 43 |
12:30 | 12 |
12:35 | 30 |
On Monday | |
11:30 | 29 |
11:57 | 87 |
11.59 | 63 |
On Tuesday | |
11:33 | 19 |
11:15 | 27 |
12.59 | 54 |
Delivery | Amount |
---|---|
On Sunday | |
10:30 | 43 |
12:30 | 12 |
12:35 | 30 |
On Monday | |
11:30 | 29 |
11:57 | 87 |
11.59 | 63 |
On Tuesday | |
11:33 | 19 |
11:15 | 27 |
12.59 | 54 |
Days | times | Amount |
---|---|---|
Sunday | 10:30 | 43 |
Sunday | 12:30 | 12 |
Sunday | 12:35 | 30 |
Monday | 11:30 | 29 |
Monday | 11:57 | 87 |
Monday | 11.59 | 63 |
Tuesday | 11:33 | 19 |
Tuesday | 11:15 | 27 |
Tuesday | 12.59 | 54 |
Please refer to Hadley's lecture on tidy data
http://stat405.had.co.nz/lectures/18-tidy-data.pdf
For more details refer to the documentation of R
package tidyR
install.packages("tidyr")
library(tidyr)
Large amount of data, no single machine can handle to clean
Problem of character encoding in text data
grunt> dat = LOAD 'input.txt' as myText;
grunt> out = FOREACH dat GENERATE REGEX_EXTRACT(myText,'(\\w+)',1);
grunt> dump out;
R Little and D Rubin (2002) Statistical Analysis with Missing Data, Second Edition
Hadley's lecture on tidy data
http://stat405.had.co.nz/lectures/18-tidy-data.pdf
RStudio blog introducing tydyr package
http://blog.rstudio.org/2014/07/22/introducing-tidyr/
Demonstration of tidyr by Hadley
https://rpubs.com/m_dev/tidyr-intro-and-demos