Mahbubul Majumder, PhD
Nov 13, 2014
What is an ideal computing environment for a data scientist?
High performance computing platform is often non-windows
We have windows workstations
R, RStudio and MySQLMySQL from R and dplyrYou can connect to high performance super computing facilities at PKI
MySQL, R and hadoop are available for usehadoop and mapreduce will run using at least 10 nodesFor convenience, we will use virtual machines to learn working on non-windows platform
VMwayer PlayerdatascienceVM is a Linux virtual machine with R, MySQL and Hadoop installedhadoop and mapreduce will run using one single nodeCommon Linux distributions
Ubantu , CentOS , Red Hat Linux
To learn which distribution you are working use the following command
cat /etc/issue
CentOS release 6.2 (Final)
Kernel \r on an \m
who
mmajumder console Nov 11 19:07
mmajumder ttys000 Nov 13 17:01
Each machine in the lab has a Linux virtual machine (LVM)
R, MySQL and Hadoop is installedAlways shut down the virtual machine (VM) properly
cross to close the VM windowFor all applications in the datascienceVM
| commands | functions | examples |
|---|---|---|
| ls | lists the files and directory in the current location | ls -l or ls myFileName |
| pwd | displays the path of current working directory | pwd and press enter |
| cd | change the directory | cd.. to go back or cd myFolder |
| mkdir | create a directory | mkdir newFolder |
| rm | remove a file. Be cautious, it can’t be undone | rm myTempFile |
| vi | open a file in the text editor. | vi newFile.txt |
| cat | view the content of a file without opening it | cat myFile.txt |
| cp | copy file or folder to a different destination | cp sourceFile destinationPath |
| ps | display currently running processes | ps -a |
| man | displays the help about a command | man cp |
pwd
/Users/mmajumder/Box Sync/Teaching/stat4410-8416-Data-Science/lectures/21-datascience-lab
library(knitr)
Rmarkdown syntax is used
R codes or even python or linux codesThe complete source code of this presentation slide can be found on github repository
https://github.com/mamajumder/html-presentation
Authoring R presentation
https://support.rstudio.com/hc/en-us/articles/200486468
For markdown documentation
http://daringfireball.net/projects/markdown/syntax
Details about ioslides presentation using markdown
http://rmarkdown.rstudio.com/ioslides_presentation_format.html