Mahbubul Majumder, PhD
Nov 13, 2014
What is an ideal computing environment for a data scientist?
High performance computing platform is often non-windows
We have windows workstations
R
, RStudio
and MySQL
MySQL
from R
and dplyr
You can connect to high performance super computing facilities at PKI
MySQL
, R
and hadoop
are available for usehadoop
and mapreduce
will run using at least 10 nodesFor convenience, we will use virtual machines to learn working on non-windows platform
VMwayer Player
datascienceVM
is a Linux virtual machine with R
, MySQL
and Hadoop
installedhadoop
and mapreduce
will run using one single nodeCommon Linux distributions
Ubantu
, CentOS
, Red Hat Linux
To learn which distribution you are working use the following command
cat /etc/issue
CentOS release 6.2 (Final)
Kernel \r on an \m
who
mmajumder console Nov 11 19:07
mmajumder ttys000 Nov 13 17:01
Each machine in the lab has a Linux virtual machine (LVM)
R
, MySQL
and Hadoop
is installedAlways shut down the virtual machine (VM) properly
cross
to close the VM windowFor all applications in the datascienceVM
commands | functions | examples |
---|---|---|
ls | lists the files and directory in the current location | ls -l or ls myFileName |
pwd | displays the path of current working directory | pwd and press enter |
cd | change the directory | cd.. to go back or cd myFolder |
mkdir | create a directory | mkdir newFolder |
rm | remove a file. Be cautious, it can’t be undone | rm myTempFile |
vi | open a file in the text editor. | vi newFile.txt |
cat | view the content of a file without opening it | cat myFile.txt |
cp | copy file or folder to a different destination | cp sourceFile destinationPath |
ps | display currently running processes | ps -a |
man | displays the help about a command | man cp |
pwd
/Users/mmajumder/Box Sync/Teaching/stat4410-8416-Data-Science/lectures/21-datascience-lab
library(knitr)
Rmarkdown syntax is used
R
codes or even python
or linux
codesThe complete source code of this presentation slide can be found on github repository
https://github.com/mamajumder/html-presentation
Authoring R presentation
https://support.rstudio.com/hc/en-us/articles/200486468
For markdown documentation
http://daringfireball.net/projects/markdown/syntax
Details about ioslides presentation using markdown
http://rmarkdown.rstudio.com/ioslides_presentation_format.html