Running virtual machine using the VMWare player.

This document is prepared for the data science lab users who are taking the class Introduction to Data Science at University of Nebraska at Omaha. The class website can be visited here. For convenience, a Linux virtual machine is created where all the data science tools are configured for training purpose. You will need two things to be able to use this resource.

Windows users need VMWare player to run the virtual machine. You can download VMWare player from here. MAC users need Virtual Box which can be downloaded from here.
Data Science virtual machine datachienceVM. You can copy it from any of the lab machines.

The virtual machine is configured with the following tools

R
MySQL
Hadoop

This document provides detailed instructions on how to work on the data science virtual machine. There will be seperate documentation on how to work with data science tools. If you have any question, please contact the course instructor Dr. Mahbubul Majumder.

Starting data science virtual machine

Enter into the lab machine by giving the password training123
Then we open up the VMWare player where we can use the LINUX commands in the terminal window of that machine. For that we click on the image on your task bar.
We then get the window giving us the options to select the required virtual machine we wanted to use. If you don’t find the virtual machine you are looking for in the list, you may need to open your machine from the option Open a Virtual Machine
To select a virtual machine we should double click on the name which opens the virtual machine. Open the datascienceVM
Now you need to enter the password of the virtual machine to open and use it. The password is training123
Now you are in the datascienceVM Virtual Machine where you will find all the commands on the left side task bar.
Now open the terminal window by giving a click on it. The terminal window looks like the following image.

After the terminal window is opened we now perform the following commands.

Most Commonly used UNIX/LINUX commands

Contents

ls
pwd
cd
vi
ps
man
mkdir

rmdir
mv
cp
sudo
kill
head
tail

tar
who
cat
clear
rm
top
touch

grep
history
cal
apt
locate
free
passwd

pipe |
>
exit

ls command is used to display the contents that are present in your current working directory. When ever anyone wants to know the files and directories that are present in the current directory that he is working in then he uses this command. This is one of the most important and commonly used commands in the LINUX operating system.
```
training@ubuntu:~$ ls
Desktop    Downloads         mathlab  Pictures  R          test456
Documents  examples.desktop  Music    Public    Templates  Videos
```

There are several options while using ls command. Now if you provide ls -l the output will display information about the files. The following example presents us a more clear picture of the usage of the command.

training@ubuntu:~$ ls -l
total 52
drwxr-xr-x 2 training training 4096 Oct 16 17:09 Desktop
drwxr-xr-x 2 training training 4096 Oct 16 17:09 Documents
drwxr-xr-x 3 training training 4096 Nov  4 12:18 Downloads

pwd [present working directory] as the name suggests it displays the pathname of the directory we are currently working in. This shows the exact place where we are performing the operations
```
training@ubuntu:~$ pwd
/home/training
```

cd command enables us to move from one place to another. The mostly and highly used command for each and every purpose is the cd command.

training@ubuntu:~$ cd Downloads/
training@ubuntu:~/Downloads$ ls
hadoop-1.2.1  hadoop-1.2.1.tar.gz  jdk-8u25-linux-x64.tar.gz  rstudio-0.98.1087-amd64.deb
training@ubuntu:~/Downloads$ cd hadoop-1.2.1/
training@ubuntu:~/Downloads/hadoop-1.2.1$ ls
bin          conf                  hadoop-client-1.2.1.jar       hadoop-test-1.2.1.jar   lib          NOTICE.txt  src
build.xml    contrib               hadoop-core-1.2.1.jar         hadoop-tools-1.2.1.jar  libexec      README.txt  webapps
c++          docs                  hadoop-examples-1.2.1.jar     ivy                     LICENSE.txt  sbin
CHANGES.txt  hadoop-ant-1.2.1.jar  hadoop-minicluster-1.2.1.jar  ivy.xml                 logs         share
training@ubuntu:~/Downloads/hadoop-1.2.1$

vi command is used to create a file. Whenever you wish to create a file then follow this syntax. Below is an example showing the syntax of creating any text file or a C file.
```
vi [filename].txt which creates a Text file
vi [filename].c   which creates a C file.
```

you can create different kinds of files by giving their type properly after the file name. As you need to create a C file you extension should be .c

ps command displays the snapshot of currently running process. This also enables one to know much more information about the currently running process.
```
training@ubuntu:~$ ps
  PID TTY          TIME CMD
 2903 pts/13   00:00:00 bash
 4483 pts/13   00:00:00 ps
```

man command that displays the help information of any particular commands we use. Lets see an example giving the command man mkdir which gives us the information about that command.

MKDIR(1)                                                   User Commands                                                  MKDIR(1)

NAME
       mkdir - make directories

SYNOPSIS
       mkdir [OPTION]... DIRECTORY...

DESCRIPTION
       Create the DIRECTORY(ies), if they do not already exist.

       Mandatory arguments to long options are mandatory for short options too.
       -m, --mode=MODE
              set file mode (as in chmod), not a=rwx - umask
       -p, --parents
              no error if existing, make parent directories as needed
       -v, --verbose
:

mkdir [make directory] is used to create a new directory. We create any new directory with the command mkdir [directory_name] and use this new directory to store certain files which belong to that.
- For example we create a directory called “JAVA” and check if its created or not by using the mkdir command.
```
training@ubuntu:~$ mkdir java
training@ubuntu:~$ ls
Desktop    Downloads         java     Music     Public  Templates  Videos
Documents  examples.desktop  mathlab  Pictures  R       test456
training@ubuntu:~$ cd java
training@ubuntu:~/java$ ls
training@ubuntu:~/java$ 
```
  By the above example we can confirm that the directory JAVA is created when we use the mkdir command to create it.

rmdir [remove directory] command removes any directory. If we want to delete any particular directory then we have to use rmdir [directory_name] command and the directory name should be the name of the directory that should be deleted.
- Below is an example which shows us the working of mkdir command.
```
training@ubuntu:~$ ls
Desktop  Documents  Downloads  examples.desktop  java  mathlab  Music  Pictures  Public  R  Templates  test456  Videos
training@ubuntu:~$ rmdir java
training@ubuntu:~$ ls
Desktop  Documents  Downloads  examples.desktop  mathlab  Music  Pictures  Public  R  Templates  test456  Videos
training@ubuntu:~$ 
```
- We confirm that the directory JAVA is deleted. Hence we can use the rmdir command to delete the unused directories.
- This command returns us an error if we operate this command from the directory which we intend to delete.
- For example if we use this command from the JAVA directory and then we want to delete JAVA directory then this command does not work

mv command is used to move a file or directory from one place to another. The syntax will be the command followed by the source i,e the file to be moved to the destination i,e where should the file be moved
```
mv [source] [destination]
```

cp Command helps us to copy the files or any content from one place to another. This copies the content from the source to destination and keeps the content in both the places unlike mv command which removes the content from the source folder. Below is an example of how to use the cp command:
```
cp [source] [destination] 
example:
cp NOTICE.txt NOTICE2.txt
```
This example copies the contents in the NOTICE.txt file into NOTICE2.txt. Here NOTICE.txt is the source and NOTICE2.txt is the destination. We can find all the data which is in the NOTICE.txt file in the NOTICE2.txt when we open the NOTICE2.txt using the command as:
```
vi NOTICE2.txt
```

There are many option for using the cp command. Some of them are:

-i [interactive] command prompts before overriding the destination file. Whenever a destination file is being overridden then it prompts us with a question as shown in the below example. Our answer should be Y if you want the destination file to be overridden or N if you don’t want it to be overridden.
```
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -i README.txt NOTICE.txt
cp: overwrite 'NOTICE.txt'? n
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt
```

-b [backup] command acts as a back up but it does not accept any arguments

training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -b NOTICE.txt NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt

-r [recursive] command is used to copy the directories recursively.

training@ubuntu:~/Downloads/hadoop-1.2.1$ mkdir 123
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -r hadoop-core-1.2.1.jar 123
training@ubuntu:~/Downloads/hadoop-1.2.1$ cd 123
training@ubuntu:~/Downloads/hadoop-1.2.1/123$ ls
hadoop-core-1.2.1.jar

-n [no clobber] command is used to prevent the data from overriding

training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -n NOTICE.txt NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt

-f [force] command is used when an existing destination can’t be opened it removes it and tries to open it again

-s [symbolic-links] command is used to make symbolic links to a new file instead of just copying it. Now the file 123.txt will have the same contents as that of NOTICE2.txt

training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -s NOTICE2.txt 123.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi 123.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ ls
123        c++          docs                     hadoop-examples-1.2.1.jar     ivy      LICENSE.txt   NOTICE.txt  src
123.txt    CHANGES.txt  hadoop-ant-1.2.1.jar     hadoop-minicluster-1.2.1.jar  ivy.xml  logs          README.txt  webapps
bin        conf         hadoop-client-1.2.1.jar  hadoop-test-1.2.1.jar         lib      NOTICE2.txt   sbin
build.xml  contrib      hadoop-core-1.2.1.jar    hadoop-tools-1.2.1.jar        libexec  NOTICE2.txt~  share

-t [target-directory] command is used to copy all the source arguments into the directory.

training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -t 123 NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ cd 123
training@ubuntu:~/Downloads/hadoop-1.2.1/123$ ls
hadoop-core-1.2.1.jar  NOTICE2.txt

-v [verbose] command is used to describe what action is actually being done. For example we are copying the contents of hadoop-core-1.2.1.jar to another directory 123, these are the following operations we perform
```
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -v hadoop-core-1.2.1.jar 123
'hadoop-core-1.2.1.jar' -> '123/hadoop-core-1.2.1.jar'
```

sudo command allows a permitted user to execute a command as the superuser or another user. By default, sudo requires users to authenticate themselves with password. Below is an example of how we use the sudo command.
```
training@ubuntu:~$ sudo mkdir directory
[sudo] password for screenshot: 
training@ubuntu:~$ ls
Desktop  directory  Documents  Downloads  examples.desktop  Music  Pictures  Public  R  Templates  Videos
```
- In the above example we have tried to create a Directory named directory with the sudo command. The immediate step is the prompt for the password.
- Once the password is given the directory is created and it can be verified by giving the ls command which produces us with the files and directories as discussed above

kill is a command that is used in several popular operating systems to send signals to running processes in order to request the termination of the process. Below are some examples of different kill commands:
- killall
```
killall -sigkill [process-name]
```
- pkill command is same as killall but the only thing is pkill command allows the use of extended regular expression patterns and other matching criteria. Below is an example:
```
pkill -9 unity
```
  Here the command kills all the processes that start with the name unity.
- pkill command can also be used to kill usernames. For example
```
pkill -9 -u USERNAME
```

kill -l command lists the kill signals

training@ubuntu:~/Downloads/hadoop-1.2.1$ kill -l
 1) SIGHUP   2) SIGINT   3) SIGQUIT  4) SIGILL   5) SIGTRAP
 6) SIGABRT  7) SIGBUS   8) SIGFPE   9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG  24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF 28) SIGWINCH    29) SIGIO   30) SIGPWR
31) SIGSYS  34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

Here we can observe that all the kill signals begin with SIG which means SIGNAL

head command reads the first ten lines of a any given file name. The basic syntax of head command is:
```
head [file-name]
```
- Here is an example of using the head command

LINUX/UNIX@ubuntu:~/Downloads/hadoop-1.2.1$ head LICENSE.txt

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/
    
    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,

If you want to use the head command on multiple files there is an option for that too. You can just give the head command followed by filenames for which you want to read the lines. Below is an example of using it.

hadoopinstall@ubuntu:~/Downloads/hadoop-1.2.1$ head LICENSE.txt ivy.xml
==> LICENSE.txt <==

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,

==> ivy.xml <==
<?xml version="1.0" ?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

In the above example we have given a command to read the lines of LICENSE.txt and ivy.xml file. Its displayed as shown above.
If we want to retrieve more number of lines than the default 10, then give the hyphen and specify the integer without spaces to get the desired line to be retrieved. Below is an example that shows the usage of the command which produces us with 15 lines.

hadoopinstall@ubuntu:~/Downloads/hadoop-1.2.1$ head -15 build.xml
<?xml version="1.0"?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

tail command allows use to view the last 10 lines of any file. It has the same working as head command but the only difference is the head command displays the first 10 line where as the tail command creates the last 10 lines. Everything else remains the same including the options. The working of tail command is as shown below:
```
tail [filenames]
```

tar is used to Tape Archive i,e it allows us to convert a large group of files into an archive. This enables us to move the archived file as a single unit which makes the usage easier as every thing will be present at one place.

Here is an example command to archive a file using tar command.
```
tar -cf archive.tar file1 file2
```

who command displays who has logged on into the system. The output will be displayed as follows if the command is given as who

training@ubuntu:~$ who
screenshot :0           2014-11-11 16:15 (:0)
screenshot pts/13       2014-11-11 16:15 (:0)
training@ubuntu:~$

cat command enables us to view the contents in those files. When we write the command cat [file-name] it gives us the whole contents present in that file.
- For example if we want to view the contents of the LICENSE.txt which is present in the Hadoop folder which is in the Downloads folder we give the command cat LICNSE.txt. We are produced with the output :

training@ubuntu:~/Downloads/hadoop-1.2.1$ cat LICENSE.txt

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
:

There are many other uses of cat command. Some of them are:

It is used to concatenate many number of files together. Here is the example:

hadoopinstall@ubuntu:~$ echo 'Hello' > 1
hadoopinstall@ubuntu:~$ echo 'Welcome to' > 2
hadoopinstall@ubuntu:~$ echo 'Data Science Lab' > 3
hadoopinstall@ubuntu:~$ cat 1 2 3 > 4
hadoopinstall@ubuntu:~$ cat 4
Hello
Welcome to
Data Science Lab

cat command is also used to copy the files from source to desination

hadoopinstall@ubuntu:~$ vi make.txt
//(data in the file)  i can study the use of CAT command
hadoopinstall@ubuntu:~$ cat make.txt > make1.txt
hadoopinstall@ubuntu:~$ cat make1.txt
i can study the use of CAT command

clear Command allows us to clears the screen. If we want to clear the screen after it becomes filled up with the commands then you can simply give the command clear which clears the screen by moving all the commands to the upper page. We won’t loose the previous commands and we can also have those by moving the scroll upwards.
```
clear
```

rm command deletes the existing files. It’s similar to the earlier command rmdir which deletes the directories and the rm command is used to delete the files. Below is the syntax we use to implement the rm command:

rm [file-name]` the [file-name] is the name of the file that should be deleted.

Example of using the rm command can be seen below:

training@ubuntu:~$ ls
create.c  Desktop  directory  Documents  Downloads  examples.desktop  Music  Pictures  Public  R  Templates  Videos
training@ubuntu:~$ rm create.c
training@ubuntu:~$ ls
Desktop  directory  Documents  Downloads  examples.desktop  Music  Pictures  Public  R  Templates  Videos

top command displays the resources being used by your system. The example of using the top command can be seen below. When we intend to stop just enter the letter q

hadoopinstall@ubuntu:~$ top

top - 13:45:05 up  2:04,  2 users,  load average: 0.23, 0.14, 0.09
Tasks: 312 total,   2 running, 310 sleeping,   0 stopped,   0 zombie
%Cpu(s): 28.9 us, 11.3 sy,  0.0 ni, 42.5 id, 16.9 wa,  0.3 hi,  0.0 si,  0.0 st
KiB Mem:   1010460 total,   940180 used,    70280 free,    68588 buffers
KiB Swap:  1046524 total,    17692 used,  1028832 free.   338844 cached Mem

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                            
3861 hadoopi+  20   0  592080  19156   9480 S 13.7  1.9   0:00.55 unity-scope-loa                                                    
 2481 hadoopi+  20   0 1187328  67072  24388 S 11.1  6.6   0:34.82 compiz                                                             
 1190 root      20   0  315072  41816  11404 S  5.9  4.1   0:13.11 Xorg                                                               
:

In Linux every single file is associated with timestamps, and every file stores the information of last access time, last modification time and last change time. So, whenever we create new file, access or modify an existing file, the timestamps of that file automatically updated. The touch is one such command for Unix/Linux operating systems, that is used to create, change and modify timestamps of a file. Let us now look at the usage of touch command.
```
touch [file-name1] [file-name2] [file-name3]
```

This allows us to create multiple files at a time
There are different options while using the touch command. Now let us have a look into them
- -a –> Its used to change or update the last access and modification times of a file. This command sets the current time and date on a file. If the above file does not exist, it will create the new empty file with that name. The usage of this command can be done as follows:
```
touch -a [file-name]
```
- -c –> Using the -c option with touch command avoids creating new files. The usage of this command can be done as follows:
```
touch -c [file-name]
```
- -d –> update the access and modification times.
- -m –> Its used to change the last updated modification time. The usage of this command can be done as follows:
```
touch -m [file-name]
```
- -r –> use the access and modification times of file. The usage of this command can be done as follows:
```
touch -r [file-name1] [file-name2]
```
- -t –> creates a file using a specified time. Its pattern is shown in the example below.
```
touch -t YYMMDDHHMM.SS [file-name]
```

Whenever if you have been looking for a particular string or pattern in a file, yet have no idea where to start looking for then there is a command you can make use of. It is called grep. grep is a powerful file pattern searcher that comes equipped on every distribution of Linux. If at all it is not installed on your system, you can easily install it via your package manager. The command we make use of for this purpose is as shown below.
```
sudo apt-get install grep
```
grep is a very useful command to find text or data on the fly when other commands are being used. For example, suppose we want to find a specific file from a ls command, we can use grep. Notice what the following example is doing.

ls -l | grep 'metric'

history command produce us with the history. When some one wants to know about the previously used commands then this command can be used
```
training@ubuntu:~$ history
1  ls
2  cd Downloads/
3  ls
4  cd ..
5  cd
6  cd Downloads/
7  ls
8  cd
9  cd ../..
:
```

cal shows us the calender

training@ubuntu:~$ cal
   November 2014      
Su Mo Tu We Th Fr Sa  
               1  
 2  3  4  5  6  7  8  
 9 10 11 12 13 14 15  
16 17 18 19 20 21 22  
23 24 25 26 27 28 29  
30

apt command allows us to automatically and intelligently search, install, update and resolves dependency of packages on Gnu/Linux system from command line.
```
apt (advanced package tool)
```

locate Command is used to search and show the files and directories in every new line. This can be used as follows:
```
locate [file/directory]
```
This produces the output by giving the details of that file and directory.

free command shows the information about the RAM used and available to be used. This enables the user to manage the files. The usage of free command is shown below:

training@ubuntu:~$ free
         total       used       free     shared    buffers     cached
Mem:       1010460     902112     108348       7548     126016     212432
-/+ buffers/cache:     563664     446796
Swap:      1046524      14064    1032460
training@ubuntu:~$

passwd command that enables the user to change the password. When the user feels the present password is too short then he can change it using this command. The process for using this command and changing the password is shown below:

We can connect multiple commands together with what are called pipes represented with the symbol |. With pipes, the standard output of one command is fed into the standard input of another.
```
[me@linuxbox me]$ ls -l | book
```
In this example, the output of the ls command is fed into book. By using the “| less” , you can make any command have scrolling output

> symbol is used to send information somewhere from one place to another. The following example gets the head of file1.txt and creates a new file2.txt to save the output.

  head file1.txt > file2.txt

This will concatenate the files together into one big file named “file1 and file2.txt”.

The more detailed examples are shown in the cat command

exit command that is used to come out of the terminal
```
exit
```

Stopping data science virtual machine

Never close the VMWare machine with the x button as we generally do for closing other things, instead always shut down properly to save all the work done without any failure.

You can find the complete documentation of all the things we have done now, in the Data Science Tools folder which is on the desktop of the lab machines. You can open the LinuxCommands.html file for the complete documentation.