This document is prepared for the data science lab users who are taking the class Introduction to Data Science at University of Nebraska at Omaha. The class website can be visited here. For convenience, a Linux virtual machine is created where all the data science tools are configured for training purpose. You will need two things to be able to use this resource.
The virtual machine is configured with the following tools
This document provides detailed instructions on how to work on the data science virtual machine. There will be seperate documentation on how to work with data science tools. If you have any question, please contact the course instructor Dr. Mahbubul Majumder.
Enter into the lab machine by giving the password training123
Then we open up the VMWare player where we can use the LINUX commands in the terminal window of that machine. For that we click on the image on your task bar.
We then get the window giving us the options to select the required virtual machine we wanted to use. If you don’t find the virtual machine you are looking for in the list, you may need to open your machine from the option Open a Virtual Machine
To select a virtual machine we should double click on the name which opens the virtual machine. Open the datascienceVM
Now you need to enter the password of the virtual machine to open and use it. The password is training123
Now you are in the datascienceVM
Virtual Machine where you will find all the commands on the left side task bar.
Now open the terminal window by giving a click on it. The terminal window looks like the following image.
ls
command is used to display the contents that are present in your current working directory. When ever anyone wants to know the files and directories that are present in the current directory that he is working in then he uses this command. This is one of the most important and commonly used commands in the LINUX operating system.
training@ubuntu:~$ ls
Desktop Downloads mathlab Pictures R test456
Documents examples.desktop Music Public Templates Videos
There are several options while using ls
command. Now if you provide ls -l
the output will display information about the files. The following example presents us a more clear picture of the usage of the command.
training@ubuntu:~$ ls -l
total 52
drwxr-xr-x 2 training training 4096 Oct 16 17:09 Desktop
drwxr-xr-x 2 training training 4096 Oct 16 17:09 Documents
drwxr-xr-x 3 training training 4096 Nov 4 12:18 Downloads
pwd
[present working directory] as the name suggests it displays the pathname of the directory we are currently working in. This shows the exact place where we are performing the operations
training@ubuntu:~$ pwd
/home/training
cd
command enables us to move from one place to another. The mostly and highly used command for each and every purpose is the cd
command.
training@ubuntu:~$ cd Downloads/
training@ubuntu:~/Downloads$ ls
hadoop-1.2.1 hadoop-1.2.1.tar.gz jdk-8u25-linux-x64.tar.gz rstudio-0.98.1087-amd64.deb
training@ubuntu:~/Downloads$ cd hadoop-1.2.1/
training@ubuntu:~/Downloads/hadoop-1.2.1$ ls
bin conf hadoop-client-1.2.1.jar hadoop-test-1.2.1.jar lib NOTICE.txt src
build.xml contrib hadoop-core-1.2.1.jar hadoop-tools-1.2.1.jar libexec README.txt webapps
c++ docs hadoop-examples-1.2.1.jar ivy LICENSE.txt sbin
CHANGES.txt hadoop-ant-1.2.1.jar hadoop-minicluster-1.2.1.jar ivy.xml logs share
training@ubuntu:~/Downloads/hadoop-1.2.1$
vi
command is used to create a file. Whenever you wish to create a file then follow this syntax. Below is an example showing the syntax of creating any text
file or a C
file.
vi [filename].txt which creates a Text file
vi [filename].c which creates a C file.
C
file you extension should be .c
ps
command displays the snapshot of currently running process. This also enables one to know much more information about the currently running process.
training@ubuntu:~$ ps
PID TTY TIME CMD
2903 pts/13 00:00:00 bash
4483 pts/13 00:00:00 ps
man
command that displays the help information of any particular commands we use. Lets see an example giving the command man mkdir
which gives us the information about that command.MKDIR(1) User Commands MKDIR(1)
NAME
mkdir - make directories
SYNOPSIS
mkdir [OPTION]... DIRECTORY...
DESCRIPTION
Create the DIRECTORY(ies), if they do not already exist.
Mandatory arguments to long options are mandatory for short options too.
-m, --mode=MODE
set file mode (as in chmod), not a=rwx - umask
-p, --parents
no error if existing, make parent directories as needed
-v, --verbose
:
mkdir
[make directory] is used to create a new directory. We create any new directory with the command mkdir [directory_name]
and use this new directory to store certain files which belong to that.
For example we create a directory called “JAVA” and check if its created or not by using the mkdir
command.
training@ubuntu:~$ mkdir java
training@ubuntu:~$ ls
Desktop Downloads java Music Public Templates Videos
Documents examples.desktop mathlab Pictures R test456
training@ubuntu:~$ cd java
training@ubuntu:~/java$ ls
training@ubuntu:~/java$
By the above example we can confirm that the directory JAVA is created when we use the mkdir
command to create it.
rmdir
[remove directory] command removes any directory. If we want to delete any particular directory then we have to use rmdir [directory_name]
command and the directory name should be the name of the directory that should be deleted.
Below is an example which shows us the working of mkdir
command.
training@ubuntu:~$ ls
Desktop Documents Downloads examples.desktop java mathlab Music Pictures Public R Templates test456 Videos
training@ubuntu:~$ rmdir java
training@ubuntu:~$ ls
Desktop Documents Downloads examples.desktop mathlab Music Pictures Public R Templates test456 Videos
training@ubuntu:~$
rmdir
command to delete the unused directories.For example if we use this command from the JAVA directory and then we want to delete JAVA directory then this command does not work
mv
command is used to move a file or directory from one place to another. The syntax will be the command followed by the source i,e the file to be moved to the destination i,e where should the file be moved
mv [source] [destination]
cp
Command helps us to copy the files or any content from one place to another. This copies the content from the source to destination and keeps the content in both the places unlike mv
command which removes the content from the source folder. Below is an example of how to use the cp
command:
cp [source] [destination]
example:
cp NOTICE.txt NOTICE2.txt
This example copies the contents in the NOTICE.txt file into NOTICE2.txt. Here NOTICE.txt is the source and NOTICE2.txt is the destination. We can find all the data which is in the NOTICE.txt file in the NOTICE2.txt when we open the NOTICE2.txt using the command as:
vi NOTICE2.txt
cp
command. Some of them are:
-i [interactive]
command prompts before overriding the destination file. Whenever a destination file is being overridden then it prompts us with a question as shown in the below example. Our answer should be Y
if you want the destination file to be overridden or N
if you don’t want it to be overridden.
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -i README.txt NOTICE.txt
cp: overwrite 'NOTICE.txt'? n
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt
-b [backup]
command acts as a back up but it does not accept any arguments
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -b NOTICE.txt NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt
-r [recursive]
command is used to copy the directories recursively.
training@ubuntu:~/Downloads/hadoop-1.2.1$ mkdir 123
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -r hadoop-core-1.2.1.jar 123
training@ubuntu:~/Downloads/hadoop-1.2.1$ cd 123
training@ubuntu:~/Downloads/hadoop-1.2.1/123$ ls
hadoop-core-1.2.1.jar
-n [no clobber]
command is used to prevent the data from overriding
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -n NOTICE.txt NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt
-f [force]
command is used when an existing destination can’t be opened it removes it and tries to open it again-s [symbolic-links]
command is used to make symbolic links to a new file instead of just copying it. Now the file 123.txt will have the same contents as that of NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -s NOTICE2.txt 123.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi 123.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ ls
123 c++ docs hadoop-examples-1.2.1.jar ivy LICENSE.txt NOTICE.txt src
123.txt CHANGES.txt hadoop-ant-1.2.1.jar hadoop-minicluster-1.2.1.jar ivy.xml logs README.txt webapps
bin conf hadoop-client-1.2.1.jar hadoop-test-1.2.1.jar lib NOTICE2.txt sbin
build.xml contrib hadoop-core-1.2.1.jar hadoop-tools-1.2.1.jar libexec NOTICE2.txt~ share
-t [target-directory]
command is used to copy all the source arguments into the directory.
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -t 123 NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ vi NOTICE2.txt
training@ubuntu:~/Downloads/hadoop-1.2.1$ cd 123
training@ubuntu:~/Downloads/hadoop-1.2.1/123$ ls
hadoop-core-1.2.1.jar NOTICE2.txt
-v [verbose]
command is used to describe what action is actually being done. For example we are copying the contents of hadoop-core-1.2.1.jar
to another directory 123
, these are the following operations we perform
training@ubuntu:~/Downloads/hadoop-1.2.1$ cp -v hadoop-core-1.2.1.jar 123
'hadoop-core-1.2.1.jar' -> '123/hadoop-core-1.2.1.jar'
sudo
command allows a permitted user to execute a command as the superuser or another user. By default, sudo requires users to authenticate themselves with password. Below is an example of how we use the sudo
command.
training@ubuntu:~$ sudo mkdir directory
[sudo] password for screenshot:
training@ubuntu:~$ ls
Desktop directory Documents Downloads examples.desktop Music Pictures Public R Templates Videos
directory
with the sudo
command. The immediate step is the prompt for the password.ls
command which produces us with the files and directories as discussed abovekill
is a command that is used in several popular operating systems to send signals to running processes in order to request the termination of the process. Below are some examples of different kill
commands:
killall
killall -sigkill [process-name]
pkill
command is same as killall
but the only thing is pkill
command allows the use of extended regular expression patterns and other matching criteria. Below is an example:
pkill -9 unity
Here the command kills all the processes that start with the name unity
.pkill
command can also be used to kill usernames. For example
pkill -9 -u USERNAME
kill -l
command lists the kill
signalstraining@ubuntu:~/Downloads/hadoop-1.2.1$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX
head
command reads the first ten lines of a any given file name. The basic syntax of head command is:
head [file-name]
head
commandLINUX/UNIX@ubuntu:~/Downloads/hadoop-1.2.1$ head LICENSE.txt
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
head
command on multiple files there is an option for that too. You can just give the head
command followed by filenames for which you want to read the lines. Below is an example of using it.hadoopinstall@ubuntu:~/Downloads/hadoop-1.2.1$ head LICENSE.txt ivy.xml
==> LICENSE.txt <==
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
==> ivy.xml <==
<?xml version="1.0" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
LICENSE.txt
and ivy.xml
file. Its displayed as shown above.hadoopinstall@ubuntu:~/Downloads/hadoop-1.2.1$ head -15 build.xml
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
tail
command allows use to view the last 10 lines of any file. It has the same working as head
command but the only difference is the head
command displays the first 10 line where as the tail
command creates the last 10 lines. Everything else remains the same including the options. The working of tail
command is as shown below:
tail [filenames]
tar
is used to Tape Archive i,e it allows us to convert a large group of files into an archive. This enables us to move the archived file as a single unit which makes the usage easier as every thing will be present at one place.who
command displays who has logged on into the system. The output will be displayed as follows if the command is given as who
training@ubuntu:~$ who
screenshot :0 2014-11-11 16:15 (:0)
screenshot pts/13 2014-11-11 16:15 (:0)
training@ubuntu:~$
cat
command enables us to view the contents in those files. When we write the command cat [file-name]
it gives us the whole contents present in that file.
cat LICNSE.txt
. We are produced with the output :training@ubuntu:~/Downloads/hadoop-1.2.1$ cat LICENSE.txt
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
:
cat
command. Some of them are:It is used to concatenate many number of files together. Here is the example:
hadoopinstall@ubuntu:~$ echo 'Hello' > 1
hadoopinstall@ubuntu:~$ echo 'Welcome to' > 2
hadoopinstall@ubuntu:~$ echo 'Data Science Lab' > 3
hadoopinstall@ubuntu:~$ cat 1 2 3 > 4
hadoopinstall@ubuntu:~$ cat 4
Hello
Welcome to
Data Science Lab
cat
command is also used to copy the files from source to desination
hadoopinstall@ubuntu:~$ vi make.txt
//(data in the file) i can study the use of CAT command
hadoopinstall@ubuntu:~$ cat make.txt > make1.txt
hadoopinstall@ubuntu:~$ cat make1.txt
i can study the use of CAT command
clear
Command allows us to clears the screen. If we want to clear the screen after it becomes filled up with the commands then you can simply give the command clear
which clears the screen by moving all the commands to the upper page. We won’t loose the previous commands and we can also have those by moving the scroll upwards.
clear
rm
command deletes the existing files. It’s similar to the earlier command rmdir
which deletes the directories and the rm
command is used to delete the files. Below is the syntax we use to implement the rm
command:
rm [file-name]` the [file-name] is the name of the file that should be deleted.
Example of using the rm
command can be seen below:
training@ubuntu:~$ ls
create.c Desktop directory Documents Downloads examples.desktop Music Pictures Public R Templates Videos
training@ubuntu:~$ rm create.c
training@ubuntu:~$ ls
Desktop directory Documents Downloads examples.desktop Music Pictures Public R Templates Videos
top
command displays the resources being used by your system. The example of using the top
command can be seen below. When we intend to stop just enter the letter qhadoopinstall@ubuntu:~$ top
top - 13:45:05 up 2:04, 2 users, load average: 0.23, 0.14, 0.09
Tasks: 312 total, 2 running, 310 sleeping, 0 stopped, 0 zombie
%Cpu(s): 28.9 us, 11.3 sy, 0.0 ni, 42.5 id, 16.9 wa, 0.3 hi, 0.0 si, 0.0 st
KiB Mem: 1010460 total, 940180 used, 70280 free, 68588 buffers
KiB Swap: 1046524 total, 17692 used, 1028832 free. 338844 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3861 hadoopi+ 20 0 592080 19156 9480 S 13.7 1.9 0:00.55 unity-scope-loa
2481 hadoopi+ 20 0 1187328 67072 24388 S 11.1 6.6 0:34.82 compiz
1190 root 20 0 315072 41816 11404 S 5.9 4.1 0:13.11 Xorg
:
In Linux every single file is associated with timestamps, and every file stores the information of last access time, last modification time and last change time. So, whenever we create new file, access or modify an existing file, the timestamps of that file automatically updated. The touch
is one such command for Unix/Linux operating systems, that is used to create, change and modify timestamps of a file. Let us now look at the usage of touch command.
touch [file-name1] [file-name2] [file-name3]
touch
command. Now let us have a look into them
-a –> Its used to change or update the last access and modification times of a file. This command sets the current time and date on a file. If the above file does not exist, it will create the new empty file with that name. The usage of this command can be done as follows:
touch -a [file-name]
-c –> Using the -c option with touch command avoids creating new files. The usage of this command can be done as follows:
touch -c [file-name]
-d –> update the access and modification times.
-m –> Its used to change the last updated modification time. The usage of this command can be done as follows:
touch -m [file-name]
-r –> use the access and modification times of file. The usage of this command can be done as follows:
touch -r [file-name1] [file-name2]
-t –> creates a file using a specified time. Its pattern is shown in the example below.
touch -t YYMMDDHHMM.SS [file-name]
Whenever if you have been looking for a particular string or pattern in a file, yet have no idea where to start looking for then there is a command you can make use of. It is called grep
. grep is a powerful file pattern searcher that comes equipped on every distribution of Linux. If at all it is not installed on your system, you can easily install it via your package manager. The command we make use of for this purpose is as shown below.
sudo apt-get install grep
grep is a very useful command to find text or data on the fly when other commands are being used. For example, suppose we want to find a specific file from a ls command, we can use grep. Notice what the following example is doing.
ls -l | grep 'metric'
history
command produce us with the history. When some one wants to know about the previously used commands then this command can be used
training@ubuntu:~$ history
1 ls
2 cd Downloads/
3 ls
4 cd ..
5 cd
6 cd Downloads/
7 ls
8 cd
9 cd ../..
:
cal
shows us the calender
training@ubuntu:~$ cal
November 2014
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
apt
command allows us to automatically and intelligently search, install, update and resolves dependency of packages on Gnu/Linux system from command line.
apt (advanced package tool)
locate
Command is used to search and show the files and directories in every new line. This can be used as follows:
locate [file/directory]
This produces the output by giving the details of that file and directory.
free
command shows the information about the RAM used and available to be used. This enables the user to manage the files. The usage of free
command is shown below:
training@ubuntu:~$ free
total used free shared buffers cached
Mem: 1010460 902112 108348 7548 126016 212432
-/+ buffers/cache: 563664 446796
Swap: 1046524 14064 1032460
training@ubuntu:~$
passwd
command that enables the user to change the password. When the user feels the present password is too short then he can change it using this command. The process for using this command and changing the password is shown below:We can connect multiple commands together with what are called pipes
represented with the symbol |
. With pipes, the standard output of one command is fed into the standard input of another.
[me@linuxbox me]$ ls -l | book
In this example, the output of the ls command is fed into book. By using the “| less” , you can make any command have scrolling output
>
symbol is used to send information somewhere from one place to another. The following example gets the head of file1.txt and creates a new file2.txt to save the output. head file1.txt > file2.txt
This will concatenate the files together into one big file named “file1 and file2.txt”.
cat
commandexit
command that is used to come out of the terminal
exit
Never close the VMWare machine with the x
button as we generally do for closing other things, instead always shut down properly to save all the work done without any failure.