Using Linux from the command line
Files and directories in Linux
The file systems in Linux machines are based on a hierarchical directory tree. There is one root directory which you can refer to with the forward slash symbol (/). All the files and directories are located in the subdirectories of this directory, so that each file has a unique combination of a name and a directory path. Also, the commands that the user gives are executed in the directory that the user currently is. It is called the current working directory.
Normally, you do not need to know the explicit directory paths when you work in the CSC environment. It is enough to know the locations of the files in the user's own disk areas. The user- and the project-specific disk areas are presented in chapter 3. However, you should remember that many disk areas in the CSC environment can be accessed from several different servers (e.g., from the user's home directory) while some areas are server-specific. In the case of shared disk areas, the path to a certain file may be different on different servers.
Structure of Linux commands
Once the terminal connection to CSC, e.g., to [Puhti-shell], has been opened, the remote server is operated by using Linux commands. The standard structure of a command is:
command -options argument1 argument2 ...
The command is executed by pressing the return key (Enter). The names and functions of the options and the arguments depend on the Linux command. In many cases, you can run the command without any options and arguments. Options are used to modify the actions that the command performs. Arguments are used to define the files, directories and values that are used as input parameters and to define where the output is written.
For example, the command ls
can be used as such or with several options
and arguments. Running a plain command ls
lists the content of a
directory in alphabetical order. You can modify the output of the
command, for example, by using the option -t
. With this option, the
directory content list is ordered according to the age of the file (timestamp).
If no argument is given, ls
prints the content of the current
working directory. By giving an argument to the ls
command, the user can
define a directory whose content should be listed. For example, the command
ls -t /scratch/project_2979797
will list the content of directory /scratch/project_2979797. In case there aer errors in the command, the arguments or the options, the command will not be executed when the return key is pressed. Instead, an error message is printed to the screen. Thus, having errors in the command often does not cause any major problems. Issues may arise when using undefined shell variables. The output of commands vary from one command to the other but in many cases, no output means that command was successfully executed.
Most of the Linux commands have their own manual page that can be
studied using the man
command. For example the manual page of ls
command is shown using the command
`man ls`
Manual pages can be very detailed and technical. However, often you do not need to read and understand all the details given there, but instead, you can just see what command-line options are available for the particular command, which are of interest to you and then start testing/using them in practice.
There are thousands of Linux commands, though, you do not need to know
the majority of them in order to get started.
Below, we introduce the most frequently used ones. You can also use
the command apropos
to find a suitable command.
Apropos lists those Linux commands whose short description
lines match the text that is given as a command argument. For example,
if you want to look for commands that are processing pdf files, you
could type
`apropos pdf`
Note that the listing that apropos prints includes only Linux
commands but not program names. Thus, the sample command
above would produce a list that contains many pdf conversion commands but
no pdf viewers such as acroread
or evince
.
Basic commands for using directories
The table below lists the commands that are most frequently used for moving in the directory hierarchy and managing it. Below are some examples of directory related commands.
Basic directory commands
Name | Argument | Description |
---|---|---|
cd | directory | Change current working directory |
ls | directory | List the content of a directory |
pwd | Print the directory path of current working directory | |
mkdir | directory | Create a new directory |
rmdir | directory | Remove a directory |
When you log in to a server at CSC, you will be taken to your home
directory. You can check your location, i.e., the path of the current
working directory with the command pwd
(abbreviation from Print
current Working Directory). However, you do not have to remember the
location of your home directory (see the cd
command).
The content of the directory can be listed with the ls
command.
The plain ls
command just lists the names of the files and directories in
your current directory. You can get more information about the files and
directories with command ls -la
. The -l
option produces a long
directory listing that, in addition to the name, contains also information
about the access permissions, the size and the modification time of files and
directories. The option -a
defines that all files, including also
hidden files that start with dot (.) character, are listed. Below
is a sample output for ls -la
command:
kkayttaj@c305:~>ls -la
total 26914
drwx------+ 3 kkayttaj csc 10 Dec 22 09:12 .
drwxr-xr-x 20 root root 0 Dec 22 09:12 ..
drwx------+ 42 kkayttaj csc 472 Dec 22 09:07 ..
-rwxr-x---+ 1 kkayttaj csc 1648 Dec 22 09:01 .cshrc
-rw-------+ 1 kkayttaj csc 93 Dec 22 09:01 .my.cnf
-rw-------+ 1 kkayttaj csc 48 Dec 22 09:05 Test.txt
-rw-------+ 1 kkayttaj csc 878849 Jan 19 2009 input.table
drwxr-xr-x+ 2 kkayttaj csc 2 Dec 22 09:11 project1
-rw-------+ 1 kkayttaj csc 26432051 Dec 22 09:08 results.out
-rw-------+ 1 kkayttaj csc 25 Mar 27 2009 sample.data
-rw-------+ 1 kkayttaj csc 49 Mar 27 2009 test.txt
The first output row: total 26914 tells that the total size of the
files in the directory is 26914 KB. In the list, the first character
tells if the item is a file (-) or a directory(d). The next nine
characters display the access permissions of the files (see the
chmod
command for more details). The next columns show the number of
links pointing to the item, owner, user group, size in bytes,
modification time and the name of the file or directory.
By default, files are presented in an alphabetical order. You can order
the results by modification time with the option -t
or by size with the
option -S
(note: uppercase S, not lowercase s). Two other
frequently used options are -h
(Human readable) which prints out the
sizes of large files in megabytes or gigabytes, and -r
which means
reverse sorting order. For example, the command:
ls -ltrh
is very handy when you want to check what files have recently been
modified or created. The ls
and pwd
commands do not modify files in
any way so you can use them always when you want to know where you are
and what files your current directory contains.
The command cd
directory_name moves you from the current
directory to a directory you specified. For example, the user kkayttaj
could go to his local temporary directory using the command
cd /local_scratch/kkayttaj
or
cd $TMPDIR
In the latter command, the automatically defined environment variable $TMPDIR that contains the explicit directory path is used to define the target directory.
New directories can be created with the command mkdir
directory_name. For example, the command:
mkdir project1
Creates a new directory called project1. You can use the ls
command to
check that the directory was created. Now you can go to this directory
with the command
cd project1
You can come back from the project1 directory using the command
cd ..
Note the space between cd
and the dots in the command. One dot (.
) and
two dots (..
) have a special meaning in Linux. One dot (.
)
means the current directory, and two dots (..
) mean the directory that
is one level up in the directory tree, i.e., the directory where the current
directory resides. Executing the cd
command without any arguments will
always move you back to your home directory, regardless of where you are in
the directory tree. An empty directory can be removed with the command
rmdir
directory_name. For example:
`rmdir project1`
Basic commands for files
On a fundamental level, a file in a Linux system is just a string of bytes, were a byte consists of eight bits. So-called text files contain only bytes that can be interpreted as text characters using the ASCII encoding rules. Thus, these files can be considered as consisting of lines of text. In the so-called binary files, also non-ASCII bytes are used, and they cannot and are not intended to be convertible to readable text. Typical examples of binary files are compiled programs, images or compressed files. Normally, users work mostly with text files, and also in the examples of this guide, we normally assume that the files contain some kind of text data: letters or numbers.
Each file has a name. The name can, in principle, be any combination of
characters. However, several characters have a special meaning, e.g.,
?
, *
and #
, see below and
thus, using these characters in filenames may cause problems. We
recommend that you only use ordinary letters (lower- or uppercase),
numbers, dots (.
), dashes (-
) or underscores (_
) in
file- and directory names. Also, using space characters in filenames
may cause problems. We recommend that space characters are
replaced by underscores, for example, new_file.txt instead of new file.txt . Note that Linux
is case-sensitive: lower- and uppercase characters are not considered
equal unlike in Windows computers, and, e.g., the names New_File.txt and new_file.txt refer to
different files.
In Linux, the usage of filename extensions such as .doc or .txt is not obligatory. Most Linux tools do not require an extension to be specified. However, in the long run, using systematic naming conventions, including illustrative name extensions, makes file management easier.
File-processing commands:
Name |
Arguments |
Description |
---|---|---|
cat | file_names |
Print the content of the specified file of files to the standard output (your screen) |
chmod | file_names |
Change the access permissions of a file |
cp | file_name1 file_name2 |
Copy the file content to a new file or new location |
grep | search_string file_name |
Pick from the file the rows that contain a specific string |
head | file_name |
Print the first rows of the file to the standard output (your screen) |
less | file_name |
Show the content of a file one screenfull at a time |
more | file_name |
Show the content of a file one screenfull at a time |
ln | file_name1 link_name |
Create a link to a file |
mv | file_name1 file_name2 |
Rename a file or move it to another location |
rm | file_names |
Remove files |
tail | file_name |
Show the last rows of a file |
You can explore the content of text files using the commands cat
,
more
and less
. These commands are safe to use as
they do not modify the files in any way. For example, you could read the
.bashrc file in your home directory with the commands
cat .bashrc
more .bashrc
less .bashrc
The cat
command (abbreviation from concatenate) prints the content of
the specified file or files to the standard output which by default
means your screen. The pager programs less
and more
are often more
useful tools for studying text files as they allow the user to see the
content of the file one screenfull at a time. In both programs, more
and less
, you
can move forward one line at a time by pressing the Return key or one
screenfull at a time by pressing Space. You can exit from the file preview
by pressing q
.
The less
pager program is more advanced than more
. For example, less
can browse
text also backwards, either one row at a time by pressing k
or
one screenfull at a time by pressing b
. You can also search a text
string from the document by using slash (/
) character. For example,
to locate the string ABC in a file opened with less
, type /ABC
and the press Return. Actually, the man
command uses less
as its pager.
The commands head
and tail
can be used to see just the
first and the last rows of a file, respectively. By default, these commands
print 10 lines, but you can change this by giving the number of rows
to be printed as an argument to a head
or tail
command. For example,
to check the last 30 rows of a file called run1.log, give the command
tail -30 run1.log
Copying files to a new file or to another directory is done using the command
cp
(copy). Below you can find two examples of copy commands:
cp output.dat output_copy.dat
cp output1.dat output2.dat results/
The first command makes a copy of file output.dat to a new file called
output_copy.dat. In the second example, the files output1.dat
and output2.dat are copied to an existing directory called results.
The command mv
(move) is used to rename or move a file to
another location. For example:
mv output.dat output_copy.dat
mv output1.dat output2.dat results/
would create the same new files as the cp
example commands. However, in the
case of mv
, the original files output.dat, output1.dat and
output2.dat would be removed from the current working directory.
Files are removed using the command rm filename
. By default, the file is deleted immediately.
If you would like to first confirm file deletion, you can add a parameter to the command: rm -i
.
To make this behaviour permanent, you can set an alias in your .bashrc:
alias rm="rm -i"
Note that after editting your .bashrc file, you need to use the source ~/.bashrc
command or open a new shell.
After that, the rm
command will always ask the user to confirm that he/she really wants to remove the
file:
kkayttaj@c305:~>rm output_copy.dat
rm: remove output_copy.dat (yes/no)?
You can answer y (yes) or n (no). Note that this confirmation step
is not necessary in use in your local Linux environment or currently in Puhti.
You can skip
the confirmation query using the option -f
, standing for force. However, you should use this
option with caution as the rm
command will remove the file immediately
and permanently.
Special characters
Some characters have special functions in Linux. In the following paragraphs, we present the characters that are used for redirecting standard input and output or used as the so-called wildcard characters.
The $
sign, that serves as an indicator of a variable name, the #
symbol that is used to place comments, and the different kind of quotation
marks are discussed later on in the linux scripting chapter.
Commonly used special characters
Character | Function |
---|---|
$ | Indicates a shell or environmental variable |
| | Pipes standard output to the standard input of the next command |
# | Starts a comment |
& | Executes a process in the background |
? | Matches one character |
* | Matches any string (including an empty string) |
> | Output redirection operator |
< | Input redirection operator |
>> | Output redirection operator (to append to a file) |
\ | ignore the possible special function of the following character |
Wildcard characters
In Linux, a question mark (?
) and an sterisk (*
) are used as the so-called
wildcard characters. They can be used to define arguments that
match many files or directories. When given as command arguments, the
?
sign is interpreted as any single character and *
sign as any
string of characters. For example,
ls test?.input
would produce a list of files that has a name: testcharacter.input. Thus, the files with names testA.input and test4.input would be listed, but filenames like test10.input or testOld.input would be ignored. Instead, the command
ls test*.input
would list all of the files mentioned above as *
matches any string.
Here, the only limitations would be that the command must start with the
string test and end with string .input.
Redirecting standard input and output
The characters less than (<
), greater than (>
), >>
and pipe (|
) are used to control the standard input and output. The
less than symbol (<
) instructs the command to read data
from a file defined after that < character instead of reading from the keyboard.
The greater than character ('>') would redirect the output of a command to a new file instead of the display (standard output). For example, the command
ls test*.input > input_files
would produce a new file called input_files that would contain the
names of files that start with the string test and end with .input.
Using two greater than signs with no space between them (>>
)
would append the results of a command to the end of an existing file or,
if the file does not exist yet, direct the output to a new file with the specified filename.
The pipe character (|
) redirects the output of the command
as input for the next command. In this way, you can
combine several Linux commands into a command chain. For example, if your
file listing does not fit to one screen, you could redirect it to less
so that you can browse it one screenfull at a time.
This kind of redirecting could be done like this:
ls -l | less
As another example, we could use the grep
command as a postprocessor for
the output of the ls
command and pick, e.g., those files that have been created on
August 14th:
ls -l | grep "Aug 14"
Further practice
We recommend that you look up examples of the practical usage of Linux commands online. There are lots of resources showing efficient use of the toolset that comes with Linux, so that you can save your time and produce neat results faster.
Defining own aliases is one of the ways of speeding up your work, and they are
recommended. However, sometimes there might be name clashes and some commands
might end up behaving unexpectedly. We have the tool called csc-env
which
shows how your environment differs from the default one. For more information,
check this documentation entry.