Instructor Notes: Python Essentials for GIS Learners Day 1 (Unix Shell)¶
This lesson is an introduction to using the Unix shell and the Bash scripting language to create and navigate through files and directories (folders). It is an alternative to “clicking” and provides some significant advantages in documenting processes, automating tasks, and enabling reproducible workflows.
The second part of the lesson runs a script called map.py that draws a simple map using the Turtle graphics package. Finally, learners are introduced to Jupyter Lab and using Jupyter notebooks to write and compile code.
The lesson material on Unix Shell is based upon the Software Carpentries lesson on The Unix Shell, the Introduction to the Command Line for Genomics lesson from Data Carpentries, and an Intro to the Shell workshop from softwarecarpentry.org.
The learning goals for this lesson are:
computers can be used to record and automate iterative tasks
scripts are a tool for reproducibility
code is reusable and will consistently produce the same result from the same data file
code can be used with different data files
JupyterLab and Jupyter notebooks are important tools for writing and sharing code
Participants using Windows OS will need to install Git for Windows following the instructions here before Day 1 begins.
The default shell in some versions of macOS is Bash, and Bash is available in all versions, so no need to install anything. You access Bash from the Terminal. The easiest way to find it is by using the magnifying glass symbol at the top right corner of your page and search “Terminal”. You should open it up and it’s a good idea to right click the icon at the bottom of your screen and choose Options > Keep in Dock.
Data and script for Day 1¶
Files for the Unix Shell lesson are contained in the
pre_workshop_files/ directory that participants should download and save this folder on their Desktop keeping the name
The shell is a program that enables us to send commands to the computer and receive output. It is also referred to as the terminal or command line.
Humans and computers commonly interact in many different ways, such as through a keyboard and mouse, touch screen interfaces, or using speech recognition systems. The most widely used way to interact with personal computers is called a graphical user interface (GUI). With a GUI, we give instructions by clicking a mouse and using menu-driven interactions.
While the visual aid of a GUI makes it intuitive to learn, this way of delivering instructions to a computer scales very poorly. Imagine the following task: copying image files one at a time from one location to another. Using a GUI, you would not only be clicking at your desk for several hours, but you could potentially also commit an error in the process of completing this repetitive task. This is where we take advantage of the Unix shell. The Unix shell is both a command-line interface (CLI) and a scripting language, allowing such repetitive tasks to be done automatically and fast. With the proper commands, the shell can repeat tasks with or without some modification as many times as we want. Using the shell, the task in the example can be accomplished in seconds.
The most popular Unix shell is Bash (the Bourne Again SHell — so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.
Using the shell will take some effort and some time to learn. While a GUI presents you with choices to select, CLI choices are not automatically presented to you, so you must learn a few commands like new vocabulary in a language you’re studying. However, unlike a spoken language, a small number of “words” (i.e. commands) gets you a long way, and we’ll cover some of those essentials today.
The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically. Sequences of commands can be written into a script, improving the reproducibility of workflows. The Bash commands, notations and tools taught in this lesson are:
cd- change directory
ls- list (files and directories)
ls -F- list files and directories with a /
mkdir- make directory
cat- show file contents
..- move “up one” directory level
pwd- print working directory
*- wildcard used to match filenames
>- used to save output from a command in a file
awk- used for pattern matching and processing
nano text editor
The Jupyter commands taught in this lesson are:
Step 1. Create a new directory and folder structure
Run your first Bash commands¶
Change Directory to Desktop: `$ cd Desktop
List files and directories on Desktop:
my_Desktop $ ls
Try a command error:
my_Desktop $ ks
Create the new directories¶
Make a workshop folder called
$ mkdir py_workshop_2021
Change directory to
$ cd py_workshop_2021
List contents of
$ ls(it should be empty)
Set up the project folder structure¶
Let’s set up a folder structure for our project that is generally useful for any project working with data and code. It’s good practice to make separate folders each for raw data, data outputs, scripts and documents in the context of any project you’re working on. For now, let’s create two new directories, data/ and scripts/ inside the
Make sure you are in the
Create two new directories,
scripts/inside the py_workshop_2021/ folder:
$ mkdir data scripts
Try on your own
Try creating two new folders for
mkdir documentation demos
ls -Fto verify that your new subdirectories were created.
Step 2: Move data and script files to the project folder\n”,¶
The data and scripts we want to use are located in the
pre_workshop_files/ folder that we sent out prior to this workshop and you should have saved on your Desktop. Let’s take a look at the contents of that folder by navigating there in the shell. If you are in your
py_workshop_2021/ folder, you can use a special notation that means “move up one level”: ..
Move back to the Desktop and directly into the
$ cd ../pre_workshop_files
Use ls to take a look at the files in the
Take a look at the contents of the documentation.txt file:
$ cat documentation.txt
Move files from the
pre_workshop_files/ folder to the
$ mv documentation.txt ../py_workshop_2021/documentation
lsto see that
documentation.txtis no longer in the
Note how we use the
../ notation to move one folder “up” from
pre_workshop_files/ to Desktop, and from there we can specify the exact path to the new folder. Note that we didn’t actually move there in the terminal - just our files did.
pre_workshop_files/ You should also see a file called
map.py - this extension lets us know that this is a Python script. We can
cat into it the same way we did to other files because it is simply a text file with a very particular syntax that Python can read and understand. We don’t know much else about it at this moment, and that’s ok - let’s just move it for now.
Try on your own
Move map.py to the right subfolder within the
$ mv map.py ../py_workshop_2021/scripts
Use wildcards to access multiple files at once¶
We have three .csv files in the
pre_workshop_files/ folder -
oregonmap.csv. The last two are both data files that can be run with map.py. We could move each file individually, or type both names in the
mv command, but there’s another way to do this that saves us time.
* is a wildcard, which matches zero or more characters. Let’s consider the
*.csv matches every file that ends with ‘.csv’. On the other hand,
o*.csv only matches oregonmap.csv because the ‘o’ at the front only matches filenames that begin with the letter ‘o’. Since we have one file airports_edit.csv that matches the extension but cannot be run with map.py, we want to avoid copying that. How could we do that?
You could move the two .csv data files from
$ mv *map.csv ../py_workshop_2021/data
But, we want all the .csv files to go in the data folder.
$ mv *.csv ../py_workshop_2021/data
Now, let’s check that everything worked the way we wanted it to. The
pre_workshop_files/ directory should now contain two folders called
qgis/. Let’s move these folders to the
py_workshop_2021/demos/ directories, respectively.
lsto check that the
pre_workshop_files/directory is empty as it should be.
Change directory to
Try on your own
Check in each of the subdirectories to see if your files are there.
ls -F to see the subdirectories, then
Step 3: Use nano to add information to the
Nano is a text editor that you can use directly in the command line. Let’s add some information about ourselves in the
Move into the
Use nano to open the
Add your name and information about this workshop to that file. Use the arrow keys to navigate and type directly in the editor.
Save the changes and write out the file: Control + O; Enter
Exit the nano editor: Control + X (you should be back in the terminal)
See that your changes were saved:
Counting the World’s Tallest Mountains¶
mountain_data/ has an assortment of files that describe the world’s 10 tallest mountains. We are going to demonstrate pipes and pattern matching to count and concatenate files.
cd into mountain_data: cd ../data/mountain_data
Check out what is there: ls (see mostly text files that follow a naming convention of mountainname-country.txt and two image files)
README.md file shows what is there in the dataset
cat into one of the files to see it: cat annapurna-np.txt
Use pattern matching to list all the files that end with -np.txt: ls *-np.txt
Count how many mountains are in Nepal: ls *-np.txt | wc -l
We know there are 10 mountains in this dataset which means that 8/10 of the tallest mountains on earth are in Nepal.
Saving the Nepalese tallest mountains in a file (concatenating)¶
Right now, the data for each mountain are stored in separate files. So we can see the list of mountains in Nepal, but we want one file with all mountains in Nepal. Use the awk command, for pattern processing and matching. The 1 flag automatically inserts a new line between each file.
See the list of mountains in Nepal: awk 1 *-np.txt
Save the list using “>”: awk 1 *-np.txt > nepal_mountains.txt