More file operations & argument parsing

Tips for navigating the slides:
  • Press O or Escape for overview mode.
  • Visit this link for a nice printable version
  • Press the copy icon on the upper right of code blocks to copy the code

Class outline:

  • Review
  • Working with different types of files
  • Working with directories (the OS module)
  • Passing & parsing arguments

Review: CSV

Opening files for reading/writing

open(file, mode='r', ...): Returns a file object for the specified file and mode.

The keyword with tells Python to automatically close the file when we're finished.


                    with open("myinfo", "r") as my_file:
                        # do something
                    

open() modes

A few of the possible modes:

Character Mode Description
r Read Default mode used if a mode isn't specified. If the file does not exist, this raises an error.
w Write If the file does not exist, this mode creates it. If file already exists, this mode deletes all data and writes new data from the beginning of the file.
a Append If the file does not exist, this mode creates it. If file already exists, this mode appends data to end of existing data.

Writing files with CSV module

The CSV module has support for writing files (docs)

A CSV is essentially a list of lists, with each list representing a row in the CSV.


                        import csv
                        my_list = [
                                ["Day", "Breakfast", "Lunch", "Dinner"],
                                ["Mon", "Oatmeal", "Salad", "Pasta"],
                                ["Tues", "Waffles", "Soup", "Tacos"],
                                ["Wed", "Cereal", "Sandwich", "Pizza"]
                            ]
                        # open a new or existing file for writing
                        with open('fun_spreadsheet.csv', 'w') as my_file:
                            # prepare the CSV for writing
                            writer = csv.writer(my_file)
                            # write data from a list to the CSV file
                            writer.writerows(my_list)
                    

Reading files with CSV module

We can also read files with the CSV module, using nested for loops to get values from rows and cells.


                        import csv
                        with open('fun_spreadsheet.csv') as my_file:
                            csv_reader = csv.reader(my_file, delimiter=',')
                            for row in csv_reader:
                                print(row)
                                for cell in row:
                                    print(cell)
                    

Reading CSV files with headers

Use csv.DictReader to read a CSV file with headers


                        # import the csv module
                        import csv
                        # open the file
                        with open('fun_spreadsheet.csv') as my_file:
                            # load the contents of the CSV file into a variable
                            csv_reader = csv.DictReader(my_file)
                            # CSV is a dict with column headers as keys!
                            # now we can loop through the rows in the file
                            # and get the values from specific columns
                            for row in csv_reader:
                                print(row)
                    

Exercise review

More on files

Types of files

Type Description
Text Contains one or more lines that contain text characters, encoded according to a character encoding (like UTF-8 or ASCII). Each file ends with a control character:
  • Unix/Linux: Line feed (\n)
  • Mac: Carriage return (\r)
  • Windows: Both (\n\r)


Binary Any file that is not a text file. The bytes are typically intended to be interpreted as something other than a text file, according to the file extension.
e.g. images (GIF/JPG/PNG/), audio files (WAV/MP3), video files (MOV/MP4), compressed files (ZIP/RAR).

File formats

The Python standard library has support for parsing:

  • Plain text (.txt)
  • CSV (.csv)
  • JSON (.json)
  • HTML (.html)
  • HTML (.xml)

It also supports zip and tar archives

Many 3rd party modules are available in PyPi!
pdf, docx, Google docs, markdown, etc

Writing to text files

Use the built-in file.write() method to write to a text file:

Write items in a list to a new file:


                    authors = ["Ursula K. Le Guin", "N. K. Jemisin", "Octavia E. Butler"]
                    with open("authors.txt", "w") as file:
                        for author in authors:
                            file.write(author + "\n")
                    

Note that we have to add a line ending to each item - Python doesn't add it automatically

Writing to text files

Add lines to an existing file:


                    more_authors = ["Toni Morrison", "Zora Neale Hurston"]
                    with open("authors.txt", "a") as file:
                        for author in more_authors:
                            file.write(author + "\n")
                    

Reading a whole text file

Use the built-in read() method to receive the entire contents of the file as a string.


                    with open("authors.txt", "r") as file:
                        authors = file.read()
                        print(authors)
                    

Reading a text file line by line

Use the readlines() method to receive the entire file as a list of strings (one for each line).


                    with open("authors.txt", "r") as file:
                        authors = file.readlines()
                        print(authors)
                    

Working with directories
(the OS module)

What happens if we try to write to a file in a directory (folder) that doesn't exist?


                    authors = ["Ursula K. Le Guin", "N. K. Jemisin", "Octavia E. Butler"]
                    with open("data/authors.txt", "a") as file:
                        for author in authors:
                            file.write(author + "\n")

                    FileNotFoundError: [Errno 2] No such file or directory: 'data/authors.txt'
                    

OS module

The OS module is part if the standard Python library and contains loads of functions for interacting with a computer's operating system, including:

  • Creating directories
  • Renaming/deleting files & folders
  • Listing files in a directory
  • Setting file permissions
  • Getting and setting environment variables
  • Managing processes

Create directories

Create a directory using os.makdir()


                        import os
                        os.mkdir('./data')
                        # now we can create a new file inside our dir
                        authors = ["Ursula K. Le Guin", "N. K. Jemisin", "Octavia E. Butler"]
                        with open("data/authors.txt", "a") as file:
                            for author in authors:
                                file.write(author + "\n")
                    

List files in a directory

Check what's inside a directory using os.listdir().
This function returns a list of filenames as strings.


                        import os
                        files = os.listdir('./data')
                        for file in file:
                            print(file)
                    

OS.path module

The os.path module is also part if the standard Python library and contains handy functions for getting information about files, including:

  • Checking if a file/directory exists
  • Checking if a thing is a file or a directory
  • Extracting parts of a filepath, such as the parent directory, the filename or the fiel extension

Check if a file/directory exists

If we try to create a directory that already exists, we get an error.
It's a good practice to check whether a directory exists before creating a new one using os.path.exists()

os.path.exists() returns True or False


                        import os
                        if not os.path.exists('./data'):
                            os.mkdir('./data')
                    

Get the file extension

Sometimes you need to get just file names or extensions. Use os.path.splitext() to return a list with 2 items: the file name and the extension


                        import os
                        files = os.listdir('./data')
                        for file in file:
                            split = os.path.splitext(file)
                            name = split[0]
                            print(name)
                            extension  = split[1]
                            print(extension)
                    

Passing command-line arguments

What are arguments?

We often need to pass bits of data into a script, so that the script can serve different use cases without needing to edit the code in the script.
Ex, we may want to run a script that processes files on different directories.

To do this, we use command-line arguments that we can customize for a given script.


                        python main.py -d my_dir
                    

argparse module

The argparse module is part of the standard Python library and contains functions that allow accepting and processing command line arguments, as well as providing help text to users.

Definining arguments

add_argument() is used to define a new argument.
A name is required, and additional parameters can also be set, such as:

  • type automatically convert an argument to the given type (str, int, float, etc). Default is str.
  • required whether an argument is required or optional (True or False). Default is False.
  • help message for the script user about the argument
  • choices limit values to a specific set of choices (formatted as a list or range, ex ['foo', 'bar'])

Definining arguments


                        import argparse
                        # initialize the argument parser
                        parser=argparse.ArgumentParser()
                        # define an argument
                        parser.add_argument('-d', '--directory',
                                            type=str,
                                            required=True,
                                            help='The directory that report files are located in')
                    

Note: 1 argument can have multiple names. It's common to support both a short -d and long --directory name version. - and -- before short and long names are not required, but they are common convention in CLI commands.

Using argument values

parse_args() is used to extract values passed in as arguments


                        import argparse
                        # initialize the argument parser
                        parser=argparse.ArgumentParser()
                        # define an argument
                        parser.add_argument('-d', '--directory',
                                            type=str,
                                            required=True,
                                            help='The directory that report files are located in')
                        # parse the arguments
                        args=parser.parse_args()
                        # get a specifc argument value
                        directory = args.directory
                    

Passing arguments in Replit

The Replit Run button does the equivalent of typing python main.py into a command line interface (CLI).

We can access the CLI in Replit by switching the right panel from Console to Shell

Passing arguments in Replit

screenshot showing replit shell

Exercise

Let's update our quarterly report exercise to get the monthly report files automatically.