open(file, mode='r', ...)
:
Returns a file object for the specified file
and mode
.
The keyword with
tells Python to automatically close the file when we're finished.
with open("myinfo", "r") as my_file:
# do something
A few of the possible modes:
Character | Mode | Description |
r
| Read | Default mode used if a mode isn't specified. If the file does not exist, this raises an error. |
w
| Write | If the file does not exist, this mode creates it. If file already exists, this mode deletes all data and writes new data from the beginning of the file. |
a
| Append | If the file does not exist, this mode creates it. If file already exists, this mode appends data to end of existing data. |
The CSV module has support for writing files (docs)
A CSV is essentially a list of lists, with each list representing a row in the CSV.
import csv
my_list = [
["Day", "Breakfast", "Lunch", "Dinner"],
["Mon", "Oatmeal", "Salad", "Pasta"],
["Tues", "Waffles", "Soup", "Tacos"],
["Wed", "Cereal", "Sandwich", "Pizza"]
]
# open a new or existing file for writing
with open('fun_spreadsheet.csv', 'w') as my_file:
# prepare the CSV for writing
writer = csv.writer(my_file)
# write data from a list to the CSV file
writer.writerows(my_list)
We can also read files with the CSV module, using nested for loops to get values from rows and cells.
import csv
with open('fun_spreadsheet.csv') as my_file:
csv_reader = csv.reader(my_file, delimiter=',')
for row in csv_reader:
print(row)
for cell in row:
print(cell)
Use csv.DictReader to read a CSV file with headers
# import the csv module
import csv
# open the file
with open('fun_spreadsheet.csv') as my_file:
# load the contents of the CSV file into a variable
csv_reader = csv.DictReader(my_file)
# CSV is a dict with column headers as keys!
# now we can loop through the rows in the file
# and get the values from specific columns
for row in csv_reader:
print(row)
Type | Description |
---|---|
Text | Contains one or more lines that contain text characters,
encoded according to a character encoding (like UTF-8 or ASCII).
Each file ends with a control character:
|
Binary | Any file that is not a text file. The bytes are typically
intended to be interpreted as something other than a text file,
according to the file extension. e.g. images (GIF/JPG/PNG/), audio files (WAV/MP3), video files (MOV/MP4), compressed files (ZIP/RAR). |
The Python standard library has support for parsing:
It also supports zip and tar archives
Many 3rd party modules are available in PyPi!
pdf, docx, Google docs, markdown, etc
Use the built-in file.write()
method to write to a text file:
Write items in a list to a new file:
authors = ["Ursula K. Le Guin", "N. K. Jemisin", "Octavia E. Butler"]
with open("authors.txt", "w") as file:
for author in authors:
file.write(author + "\n")
Note that we have to add a line ending to each item - Python doesn't add it automatically
Add lines to an existing file:
more_authors = ["Toni Morrison", "Zora Neale Hurston"]
with open("authors.txt", "a") as file:
for author in more_authors:
file.write(author + "\n")
Use the built-in read()
method to
receive the entire contents of the file as a string.
with open("authors.txt", "r") as file:
authors = file.read()
print(authors)
Use the readlines()
method to
receive the entire file as a list of strings (one for each line).
with open("authors.txt", "r") as file:
authors = file.readlines()
print(authors)
What happens if we try to write to a file in a directory (folder) that doesn't exist?
authors = ["Ursula K. Le Guin", "N. K. Jemisin", "Octavia E. Butler"]
with open("data/authors.txt", "a") as file:
for author in authors:
file.write(author + "\n")
FileNotFoundError: [Errno 2] No such file or directory: 'data/authors.txt'
The OS module is part if the standard Python library and contains loads of functions for interacting with a computer's operating system, including:
Create a directory using os.makdir()
import os
os.mkdir('./data')
# now we can create a new file inside our dir
authors = ["Ursula K. Le Guin", "N. K. Jemisin", "Octavia E. Butler"]
with open("data/authors.txt", "a") as file:
for author in authors:
file.write(author + "\n")
Check what's inside a directory using os.listdir().
This function returns a list of filenames as strings.
import os
files = os.listdir('./data')
for file in file:
print(file)
The os.path module is also part if the standard Python library and contains handy functions for getting information about files, including:
If we try to create a directory that already exists, we get an error.
It's a good practice to check whether a directory exists before creating a new one using os.path.exists()
os.path.exists() returns True
or False
import os
if not os.path.exists('./data'):
os.mkdir('./data')
Sometimes you need to get just file names or extensions. Use os.path.splitext() to return a list with 2 items: the file name and the extension
import os
files = os.listdir('./data')
for file in file:
split = os.path.splitext(file)
name = split[0]
print(name)
extension = split[1]
print(extension)
We often need to pass bits of data into a script, so that the script can serve
different use cases without needing to edit the code in the script.
Ex, we may want to run a script that processes files on different directories.
To do this, we use command-line arguments that we can customize for a given script.
python main.py -d my_dir
The argparse module is part of the standard Python library and contains functions that allow accepting and processing command line arguments, as well as providing help text to users.
add_argument() is used to define a new argument.
A name is required, and additional parameters can also be set, such as:
import argparse
# initialize the argument parser
parser=argparse.ArgumentParser()
# define an argument
parser.add_argument('-d', '--directory',
type=str,
required=True,
help='The directory that report files are located in')
Note: 1 argument can have multiple names. It's common to support both a short -d
and long --directory
name version.
-
and --
before short and long names are not required, but they are common convention in CLI commands.
parse_args() is used to extract values passed in as arguments
import argparse
# initialize the argument parser
parser=argparse.ArgumentParser()
# define an argument
parser.add_argument('-d', '--directory',
type=str,
required=True,
help='The directory that report files are located in')
# parse the arguments
args=parser.parse_args()
# get a specifc argument value
directory = args.directory
The Replit Run button does the equivalent of typing python main.py
into a command line interface (CLI).
We can access the CLI in Replit by switching the right panel from Console to Shell
Let's update our quarterly report exercise to get the monthly report files automatically.