Pathlib: my new favorite module

October 24, 2018

Though pathlib was introduced in python 3.4 to some praise, I didn’t “get” it. Like many things in python, I needed some time to come around and tinker with it before I realized the power within. I can’t remember when pathlib started “clicking” for me, but I’m sure it was an accidental rediscovering of it via the Dash documentation application.More on Dash in an upcoming post

I’ve found it incredibly helpful in a number of ways, and I hope to pass along some helpful tips if you haven’t started using it yet.

You don’t have to think as hard

Pretty much every time I needed to configure media or static folders for Django or read some config files, I would have a mental block, frequently asking myself, “What are the os.path commands for getting the absolute path of a folder?”

I had to look it up every time, and it just wouldn’t stick, but this is usually how I would do it:

local_project_dir = os.path.abspath(os.path.dirname(__file__))

Now, pathlib allows me to find a folder in a similar way that I would through the command line:

local_project_path = Path(__file__, '..').resolve()

I also appreciate the flexibility pathlib gives you to do the same thing in a couple of ways, to give you freedom to find what works better for you.

All these return a path to the same place:


Path(__file__ , '..').resolve()
(Path(__file__) / '..').resolve()
Path(__file__).parent.absolute()

I find I don’t have to think as hard to process this syntax, as compared to the os.path syntax above.

Built-in conveniences

I also love how pathlib bundles actions into a Path . In particular, one doesn’t need to create a indented block of code to read its contents. I’m thankful to not have to think about whether I want to iterate over a file’s lines, or just read it in.

with open('readfile.txt') as f:
    data = f.read()

Now, you can read all the data in one linedocumentation :

data = Path('readfile.txt').read_text()

Don’t get me wrong, I think the open block is still very much needed, especially when you want to have fine control over when a file is open, or if you need to iterate over a large file. But when you just need to get the decoded contents,Or if you need the bytes, you can do that too. it sure is nice to have a quick method to do so.

A real-world example

Today, I found out that some of the images in one of my projects were not aligning with the content. I realized I could create a contact sheet of images to help us find the problem images. However, the images were named after the id of the content they were relating to, and there’s no easy way for a human to connect those ids to the product.

But we have an API that takes an id and returns the details of the product, including its name. So I was confident I could quickly whip up a little python script to convert the names of each photo to its product name.

from pathlib import Path
import requests


IMG_DIR = Path('C:/Users/chmay/Desktop/pdfcovers')


def id_to_name(id: str):
    url = f'https://stage.projectloc.net/api/v1/products/{id}'
    headers = {
        'accept': 'application/json'
    }
    r = requests.get(url, headers=headers)
    r.raise_for_status()
    return r.json()['name']


def main(path:Path):
    for item in path.glob('*.jpg'):
        id = Path(item).stem
        name = id_to_name(id)
        item.rename(Path(f'{IMG_DIR}/{name}.jpg'))


if __name__ == '__main__':
    main(IMG_DIR)

For those learning python, I’ll explain what’s going on.

The top imports the dependencies, in this case the Path object from pathlib , and the requests module that I installed into my project’s virtual environment.Reach out to me if this is hard to understand. Virtual environments are hard to understand when starting out… and many times after that.

Next, I’m setting a variable ( IMG_DIR ) that points to the folder the images are in. This script uses this variable as a default setting that I could overwrite it if I wanted to use it again somewhere else.

Following that is the function called id_to_name . It takes in the id of the product, uses requests to get the information, and returns the product’s name.

Next up is main , where pathlib shines.

  • I set up a loop to iterate over all the jpg files in the folderglob docs
  • Then get the id from the file namestem docs .
  • Pass that id in to the id_to_name function
  • And then rename the photorename docs .

Finally the last two lines of code are what kicks off the process, but only when I run this file from the command line like this:

python rename_images.py

It’s your turn!

I have shown this module to a number of people at work, and like me, none of them realized the greatness that lies within. So I encourage you to open up the documentation,Or if you’re on a Mac purchase Dash. a REPL or your favorite IDE, and mess around with it.

Thanks!

Thanks go to Trey Hunner, who sent out an email in praise of pathlib the other day, and it was a reminder that I hadn’t finished this entry.

← Read other articles