29 Feb 2020

Slow python imports

History / Edit / PDF / EPUB / BIB / 4 min read (~778 words)

Some imports in my python code are slow. How can I figure out which ones are the source of slowness?

Python offers a really useful functionality you can use that will list how long each import took. By passing the -X importtime argument to your python command when you execute your script it will print out both the cumulative time (including nested imports) and self time (excluding nested imports) of each import.

python -X importtime your-script.py

Running python -X importtime my-script.py on an empty script returns the following (on Windows 7, Python 3.7.5)

import time: self [us] | cumulative | imported package
import time:        52 |         52 | zipimport
import time:       367 |        367 | _frozen_importlib_external
import time:        55 |         55 |     _codecs
import time:       530 |        585 |   codecs
import time:       520 |        520 |   encodings.aliases
import time:      1107 |       2210 | encodings
import time:       328 |        328 | encodings.utf_8
import time:        39 |         39 | _signal
import time:       357 |        357 | encodings.latin_1
import time:        34 |         34 |     _abc
import time:       312 |        345 |   abc
import time:       474 |        819 | io
import time:       113 |        113 |       _stat
import time:       264 |        377 |     stat
import time:       269 |        269 |       genericpath
import time:       794 |       1062 |     ntpath
import time:       871 |        871 |     _collections_abc
import time:       915 |       3223 |   os
import time:       490 |        490 |   _sitebuiltins
import time:        57 |         57 |     _locale
import time:       934 |        991 |   _bootlocale
import time:       421 |        421 |   encodings.cp1252
import time:       472 |        472 |   types
import time:       394 |        394 |       warnings
import time:       440 |        834 |     importlib
import time:       263 |        263 |       importlib.machinery
import time:       554 |        816 |     importlib.abc
import time:        64 |         64 |           _operator
import time:       792 |        856 |         operator
import time:       343 |        343 |         keyword
import time:        43 |         43 |           _heapq
import time:       405 |        447 |         heapq
import time:        85 |         85 |         itertools
import time:       328 |        328 |         reprlib
import time:        69 |         69 |         _collections
import time:      1460 |       3585 |       collections
import time:        44 |         44 |         _functools
import time:       596 |        640 |       functools
import time:       784 |       5008 |     contextlib
import time:       651 |       7308 |   importlib.util
import time:      1095 |       1095 |   pywin32_bootstrap
import time:       231 |        231 |   sitecustomize
import time:     10744 |      24972 | site

For a script with a simple import argparse, I get the following output:

import time: self [us] | cumulative | imported package
import time:        70 |         70 | zipimport
import time:       341 |        341 | _frozen_importlib_external
import time:        54 |         54 |     _codecs
import time:       457 |        511 |   codecs
import time:       456 |        456 |   encodings.aliases
import time:      1030 |       1997 | encodings
import time:       215 |        215 | encodings.utf_8
import time:        38 |         38 | _signal
import time:       268 |        268 | encodings.latin_1
import time:        33 |         33 |     _abc
import time:       398 |        431 |   abc
import time:       311 |        741 | io
import time:        87 |         87 |       _stat
import time:       271 |        357 |     stat
import time:       196 |        196 |       genericpath
import time:       416 |        612 |     ntpath
import time:       714 |        714 |     _collections_abc
import time:       610 |       2292 |   os
import time:       229 |        229 |   _sitebuiltins
import time:        48 |         48 |     _locale
import time:       246 |        293 |   _bootlocale
import time:       217 |        217 |   encodings.cp1252
import time:       488 |        488 |   types
import time:       279 |        279 |       warnings
import time:       461 |        740 |     importlib
import time:       269 |        269 |       importlib.machinery
import time:       557 |        825 |     importlib.abc
import time:        63 |         63 |           _operator
import time:       808 |        871 |         operator
import time:       336 |        336 |         keyword
import time:        41 |         41 |           _heapq
import time:       336 |        376 |         heapq
import time:        69 |         69 |         itertools
import time:       341 |        341 |         reprlib
import time:        70 |         70 |         _collections
import time:      1136 |       3197 |       collections
import time:        69 |         69 |         _functools
import time:       642 |        710 |       functools
import time:       801 |       4708 |     contextlib
import time:       688 |       6959 |   importlib.util
import time:       934 |        934 |   pywin32_bootstrap
import time:       224 |        224 |   sitecustomize
import time:      9323 |      20954 | site
import time:       698 |        698 |     enum
import time:        55 |         55 |       _sre
import time:       417 |        417 |         sre_constants
import time:       372 |        789 |       sre_parse
import time:       443 |       1286 |     sre_compile
import time:       323 |        323 |     copyreg
import time:       716 |       3021 |   re
import time:       725 |        725 |     locale
import time:       948 |       1673 |   gettext
import time:       986 |       5678 | argparse

The package are listed in order that they are resolved. In argparse case, os and sys were already loaded, so it first loads re, then gettext. Once both are loaded, argparse has finished loading.

The way the cumulative column is computed is to take all the prior self that are a level higher than the package you're looking at. For example (if we take the io package):

import time:        33 |         33 |     _abc
import time:       398 |        431 |   abc
import time:       311 |        741 | io

311 + 398 + 33 = 742

We can see here that the numbers are not necessarily equal to one another, this might be due to precision used to do the computation while the rendering of numbers is rounded.

Note that the load time of a package may be different depending on which script you load because dependendencies of the package may have already been loaded in some cases, while in others it may have to load them.

Looking at text might be your thing, but if you're more visual, there's a tool called tuna which will consume this output and create an icicle plot you can look at to find which imports are the slowest/longest.

29 Feb 2020

Smarter people than you

History / Edit / PDF / EPUB / BIB / 1 min read (~174 words)

How can you tell when you've reached the top of your job and that you won't find smarter people?

When people that challenge you bring arguments which you've already considered. When the suggestions of others are stale or not intriguing. When you don't find excitement in your work. When you aren't challenged anymore. When the problems you are facing are not solved elegantly by others. When the problems you are facing have become so niche that few people on Earth may be able to discuss them or help you solve them.

28 Feb 2020

Determining if you are a low performer

History / Edit / PDF / EPUB / BIB / 2 min read (~320 words)

How can you tell if you are a low performer?

I always prefer to compare myself against my prior self and not against others. Thus, I would consider myself a low performer if my throughput is lower than what it has been on average in the past. This may happen for many reasons, amongst them it would be because I'm learning something new, so I'm spending a good chunk of my time on learning and less on executing. It might be because I'm trying different ideas to find the best one because I'm working on something I've never worked before.

It's generally easy for a programmer to tell whether he's been more or less productive than the prior week. It is mostly based on feelings, where you feel good when you are productive and less good when you're not making any progress or facing issues.

If you think and feel that you are performing poorly, start recording more thoroughly what you are working on. Identify when you start and finish working on a task, and when you get blocked, write down why. After a few weeks, look at what you wrote and assess what might cause you to feel that you are a low performer. Is it because you're working on a task you are not good at? Is it because of a lack of motivation on the task you've been assigned to?

With more information in hand to determine why you feel that you are a low performer, you will be able to devise a plan so that you can once again feel like a high performer.

28 Feb 2020

Pytest with tests files with similar names

History / Edit / PDF / EPUB / BIB / 2 min read (~248 words)

I have two test files with the same name and pytest complains. How do I make it work without changing the test filenames?

Example directory structure

/path/to/project/tests
├── a/
│   └── test_a.py
└── b/
    └── test_a.py

Error message:

import file mismatch:
imported module 'test_a' has this __file__ attribute:
  /path/to/project/tests/a/test_a.py
which is not the same as the test file we want to collect:
  /path/to/project/tests/b/test_a.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

Add a __init__.py to each directories with tests files that have the same name. Technically, you only to have a __init__.py file in one of the two directories, so that one is in a package while the other one is in a different one. Adding it in both simply prevents this issue from occurring again if you were to add a third file test_a.py.

/path/to/project/tests
├── a/
│   ├── __init__.py
│   └── test_a.py
└── b/
    ├── __init__.py
    └── test_a.py
27 Feb 2020

Identifying python files with no coverage

History / Edit / PDF / EPUB / BIB / 2 min read (~215 words)

I use pytest with coverage and I want to see the files that have no coverage.

It appears that pytest and pytest-cov will not list someof the files that are under namespace packages, while it will work fine for files in regular packages (see PEP 420 on the topic of implicit namespace packages).

To fix this problem, one solution is to add __init__.py files in all of your directories in order to create regular packages.

If you are using PyCharm Professional, you can simply run your test with coverage. This will allow you to identify all the files that have currently no coverage as they will appear with coverage = 0%.