Profiling Python with cProfile

habitue · on March 8, 2014

I'd say you had it right the first time with the list comprehension. List comprehensions in python are way faster than while/append loops because they're implemented in C. The issue is that you had the "not in" which is O(n) over a list (making the comprehension O(n^2) ). If the documents are hashable, I'd suggest making results a set. "not in" is constant time on a set. If a document is something not hashable, you can make results a dictionary and get the same constant time access.

In python function/method calls are expensive, so avoid them at all costs inside tight loops.

js2 · on March 8, 2014

If a document is something not hashable, you can make results a dictionary

Huh? Dictionary keys have to be hashable. Oh, maybe you mean coming up with a hashable key for each result, then just taking the values at the end. Something like:

    # A set of all the documents
    all_docs = dict(doc.key, doc for doc in dictionary.all_docs())

    # A set of the documents matching `operand`
    results = set(doc.key for doc in get_results(operand, dictionary, pfile, force_list=True))

    return [all_docs[key] for key in all_docs if key not in results]

(This loses the "sorted" property the author has in the original comments. If that's important, just make all_docs an OrderedDict.)

habitue · on March 8, 2014

Yeah, that was what I was talking about. Without typing up an example its cumbersome to explain. But the idea is use a constant time access data structure. The easiest would be a set, but if (as I suspect) the documents are dicts, you'd need to lookup by a hashable key.

Thanks for giving the example

js2 · on March 8, 2014

It may be faster to change the last line to:

    return [all_docs[key] for key in set(all_docs) - results]

m_ke · on March 8, 2014

Thumbs up to this guy.

Function calls in tight loops are a killer. Property and global lookups are also "much" less efficient than stuff in the local scope.

pjscott · on March 8, 2014

For fairly small values of "implemented in C". In CPython 2.7.5, the list comprehension

    [doc for doc in all_docs if doc not in results]

generates this bytecode (output by the dis module):

          0 BUILD_LIST               0
          3 LOAD_GLOBAL              0 (all_docs)
          6 GET_ITER
    >>    7 FOR_ITER                24 (to 34)
         10 STORE_FAST               0 (doc)
         13 LOAD_FAST                0 (doc)
         16 LOAD_GLOBAL              1 (results)
         19 COMPARE_OP               7 (not in)
         22 POP_JUMP_IF_FALSE        7
         25 LOAD_FAST                0 (doc)
         28 LIST_APPEND              2
         31 JUMP_ABSOLUTE            7
         34 (whatever comes next in the program)

Let's go through it. Instruction 0 (BUILD_LIST) makes an empty list and puts it on the stack. Instruction 3 (LOAD_GLOBAL) does a global variable lookup and puts it on the stack. (If this occurs in a function it would be a local variable lookup, but this isn't important.) Then we start iterating over an iterable with GET_ITER and FOR_ITER. In the body of the loop we use STORE_FAST to set the topmost stack value to a local variable "doc", then LOAD_FAST to grab its value and put it atop the stack, then LOAD_GLOBAL to get "results", then evaluate the "not in" with a single COMPARISON_OP bytecode, then either go to the beginning of the loop, or load doc onto the stack again and then append it to the list we're building. And then go to the beginning of the loop.

Now, let's compare with the obvious non-list-comprehension version:

    docs = []
    for doc in all_docs:
        if doc not in results:
            docs.append(doc)

This generates some similar bytecode:

         0 BUILD_LIST               0
         3 STORE_FAST               0 (docs)

         6 SETUP_LOOP              42 (to 51)
         9 LOAD_GLOBAL              0 (all_docs)
        12 GET_ITER
   >>   13 FOR_ITER                34 (to 50)
        16 STORE_FAST               1 (doc)

        19 LOAD_FAST                1 (doc)
        22 LOAD_GLOBAL              1 (results)
        25 COMPARE_OP               7 (not in)
        28 POP_JUMP_IF_FALSE       13

        31 LOAD_FAST                0 (docs)
        34 LOAD_ATTR                2 (append)
        37 LOAD_FAST                1 (doc)
        40 CALL_FUNCTION            1
        43 POP_TOP
        44 JUMP_ABSOLUTE           13
        47 JUMP_ABSOLUTE           13
   >>   50 POP_BLOCK
   >>   51 (whatever comes next in the program)

The main difference here is that there's a lot more loading and saving of variables (e.g. the "docs" variable which was implicit in the list comprehension version), and things like the docs.append() method have to be loaded each time through the loop and then called, which slows things down some compared to using a single LIST_APPEND opcode.

I'm not really going anywhere with this; I just think that interpreter internals are fun.

habitue · on March 8, 2014

Nice! I knew comprehensions were faster, but apparently was mistaken as to why.

fijal · on March 8, 2014

er. "as long as you use CPython", yes. If you use PyPy (which you really should for such an example), there are no such restrictions.

pekk · on March 8, 2014

I'm glad that there wasn't a "complain about the GIL" or "write everything in Go because Python is too slow" step in this optimization workflow

nwenzel · on March 8, 2014

Very helpful to have a UI for digging through profiler results. I use the django debug toolbar but I think your profiler will be incredibly helpful for finding bottlenecks. Thanks for posting, thanks for sharing.

jbaiter · on March 8, 2014

Then you might really enjoy runsnakerun: http://www.vrplumber.com/programming/runsnakerun/

tgb · on March 8, 2014

Kernprof and lineprof are also helpful.

RogerL · on March 8, 2014

You'll perhaps take this as snark, but I am not going to read light gray text on a white background with a poorly rendered font (windows 7, firefox 27, 47yo eyes).

nwenzel · on March 8, 2014

For less snark and more helpfulness, consider removing "I am not going to read". I think you'll be happier and less likely to be perceived as snarky.

mdaniel · on March 8, 2014

FWIW, I've had good luck with using the "select all" keyboard shortcut in those circumstances. I believe that you have an additional advantage in that Firefox has an option(1) to deactivate their stylesheets (one at a time, IIRC) which can also get you back into normal styling without a great deal of effort.

1= or it used to but without it in front of me I can't promise it still does

ymichael · on March 8, 2014

Ahh sorry about that. I'm like the most terrible UI person. ever... thanks for the feedback. made the text darker.