Manually dealing with threads and processes is useful if you want to build a framework or a very complex workflow. But chances are you just want to run stuff concurrently, in the background.
In that case (which is most people case), you really want to use one of the stdlib pools: it takes care of sync, serialization, communication with queues, worker life cycle, task distribution, etc. for you.
Plus in such case, waiting for results is super easy:
import time
import random
from concurrent.futures import ThreadPoolExecutor, as_completed
# the work to distribute
def hello():
seconds = random.randint(0, 5)
print(f'Hi {seconds}s')
time.sleep(seconds)
print(f'Bye {seconds}s')
return seconds
# max concurrency is 2
executor = ThreadPoolExecutor(max_workers=2)
# submit the work
a = executor.submit(hello)
b = executor.submit(hello)
# and here we wait for results
for future in as_completed((a, b)):
print(future.result())
Want multiple process instead ? It's the same API:
import time
import random
from concurrent.futures import ProcessPoolExecutor, as_completed
def hello():
seconds = random.randint(0, 5)
print(f'Hi {seconds}s')
time.sleep(seconds)
print(f'Bye {seconds}s')
return seconds
# Don't forget this for processes, or you'll get in trouble
if __name__ == "__main__":
executor = ProcessPoolExecutor(max_workers=2)
a = executor.submit(hello)
b = executor.submit(hello)
for future in as_completed((a, b)):
print(future.result())
This is Python. Don't make your life harder that it needs to be.
This example still involves a lot of manual work. It's often times even easier.
from concurrent.futures import ProcessPoolExecutor
import string
def hello() -> int:
seconds = random.randint(0, 5)
print(f'Hi {seconds}s')
time.sleep(seconds)
print(f'Bye {seconds}s')
return seconds
# Don't forget this for processes, or you'll get in trouble
if __name__ == "__main__":
inputs = list(string.printable)
results = []
# You can sub out ProcessPool with ThreadPool.
with ProcessPoolExecutor() as executor:
results += executor.map(hello, inputs)
[print(s) for s in results]
The code needs some minor changes (imports, missing parameter and indent) to make it runnable.
from concurrent.futures import ProcessPoolExecutor
import string
import random
import time
def hello(output) -> int:
seconds = random.randint(0, 5)
print(f'Hi {output} {seconds}s')
time.sleep(seconds)
print(f'Bye {seconds}s')
return seconds
# Don't forget this for processes, or you'll get in trouble
if __name__ == "__main__":
inputs = list(string.printable)
results = []
# You can sub out ProcessPool with ThreadPool.
with ProcessPoolExecutor() as executor:
results += executor.map(hello, inputs)
[print(s) for s in results]
EDIT: I also struggle with proper code indentation on HN.
Quintessential HN, top comment totally disapproves the posted article. Thanks. I was just having an issue with this and was excited to see some gains with the link here and then to see there's an even better way is terrific!
If you wouldn't mind going into a little more detail about what you're doing I'd really appreciate it!
That is true. Python has better ways to deal with concurrency. As I wrote elsewhere in comments, I started reading asyncio. But found that for a newbie this article is good to grasp basic concepts.
Only if you really need to. For 99% of network needs, using pool executors is simpler and easier than asyncio and it's one less only-sort-of-useful thing to have to learn.
It should make at least the most common use case of asyncio way more easier.
The biggest problem is that I have yet seen a tutorial that explains properly how to use asyncio.
They all talk about the loop, and future, etc..
First, they all should tell you that asyncio should be used only with Python 3.7+. Don't even bother before. Not that it's not possible, but I've done it, and it's not worth the trouble.
Then, all tutorials should mention wait() or gather(), which are the most important functions of the whole framework. It kills me to never see those explained.
With just that knowledge, you can script happily at least as easily as with the pools I just had show case.
Now, I really hope that we are going to get trio's nurseries imported in the stdlib. yury selivanov is working on it from uvloop, so I got good hopes.
I did a proof of concept as an asyncio lib and it works decently, but having a standard would be much, much better.
I don't. I have a huge knowledge base I store on my computers with those kind of things. E.G: this snippet is actually almost verbatim a file I have on my laptop I wrote months ago and kept so I don't have to rewrite that again.
Manually dealing with threads and processes is useful if you want to build a framework or a very complex workflow. But chances are you just want to run stuff concurrently, in the background.
In that case (which is most people case), you really want to use one of the stdlib pools: it takes care of sync, serialization, communication with queues, worker life cycle, task distribution, etc. for you.
Plus in such case, waiting for results is super easy:
Want multiple process instead ? It's the same API: This is Python. Don't make your life harder that it needs to be.