Introduction to Multithreading in Python
Got slow python code?
Don't want to refactor to some other "low-level" language?
Want to impress your peers?
We got all that here!!
Now that you are actually here, today we will cover ways to optimize our Python code specifically using Multithreading and
In this article, you will learn:
- Different methods to speed up python
- What concurrency and parallelism are
- How to choose the appropriate speed up method
- How to use the Python
This article assumes that you know the basics of Python and have at least Python 3.6 installed to run the examples.
How to Speed Up Python Code?
Let's imagine you got slow python code.
What to do in this situation?
First of all, make sure to profile your code and get raw data for you to compare with before applying optimizations.
Second, if your code encompasses some business logic, make sure that they are well tested because we don't want our optimizations breaking things.
Thirdly, try to optimize the code itself. Things such as using proper algorithms and data structures.
You can check out similar optimizations here:
If it's still not enough and you don't want to refactor your code into a faster low-level language then let's optimize our code to use more of the hardware.
But what does that actually mean?
In layman's terms, it's multitasking.
But there are two different ways of doing that.
Concurrency is fake multitasking, meaning that you don't run things simultaneously. You instead take turns and hence making it look like you're running things simultaneously. But you might ask what do I "run" exactly? It differs based on which technique you use.
They are threads and processes.
If you look at it from a high-level perspective, they are all the same. They are simply blocks of code waiting to be run. But if you dig deeper, you would find that they are very different.
A process is an instance of a running program with all its code, memory, data and other resources. While a thread is a sequence of code that is executed within the scope of a process. You can have multiple threads running in a single process hence
Multithreading in Python
Multithreading in Python is somewhat "different" because of the Python Global Interpreter Lock or GIL.
GIL allows Python to have one running thread at a time. Meaning that CPU bound operations would see no benefit from multithreading in Python.
On the other hand, if your bottleneck comes from Input/Output (IO) then you would benefit from multithreading in Python.
But there are two ways to implement multithreading in Python:
But what's the difference between the two?
threading library creates actual OS-level threads, but only one can be used at a time due to Python's GIL. On the other hand,
asyncio uses the concept of coroutines which are much more lightweight than threads. They take less memory, and it takes much less time to switch between coroutines. However you need to program specifically for
asyncio and use libraries that leverage
asyncio as well. Threading is less scalable, but you get to keep your "old" libraries and style of programming.
asynciowhen you can,
threadingwhen you must.
Parallelism is true multitasking, meaning that you are literally running processes simultaneously. This is done using
multiprocessing, where you use multiple CPU cores to distribute tasks accordingly. This doesn't "break" up the code into parts, each core has a complete running copy of your program.
So which method do I choose?
Like everything on the internet, it depends. But it's pretty simple figuring out which method to use:
- If you got a CPU bound problem, then you would benefit from running multiple cores, hence the Python
multiprocessinglibrary will help.
- If you got an IO-bound problem then use the
asynciois not compatible.
Asyncio Code Example
asyncio is pretty simple, if you ever used
Let's start with a classic, a simple hello world program:
import asyncio async def main(): print('hello') await asyncio.sleep(1) print('world') asyncio.run(main())
async before our
main function to tell Python that this is an asynchronous function. Inside the main function, we declare
await before our sleep method to tell Python to wait till the sleep function finishes.
Finally, we call the function using the run method that
If we run the program, we get this response:
This is a pretty simple example, let's look at something more "real-world"
Let's imagine you were tasked to scrape a website, but using a synchronous python web client is pretty slow. Doing it asynchronously is much faster!
We will use the
aiohttp library to use its asynchronous HTTP client.
import aiohttp import asyncio async def main(): async with aiohttp.ClientSession() as session: async with session.get('http://python.org') as response: print("Status:", response.status) print("Content-type:", response.headers['content-type']) html = await response.text() print("Body:", html[:15], "...") asyncio.run(main())
These are the bare basics of
asyncio, if you want to learn more I would recommend checking out this article and tech talk.
Benefits and Downsides of Multithreading
Everything in software is relative, meaning that there are pros and cons to everything and it's up to us as software engineers to decide whether technology is useful for our use case.
Multithreading in Python is no different.
Let's start with the benefits.
If it's an IO-bound problem, multithreading will significantly improve performance.
That's pretty much it?
Well, asynchronous programming is a lot different from sequential programming. For some domains, it might be very beneficial to switch over to asynchronous programming, for others not so much.
The downside of multithreading is that it makes stuff a lot more complicated and when things get complicated, it gets harder to maintain. Another big thing is that it's hard to test due to flakiness and hard to debug.
At the end of the day, it all depends on your use case, so think wisely before committing.
Speeding up Python code can be a painful experience.
But at least now you know a trick or two on how to speed things up.
Today you learned:
- Multithreaded programming is when the program utilizes multiple threads to improve performance.
- Due to Python's GIL, it's impossible to use multiple threads at once, that is why multithreading is regarded more as asynchronous programming in Python.
- If you got a CPU bound problem, then your best bet is to use multiprocessing which bypasses the GIL.
- If you got an IO problem, then use the
- Asynchronous programming is pretty complicated, so think a lot before committing to use it.
I hope you enjoyed this article, if you got any questions feel free to reach out to me on Twitter.
Thanks for reading