Introduction to Multithreading in Python

Introduction to Multithreading in Python

Got slow python code?

Don't want to refactor to some other "low-level" language?

Want to impress your peers?

We got all that here!!


Now that you are actually here, today we will cover ways to optimize our Python code specifically using Multithreading and asyncio.

In this article, you will learn:

  • Different methods to speed up python
  • What concurrency and parallelism are
  • How to choose the appropriate speed up method
  • How to use the Python asyncio library

Prerequisites

This article assumes that you know the basics of Python and have at least Python 3.6 installed to run the examples.

How to Speed Up Python Code?

Let's imagine you got slow python code.

What to do in this situation?

First of all, make sure to profile your code and get raw data for you to compare with before applying optimizations.  

Second, if your code encompasses some business logic, make sure that they are well tested because we don't want our optimizations breaking things.

Thirdly, try to optimize the code itself. Things such as using proper algorithms and data structures.

You can check out similar optimizations here:

How to Make Python Code Run Incredibly Fast - KDnuggets
In this article, I have explained some tips and tricks to optimize and speed up Python code.

If it's still not enough and you don't want to refactor your code into a faster low-level language then let's optimize our code to use more of the hardware.

But what does that actually mean?

In layman's terms, it's multitasking.

But there are two different ways of doing that.

Concurrency

Source: https://luminousmen.com/post/concurrency-and-parallelism-are-different

Concurrency is fake multitasking, meaning that you don't run things simultaneously. You instead take turns and hence making it look like you're running things simultaneously. But you might ask what do I "run" exactly? It differs based on which technique you use.

They are threads and processes.

If you look at it from a high-level perspective, they are all the same. They are simply blocks of code waiting to be run. But if you dig deeper,  you would find that they are very different.

Source: https://sites.google.com/site/sureshdevang/thread-vs-process

A process is an instance of a running program with all its code, memory, data and other resources. While a thread is a sequence of code that is executed within the scope of a process. You can have multiple threads running in a single process hence multithreaded programming.

Multithreading in Python

Different techniques of concurrency in Python. Source: Adapted from Anderson 2019 

Multithreading in Python is somewhat "different" because of the Python Global Interpreter Lock or GIL.

GIL allows Python to have one running thread at a time. Meaning that CPU bound operations would see no benefit from multithreading in Python.

On the other hand, if your bottleneck comes from Input/Output (IO) then you would benefit from multithreading in Python.

But there are two ways to implement multithreading in Python:

But what's the difference between the two?

The threading library creates actual OS-level threads, but only one can be used at a time due to Python's GIL. On the other hand, asyncio uses the concept of coroutines which are much more lightweight than threads. They take less memory, and it takes much less time to switch between coroutines. However you need to program specifically for asyncio and use libraries that leverage asyncio as well. Threading is less scalable, but you get to keep your "old" libraries and style of programming.

In general:

Use asyncio when you can, threading when you must.

Parallelism

Source: https://avishkabalasuriya980330.medium.com/python-multiprocessing-for-beginners-cde6bc520217

Parallelism is true multitasking, meaning that you are literally running processes simultaneously.  This is done using multiprocessing, where you use multiple CPU cores to distribute tasks accordingly. This doesn't "break" up the code into parts, each core has a complete running copy of your program.  

So which method do I choose?

Like everything on the internet, it depends. But it's pretty simple figuring out which method to use:

  • If you got a CPU bound problem, then you would benefit from running multiple cores, hence the Python multiprocessing library will help.
  • If you got an IO-bound problem then use the asyncio or threading library if asyncio is not compatible.

Asyncio Code Example

Using asyncio is pretty simple, if you ever used async/await in JavaScript, it's almost syntactically the same.

Let's start with a classic, a simple hello world program:

import asyncio

async def main():
	print('hello')
	await asyncio.sleep(1)
	print('world')

asyncio.run(main())

We declared async before our main function to tell Python that this is an asynchronous function.  Inside the main function, we declare await before our sleep method to tell Python to wait till the sleep function finishes.

Finally, we call the function using the run method that asyncio provides.

If we run the program, we get this response:

hello
world

This is a pretty simple example, let's look at something more "real-world"

Let's imagine you were tasked to scrape a website, but using a synchronous python web client is pretty slow. Doing it asynchronously is much faster!

We will use the aiohttp library to use its asynchronous HTTP client.

import aiohttp
import asyncio

async def main():

    async with aiohttp.ClientSession() as session:
        async with session.get('http://python.org') as response:

            print("Status:", response.status)
            print("Content-type:", response.headers['content-type'])

            html = await response.text()
            print("Body:", html[:15], "...")

asyncio.run(main())

These are the bare basics of asyncio, if you want to learn more I would recommend checking out this article and tech talk.

Async python in real life 🐍🔀
Await Async Python applied with real examples. I show a slow API server and a slow database, and explain why async is not parallel but concurrent....
Advanced Asyncio: Solving Real World Production Problems
By building a simplified chaos monkey service, we will walk through how to create a good foundation for an asyncio-based service, including graceful shutdown...

Benefits and Downsides of Multithreading

Everything in software is relative, meaning that there are pros and cons to everything and it's up to us as software engineers to decide whether technology is useful for our use case.

Multithreading in Python is no different.

Let's start with the benefits.

If it's an IO-bound problem, multithreading will significantly improve performance.

That's pretty much it?

Well, asynchronous programming is a lot different from sequential programming. For some domains, it might be very beneficial to switch over to asynchronous programming, for others not so much.

The downside of multithreading is that it makes stuff a lot more complicated and when things get complicated, it gets harder to maintain. Another big thing is that it's hard to test due to flakiness and hard to debug.

At the end of the day, it all depends on your use case, so think wisely before committing.  

Conclusion

Speeding up Python code can be a painful experience.

But at least now you know a trick or two on how to speed things up.

Today you learned:

  • Multithreaded programming is when the program utilizes multiple threads to improve performance.
  • Due to Python's GIL, it's impossible to use multiple threads at once, that is why multithreading is regarded more as asynchronous programming in Python.
  • If you got a CPU bound problem, then your best bet is to use multiprocessing which bypasses the GIL.
  • If you got an IO problem, then use the asyncio library.
  • Asynchronous programming is pretty complicated, so think a lot before committing to use it.

I hope you enjoyed this article, if you got any questions feel free to reach out to me on Twitter.

Thanks for reading

Member discussion