Numba is a powerful JIT(Just-In-Time) compiler used to accelerate the speed of large numerical calculations in Python.
It uses the industry-standard LLVM library to compile the machine code at runtime for optimization.
Numba enables certain numerical algorithms in Python to reach the speed of compiled languages like C or FORTRAN.
It is an easy-to-use compiler that has several advantages such as:
- Optimizing scientific code – Numba can be used along with NumPy to optimize the performance of mathematical calculations. For different types of numerical algorithms, arrays and layouts used, Numba generates specially optimized code for better performance.
- Use across various platform configurations – Numba is tested and maintained across 200 platform configurations. It offers great flexibility as the main code can be written in Python while Numba handles the specifics for compilation at runtime.
It supports Windows/Mac/Linux OS, Python 3.7-3.10, and processors such as Intel and AMDx86. - Parallelization – Numba can be used for running NumPy on multiple cores and to write parallel GPU algorithms in Python.
Python is used across a variety of disciplines such as Machine Learning, Artificial Intelligence, Data Science, etc., and across various industries such as finance, healthcare, etc.
Using large data sets is the norm in such disciplines and Numba can help address the slow runtime speed due to the interpreted nature of Python.
Installing Numba
You can install Numba using pip, run pip install numba
in your terminal.
In case you are using pip3 (with Python3), use the pip3 install numba
command.
All the dependencies required for Numba will also be installed with the pip install. You can also install it using conda, with conda install numba
.
In case you need to install Numba from the source, you can clone the repo with git clone git://github.com/numba/numba.git
and install it with the following command:
python setup.py install
Use Numba with Python
Numba exhibits its best performance when it is used along with NumPy arrays and to optimize constructs such as loops and functions.
Using it on simple mathematical operations will not yield the best potential for the library.
The most common way of using Numba with Python code is to use Numba’s decorators to compile your Python functions.
The most common of these decorators is the @jit
decorator.
There are two compilation modes in which Numba’s @jit
decorator operates. the nopython
mode and the object
mode.
nopython
mode can be used by setting the nopython
parameter of the jit
decorator True
.In this mode, the entire function will be compiled into machine code at run time and executed without the involvement of the Python interpreter.
If the nopython
parameter is not set to True, then the object
mode will be used by default.
This mode identifies and compiles the loops in the function at run time while the rest of the function is executed by the Python interpreter.
It is generally not recommended to use the object mode.
In fact, the nopython
mode is so popular that there is a separate decorator called @njit
which defaults to this mode and you don’t need to specify the nopython
parameter separately.
from numba import jit import numpy as np arr = np.random.random(size=(40,25)) @jit(nopython=True) #tells Python to optimize following function def numba_xlogx(x): log_x = np.zeros_like(x) #array to store log values for i in range(x.shape[0]): for j in range(x.shape[1]): log_x[i][j] = np.log(x[i][j]) return x * log_x arr_l = numba_xlogx(arr) print(arr[:5,:5],"\n") print(arr_l[:5,:5])
Output:
Recursion in Numba
Numba can be used with recursive functions where self-recursion is used with explicit type annotation for the function in use.
The below example demonstrates the Fibonacci series implementation using recursive call.
The function fibonacci_rec
calls itself and is a self-recursion function.
As Numba is currently limited to self-recursion, this code will execute without a hitch.
from numba import jit import numpy as np @jit(nopython=True) def fibonacci_rec(n): if n <= 1: return n else: return(fibonacci_rec(n-1) + fibonacci_rec(n-2)) num = 5 print("Fibonacci series:") for i in range(num): print(fibonacci_rec(i))
Output:
Running a mutual-recursion of two functions, however, is a bit tricky.
The code below demonstrates a mutual-recursion function. The function second
calls the function one
within its function body and vice-versa.
The type inference of function second
is dependent on the type inference of function one
and that of one
is dependent on the second
.
Naturally, this leads to a cyclic dependency and the type inference cannot be resolved as the type inference for a function is suspended when waiting for the function type of the called function.
This will thus throw an error when running with Numba.
from numba import jit import numpy as np import time @jit(nopython=True) def second(y): if y > 0: return one(y) else: return 1 def one(y): return second(y - 1) second(4) print('done')
Output:
It is, however, possible to implement a mutually recursive function in case one of the functions has a return statement that does not have a recursive call and is the terminating statement for the function.
This function needs to be compiled first for successful execution of the program with Numba or there will be an error.
In the code demonstrated below, as the function terminating_func
has the statement without a recursive call, it needs to be compiled first by Numba
to ensure the successful execution of the program.
Although the functions are recursive, this trick will throw no error.
from numba import jit import numpy as np @jit def terminating_func(x): if x > 0: return other1(x) else: return 1 @jit def other1(x): return other2(x) @jit def other2(x): return terminating_func(x - 1) terminating_func(5) print("done")
Output:
Numba vs Python – Speed comparison
The whole purpose of using Numba is to generate a compiled version of Python code and thus gain significant improvement in speed of execution over pure Python interpreted code.
Let us do a comparison of one of the code samples used above with and without Numba’s @jit
decorator in nopython
mode.
Let us first run the code in pure Python and measure its time.
from numba import jit import numpy as np arr = np.random.random(size=(1000,1000)) def python_xlogx(x): #the method defined in python without numba log_x = np.zeros_like(x) for i in range(x.shape[0]): for j in range(x.shape[1]): log_x[i][j] = np.log(x[i][j]) return x * log_x
We have defined the method, let’s now measure its time of execution
%%timeit -r 5 -n 10 arr_l = python_xlogx(arr)
Output:
Note that here we are using the %%timeit
magic command of Jupyter notebooks.
You can place this command at the top of any code cell to measure its speed of execution.
It runs the same code several times and computes the mean and standard deviation of the execution time.
You can additionally specify the number of runs and the number of loops in each run using the -r
and -n
options respectively.
Now let us apply Numba’s jit
to the same function(with different name) and measure its speed.
@jit(nopython=True) #now using Numba def numba_xlogx(x): log_x = np.zeros_like(x) #array to store log values for i in range(x.shape[0]): for j in range(x.shape[1]): log_x[i][j] = np.log(x[i][j]) return x * log_x
Time to call this function and measure its performance!
%%timeit -r 5 -n 10 arr_l = numba_xlogx(arr)
Output:
As can be seen from the two outputs above, while Python takes an average of 2.96s to execute the function code, the Numba compiled code of the same function takes just about 22ms on average, thus giving us a speed-up of more than 100 times!
Using Numba with CUDA
Most modern computation-intensive applications rely on increasingly powerful GPUs to parallelize their computations with the help of large memories on GPUs and get the results much faster.
For example, training a complex neural network that takes weeks or months on CPUs, can be accelerated with GPUs to do the same training in just a few days or hours.
Nvidia provides a powerful toolkit or API called ‘CUDA’ for programming on their GPUs.
Most of the modern Deep Learning frameworks such as Pytorch, Tensorflow, etc. make use of the CUDA toolkit and provide the option to switch any computation between CPUs and GPUs.
Our Numba compiler is not behind, it makes use of any available CUDA-supported GPUs to further accelerate our computations.
It has the cuda
module to enable computations on the GPU.
But before using it, you need to additionally install the CUDA toolkit with pip3 install cudatoolkit
or conda install cudatoolkit
First of all, let’s find out if we have any available CUDA GPU on our machine that we can use with Numba.
from numba import cuda print(f"number of gpus:",len(cuda.gpus)) print(f"list of gpus:",cuda.gpus.lst)
Output:
Note that if there are no GPUs on our machine, we will get the CudaSupportError
exception with CUDA_ERROR_NO_DEVICE
error.
So it’s a good idea to put such codes in try-catch blocks.
Next, depending on how many GPUs we have and which one is currently free for use (i.e not being used by other users/processes), we can select/activate a certain GPU for Numba operations using the select_device
method.
We can verify our selection using the numba.gpus.current
attribute.
from numba import cuda print("GPU available:", cuda.is_available()) print("currently active gpu:", cuda.gpus.current) #selecting device cuda.select_device(0) print("currently active gpu:", cuda.gpus.current)
Output:
You can also optionally describe the GPU hardware by calling the numba.cuda.detect() method
from numba import cuda print(cuda.detect())
Output:
Now let us try to accelerate a complex operation involving a series of element-wise matrix multiplications using the powerful combination of Numba and CUDA.
We can apply the @numba.cuda.jit
decorator to our function to instruct Numba to use the currently active CUDA GPU for the function.
The functions defined to use GPU are called kernels, and they are invoked in a special way. We define ‘number_of_blocks’ and ‘threads_per_block’ and use them to invoke the kernel. The number of threads running the code will be equal to the product of the these two values.
Also note that the kernels cannot return a value, so any value that we expect from the function should be written in a mutable data structure passed as a parameter to the kernel function.
from numba import cuda, jit import numpy as np a = np.random.random(size=(50,100,100)) #defining 50 2D arrays b = np.random.random(size=(50,100,100)) #another 50 2d arrays result = np.zeros((50,)) #array to store the result def mutiply_python(a,b, result): n,h,w = a.shape for i in range(n): result[i] = 0 #computing sum of elements of product for j in range(h): for k in range(w): result[i] += a[i,j,k]*b[i,j,k] @cuda.jit() def mutiply_numba_cuda(a,b, result): n,h,w = a.shape for i in range(n): result[i] = 0 #computing sum of elements of product for j in range(h): for k in range(w): result[i] += a[i,j,k]*b[i,j,k]
Now let’s run each of the two functions and measure their time.
Note that the code used here may not be the best candidate for GPU parallelization, and so the markup in time over pure Python code may not be representative of the best gain we can achieve through CUDA.
%%timeit -n 5 -r 10 mutiply_python(a,b,result)
Output:
%%timeit -n 5 -r 10 n_block, n_thread = 10,50 mutiply_numba_cuda[n_block, n_thread](a,b,result)
Output:
Note that a lot of Python methods and NumPy operations are still not supported by CUDA with Numba. An exhaustive list of supported Python features can be found here.
Numba import error: Numba needs numpy 1.21 or less
Since Numba depends extensively on NumPy, it can work well only with certain versions of NumPy.
Currently, it works for NumPy versions<1.21
. If you have a NumPy version above 1.21, and you try to import Numba, you will get the above error.
You can check your current NumPy version using numpy.__version__
import numpy as np print(f"Current NumPy version: {np.__version__}") from numba import jit
Output:
As you can see, I have the NumPy version 1.23.1
installed and so I get an error when I import numba.jit
.
To circumvent this error, you can downgrade the NumPy version using pip
as pip3 install numpy=1.21
.
Once this installation is successful, your Numba imports will work fine.