Quantcast
Channel: Like Geeks
Viewing all articles
Browse latest Browse all 104

Normalization using NumPy norm (Simple Examples)

$
0
0

Normalization of a vector or a matrix is a common operation performed in a variety of scientific, mathematical, and programming applications.
In this tutorial, we will understand what normalization is, and how to compute the same in Python.
We will look at the following topics on normalization using Python NumPy:

 

 

Introduction

NumPy arrays are most commonly used to represent vectors or matrices of numbers.
A 1-dimensional or a 1-D array is used for representing a vector and a 2-D array is used to define a matrix (where each row/column is a vector).

These vectors and matrices have interesting mathematical properties.
A vector, as we know it, is an entity in space. It has a magnitude and a direction.

Normalization of a vector is the transformation of a vector, obtained by performing certain mathematical operations on it. To perform normalization, we calculate a value called `norm` of a vector.

This value represents some property of the vector, for eg., an L2 norm of a vector denotes its length.
There are various types of norms, but in this tutorial, we are going to focus on the most popular ones namely the L2 norm and the L1 norm.

 

NumPy norm

NumPy has a dedicated submodule called linalg for functions related to Linear Algebra.
This submodule is a collection of Python functions used for performing various common Linear Algebraic operations such as vector products, eigenvalues calculation, determinant of a matrix, solving equations, etc.

The function used for finding norms of vectors and matrices is called norm and can be called in Python as numpy.linalg.norm(x)
The function returns different results, depending on the value passed for argument x. Generally, x is a vector or a matrix, i.e a 1-D or a 2-D NumPy array.
This function takes a second parameter calledord, which determines the type of norm to be calculated on the array. The default value for this is None, in which case we get the 2-norm(popularly known as the ‘L2 norm’ or ‘Euclidean norm’) of a vector.
The L2 norm or Euclidean norm of an array is calculated using the following formula:L2 norm of vector forumla

Note that we will use the default value for the ord parameter for most of our code examples.

 

norm of an array

Let us now use the norm function to find the norm of a NumPy array.

import numpy as np

a = np.array([1, 2, 3, 4, 5])

a_norm = np.linalg.norm(a)

print(a_norm)

Output:

output of the NumPy norm on array a

Since the values in array a are 1,2,3,4 and 5, the L2 norm of the array has been calculated as:

calculation of norm of a

Let us now see how the function behaves on a matrix i.e a 2-D NumPy array.

b = np.array([[1, 2, 3],
            [4, 5, 6],
            [7, 8, 9]])

print(f"b:\n{b}")

b_norm = np.linalg.norm(b)

print(f"norm of b = {b_norm}")

Output:

output of the NumPy norm on matrix b

As we can see, when we pass a matrix to the norm function, it still returns a single real value.
This is called the ‘Frobenius norm’ of a matrix. It is the square root of the sum of squares of all elements in the matrix.

 

Norms of columns and rows of a matrix

As we saw in the previous section, if we pass a matrix to the norm function, it calculates the sum of squares of all elements and returns a single value.
But often we need to normalize each column or row of a matrix separately. The row/columns of a matrix are, after all, 1-D vectors.
This can be achieved by specifying the ‘axis‘ parameter of the norm function.

For finding the norm of the columns, we pass the value 0 to the axis parameter, and for row norms, we pass the value 1.
Let us look at examples of each of them.

x = np.arange(9) - 4

x = x.reshape(3,3)

print(f"x:\n{x}")

x_norm_col = np.linalg.norm(x, axis=0)

print("\nColumn wise norm:")

print(x_norm_col)

x_norm_row = np.linalg.norm(x, axis=1)

print("\nRow wise norm:")

print(x_norm_row)

Output:

row and column wise normalization output

Since there are 3 rows in our matrix, we get 3 norm values for row normalisation(axis=1)
Similarly, for each of the 4 columns, we get 4 norm values when we pass axis=0.

 

Norm of an n-dimensional array

We have so far seen the calculation of norms on vector and 2-D arrays. Let us now understand how to find the norm of n-dimensional arrays.
Let us construct a 3-dimensional array of the shape (10,2,2).

a = np.arange(16).reshape(4, 2, 2)

print(a)

Output:

3-dimensional matrix a

Now we can find the norm of this array, row-wise by passing the value of ‘axis’ as 0.
This will give us a matrix of size 2×2, each representing the norm of values in the for matrices at positions (0,0), (0,1), (1,0) and (1,2).

a_norm = np.linalg.norm(a, axis=0)

print(a_norm)

Output:

norm of 3-dimensional matrix a

 

Why do we need norms?

As stated in the introduction, normalization is a very common operation in a variety of applications.
One important use of norm is to transform a given vector into a unit-length vector, that is, making the magnitude of vector = 1, while still preserving its direction.
This is achieved by dividing each element in a vector by its length i.e its L2-norm.
Normalization is also an important pre-processing step in many machine learning algorithms.

Let us normalize a vector and a matrix (a collection of vectors).

a = np.array([5, 2, 0, 1, 9])

a_norm = np.linalg.norm(a)

a_normalized = a/a_norm

print(f"a = {a}")

print(f"L2 norm of a = {a_norm}")

print(f"normalized a = {a_normalized}")

Output:

normalization of vector a

We now have a transformed vector whose length is 1. We can verify this by calculating the L2 norm of the normalized vector

l = np.linalg.norm(a_normalized)

print(f"Length of normalized vector = {l}")

Output:

length of normalized vector

Similarly, we can also normalize matrices.
This is especially useful when we need to normalize tabular data in a machine learning application, where each row represents a sample, and each column, an attribute or feature of the sample.
To normalize such data, we perform L2-normalization on the columns of the matrix i.e with axis = 0.

Let us suppose we have 5 samples of human data, where each sample represents a person’s height in cm, weight in kg, age in years, and monthly salary in USD.
Let’s construct the matrix to represent this data.

data = np.array([[150, 60, 23, 5000],
                [165, 65, 29, 2300],
                [155, 85, 35, 7500],
                [135, 72, 54, 1800],
                [170, 91, 24, 1500]])

In this example, we are representing 4 attributes of 5 individuals, and we need to normalize each of these attributes/features before feeding it to an ML algorithm.
Let us calculate the norms of each column, and then divide the respective columns by these norms.

feature_norms = np.linalg.norm(data, axis = 0)

print(f"Norms of features of data = {feature_norms}\n")

data_normalized = data/feature_norms
    
print("Normalized data:")

print(data_normalized)

Output:

normalization of 2-D data

 

L1 norm of a vector

Another popular type of norm is the L1 norm of a vector. It is equal to the sum of the magnitudes of elements of a vector.

formula for L1 norm of vector

We can find the L-1 norm of an array in Python using the same function that we used for the L2 norm i.e np.linalg.norm, except this time we’ll pass the value of the parameter ‘ord‘ as 1.

a = [1,2,-1,3,4,-2]

norm_a_l1 =np.linalg.norm(a, ord=1)

print(f"a = {a}\n")

print(f"L1 norm of a = {norm_a_l1}")

Output:

l1 norm of a in Python

As is evident, the sum of magnitudes of values in a (i.e sum of all absolute values in a) is equal to 13.

Note that another interesting use of these two norms i.e the L1 norm and the L2 norm is in the computation of loss in regularised gradient descent algorithms.
These are used in the famous ‘Ridge’ and ‘Lasso’ regression algorithms.

 

NumPy norm of arrays with nan values

While processing real-world data, we often encounter missing values or non-sensical for some features in data.
These values are called nan (Not a Number) for numeric features. They cannot be accounted for in any mathematical operation on the data.

Let us take an example of a NumPy array with a nan value. We’ll compute the L2 norm on this array.

a = np.array([1,2,3,4,np.nan, 5,6])

print(f"a = {a}\n")

norm_a = np.linalg.norm(a)

print(f"L2 norm of a = {norm_a}")

Output:

norm of array with nan value

As can see, if we involve nan values when performing a mathematical operation, we are going to get a result that doesn’t make any sense i.e we end up with another nan value!

We can fix this by filtering out the nan values from the array and computing the norm on the rest of the array.

nan_flags = np.isnan(a)

a_clean = a[~nan_flags]

print(f"clean a = {a_clean}\n")
    
norm_a_clean = np.linalg.norm(a_clean)

print(f"L2 norm of a = {norm_a_clean}")

Output:

norm of array with nan value

We first construct a boolean array using np.isnan(a), having values True at positions of nan values, and False elsewhere.
We then invert these flags and use them to index our original array, thus giving us values that are not nan.
Finally, we compute the norm on this indexed array.

 

Euclidean distance using NumPy norm

You must have heard of the famous `Euclidean distance` formula to calculate the distance between two points A(x1,y1) and B(x2, y2)

euclidean distance formula

Let us understand how this formula makes use of the L2 norm of a vector.

Let us consider two points A(2,3) and B(3,1). We need to find the distance between these two points.
Each of the two points can be represented as a vector from the origin to point.

vector representation of points A and B

We need to find the distance between points A and B, i.e the length of vector AB.
By property of vector addition, vector AB = OA – OB = B – A.
Now, all we have to do is find the length of this vector AB, which is nothing but the L2 norm of vector AB!
Let’s code this in Python.

A = np.array([2,3])

B = np.array([3,1])

print(f"A = {A}, B = {B}\n")
    
AB = B - A

print(f"vector AB = {AB}\n")
    
d_AB = np.linalg.norm(AB)

print(f"distance AB = {d_AB}")

Output:

Output of distance calculation using norm

We get the distance between A and B as 2.236, which we can verify using the Euclidean distance formula.

distance verification with Euclidean formula

 

Performance comparison: NumPy norm vs sqrt

We used NumPy’s norm method for computing the L2 norm of arrays.
Actually, we can do the same by writing the code for calculating norm in Python, instead of using the function np.linalg.norm.
We need to write code to compute squares of array values, calculate their sum and take the square root of the sum using np.sqrt.

Let’s compare the time performance of the two methods.

import time

a = np.random.randn(10000)

t1 = time.time()

a_norm = np.linalg.norm(a)

t2 = time.time()

print(f"L2 norm of a = {a_norm}")

print(f"Time using norm: {t2-t1}\n")


t1 = time.time()

a_norm = np.sqrt(sum(map(lambda x: x**2, a)))

t2 = time.time()

print(f"L2 norm of a = {a_norm}")

print(f"Time using sqrt: {t2-t1}\n")

print(a_norm)

Output:

comparison of norm and sqrt

The norm method is much faster (about 50 times faster) than the np.sqrt method on an array of 10000 values.

 

Conclusion

In this tutorial, we understood what norms of vectors and matrices are, and how to calculate them using NumPy’s norm method

We also saw how we can compute norms of individual rows and columns of a matrix.

We understood the need for computing norms and their applications in vector algebra and machine learning.

For most of the tutorial, we focused on L2 norms. We also introduced another popular norm called the ‘L1 norm’ and computed the same using NumPy norm.

We then learned how to compute norms of arrays with nan values.

Next, we saw how norms are related to the Euclidean distance formula and calculated the distance between two points using NumPy norm.

Finally, we compared the performance of the norm method with NumPy’s sqrt method for computing the L2 norm of an array.

The post Normalization using NumPy norm (Simple Examples) appeared first on Like Geeks.


Viewing all articles
Browse latest Browse all 104

Trending Articles