Array Operations with NumPy

Broadcasting in NumPy

Broadcasting is NumPy's smart way of handling operations between arrays of different sizes. Think of it like a universal translator that automatically figures out how to make mismatched arrays work together in calculations. You can explore more advanced broadcasting techniques in the NumPy broadcasting guide when you're ready for complex scenarios.

Vectorization allows operations on entire arrays without writing loops, making calculations incredibly fast by processing elements simultaneously at the hardware level.

vectorization_example.py

import numpy as np# Create two arraysa = np.array([0, 1, 2])b = np.array([2, 2, 2])# Vectorization operation (element-wise)result = a + bprint(result)# Output: [2 3 4]

In the example above, NumPy automatically adds each element at the same position from both arrays. No need to write loops to access each element one by one.

Broadcasting Rules

Broadcasting is a rule system that allows NumPy to perform operations on arrays with different shapes. Like when you want to add the same number to all elements in a list, NumPy can do it automatically.

There are three main rules in broadcasting:

Rule 1: If dimensions differ, add dimensions of size 1 from the left on the array with smaller dimensions
Rule 2: Stretch dimensions of size 1 to match the corresponding dimension values of the other array
Rule 3: If shapes are incompatible, an error will occur

Array with Scalar

broadcasting_1d_scalar.py

import numpy as np# 1D array with scalara = np.arange(3)  # [0, 1, 2]b = 5result = a + bprint(f"Array a: {a}")print(f"Scalar b: {b}")print(f"Result a + b: {result}")# Shape explanation:# a has shape (3,)# b has shape () - scalar# After broadcasting: b becomes [5, 5, 5]# Output: [5 6 7]

2D Array with 1D

When you work with arrays that have different dimensions, NumPy will try to automatically adjust their shapes. This process is very useful when you want to apply the same operation to each row or column of a matrix.

broadcasting_2d_1d.py

import numpy as np# 2D array with 1D arraya = np.ones((3, 3))  # 3x3 matrix filled with 1sb = np.arange(3)     # [0, 1, 2]result = a + bprint("Array a (3x3):")print(a)print(f"Array b (1D): {b}")print("Result a + b:")print(result)# Broadcasting occurs:# a: shape (3, 3)# b: shape (3,) -> expanded to (1, 3) -> (3, 3)# b is added to each row of a

Failed Broadcasting Case

broadcasting_error.py

import numpy as nptry:  # Arrays with incompatible shapes  a = np.arange(6).reshape(2, 3)  # shape (2, 3)  b = np.arange(2)                # shape (2,)    print(f"Array a shape: {a.shape}")  print(f"Array b shape: {b.shape}")    # This will produce an error  result = a + bexcept ValueError as e:  print(f"Error: {e}")  print("Array shapes are incompatible for broadcasting")

Array Arithmetic Operations

NumPy provides various arithmetic operations that can be applied to arrays. These operations work element-wise, similar to a calculator that can compute many numbers simultaneously.

When you perform arithmetic operations with scalars, NumPy will apply that operation to every element in the array. This is very efficient because you don't need to write manual loops.

arithmetic_scalar.py

import numpy as npa = np.array([0, 1, 2, 3, 4])# Addition with scalarprint("Addition:")print(f"a + 1 = {a + 1}")# Output: [1 2 3 4 5]# Multiplication with scalar  print("Multiplication:")a *= 2print(f"a *= 2: {a}")# Output: [0 2 4 6 8]# Powerprint("Power:")print(f"2**a = {2**a}")# Output: [  1   4  16  64 256]

Operations Between Arrays

arithmetic_arrays.py

import numpy as npa = np.array([0, 1, 2, 3, 4])b = np.array([4, 3, 2, 1, 0])# Element-wise subtractionprint("Subtraction:")print(f"a - b = {a - b}")# Output: [-4 -2  0  2  4]# Element-wise multiplicationprint("Element-wise multiplication:")print(f"a * b = {a * b}")# Output: [0 3 4 3 0]# Matrix multiplication (dot product)print("Matrix multiplication:")print(f"a @ b = {a @ b}")# Output: 10 (dot product result)

It's important to understand the difference between element-wise multiplication (*) and matrix multiplication (@ or np.dot()). Element-wise multiplication multiplies elements at the same position, while matrix multiplication follows linear algebra rules.

Comparison and Logic

NumPy also supports comparison operations that produce boolean arrays. These operations are very useful for data filtering or creating complex conditions.

comparison_logical.py

import numpy as npa = np.array([0, 1, 2, 3, 4])b = np.array([0, 0, 2, 4, 4])# Comparison operationsprint("Greater than comparison:")print(f"a > 2: {a > 2}")# Output: [False False False  True  True]print("Equal comparison:")print(f"a == b: {a == b}")# Output: [ True False  True False  True]# Logical operationsprint("Logical OR operation:")print(f"(a > 2) | (a == b): {(a > 2) | (a == b)}")# Output: [ True False  True  True  True]

Logical operators in NumPy use special symbols. Use ~ for NOT, & for AND, and | for OR, not regular Python operators like not, and, or.

Statistical Functions and Reductions

Reduction functions allow you to calculate a single value from an entire array or along a specific axis. Imagine you have a table of exam scores and want to calculate the average for each subject or for each student.

NumPy provides various statistical functions that are very useful for data analysis. These functions can be applied to the entire array or only to specific axes.

basic_statistics.py

import numpy as np# Create 2D array for exampledata = np.array([[3, 0, -1, 1],               [2, -1, -2, 4],               [1, 7, 0, 4]])print("Data array:")print(data)# Statistics on entire arrayprint(f"Total sum: {np.sum(data)}")print(f"Mean: {np.mean(data):.2f}")print(f"Minimum value: {np.min(data)}")print(f"Maximum value: {np.max(data)}")print(f"Standard deviation: {np.std(data):.2f}")# Output:# Total sum: 18# Mean: 1.50# Minimum value: -2# Maximum value: 7# Standard deviation: 2.50

Operations with Axes

The concept of axes in NumPy is very important. For 2D arrays, axis=0 means operations are performed along rows (producing values for each column), while axis=1 means operations are performed along columns (producing values for each row).

Understanding axes helps you control how statistical functions work on multidimensional data. For example, if you have monthly sales data for various products, you can calculate total sales per product or per month.

axis_operations.py

import numpy as npdata = np.array([[3, 0, -1, 1],               [2, -1, -2, 4],               [1, 7, 0, 4]])# Operations along axis=0 (for each column)print("Maximum of each column (axis=0):")print(f"max(axis=0): {np.max(data, axis=0)}")# Output: [3 7 0 4]print("Index of maximum in each column:")print(f"argmax(axis=0): {np.argmax(data, axis=0)}")# Output: [0 2 2 1]# Operations along axis=1 (for each row)  print("Maximum of each row (axis=1):")print(f"max(axis=1): {np.max(data, axis=1)}")# Output: [3 4 7]print("Index of maximum in each row:")print(f"argmax(axis=1): {np.argmax(data, axis=1)}")# Output: [0 3 1]

Array Shape Manipulation

Array shape manipulation allows you to change the dimensions and structure of data without changing its contents. Like rearranging books on a shelf, you can arrange them in different rows without adding or reducing the number of books.

NumPy stores multidimensional arrays internally as one-dimensional arrays with row-major order (elements at the last index are stored sequentially). Understanding this is important for reshape and flatten operations.

internal_storage.py

import numpy as np# Create 2D arraya = np.array([[0, 1], [2, 3]])print("2D Array:")print(a)print(f"Shape: {a.shape}")# See how it's stored in memoryprint(f"Stored in memory as: {a.ravel()}")# Output: [0 1 2 3] (row-major order)

Flatten and Ravel

Both flatten() and ravel() functions convert multidimensional arrays to 1D arrays, but in different ways. Flatten creates a new copy of the data, while ravel tries to create a more memory-efficient view.

flatten_ravel.py

import numpy as np# Create diagonal arraya = np.diag([1, 2, 3])print("Diagonal array:")print(a)# Flatten - creates independent copyb_flatten = a.flatten()print(f"Flatten result: {b_flatten}")# Changing flatten values doesn't affect original arrayb_flatten[0] = 9print(f"After changing flatten: {b_flatten}")print("Original array remains the same:")print(a)print()# Ravel - tries to create view (more efficient)b_ravel = a.ravel()print(f"Ravel result: {b_ravel}")# Changing ravel values affects original arrayb_ravel[0] = 9print(f"After changing ravel: {b_ravel}")print("Original array changed:")print(a)

Reshape and Resize

Reshape operation allows you to change the array shape without changing its data, as long as the total number of elements remains the same. While resize modifies the array directly (in-place).

reshape_resize.py

import numpy as np# Create diagonal array and flattena = np.diag([1, 2, 3])a_flat = a.flatten()print(f"Flat array: {a_flat}")# Reshape - change shape with same number of elementsb = a_flat.reshape(3, 3)print("Reshape result to (3,3):")print(b)# Changing values in reshapeb[0, 0] = 9print("After changing value:")print(b)print(f"Original flat array: {a_flat}")  # Changes because reshape creates view# Resize - change shape in-place (no return value)a_flat.resize(3, 3)print("After resize:")print(a_flat)  # Now a_flat is 2D

Transpose

Transpose is an operation that flips array axes, very useful in linear algebra operations. NumPy provides two ways to perform transpose: using the transpose() method or the shorter .T attribute.

transpose.py

import numpy as np# Create 2x4 arraya = np.linspace(1, 8, 8).reshape(2, 4)print("Original array (2x4):")print(a)# Transpose using methodb = a.transpose()print("Transpose result (4x2):")print(b)# Transpose using .T attribute (shorter)c = a.Tprint("Using .T:")print(c)# Verify that transpose is a viewprint("Is transpose a view?", np.shares_memory(a, b))

Data Standardization with Z-Transform

Z-Transform is a standardization technique that transforms data to have a mean of 0 and standard deviation of 1. This technique is very useful in machine learning to ensure all features have the same scale.

The Z-Transform formula is $Z = \frac{X - \mu}{\sigma}$ , where:

$X$ is the feature matrix of size $n \times k$
$n$ is the number of observations (rows)
$k$ is the number of features (columns)
$\mu$ is the mean vector for each column
$\sigma$ is the standard deviation vector for each column

z_transform.py

import numpy as np# Create sample data (5 observations, 3 features)np.random.seed(42)X = np.random.randn(5, 3) * 10 + 50  # Data with mean~50, std~10print("Original data:")print(X)print(f"Data shape: {X.shape}")# Calculate mean and standard deviation for each columnmu = np.mean(X, axis=0)  # Mean of each columnsigma = np.std(X, axis=0)  # Standard deviation of each columnprint(f"Mean of each feature: {mu}")print(f"Standard deviation of each feature: {sigma}")# Perform Z-TransformZ = (X - mu) / sigmaprint("Data after Z-Transform:")print(Z)# Verify standardization resultsprint("Standardization verification:")print(f"New mean: {np.mean(Z, axis=0)}")  # Should be close to 0print(f"New standard deviation: {np.std(Z, axis=0)}")  # Should be close to 1# New mean output: [ 1.24344979e-15  8.88178420e-17 -1.77635684e-16] (close to 0)# New standard deviation output: [1. 1. 1.]

This standardization process ensures that each feature contributes equally in machine learning algorithms, regardless of their original data scale. For example, if you have height data in centimeters and weight data in kilograms, standardization will make both have the same influence in the model.

For complete documentation and more information about NumPy array operations, you can visit the official NumPy documentation which provides comprehensive guides and practical examples.

Command Palette