Operations

Data management operations

Pandas contains several helpful functions to manage and format numerical data (Table 1).

Table 1: Common data management functions for pandas columns.
Operation Example Description
Round df['VEI'].round(1) Rounds values to the specified number of decimals
Floor df['VEI'].apply(np.floor) Rounds values down to the nearest integer
Ceil df['VEI'].apply(np.ceil) Rounds values up to the nearest integer
Absolute value df['VEI'].abs() Returns the absolute value of each element
Clip df['VEI'].clip(lower=0, upper=5) Limits values to a specified range
Fill missing df['VEI'].fillna(0) Replaces missing values with a specified value
Filling missing data

The .fillna example in Table 1 shows how to replace missing data - often referred to as Nan for Not a number - with 0 value. However, Pandas’s .fillna contains a lot of different methodologies to fill missing values (e.g., interpolation). Again, take the habit of checking out the documentation of the functions your frequently use.

Numeric operations

Let’s now see how we can manipulate and operate on data contained within our DataFrame. Table 2 illustrates arithmetic operators that can be applied to parts of the DataFrame. Table 2 relies only on native Python arithmetic operators, which can be expanded using the numpy package (Table 3).

Listing 1 Illustrates how to half the VEI column save the results to a new column.

Listing 1: Divide VEI by two and save the results to a new column.
df['VEI_halved'] = df['VEI'] / 2
Exercise

Longitudes are expressed as degrees E (i.e., from 0–180 ) and degrees W (i.e., from -180–0). Use operators to convert longitudes to degrees E (i.e., from 0–360) and store the results to a column called Longitude_E. To do so:

  1. Define a mask where Longitudes are negative using [logical operators]
  2. Where the mask is True (i.e. where the longitude is negative), add the longitude (or subtract its absolute value) to 360

Start by defining a mask

mask = df['Longitude'] <= 0

Select the values using .loc and do the maths.

360 + df.loc[mask, 'Longitude']
df.loc[mask, 'Longitude_E'] = 360 + df.loc[mask, 'Longitude']
Table 2: Common arithmetic operations on numerical pandas columns.
Operation Symbol Example Description
Addition + df['VEI'] + 1 Adds a value to each element
Subtraction - df['VEI'] - 1 Subtracts a value from each element
Multiplication * df['VEI'] * 2 Multiplies each element by a value
Division / df['VEI'] / 2 Divides each element by a value
Exponentiation ** df['VEI'] ** 2 Raises each element to a power
Modulo % df['VEI'] % 2 Remainder after division for each element
Table 3: Common NumPy operations on pandas columns or arrays.
Operation Symbol Example Description
Exponentiation np.power np.power(df['VEI'], 2) Element-wise exponentiation
Square root np.sqrt np.sqrt(df['VEI']) Element-wise square root
Logarithm (base e) np.log np.log(df['VEI']) Element-wise natural logarithm
Logarithm (base 10) np.log10 np.log10(df['VEI']) Element-wise base-10 logarithm
Exponential np.exp np.exp(df['VEI']) Element-wise exponential (e^x)

String operations

Similarly, Table 4 illustrates Pandas’s string-based operators.

Table 4: Common string operations on pandas columns.
Operation Example Description
Concatenation df['Country'] + ' volcano' Adds a string to each element
String length df['Country'].str.len() Returns the length of each string
Uppercase df['Country'].str.upper() Converts each string to uppercase
Lowercase df['Country'].str.lower() Converts each string to lowercase
Replace df['Country'].str.replace('USA', 'US') Replaces substrings in each string