df['VEI_halved'] = df['VEI'] / 2Operations
Data management operations
Pandas contains several helpful functions to manage and format numerical data (Table 1).
| Operation | Example | Description |
|---|---|---|
| Round | df['VEI'].round(1) |
Rounds values to the specified number of decimals |
| Floor | df['VEI'].apply(np.floor) |
Rounds values down to the nearest integer |
| Ceil | df['VEI'].apply(np.ceil) |
Rounds values up to the nearest integer |
| Absolute value | df['VEI'].abs() |
Returns the absolute value of each element |
| Clip | df['VEI'].clip(lower=0, upper=5) |
Limits values to a specified range |
| Fill missing | df['VEI'].fillna(0) |
Replaces missing values with a specified value |
The .fillna example in Table 1 shows how to replace missing data - often referred to as Nan for Not a number - with 0 value. However, Pandas’s .fillna contains a lot of different methodologies to fill missing values (e.g., interpolation). Again, take the habit of checking out the documentation of the functions your frequently use.
Numeric operations
Let’s now see how we can manipulate and operate on data contained within our DataFrame. Table 2 illustrates arithmetic operators that can be applied to parts of the DataFrame. Table 2 relies only on native Python arithmetic operators, which can be expanded using the numpy package (Table 3).
Listing 1 Illustrates how to half the VEI column save the results to a new column.
Longitudes are expressed as degrees E (i.e., from 0–180 ) and degrees W (i.e., from -180–0). Use operators to convert longitudes to degrees E (i.e., from 0–360) and store the results to a column called Longitude_E. To do so:
- Define a mask where Longitudes are negative using [logical operators]
- Where the mask is
True(i.e. where the longitude is negative), add the longitude (or subtract its absolute value) to 360
| Operation | Symbol | Example | Description |
|---|---|---|---|
| Addition | + |
df['VEI'] + 1 |
Adds a value to each element |
| Subtraction | - |
df['VEI'] - 1 |
Subtracts a value from each element |
| Multiplication | * |
df['VEI'] * 2 |
Multiplies each element by a value |
| Division | / |
df['VEI'] / 2 |
Divides each element by a value |
| Exponentiation | ** |
df['VEI'] ** 2 |
Raises each element to a power |
| Modulo | % |
df['VEI'] % 2 |
Remainder after division for each element |
| Operation | Symbol | Example | Description |
|---|---|---|---|
| Exponentiation | np.power |
np.power(df['VEI'], 2) |
Element-wise exponentiation |
| Square root | np.sqrt |
np.sqrt(df['VEI']) |
Element-wise square root |
| Logarithm (base e) | np.log |
np.log(df['VEI']) |
Element-wise natural logarithm |
| Logarithm (base 10) | np.log10 |
np.log10(df['VEI']) |
Element-wise base-10 logarithm |
| Exponential | np.exp |
np.exp(df['VEI']) |
Element-wise exponential (e^x) |
String operations
Similarly, Table 4 illustrates Pandas’s string-based operators.
| Operation | Example | Description |
|---|---|---|
| Concatenation | df['Country'] + ' volcano' |
Adds a string to each element |
| String length | df['Country'].str.len() |
Returns the length of each string |
| Uppercase | df['Country'].str.upper() |
Converts each string to uppercase |
| Lowercase | df['Country'].str.lower() |
Converts each string to lowercase |
| Replace | df['Country'].str.replace('USA', 'US') |
Replaces substrings in each string |