Quite often it is needed to aggregate data to obtain more information, for example, suppose to have a dataset about bike-sharing and we want to know the mean of the number of bike’s rents by season. In Pandas the solution is easy:

bike_sharing = pd.read_csv('bike_data.csv') rents_by_season = bike_sharing.groupby('Seasons').mean().reset_index() rents_by_season[['Seasons', 'Rented Bike Count']]

And… What if we have to aggregate data by some attribute and we want the “mean” of categorical data?

The statistical tool to use in this scenario is the **mode**, in poor words the **most common value**. At the present time, Pandas implement the **mode **as a method to call on a dataframe, but does not implement the **mode** as a reducer for the **groupby** method.

## Solution in Pandas

As said before, since Pandas does not implement the **mode** as aggregation operator for the **groupby **method, it is needed to adopt another smart strategy. Suppose to have the following dataset.

Discover your most common music genre by year

Learn Spotify API to get your dataAnd try this solution

Let’s suppose that we want to know for each year the **most frequent genre**. Here is the solution.

import pandas as pd data = pd.read_csv('film-data.csv') mode_data = data.groupby(['Year']).agg(lambda x:x.value_counts().index[0]).reset_index() mode_data[['Year', 'Genre']]

First, choose the aggregation attribute, in this case **Year**. Second, we have to define a custom function that implements our aggregation strategy to pass to **agg** method. In particular, we have defined a **lambda function** that computes the **value counts** (**x.value_counts()**) for each column of the dataset and takes the first row of the result (**index[0]**). Remember the method** value_counts()** returns the frequency of values in a column in decrescent order.

And… here we are! Now we can aggregate by computing the **mode** for the qualitative data.