Join Aggregate Transform¶
The Join Aggregate transform acts in almost every way the same as an Aggregate transform, but the resulting aggregate is joined to the original dataset. To make this more clear, consider the following dataset:
import pandas as pd
import numpy as np
rand = np.random.RandomState(0)
df = pd.DataFrame({
'label': rand.choice(['A', 'B', 'C'], 10),
'value': rand.randn(10),
})
df
label value
0 A -0.173070
1 B -1.761652
2 A -0.087673
3 B 1.366879
4 B 1.125314
5 C -0.358996
6 A 1.220608
7 C -1.339496
8 A 0.428373
9 A -0.123463
Here is a pandas operation that is equivalent to Altair’s Aggregate transform, using the mean as an example:
mean = df.groupby('label').mean().reset_index()
mean
label value
0 A 0.252955
1 B 0.243514
2 C -0.849246
And here is an output that is equivalent to Altair’s Join Aggregate:
pd.merge(df, mean, on='label', suffixes=['', '_mean'])
label value value_mean
0 A -0.173070 0.252955
1 A -0.087673 0.252955
2 A 1.220608 0.252955
3 A 0.428373 0.252955
4 A -0.123463 0.252955
5 B -1.761652 0.243514
6 B 1.366879 0.243514
7 B 1.125314 0.243514
8 C -0.358996 -0.849246
9 C -1.339496 -0.849246
Notice that the join aggregate joins the aggregated value with the original dataframe, such that the aggregated values can be used in tandem with the original values if desired.
Here is an example of how the join aggregate might be used: we compare the IMDB and Rotten Tomatoes movie ratings, normalized by their mean and standard deviation, which requires calculations on the joined data:
import altair as alt
from vega_datasets import data
alt.Chart(data.movies.url).transform_filter(
'datum.IMDB_Rating != null && datum.Rotten_Tomatoes_Rating != null'
).transform_joinaggregate(
IMDB_mean='mean(IMDB_Rating)',
IMDB_std='stdev(IMDB_Rating)',
RT_mean='mean(Rotten_Tomatoes_Rating)',
RT_std='stdev(Rotten_Tomatoes_Rating)'
).transform_calculate(
IMDB_Deviation="(datum.IMDB_Rating - datum.IMDB_mean) / datum.IMDB_std",
Rotten_Tomatoes_Deviation="(datum.Rotten_Tomatoes_Rating - datum.RT_mean) / datum.RT_std"
).mark_point().encode(
x='IMDB_Deviation:Q',
y="Rotten_Tomatoes_Deviation:Q"
)
Transform Options¶
The transform_joinaggregate()
method is built on the
JoinAggregateTransform
class, which has the following options:
Property |
Type |
Description |
---|---|---|
groupby |
array( |
The data fields for partitioning the data objects into separate groups. If unspecified, all data points will be in a single group. |
joinaggregate |
array( |
The definition of the fields in the join aggregate, and what calculations to use. |