Regression Transform¶
The regression transform fits two-dimensional regression models to smooth and predict data. This transform can fit multiple models for input data (one per group) and generates new data objects that represent points for summary trend lines. Alternatively, this transform can be used to generate a set of objects containing regression model parameters, one per group.
This transform supports parametric models for the following functional forms:
linear (
linear
): y = a + b * xlogarithmic (
log
): y = a + b * log(x)exponential (
exp
): y = a + eb * xpower (
pow
): y = a * xbquadratic (
quad
): y = a + b * x + c * x2polynomial (
poly
): y = a + b * x + … + k * xorder
All models are fit using ordinary least squares. For non-parametric locally weighted regression, see the LOESS Transform.
Here is an example of a simple linear regression plotted on top of data:
import altair as alt
import pandas as pd
import numpy as np
np.random.seed(42)
x = np.linspace(0, 10)
y = x - 5 + np.random.randn(len(x))
df = pd.DataFrame({'x': x, 'y': y})
chart = alt.Chart(df).mark_point().encode(
x='x',
y='y'
)
chart + chart.transform_regression('x', 'y').mark_line()
Transform Options¶
The transform_regression()
method is built on the RegressionTransform
class, which has the following options:
Property |
Type |
Description |
---|---|---|
as |
array(any) |
The output field names for the smoothed points generated by the regression transform. Default value: The field names of the input x and y values. |
extent |
array(any) |
A [min, max] domain over the independent (x) field for the starting and ending points of the generated trend line. |
groupby |
array( |
The data fields to group by. If not specified, a single group containing all data objects will be used. |
method |
[‘linear’, ‘log’, ‘exp’, ‘pow’, ‘quad’, ‘poly’] |
The functional form of the regression model. One of Default value: |
on |
The data field of the independent variable to use a predictor. |
|
order |
|
The polynomial order (number of coefficients) for the ‘poly’ method. Default value: |
params |
|
A boolean flag indicating if the transform should return the regression model parameters (one object per group), rather than trend line points.
The resulting objects include a Default value: |
regression |
The data field of the dependent variable to predict. |