sklearn.kernel_approximation.SkewedChi2Sampler

class sklearn.kernel_approximation.SkewedChi2Sampler(skewedness=1.0, n_components=100, random_state=None)[source]

Approximates feature map of the “skewed chi-squared” kernel by Monte Carlo approximation of its Fourier transform.

Read more in the User Guide.

Parameters:
skewedness : float

“skewedness” parameter of the kernel. Needs to be cross-validated.

n_components : int

number of Monte Carlo samples per original feature. Equals the dimensionality of the computed feature space.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

See also

AdditiveChi2Sampler
A different approach for approximating an additive variant of the chi squared kernel.
sklearn.metrics.pairwise.chi2_kernel
The exact chi squared kernel.

References

See “Random Fourier Approximations for Skewed Multiplicative Histogram Kernels” by Fuxin Li, Catalin Ionescu and Cristian Sminchisescu.

Examples

>>> from sklearn.kernel_approximation import SkewedChi2Sampler
>>> from sklearn.linear_model import SGDClassifier
>>> X = [[0, 0], [1, 1], [1, 0], [0, 1]]
>>> y = [0, 0, 1, 1]
>>> chi2_feature = SkewedChi2Sampler(skewedness=.01,
...                                  n_components=10,
...                                  random_state=0)
>>> X_features = chi2_feature.fit_transform(X, y)
>>> clf = SGDClassifier(max_iter=10, tol=1e-3)
>>> clf.fit(X_features, y)
SGDClassifier(alpha=0.0001, average=False, class_weight=None,
       early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
       l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=10,
       n_iter=None, n_iter_no_change=5, n_jobs=None, penalty='l2',
       power_t=0.5, random_state=None, shuffle=True, tol=0.001,
       validation_fraction=0.1, verbose=0, warm_start=False)
>>> clf.score(X_features, y)
1.0

Methods

fit(X[, y]) Fit the model with X.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Apply the approximate feature map to X.
__init__(skewedness=1.0, n_components=100, random_state=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Fit the model with X.

Samples random projection according to n_features.

Parameters:
X : array-like, shape (n_samples, n_features)

Training data, where n_samples in the number of samples and n_features is the number of features.

Returns:
self : object

Returns the transformer.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(X)[source]

Apply the approximate feature map to X.

Parameters:
X : array-like, shape (n_samples, n_features)

New data, where n_samples in the number of samples and n_features is the number of features. All values of X must be strictly greater than “-skewedness”.

Returns:
X_new : array-like, shape (n_samples, n_components)