Contributing¶

rules of thumb

>>> import sklearn
>>> sklearn.show_versions()  

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)

$ git clone git@github.com:YourLogin/scikit-learn.git

$ git checkout -b my-feature

$ git add modified_files
$ git commit

$ git push -u origin my-feature

$ git remote add upstream https://github.com/scikit-learn/scikit-learn.git

$ git merge master

$ make

$ pip install pytest pytest-cov
$ pytest --cov sklearn path/to/tests_for_package

$ pip install flake8
$ flake8 path/to/module.py

pip install sphinx sphinx-gallery numpydoc matplotlib Pillow pandas scikit-image joblib

cd doc

pip install --editable ..

make html

make latexpdf

See also
--------
SelectKBest : Select features based on the k highest scores.
SelectFpr : Select features based on a false positive rate test.

from sklearn.utils import check_array, check_random_state

def choose_random_sample(X, random_state=0):
    """
    Choose a random point from X

    Parameters
    ----------
    X : array-like, shape (n_samples, n_features)
        array representing the data
    random_state : RandomState or an int seed (0 by default)
        A random number generator instance to define the state of the
        random permutations generator.

    Returns
    -------
    x : numpy array, shape (n_features,)
        A random point selected from X
    """
    X = check_array(X)
    random_state = check_random_state(random_state)
    i = random_state.randint(X.shape[0])
    return X[i]

class GaussianNoise(BaseEstimator, TransformerMixin):
    """This estimator ignores its input and returns random Gaussian noise.

    It also does not adhere to all scikit-learn conventions,
    but showcases how to handle randomness.
    """

    def __init__(self, n_components=100, random_state=None):
        self.random_state = random_state

    # the arguments are ignored anyway, so we make them optional
    def fit(self, X=None, y=None):
        self.random_state_ = check_random_state(self.random_state)

    def transform(self, X):
        n_samples = X.shape[0]
        return self.random_state_.randn(n_samples, n_components)

from ..utils import deprecated

def zero_one_loss(y_true, y_pred, normalize=True):
    # actual implementation
    pass

@deprecated("Function 'zero_one' was renamed to 'zero_one_loss' "
            "in version 0.13 and will be removed in release 0.15. "
            "Default behavior is changed from 'normalize=False' to "
            "'normalize=True'")
def zero_one(y_true, y_pred, normalize=False):
    return zero_one_loss(y_true, y_pred, normalize)

@property
@deprecated("Attribute labels_ was deprecated in version 0.13 and "
            "will be removed in 0.15. Use 'classes_' instead")
def labels_(self):
    return self.classes_

import warnings

def example_function(n_clusters=8, k='not_used'):
    if k != 'not_used':
        warnings.warn("'k' was renamed to n_clusters in version 0.13 and "
                      "will be removed in 0.15.", DeprecationWarning)
        n_clusters = k

import warnings

class ExampleEstimator(BaseEstimator):
    def __init__(self, n_clusters=8, k='not_used'):
        self.n_clusters = n_clusters
        self.k = k

    def fit(self, X, y):
        if k != 'not_used':
            warnings.warn("'k' was renamed to n_clusters in version 0.13 and "
                          "will be removed in 0.15.", DeprecationWarning)
            self._n_clusters = k
        else:
            self._n_clusters = self.n_clusters

.. deprecated:: 0.13
   ``k`` was renamed to ``n_clusters`` in version 0.13 and will be removed
   in 0.15.

import warnings

def example_function(n_clusters='warn'):
    if n_clusters == 'warn':
        warnings.warn("The default value of n_clusters will change from "
                      "5 to 10 in 0.22.", FutureWarning)
        n_clusters = 5

import warnings

class ExampleEstimator:
    def __init__(self, n_clusters='warn'):
        self.n_clusters = n_clusters

    def fit(self, X, y):
        if self.n_clusters == 'warn':
          warnings.warn("The default value of n_clusters will change from "
                        "5 to 10 in 0.22.", FutureWarning)
          self._n_clusters = 5

estimator = estimator.fit(data, targets)

estimator = estimator.fit(data)

prediction = predictor.predict(data)

probability = predictor.predict_proba(data)

new_data = transformer.transform(data)

new_data = transformer.fit_transform(data)

score = model.score(data)

estimator.fit(X, y)

clf2 = SVC(C=2.3)
clf3 = SVC([[1, 2], [2, 3]], [-1, 1]) # WRONG!

def __init__(self, param1=1, param2=2):
    self.param1 = param1
    self.param2 = param2

def __init__(self, param1=1, param2=2, param3=3):
    # WRONG: parameters should not be modified
    if param1 > 1:
        param2 += 1
    self.param1 = param1
    # WRONG: the object's attributes should have exactly the name of
    # the argument in the constructor
    self.param3 = param2

y_predicted = SVC(C=100).fit(X_train, y_train).predict(X_test)

>>> from sklearn.utils.estimator_checks import check_estimator
>>> from sklearn.svm import LinearSVC
>>> check_estimator(LinearSVC)  # passes

>>> import numpy as np
>>> from sklearn.base import BaseEstimator, ClassifierMixin
>>> from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
>>> from sklearn.utils.multiclass import unique_labels
>>> from sklearn.metrics import euclidean_distances
>>> class TemplateClassifier(BaseEstimator, ClassifierMixin):
...
...     def __init__(self, demo_param='demo'):
...         self.demo_param = demo_param
...
...     def fit(self, X, y):
...
...         # Check that X and y have correct shape
...         X, y = check_X_y(X, y)
...         # Store the classes seen during fit
...         self.classes_ = unique_labels(y)
...
...         self.X_ = X
...         self.y_ = y
...         # Return the classifier
...         return self
...
...     def predict(self, X):
...
...         # Check is fit had been called
...         check_is_fitted(self, ['X_', 'y_'])
...
...         # Input validation
...         X = check_array(X)
...
...         closest = np.argmin(euclidean_distances(X, self.X_), axis=1)
...         return self.y_[closest]

def get_params(self, deep=True):
    # suppose this estimator has parameters "alpha" and "recursive"
    return {"alpha": self.alpha, "recursive": self.recursive}

def set_params(self, **parameters):
    for parameter, value in parameters.items():
        setattr(self, parameter, value)
    return self

self.classes_, y = np.unique(y, return_inverse=True)

def predict(self, X):
    D = self.decision_function(X)
    return self.classes_[np.argmax(D, axis=1)]

Commit Message Marker	Action Taken by CI
[scipy-dev]	Add a Travis build with our dependencies (numpy, scipy, etc …) development builds
[ci skip]	CI is skipped completely
[doc skip]	Docs are not built
[doc quick]	Docs built, but excludes example gallery plots
[doc build]	Docs built including example gallery plots

Cleanup / Enhancement:
Bug / Crash:	Something is happening that clearly shouldn’t happen. Wrong results as well as unexpected errors from estimators go here.
	Improving performance, usability, consistency.
Documentation:	Missing, incorrect or sub-standard documentations and examples.
New Feature:	Feature requests and pull requests implementing a new feature.

good first issue:
	This issue is ideal for a first contribution to scikit-learn. Ask for help if the formulation is unclear. If you have already contributed to scikit-learn, look at Easy issues instead.
Easy:	This issue can be tackled without much prior experience.
Moderate:	Might need some knowledge of machine learning or the package, but is still approachable for someone new to the project.
help wanted:	This tag marks an issue which currently lacks a contributor or a PR that needs another contributor to take over the work. These issues can range in difficulty, and may not be approachable for new contributors. Note that not all issues which need contributors will have this tag.

Estimator:	The base object, implements a `fit` method to learn from data, either: estimator = estimator.fit(data, targets) or: estimator = estimator.fit(data)
Predictor:	For supervised learning, or some unsupervised problems, implements: prediction = predictor.predict(data) Classification algorithms usually also offer a way to quantify certainty of a prediction, either using `decision_function` or `predict_proba`: probability = predictor.predict_proba(data)
Transformer:	For filtering or modifying the data, in a supervised or unsupervised way, implements: new_data = transformer.transform(data) When fitting and transforming can be performed much more efficiently together than separately, implements: new_data = transformer.fit_transform(data)
Model:	A model that can give a goodness of fit measure or a likelihood of unseen data, implements (higher is better): score = model.score(data)

Parameters
X	array-like, shape (n_samples, n_features)
y	array, shape (n_samples,)
kwargs	optional data-dependent parameters.

Contributing¶

Ways to contribute¶

Submitting a bug report or a feature request¶

How to make a good bug report¶

Contributing code¶

How to contribute¶

Contributing pull requests¶

Issues for New Contributors¶

Documentation¶

Building the documentation¶

Guidelines for writing documentation¶

Generated documentation on CircleCI¶

Testing and improving test coverage¶

Developers web site¶

Issue Tracker Tags¶

Coding guidelines¶

Input validation¶

Random Numbers¶

Deprecation¶

Change the default value of a parameter¶

Python versions supported¶

Code Review Guidelines¶

APIs of scikit-learn objects¶

Different objects¶

Estimators¶

Instantiation¶

Fitting¶

Estimated Attributes¶

Optional Arguments¶

Rolling your own estimator¶

get_params and set_params¶

Parameters and init¶

Cloning¶

Pipeline compatibility¶

Estimator types¶

Working notes¶

Specific models¶