scipy.stats.kstest¶
- scipy.stats.kstest(rvs, cdf, args=(), N=20, alternative='two-sided', mode='approx')[source]¶
- Perform the Kolmogorov-Smirnov test for goodness of fit. - This performs a test of the distribution G(x) of an observed random variable against a given distribution F(x). Under the null hypothesis the two distributions are identical, G(x)=F(x). The alternative hypothesis can be either ‘two-sided’ (default), ‘less’ or ‘greater’. The KS test is only valid for continuous distributions. - Parameters: - rvs : str, array or callable - If a string, it should be the name of a distribution in scipy.stats. If an array, it should be a 1-D array of observations of random variables. If a callable, it should be a function to generate random variables; it is required to have a keyword argument size. - cdf : str or callable - If a string, it should be the name of a distribution in scipy.stats. If rvs is a string then cdf can be False or the same as rvs. If a callable, that callable is used to calculate the cdf. - args : tuple, sequence, optional - Distribution parameters, used if rvs or cdf are strings. - N : int, optional - Sample size if rvs is string or callable. Default is 20. - alternative : {‘two-sided’, ‘less’,’greater’}, optional - Defines the alternative hypothesis (see explanation above). Default is ‘two-sided’. - mode : ‘approx’ (default) or ‘asymp’, optional - Defines the distribution used for calculating the p-value. - ‘approx’ : use approximation to exact distribution of test statistic
- ‘asymp’ : use asymptotic distribution of test statistic
 - Returns: - statistic : float - KS test statistic, either D, D+ or D-. - pvalue : float - One-tailed or two-tailed p-value. - Notes - In the one-sided test, the alternative is that the empirical cumulative distribution function of the random variable is “less” or “greater” than the cumulative distribution function F(x) of the hypothesis, G(x)<=F(x), resp. G(x)>=F(x). - Examples - >>> from scipy import stats - >>> x = np.linspace(-15, 15, 9) >>> stats.kstest(x, 'norm') (0.44435602715924361, 0.038850142705171065) - >>> np.random.seed(987654321) # set random seed to get the same result >>> stats.kstest('norm', False, N=100) (0.058352892479417884, 0.88531190944151261) - The above lines are equivalent to: - >>> np.random.seed(987654321) >>> stats.kstest(stats.norm.rvs(size=100), 'norm') (0.058352892479417884, 0.88531190944151261) - Test against one-sided alternative hypothesis - Shift distribution to larger values, so that cdf_dgp(x) < norm.cdf(x): - >>> np.random.seed(987654321) >>> x = stats.norm.rvs(loc=0.2, size=100) >>> stats.kstest(x,'norm', alternative = 'less') (0.12464329735846891, 0.040989164077641749) - Reject equal distribution against alternative hypothesis: less - >>> stats.kstest(x,'norm', alternative = 'greater') (0.0072115233216311081, 0.98531158590396395) - Don’t reject equal distribution against alternative hypothesis: greater - >>> stats.kstest(x,'norm', mode='asymp') (0.12464329735846891, 0.08944488871182088) - Testing t distributed random variables against normal distribution - With 100 degrees of freedom the t distribution looks close to the normal distribution, and the K-S test does not reject the hypothesis that the sample came from the normal distribution: - >>> np.random.seed(987654321) >>> stats.kstest(stats.t.rvs(100,size=100),'norm') (0.072018929165471257, 0.67630062862479168) - With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at the 10% level: - >>> np.random.seed(987654321) >>> stats.kstest(stats.t.rvs(3,size=100),'norm') (0.131016895759829, 0.058826222555312224) 
