chainer.functions.negative_sampling¶
-
chainer.functions.negative_sampling(x, t, W, sampler, sample_size, reduce='sum', *, return_samples=False)[source]¶ Negative sampling loss function.
In natural language processing, especially language modeling, the number of words in a vocabulary can be very large. Therefore, you need to spend a lot of time calculating the gradient of the embedding matrix.
By using the negative sampling trick you only need to calculate the gradient for a few sampled negative examples.
The loss is defined as follows.
\[f(x, p) = - \log \sigma(x^\top w_p) - \ k E_{i \sim P(i)}[\log \sigma(- x^\top w_i)]\]where \(\sigma(\cdot)\) is a sigmoid function, \(w_i\) is the weight vector for the word \(i\), and \(p\) is a positive example. It is approximated with \(k\) examples \(N\) sampled from probability \(P(i)\).
\[f(x, p) \approx - \log \sigma(x^\top w_p) - \ \sum_{n \in N} \log \sigma(-x^\top w_n)\]Each sample of \(N\) is drawn from the word distribution \(P(w) = \frac{1}{Z} c(w)^\alpha\), where \(c(w)\) is the unigram count of the word \(w\), \(\alpha\) is a hyper-parameter, and \(Z\) is the normalization constant.
- Parameters
x (
Variableor N-dimensional array) – Batch of input vectors.t (
Variableor N-dimensional array) – Vector of ground truth labels.W (
Variableor N-dimensional array) – Weight matrix.sampler (FunctionType) – Sampling function. It takes a shape and returns an integer array of the shape. Each element of this array is a sample from the word distribution. A
WalkerAliasobject built with the power distribution of word frequency is recommended.sample_size (int) – Number of samples.
reduce (str) – Reduction option. Its value must be either
'sum'or'no'. Otherwise,ValueErroris raised.return_samples (bool) – If
True, the sample array is also returned. The sample array is a \((\text{batch_size}, \text{sample_size} + 1)\)-array of integers whose first column is fixed to the ground truth labels and the other columns are drawn from thesampler.
- Returns
If
return_samplesisFalse(default), the output variable holding the loss value(s) calculated by the above equation is returned. Otherwise, a tuple of the output variable and the sample array is returned.If
reduceis'no', the output variable holds array whose shape is same as one of (hence both of) input variables. If it is'sum', the output variable holds a scalar value.- Return type
See: Distributed Representations of Words and Phrases and their Compositionality
See also