chainer.functions.negative_sampling¶
-
chainer.functions.
negative_sampling
(x, t, W, sampler, sample_size, reduce='sum', *, return_samples=False)[source]¶ Negative sampling loss function.
In natural language processing, especially language modeling, the number of words in a vocabulary can be very large. Therefore, you need to spend a lot of time calculating the gradient of the embedding matrix.
By using the negative sampling trick you only need to calculate the gradient for a few sampled negative examples.
The loss is defined as follows.
\[f(x, p) = - \log \sigma(x^\top w_p) - \ k E_{i \sim P(i)}[\log \sigma(- x^\top w_i)]\]where \(\sigma(\cdot)\) is a sigmoid function, \(w_i\) is the weight vector for the word \(i\), and \(p\) is a positive example. It is approximated with \(k\) examples \(N\) sampled from probability \(P(i)\).
\[f(x, p) \approx - \log \sigma(x^\top w_p) - \ \sum_{n \in N} \log \sigma(-x^\top w_n)\]Each sample of \(N\) is drawn from the word distribution \(P(w) = \frac{1}{Z} c(w)^\alpha\), where \(c(w)\) is the unigram count of the word \(w\), \(\alpha\) is a hyper-parameter, and \(Z\) is the normalization constant.
- Parameters
x (
Variable
or N-dimensional array) – Batch of input vectors.t (
Variable
or N-dimensional array) – Vector of ground truth labels.W (
Variable
or N-dimensional array) – Weight matrix.sampler (FunctionType) – Sampling function. It takes a shape and returns an integer array of the shape. Each element of this array is a sample from the word distribution. A
WalkerAlias
object built with the power distribution of word frequency is recommended.sample_size (int) – Number of samples.
reduce (str) – Reduction option. Its value must be either
'sum'
or'no'
. Otherwise,ValueError
is raised.return_samples (bool) – If
True
, the sample array is also returned. The sample array is a \((\text{batch_size}, \text{sample_size} + 1)\)-array of integers whose first column is fixed to the ground truth labels and the other columns are drawn from thesampler
.
- Returns
If
return_samples
isFalse
(default), the output variable holding the loss value(s) calculated by the above equation is returned. Otherwise, a tuple of the output variable and the sample array is returned.If
reduce
is'no'
, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is'sum'
, the output variable holds a scalar value.- Return type
See: Distributed Representations of Words and Phrases and their Compositionality
See also