chainer.functions.negative_sampling¶
-
chainer.functions.
negative_sampling
(x, t, W, sampler, sample_size, reduce='sum', *, return_samples=False)[source]¶ Negative sampling loss function.
In natural language processing, especially language modeling, the number of words in a vocabulary can be very large. Therefore, you need to spend a lot of time calculating the gradient of the embedding matrix.
By using the negative sampling trick you only need to calculate the gradient for a few sampled negative examples.
The loss is defined as follows.
f(x,p)=−logσ(x⊤wp)− kEi∼P(i)[logσ(−x⊤wi)]where σ(⋅) is a sigmoid function, wi is the weight vector for the word i, and p is a positive example. It is approximated with k examples N sampled from probability P(i).
f(x,p)≈−logσ(x⊤wp)− ∑n∈Nlogσ(−x⊤wn)Each sample of N is drawn from the word distribution P(w)=1Zc(w)α, where c(w) is the unigram count of the word w, α is a hyper-parameter, and Z is the normalization constant.
- Parameters
x (
Variable
or N-dimensional array) – Batch of input vectors.t (
Variable
or N-dimensional array) – Vector of ground truth labels.W (
Variable
or N-dimensional array) – Weight matrix.sampler (FunctionType) – Sampling function. It takes a shape and returns an integer array of the shape. Each element of this array is a sample from the word distribution. A
WalkerAlias
object built with the power distribution of word frequency is recommended.sample_size (int) – Number of samples.
reduce (str) – Reduction option. Its value must be either
'sum'
or'no'
. Otherwise,ValueError
is raised.return_samples (bool) – If
True
, the sample array is also returned. The sample array is a (batch_size,sample_size+1)-array of integers whose first column is fixed to the ground truth labels and the other columns are drawn from thesampler
.
- Returns
If
return_samples
isFalse
(default), the output variable holding the loss value(s) calculated by the above equation is returned. Otherwise, a tuple of the output variable and the sample array is returned.If
reduce
is'no'
, the output variable holds array whose shape is same as one of (hence both of) input variables. If it is'sum'
, the output variable holds a scalar value.- Return type
See: Distributed Representations of Words and Phrases and their Compositionality
See also