chainer.functions.crf1d¶
-
chainer.functions.
crf1d
(cost, xs, ys, reduce='mean')[source]¶ Calculates negative log-likelihood of linear-chain CRF.
It takes a transition cost matrix, a sequence of costs, and a sequence of labels. Let cst be a transition cost from a label s to a label t, xit be a cost of a label t at position i, and yi be an expected label at position i. The negative log-likelihood of linear-chain CRF is defined as
L=−(l∑i=1xiyi+ l−1∑i=1cyiyi+1−log(Z)),where l is the length of the input sequence and Z is the normalizing constant called partition function.
Note
When you want to calculate the negative log-likelihood of sequences which have different lengths, sort the sequences in descending order of lengths and transpose the sequences. For example, you have three input sequences:
>>> a1 = a2 = a3 = a4 = np.random.uniform(-1, 1, 3).astype(np.float32) >>> b1 = b2 = b3 = np.random.uniform(-1, 1, 3).astype(np.float32) >>> c1 = c2 = np.random.uniform(-1, 1, 3).astype(np.float32)
>>> a = [a1, a2, a3, a4] >>> b = [b1, b2, b3] >>> c = [c1, c2]
where
a1
and all other variables are arrays with(K,)
shape. Make a transpose of the sequences:>>> x1 = np.stack([a1, b1, c1]) >>> x2 = np.stack([a2, b2, c2]) >>> x3 = np.stack([a3, b3]) >>> x4 = np.stack([a4])
and make a list of the arrays:
>>> xs = [x1, x2, x3, x4]
You need to make label sequences in the same fashion. And then, call the function:
>>> cost = chainer.Variable( ... np.random.uniform(-1, 1, (3, 3)).astype(np.float32)) >>> ys = [np.zeros(x.shape[0:1], dtype=np.int32) for x in xs] >>> loss = F.crf1d(cost, xs, ys)
It calculates mean of the negative log-likelihood of the three sequences.
The output is a variable whose value depends on the value of the option
reduce
. If it is'no'
, it holds the elementwise loss values. If it is'mean'
, it holds mean of the loss values.- Parameters
cost (
Variable
or N-dimensional array) – A K×K matrix which holds transition cost between two labels, where K is the number of labels.xs (list of Variable) – Input vector for each label.
len(xs)
denotes the length of the sequence, and eachVariable
holds a B×K matrix, where B is mini-batch size, K is the number of labels. Note that Bs in all the variables are not necessary the same, i.e., it accepts the input sequences with different lengths.ys (list of Variable) – Expected output labels. It needs to have the same length as
xs
. EachVariable
holds a B integer vector. Whenx
inxs
has the different B, correspodingy
has the same B. In other words,ys
must satisfyys[i].shape == xs[i].shape[0:1]
for alli
.reduce (str) – Reduction option. Its value must be either
'mean'
or'no'
. Otherwise,ValueError
is raised.
- Returns
A variable holding the average negative log-likelihood of the input sequences.
- Return type
Note
See detail in the original paper: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.