Links¶
In order to write neural networks, we have to combine functions with parameters and optimize the parameters.
You can use the class Link
to do this.
A Link
is an object that holds parameters (i.e. optimization targets).
The most fundamental ones are links that behave like regular functions while replacing some arguments by their parameters. We will introduce higher level links, but here think of links as simply functions with parameters.
One of the most frequently used links is the Linear
link (a.k.a. fully-connected layer or affine transformation).
It represents a mathematical function \(f(x) = Wx + b\), where the matrix \(W\) and the vector \(b\) are parameters.
This link corresponds to its pure counterpart linear()
, which accepts \(x, W, b\) as arguments.
A linear link from three-dimensional space to two-dimensional space is defined by the following line:
>>> f = L.Linear(3, 2)
Note
Most functions and links only accept mini-batch input, where the first dimension of the input array is considered as the batch dimension. In the above Linear link case, input must have shape of \((N, 3)\), where \(N\) is the mini-batch size.
The parameters of a link are stored as attributes.
Each parameter is an instance of Variable
.
In the case of the Linear link, two parameters, W
and b
, are stored.
By default, the matrix W
is initialized randomly, while the vector b
is initialized with zeros.
This is the preferred way to initialize these parameters.
>>> f.W.array
array([[ 1.0184761 , 0.23103087, 0.5650746 ],
[ 1.2937803 , 1.0782351 , -0.56423163]], dtype=float32)
>>> f.b.array
array([0., 0.], dtype=float32)
An instance of the Linear link acts like a usual function:
>>> x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
>>> y = f(x)
>>> y.array
array([[3.1757617, 1.7575557],
[8.619507 , 7.1809077]], dtype=float32)
Note
Sometimes it is cumbersome to compute the dimension of the input space. The linear link and some of (de)convolution links can omit the input dimension in their instantiation and infer it from the first mini-batch.
For example, the following line creates a linear link whose output dimension is two:
>>> f = L.Linear(2)
If we feed a mini-batch of shape \((2, M)\), the input dimension will be inferred as M
,
which means l.W
will be a 2 x M matrix.
Note that its parameters are initialized in a lazy manner at the first mini-batch.
Therefore, l
does not have W
attribute if no data is put to the link.
Gradients of parameters are computed by the backward()
method.
Note that gradients are accumulated by the method rather than overwritten.
So first you must clear the gradients to renew the computation.
It can be done by calling the cleargrads()
method.
>>> f.cleargrads()
Note
cleargrads()
is introduced in v1.15 to replace zerograds()
for efficiency.
zerograds()
is left only for backward compatibility.
Now we can compute the gradients of parameters by simply calling the backward method and access them via the grad
property.
>>> y.grad = np.ones((2, 2), dtype=np.float32)
>>> y.backward()
>>> f.W.grad
array([[5., 7., 9.],
[5., 7., 9.]], dtype=float32)
>>> f.b.grad
array([2., 2.], dtype=float32)