chainer.functions.spatial_pyramid_pooling_2d¶

chainer.functions.spatial_pyramid_pooling_2d(x, pyramid_height, pooling=None)[source]¶

Spatial pyramid pooling function.

It outputs a fixed-length vector regardless of input feature map size.

It performs pooling operation to the input 4D-array x with different kernel sizes and padding sizes, and then flattens all dimensions except first dimension of all pooling results, and finally concatenates them along second dimension.

At $i$ -th pyramid level, the kernel size $(k_h^{(i)}, k_w^{(i)})$ and padding size $(p_h^{(i)}, p_w^{(i)})$ of pooling operation are calculated as below:

$\begin{split}k_h^{(i)} &= \lceil b_h / 2^i \rceil, \\ k_w^{(i)} &= \lceil b_w / 2^i \rceil, \\ p_h^{(i)} &= (2^i k_h^{(i)} - b_h) / 2, \\ p_w^{(i)} &= (2^i k_w^{(i)} - b_w) / 2,\end{split}$

where $\lceil \cdot \rceil$ denotes the ceiling function, and $b_h, b_w$ are height and width of input variable x, respectively. Note that index of pyramid level $i$ is zero-based.

See detail in paper: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

Parameters

x (Variable) – Input variable. The shape of x should be (batchsize, # of channels, height, width).
pyramid_height (int) – Number of pyramid levels
pooling (str) – Currently, only max is supported, which performs a 2d max pooling operation.

Returns

Output variable. The shape of the output variable will be $(batchsize, c \sum_{h=0}^{H-1} 2^{2h}, 1, 1)$ , where $c$ is the number of channels of input variable x and $H$ is the number of pyramid levels.

Return type

Variable