プランクトン画像分類で1位になったチームの解説ブログのメモ（2）

<a href="http://benanne.github.io/2015/03/17/plankton.html">Classifying plankton with deep neural networks</a>benanne.github.io

プランクトン画像分類で1位を取ったチームの解説ブログを読み進めました。今回は、ネットワークアーキテクチャの所。

Kaggleで1位を取るためには、これくらいの改良をする必要があるんですね...勉強になります。

Network architecture

OxfordNetに強くインスパイアされたアーキテクチャ
- 最終的には16層のNNで構成
  - 6層から始めて、精度向上を確かめながら1層ずつ追加
- 補足: OxfordNet
  - Input: 224x224, RGB
    - 前処理でRGBの平均値を各ピクセルから引いておく
  - conv layer: 3x3. strideは1ピクセル. 出力は入力と同サイズ.
  - pooling layer: 5つのmax pooling層. 2x2. strideは2ピクセル
  - fully connected layer: 4096チャネルが2層と1000チャネルが1層
  - softmax layer
  - 隠れ層は全部ReLU
Cyclic pooling
- 概要
  - 回転に対する不変性を獲得するため
  - 0度, 90度, 180度, 270度の4種類の回転を適用した画像
  - 4画像を並列処理、出力される特徴マップをpooling(4-way cyclic pooling)
- 実装は効率的
  - 4種類の回転処理に内挿は不要
- バッチサイズを4分の1に削減
- root-mean-square poolingが一番よい結果
  - mean poolingとmax poolingと比較して
- 8-way cyclic poolingも実施
  - 0度, 45度の入力画像を生成
  - 8-way cyclic pooling + dihedral poolingはうまくいかなかった
Rolling feature maps
- 4-way cyclic poolingで出力された特徴マップを結合、次層の入力とする
- 結合操作をroll operationと呼ぶ
  - 0+90+180+270, 90+180+270+0, 180+270+0+90, 270+0+90+180
- 結果として実際の4倍のFilterを持つことに相当する
- CUDAとTheanoをPyCUDAで連携させて実装
- roll operationなしでcyclic poolingを含むNetworkは可能だが、cyclic poolingなしでroll operationを含むNetworkは出来ない
Nonlinearites
- leaky ReLUsを使うと最も良い結果が得られた
  - y = max(x, a*x)
  - aはスケール係数で0にすれば通常のReLU
  - 最終的にa = 1/3とした
Spatial pooling
- 最終的にpooling層は4つ使用した
- 2x2のmax-poolingからトライしたが、最終的には3x3でstride2(3x3s2)のmax-poolingを使った
  - 計算コストを大きく上げることなく、大きいサイズの入力画像を使うことができた
    - 5x5の特徴マップを得るのに4層の2x2のmax-poolingだと入力画像サイズは80x80
    - 3x3s2のmax-poolingだと95x95で同じ5x5の特徴マップを得ることが出来る
Multiscale architectures
- 画像サイズに基づいてリスケールした入力画像を使うネットワークと、固定係数でリスケールした入力画像を使う小さいネットワークとの組み合わせがベスト
Additional image features
- 画像特徴を抽出して入力とするネットワークを学習
  - softmax層の手前で結合するので'late fusing'と呼ぶ
- 試した特徴の例(太字は最終的に使用した特徴)
  - Image size in pixels
  - Size and shape estimates based on image moments
  - Hu moments
  - Zernike moments
  - Parameter Free Threshold Adjacency Statistics
  - Linear Binary Patterns
  - Haralick texture features
  - Feature from the competition tutorial
  - Combinations of the above
- 太字の3つの特徴は精度向上に際立っていた
Example convnet architecture
- convolution層: 10
- fully connected層: 3
- spatial pooling層: 4
- 入力: (32, 1, 95, 95) = (batch size, num of channels, height, width)
- 出力: (32, 121) = 121クラスに対する確率

Layer type	Size	Output shape
cyclic slice(回転でbatch size4倍)		(128, 1, 95, 95)
convolution	32 3x3 filters	(128, 32, 95, 95)
convolution	16 3x3 filters	(128, 16, 95, 95)
max pooling	3x3, stride 2	(128, 16, 47, 47)
cyclic roll(回転でchannel数4倍)		(128, 64, 47, 47)
convolution	64 3x3 filters	(128, 64, 47, 47)
convolution	32 3x3 filters	(128, 32, 47, 47)
max pooling	3x3 stride 2	(128, 32, 23, 23)
cyclic roll		(128, 128, 23, 23)
convolution	128, 3x3 filters	(128, 128, 23, 23)
convolution	128, 3x3 filters	(128, 128, 23, 23)
convolution	64, 3x3 filters	(128, 64, 23, 23)
max pooling	3x3 stride 2	(128, 64, 11, 11)
cyclic roll		(128, 256, 11, 11)
convolution	256, 3x3 filters	(128, 256, 11, 11)
convolution	256, 3x3 filters	(128, 256, 11, 11)
convolution	128, 3x3 filters	(128, 128, 11, 11)
max pooling	3x3 stride 2	(128, 128, 5, 5)
cyclic roll		(128, 512, 5, 5)
fully connected	512, 2 piece maxout units	(128, 512)
cyclic pooling(4方向の特徴マップを統合)		(32, 512)
fully connected	512 2 piece maxout units	(32, 512)
fully connected	121-way softmax	(32, 121)

stMind

about Tech, Computer vision and Machine learning

プランクトン画像分類で1位になったチームの解説ブログのメモ（2）

Network architecture