https://htn20190109.hatenablog.com/entry/2025/12/07/022207

https://qiita.com/DeepTama/items/aab46729d2aa51a8954d
https://arxiv.org/pdf/1512.02325

・SSD
複数の特徴マップを使用する。
小さな物体の検出や低解像度の画像処理可能。
ベースはVGG-16。
ハードネガティブマイニングを使用する。
浅い層から小さい物体を、深い層から大きな物体を検出する。

・SSD処理概要
1. 画像を300x300にリサイズする
2. 8732個のデフォルトボックスを用意する
3. 異なるサイズの特徴マップからオフセット情報と信頼度を出力する
4. 信頼度の高いデフォルトボックスを抽出する
5. 一定の信頼度以上のバウンディングボックスを最終出力とする

・ハードネガティブマイニング
物体のクラス損失を降順に並べ、上位からピックアップする。
「物体ラベルが存在するバウンディングボックス」:「物体ラベルが存在しないバウンティングボックス」 = 3:1 に調整する

・SSDの損失関数
位置特定誤差 + 確信度誤差

$\displaystyle L(x, c, l, g) = \dfrac{1}{N} \left( L_{\text{conf}}(x, c) + \alpha \, L_{\text{loc}}(x, l, g) \right)$

確信度誤差 (クラスの誤差 + 背景の識別誤差)
$\displaystyle L_{\text{conf}}(x, c) = - \sum_{i \in \text{Pos}}^{N} x_{ij}^{p} \log(\hat{c}_i^{p}) - \sum_{i \in \text{Neg}} \log(\hat{c}_i^{0})$

位置特定誤差
$\displaystyle L_{\text{loc}}(x, l, g) = \sum_{i \in \text{Pos}}^{N} \sum_{m \in \{cx, cy, w, h\}} x_{ij}^{k} \, \text{smooth}_{L1}(l_i^m - \hat{g}_j^m)$

Smooth L1 関数
$\displaystyle \text{smooth}_{L1}(x) = \begin{cases} 0.5x^2 & (|x| < 1) \\ |x| - 0.5 & (\text{otherwise}) \end{cases}$