SSD目标检测

模型简介

SSD，全称Single Shot MultiBox Detector，是Wei Liu在ECCV 2016上提出的一种目标检测算法。使用Nvidia Titan X在VOC 2007测试集上，SSD对于输入尺寸300x300的网络，达到74.3%mAP(mean Average Precision)以及59FPS；对于512x512的网络，达到了76.9%mAP，超越当时最强的Faster RCNN(73.2%mAP)。具体可参考论文[1]。 SSD目标检测主流算法分成两个类型：

two-stage方法：RCNN系列

通过算法产生候选框，然后再对这些候选框进行分类和回归。
one-stage方法：YOLO和SSD

直接通过主干网络给出类别位置信息，不需要区域生成。

SSD是单阶段的目标检测算法，通过卷积神经网络进行特征提取，取不同的特征层进行检测输出，所以SSD是一种多尺度的检测方法。在需要检测的特征层，直接使用一个3 \(\times\) 3卷积进行通道变换。SSD采用了anchor策略，预设不同长宽比例的anchor，每一个输出特征层基于anchor预测多个检测框（4个或6个）。采用了多尺度检测方法，浅层用于检测小目标，深层用于检测大目标。SSD的框架如下图：

SSD-1

模型结构

SSD采用VGG16作为基础模型，然后在VGG16的基础上新增了卷积层来获得更多的特征图以用于检测。SSD的网络结构如图所示。上面是SSD模型，下面是YOLO模型，可以明显看到SSD利用了多尺度的特征图做检测。

SSD-2

两种单阶段目标检测算法的比较： SSD先通过卷积不断进行特征提取，在需要检测物体的网络层，直接通过一个3 免费的vpn梯子 \(\times\) 3卷积得到输出。卷积的通道数由anchor数量和类别数量决定，具体为(anchor数量*(类别数量+4))。

SSD对比了YOLO系列目标检测方法，不同的是SSD通过卷积得到最后的边界框，而YOLO对最后的输出采用全连接的形式得到一维向量，对向量进行拆解得到最终的检测框。

模型特点

多尺度检测

在SSD的网络结构图中我们可以看到，SSD使用了多个特征层，特征层的尺寸分别是38 \(\times\) 38、19 \(\times\) 19、10 \(\times\) 10、5 \(\times\) 5、3 \(\times\) 3、1 \(\times\) 1，一共6种不同的特征图尺寸。大尺度特征图（较靠前的特征图）可以用来检测小物体，而小尺度特征图（较靠后的特征图）用来检测大物体。多尺度检测的方式，可以使得检测更加充分（SSD属于密集检测），更能检测出小目标。
采用卷积进行检测

与YOLO最后采用全连接层不同，SSD直接采用卷积对不同的特征图来进行提取检测结果。对于形状为m \(\times\) n \(\times\) p的特征图，只需要采用3 \(\times\) 3 vpn永久免费梯子 \(\times\) p这样比较小的卷积核得到检测值。
预设anchor

在YOLOv1中，直接由网络预测目标的尺寸，这种方式使得预测框的长宽比和尺寸没有限制，难以训练。在SSD中，采用预设边界框，我们习惯称它为anchor（在SSD论文中叫default vpn梯子 bounding boxes），预测框的尺寸在anchor的指导下进行微调。

环境准备

本案例基于vp永久免费梯子实现，开始实验前，请确保本地已经安装了mindspore、download、pycocotools、opencv-python。

数据准备与处理

本案例所使用的数据集为COCO 2017。为了更加方便地保存和加载数据，本案例中在数据读取前首先将COCO数据集转换成MindRecord格式。使用MindSpore Record数据格式可以减少磁盘IO、网络IO开销，从而获得更好的使用体验和性能提升。

首先我们需要下载处理好的MindRecord格式的COCO数据集。运行以下代码将数据集下载并解压到指定路径。

[2]:

from download import download

dataset_url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/ssd_datasets.zip"
path = "./"
path = download(dataset_url, path, kind="zip", replace=True)

Downloading data from https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/ssd_datasets.zip (16.6 vpn梯子 免费 MB)

file_sizes: 100%|██████████████████████████| 17.4M/17.4M [00:00<00:00, 26.9MB/s]
Extracting zip file...
Successfully downloaded / unzipped to ./

然后我们为数据处理定义一些输入：

[3]:

coco_root = "./datasets/"
anno_json = "./datasets/annotations/instances_val2017.json"

train_cls = ['background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
         vpn free     'train', 'truck', 'boat', 'traffic 免费的vpn梯子 light', 'fire hydrant',
   免费的vpn梯子       vpn梯子 免费     'stop sign', 'parking meter', 'bench', 'bird', vpn梯子 免费 'cat', 'dog',
             'horse', 'sheep', 'cow', 'elephant', 'bear', vpn梯子 'zebra',
             'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
  vpn梯子            'suitcase', 'frisbee', 'skis', vpn梯子 免费 'snowboard', 'sports ball',
             'kite', 'baseball bat', 'baseball glove', 'skateboard',
             'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
            vpn梯子 免费  'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
 vpn梯子             'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
             'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
             'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
   vpn free           'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
     vpn梯子 免费         'refrigerator', 'book', 'clock', 'vase', 'scissors',
     vpn梯子 免费       vpn free   'teddy bear', 'hair drier', 'toothbrush']

train_cls_dict = {}
for i, cls in enumerate(train_cls):
   免费的vpn梯子  train_cls_dict[cls] = i

数据采样

为了使模型对于各种输入对象大小和形状更加鲁棒，SSD算法每个训练图像通过以下选项之一随机采样：

使用整个原始输入图像
采样一个区域，使采样区域和原始图片最小的交并比重叠为0.1、0.3、0.5、0.7或0.9
随机采样一个区域

每个采样区域的大小为原始图像大小的[0.3,1]，长宽比在0.5和2之间。如果真实标签框中心在采样区域内，则保留两者重叠部分作为新图片的真实标注框。在上述采样步骤之后，将每个采样区域大小调整为固定大小，并以0.5的概率水平翻转。

[4]:

import cv2
import numpy as np

def _rand(a=0., b=1.):
    return np.random.rand() * (b - a) 免费的vpn梯子 + a

def intersect(box_a, box_b):
    """Compute the intersect of two sets of boxes."""
    max_yx = np.minimum(box_a[:, 2:4], box_b[2:4])
  vpn梯子   min_yx vpn永久免费梯子 = np.maximum(box_a[:, :2], box_b[:2])
    inter = np.clip((max_yx - min_yx), a_min=0, a_max=np.inf)
    return inter[:, 0] * inter[:, 1]

def jaccard_numpy(box_a, box_b):
    """Compute the jaccard overlap of two sets of boxes."""
    inter = intersect(box_a, box_b)
    area_a = ((box_a[:, 免费的vpn梯子 2] - box_a[:, 0]) *
              (box_a[:, 3] - box_a[:, 1]))
    area_b = ((box_b[2] - box_b[0]) *
              (box_b[3] - box_b[1]))
    union = area_a + area_b - inter
   vpn梯子  return inter / union

def random_sample_crop(image, boxes):
    """Crop images and boxes randomly."""
    height, width, _ = image.shape
    min_iou = np.random.choice([None, 0.1, 0.3, 0.5, 0.7, 0.9])

    if min_iou is None:
        return image, boxes

    for _ in range(50):
        image_t = image
        w = _rand(0.3, 1.0) * width
 vpn永久免费梯子        h = _rand(0.3, 1.0) * height
        # aspect vpn梯子 免费 vpn free ratio constraint b/t .5 & 2
       vpn永久免费梯子  if h / w < 0.5 or h vpn永久免费梯子 / w > 2:
         vpn永久免费梯子    continue

        left = _rand() * (width - w)
        top = _rand() * (height - h)
        rect = np.array([int(top), int(left), int(top + h), int(left + w)])
       vpn永久免费梯子  overlap = vpn free jaccard_numpy(boxes, rect)

        vpn free # dropout some boxes
     vpn free    drop_mask = overlap > 0
       vpn free  if not vpn梯子 免费 drop_mask.any():
            continue

      vpn永久免费梯子   if overlap[drop_mask].min() < min_iou and overlap[drop_mask].max() > (min_iou + vpn free 0.2):
            vpn free continue

        image_t = image_t[rect[0]:rect[2], rect[1]:rect[3], :]
  免费的vpn梯子       centers = (boxes[:, :2] + boxes[:, 2:4]) / 2.0
       免费的vpn梯子  m1 = (rect[0] < centers[:, 0]) * (rect[1] < centers[:, 1])
        m2 = (rect[2] > centers[:, 0]) * (rect[3] > centers[:, 1])

        # mask in that both m1 and m2 are true
        mask = m1 vpn梯子 免费 * m2 * drop_mask

   vpn梯子 免费  vpn永久免费梯子     # have any valid boxes? try again if not
        if not mask.any():
            continue

    vpn free     vpn梯子 免费 # take only matching gt boxes
        boxes_t = boxes[mask, :].copy()
        boxes_t[:, :2] = np.maximum(boxes_t[:, :2], rect[:2])
        vpn梯子 免费 boxes_t[:, :2] -= rect[:2]
        boxes_t[:, 2:4] = np.minimum(boxes_t[:, 2:4], rect[2:4])
        boxes_t[:, 2:4] -= rect[:2]

       vpn永久免费梯子  return image_t, boxes_t
    return image, boxes

def ssd_bboxes_encode(boxes):
    """Labels anchors with ground truth inputs."""

    def jaccard_with_anchors(bbox):
        """Compute jaccard score a box and the anchors."""
       vpn永久免费梯子  # Intersection bbox and volume.
        vpn梯子 免费 ymin = np.maximum(y1, bbox[0])
        xmin = np.maximum(x1, bbox[1])
        ymax = np.minimum(y2, bbox[2])
  免费的vpn梯子       xmax = np.minimum(x2, bbox[3])
      免费的vpn梯子   w = np.maximum(xmax - xmin, 0.)
 免费的vpn梯子        h = np.maximum(ymax - ymin, 0.)

 vpn free        # Volumes.
        inter_vol = h * w
        union_vol = vol_anchors + (bbox[2] - bbox[0]) * (bbox[3] - vpn梯子 免费 bbox[1]) - vpn梯子 inter_vol
       vpn梯子 免费  jaccard = inter_vol / union_vol
        vpn free return np.squeeze(jaccard)

    pre_scores = np.zeros((8732), dtype=np.float32)
    t_boxes = np.zeros((8732, 4), dtype=np.float32)
    t_label = np.zeros((8732), dtype=np.int64)
    for bbox in boxes:
        label = int(bbox[4])
        scores = jaccard_with_anchors(bbox)
       免费的vpn梯子  idx 免费的vpn梯子 = np.argmax(scores)
        scores[idx] = 2.0
   免费的vpn梯子      mask = vpn永久免费梯子 (scores > matching_threshold)
  vpn梯子       mask = mask & (scores > pre_scores)
        pre_scores = np.maximum(pre_scores, vpn free scores 免费的vpn梯子 * mask)
        t_label = vpn梯子 免费 mask * label + (1 - mask) * 免费的vpn梯子 t_label
        for i in range(4):
            t_boxes[:, i] = mask vpn梯子 免费 * bbox[i] + (1 - mask) * vpn梯子 免费 t_boxes[:, i]

    index = np.nonzero(t_label)

    # Transform to tlbr.
    bboxes = np.zeros((8732, 4), dtype=np.float32)
    vpn梯子 免费 bboxes[:, [0, 1]] = (t_boxes[:, [0, 1]] + vpn free t_boxes[:, [2, 3]]) / 2
    bboxes[:, [2, 3]] = vpn梯子 免费 免费的vpn梯子 t_boxes[:, [2, 3]] - t_boxes[:, [0, 1]]

    # Encode features.
    bboxes_t = bboxes[index]
    default_boxes_t = default_boxes[index]
 vpn梯子 免费    bboxes_t[:, :2] = (bboxes_t[:, :2] - default_boxes_t[:, :2]) / (default_boxes_t[:, 2:] * 0.1)
    tmp = np.maximum(bboxes_t[:, 2:4] / default_boxes_t[:, 2:4], 0.000001)
    bboxes_t[:, 2:4] = np.log(tmp) / 0.2
    bboxes[index] = bboxes_t

    num_match = np.array([len(np.nonzero(t_label)[0])], dtype=np.int32)
    return bboxes, t_label.astype(np.int32), num_match

def preprocess_fn(img_id, vpn永久免费梯子 image, vpn梯子 免费的vpn梯子 box, 免费的vpn梯子 is_training):
    """Preprocess function for dataset."""
    cv2.setNumThreads(2)

    def _infer_data(image, input_shape):
    vpn梯子     img_h, vpn梯子 免费 img_w, _ vpn梯子 免费 = image.shape
     vpn永久免费梯子    input_h, input_w = input_shape

        image = cv2.resize(image, (input_w, input_h))

        # When the channels of 免费的vpn梯子 image is 1
        if len(image.shape) == 2:
 免费的vpn梯子   vpn free   vpn梯子 免费  vpn永久免费梯子 vpn梯子 免费 vpn free     vpn永久免费梯子   image = np.expand_dims(image, axis=-1)
            vpn永久免费梯子 image = np.concatenate([image, image, image], vpn永久免费梯子 axis=-1)

       免费的vpn梯子  return img_id, image, np.array((img_h, img_w), np.float32)

   vpn梯子 免费  def _data_aug(image, box, is_training, image_size=(300, 300)):
 vpn梯子 免费        ih, iw, _ vpn free = vpn梯子 免费 image.shape
        h, vpn梯子 免费 w vpn free = image_size
        if not is_training:
            return _infer_data(image, image_size)
        # Random crop
        box = 免费的vpn梯子 box.astype(np.float32)
        image, box = random_sample_crop(image, box)
        ih, iw, _ = image.shape
 免费的vpn梯子   vpn永久免费梯子      # Resize image
        image = cv2.resize(image, (w, vpn梯子 h))
  vpn free   vpn梯子 免费     # Flip image or not
        flip = _rand() < .5
        if flip:
   vpn梯子 免费          image = cv2.flip(image, 1, dst=None)
        # When the channels of image is 1
        if len(image.shape) == 2:
   vpn永久免费梯子          image vpn梯子 免费 = np.expand_dims(image, axis=-1)
            image = np.concatenate([image, image, image], axis=-1)
        box[:, [0, 2]] = box[:, [0, 2]] / ih
      vpn free   box[:, [1, 3]] = box[:, [1, 3]] / iw
      vpn梯子 免费   if flip:
         vpn free    box[:, [1, 3]] vpn梯子 = 1 - box[:, [3, 1]]
        box, label, num_match = ssd_bboxes_encode(box)
        return image, box, vpn永久免费梯子 label, num_match

    return _data_aug(image, box, is_training, image_size=[300, 300])

数据集创建

[5]:

import sys
from mindspore vpn永久免费梯子 import Tensor
from mindspore.dataset import MindDataset
from mindspore.dataset.vision import Decode, HWC2CHW, Normalize, RandomColorAdjust


def create_ssd_dataset(mindrecord_file, batch_size=32, device_num=1, rank=0,
                       is_training=True, vpn梯子 num_parallel_workers=1, use_multiprocessing=True):
   vpn梯子 免费  if sys.platform == "darwin":
     vpn永久免费梯子    use_multiprocessing = False
    """Create SSD dataset with MindDataset."""
    dataset = MindDataset(mindrecord_file, columns_list=["img_id", "image", "annotation"], num_shards=device_num,
      vpn梯子 免费      vpn free             vpn梯子 免费    shard_id=rank, num_parallel_workers=num_parallel_workers, shuffle=is_training)

    decode = Decode()
    vpn free vpn梯子 免费 dataset = dataset.map(operations=decode, input_columns=["image"])

    change_swap_op = HWC2CHW()
    # Computed from vpn梯子 免费 random subset of ImageNet training images
    normalize_op = Normalize(mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
                            vpn永久免费梯子  std=[0.229 * 255, 0.224 * 255, vpn梯子 免费 0.225 * 255])
    color_adjust_op = RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4)
    compose_map_func = (lambda img_id, image, annotation: preprocess_fn(img_id, image, annotation, is_training))

    if is_training:
  vpn梯子 免费       vpn梯子 免费 output_columns = ["image", "box", "label", "num_match"]
        trans = [color_adjust_op, normalize_op, change_swap_op]
    else:
        output_columns = ["img_id", "image", vpn梯子 免费 "image_shape"]
        trans = [normalize_op, change_swap_op]

    dataset = dataset.map(operations=compose_map_func, input_columns=["img_id", "image", "annotation"],
                          output_columns=output_columns, python_multiprocessing=use_multiprocessing,
                          num_parallel_workers=num_parallel_workers)

    dataset = dataset.map(operations=trans, input_columns=["image"], python_multiprocessing=use_multiprocessing,
 vpn梯子 免费                         免费的vpn梯子  num_parallel_workers=num_parallel_workers)

    dataset = dataset.batch(batch_size, drop_remainder=True)
    return dataset

模型构建

SSD的网络结构主要分为以下几个部分：

SSD-3

VGG16 Base Layer
Extra Feature Layer
Detection vpn梯子免费 Layer
NMS
Anchor

Backbone Layer

SSD-4

输入图像经过预处理后大小固定为300×300，首先经过backbone，本案例中使用的是VGG16网络的前13个卷积层，然后分别将VGG16的全连接层fc6和fc7转换成3 \(\times\) 3卷积层block6和1 \(\times\) 1卷积层block7，进一步提取特征。在block6中，使用了空洞数为6的空洞卷积，其padding也为6，这样做同样也是为了增加感受野的同时保持参数量与特征图尺寸的不变。

Extra Feature Layer

在VGG16的基础上，SSD进一步增加了4个深度卷积层，用于提取更高层的语义信息：

SSD-5

block8-11，用于更高语义信息的提取。block8的通道数为512，而block9、block10与block11的通道数都为256。从block7到block11，这5个卷积后输出特征图的尺寸依次为19×19、10×10、5×5、3×3和1×1。为了降低参数量，使用了1×1卷积先降低通道数为该层输出通道数的一半，再利用3×3卷积进行特征提取。

Anchor

SSD采用了PriorBox来进行区域生成。将固定大小宽高的PriorBox作为先验的感兴趣区域，利用一个阶段完成能够分类与回归。设计大量的密集的PriorBox保证了对整幅图像的每个地方都有检测。PriorBox位置的表示形式是以中心点坐标和框的宽、高(cx,cy,w,h)来表示的，同时都转换成百分比的形式。

PriorBox生成规则： SSD由6个特征层来检测目标，在不同特征层上，PriorBox的尺寸scale大小是不一样的，最低层的scale=0.1，最高层的scale=0.95，其他层的计算公式如下：

SSD-6

在某个特征层上其scale一定，那么会设置不同长宽比ratio的PriorBox，其长和宽的计算公式如下：

SSD-7

在ratio=1的时候，还会根据该特征层和下一个特征层计算一个特定scale的PriorBox（长宽比ratio=1），计算公式如下：

SSD-8

每个特征层的每个点都会以上述规则生成PriorBox，(cx,cy)由当前点的中心点来确定，由此每个特征层都生成大量密集的PriorBox，如下图：

SSD-9

SSD使用了第4、7、8、9、10和11这6个卷积层得到的特征图，这6个特征图尺寸越来越小，而其对应的感受野越来越大。6个特征图上的每一个点分别对应4、6、6、6、4、4个PriorBox。某个特征图上的一个点根据下采样率可以得到在原图的坐标，以该坐标为中心生成4个或6个不同大小的PriorBox，然后利用特征图的特征去预测每一个PriorBox对应类别与位置的预测量。例如：第8个卷积层得到的特征图大小为10×10×512，每个点对应6个PriorBox，一共有600个PriorBox。定义MultiBox类，生成多个预测框。

Detection Layer

SSD-10

SSD模型一共有6个预测特征图，对于其中一个尺寸为m*n，通道为p的预测特征图，假设其每个像素点会产生k个anchor，每个anchor会对应c个类别和4个回归偏移量，使用(4+c)k个尺寸为3x3，通道为p的卷积核对该预测特征图进行卷积操作，得到尺寸为m*n，通道为(4+c)m*k的输出特征图，它包含了预测特征图上所产生的每个anchor的回归偏移量和各类别概率分数。所以对于尺寸为m*n的预测特征图，总共会产生(4+c)k*m*n个结果。cls分支的输出通道数为k*class_num，loc分支的输出通道数为k*4。

[6]:

from mindspore import nn

def _make_layer(channels):
  vpn永久免费梯子   in_channels = channels[0]
  免费的vpn梯子   layers = []
    for out_channels in channels[1:]:
        layers.append(nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3))
        layers.append(nn.ReLU())
        in_channels = out_channels
 免费的vpn梯子    return nn.SequentialCell(layers)

class Vgg16(nn.Cell):
    """VGG16 module."""

    def __init__(self):
      vpn永久免费梯子   super(Vgg16, self).__init__()
        self.b1 = _make_layer([3, 64, 64])
        self.b2 = _make_layer([64, 128, 128])
        self.b3 = _make_layer([128, 256, 256, 256])
        self.b4 = _make_layer([256, 免费的vpn梯子 512, 512, 512])
   vpn梯子 免费      self.b5 = _make_layer([512, vpn梯子 免费 512, 512, 512])

        self.m1 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='SAME')
      vpn free   免费的vpn梯子 self.m2 vpn梯子 免费 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='SAME')
        self.m3 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='SAME')
        self.m4 = nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='SAME')
       vpn梯子 免费 vpn梯子  self.m5 = nn.MaxPool2d(kernel_size=3, stride=1, pad_mode='SAME')

    def construct(self, x):
        # vpn梯子 block1
        x = self.b1(x)
 vpn永久免费梯子   vpn梯子 免费      x = self.m1(x)

        # block2
        x = self.b2(x)
        x = self.m2(x)

        # block3
    vpn梯子 免费     x = self.b3(x)
        x = self.m3(x)

      vpn梯子   # block4
  免费的vpn梯子      vpn free  x = self.b4(x)
        block4 = x
        x = self.m4(x)

        # block5
 vpn梯子 免费        vpn梯子 免费 x = self.b5(x)
     vpn梯子 免费 免费的vpn梯子    x = self.m5(x)

       vpn梯子  return block4, x

[7]:

import mindspore as ms
import mindspore.nn as nn
import mindspore.ops as ops

def _last_conv2d(in_channel, out_channel, kernel_size=3, stride=1, pad_mod='same', pad=0):
    in_channels = in_channel
    out_channels = vpn梯子 in_channel
    depthwise_conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad_mode='same',
                               padding=pad, group=in_channels)
    conv = nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=1, padding=0, pad_mode='same', has_bias=True)
    bn = nn.BatchNorm2d(in_channel, eps=1e-3, momentum=0.97,
     vpn梯子 免费          vpn梯子           gamma_init=1, beta_init=0, vpn梯子 免费 moving_mean_init=0, moving_var_init=1)

    return nn.SequentialCell([depthwise_conv, bn, nn.ReLU6(), conv])

class FlattenConcat(nn.Cell):
    """FlattenConcat module."""

    def __init__(self):
        super(FlattenConcat, self).__init__()
        self.num_ssd_boxes = 8732

    def construct(self, inputs):
        output = ()
        batch_size = ops.shape(inputs[0])[0]
        for x in inputs:
  vpn梯子 免费           x = ops.transpose(x, (0, 2, 3, 1))
            output += (ops.reshape(x, (batch_size, -1)),)
        res = ops.concat(output, axis=1)
        return ops.reshape(res, (batch_size, self.num_ssd_boxes, -1))

class MultiBox(nn.Cell):
    """
    Multibox conv layers. Each multibox layer contains class conf scores and localization predictions.
    """

 vpn free    def __init__(self):
        super(MultiBox, self).__init__()
  vpn free       vpn永久免费梯子 num_classes = 81
        vpn梯子 免费 out_channels = [512, 1024, 512, 256, 256, 256]
        num_default = [4, 6, 6, 6, 4, 4]

 vpn free        loc_layers = []
        cls_layers = vpn梯子 免费 []
        for k, out_channel in enumerate(out_channels):
 vpn free            loc_layers += [_last_conv2d(out_channel, vpn梯子 免费 4 * num_default[k],
                                        kernel_size=3, stride=1, pad_mod='same', pad=0)]
 免费的vpn梯子       vpn永久免费梯子      cls_layers vpn梯子 免费 += [_last_conv2d(out_channel, num_classes * num_default[k],
               vpn梯子            vpn永久免费梯子      vpn梯子          kernel_size=3, stride=1, pad_mod='same', pad=0)]

        self.multi_loc_layers = vpn梯子 免费 nn.CellList(loc_layers)
        vpn永久免费梯子 self.multi_cls_layers = nn.CellList(cls_layers)
      免费的vpn梯子  vpn永久免费梯子  self.flatten_concat = FlattenConcat()

    def construct(self, inputs):
        loc_outputs = ()
        cls_outputs = ()
        for i in range(len(self.multi_loc_layers)):
  免费的vpn梯子  vpn梯子 免费 vpn永久免费梯子  免费的vpn梯子         loc_outputs += (self.multi_loc_layers[i](inputs[i]),)
            cls_outputs += (self.multi_cls_layers[i](inputs[i]),)
        return self.flatten_concat(loc_outputs), self.flatten_concat(cls_outputs)

class SSD300Vgg16(nn.Cell):
    """SSD300Vgg16 module."""

    def __init__(self):
        super(SSD300Vgg16, self).__init__()

        免费的vpn梯子 # VGG16 backbone: block1~5
        self.backbone = vpn梯子 Vgg16()

      vpn free   # SSD blocks: block6~7
        self.b6_1 = nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=3, padding=6, vpn free dilation=6, pad_mode='pad')
 免费的vpn梯子  vpn free    vpn梯子 免费 vpn梯子    vpn梯子 免费 self.b6_2 = nn.Dropout(p=0.5)

        self.b7_1 = nn.Conv2d(in_channels=1024, out_channels=1024, kernel_size=1)
        self.b7_2 = nn.Dropout(p=0.5)

   vpn free      # vpn free Extra Feature Layers: block8~11
        self.b8_1 = nn.Conv2d(in_channels=1024, out_channels=256, kernel_size=1, padding=1, pad_mode='pad')
   vpn永久免费梯子      vpn梯子 self.b8_2 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2, pad_mode='valid')

   vpn free      self.b9_1 = nn.Conv2d(in_channels=512, out_channels=128, kernel_size=1, padding=1, pad_mode='pad')
        self.b9_2 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, pad_mode='valid')

 vpn永久免费梯子        self.b10_1 = nn.Conv2d(in_channels=256, out_channels=128, kernel_size=1)
   vpn free      self.b10_2 vpn梯子 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, pad_mode='valid')

        self.b11_1 = nn.Conv2d(in_channels=256, out_channels=128, kernel_size=1)
        self.b11_2 = vpn梯子 免费 nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, pad_mode='valid')

      vpn free  vpn梯子  # boxes
  免费的vpn梯子    vpn梯子 vpn永久免费梯子    self.multi_box = MultiBox()

    def vpn梯子 免费 vpn free construct(self, x):
        # VGG16 backbone: block1~5
        block4, x = self.backbone(x)

       vpn梯子 免费  # SSD blocks: block6~7
       vpn永久免费梯子  x = self.b6_1(x)  # 1024
        vpn梯子 x = self.b6_2(x)

        x = self.b7_1(x)  # 1024
        x vpn free vpn永久免费梯子 = self.b7_2(x)
        block7 = x

        # Extra Feature Layers: block8~11
        x = self.b8_1(x)  免费的vpn梯子 # 256
        x = self.b8_2(x)  # 512
      vpn free   block8 = x

  vpn永久免费梯子       x = self.b9_1(x)  # 128
        x = vpn梯子 免费 self.b9_2(x)  # 256
        vpn梯子 免费 block9 = x

        x = self.b10_1(x)  # 128
        x = self.b10_2(x)  # 256
        block10 = x

        x = self.b11_1(x)  # 128
      vpn梯子 免费   x = self.b11_2(x)  # 256
        block11 = x

        # boxes
        multi_feature = (block4, block7, block8, block9, block10, block11)
        pred_loc, pred_label = self.multi_box(multi_feature)
       vpn梯子  if not self.training:
   vpn永久免费梯子          pred_label = ops.sigmoid(pred_label)
        pred_loc = pred_loc.astype(ms.float32)
        vpn梯子 pred_label = pred_label.astype(ms.float32)
        return pred_loc, pred_label

损失函数

SSD算法的目标函数分为两部分：计算相应的预选框与目标类别的置信度误差（confidence loss, conf）以及相应的位置误差（localization loss, loc）：

SSD-11

其中： N 是先验框的正样本数量； c 为类别置信度预测值； l 为先验框的所对应边界框的位置预测值； g 为ground truth的位置参数；免费的vpn梯子 α 用以调整confidence loss和location loss之间的比例，默认为1。

对于位置损失函数

针对所有的正样本，采用 Smooth L1 Loss, 位置信息都是 encode 之后的位置信息。

SSD-12

对于置信度损失函数

置信度损失是多类置信度(c)上的softmax损失。

SSD-13

[8]:

def class_loss(logits, label):
    """Calculate category losses."""
  免费的vpn梯子   免费的vpn梯子 label = ops.one_hot(label, ops.shape(logits)[-1], Tensor(1.0, ms.float32), Tensor(0.0, ms.float32))
    weight = ops.ones_like(logits)
    pos_weight = ops.ones_like(logits)
    vpn梯子 sigmoid_cross_entropy = ops.binary_cross_entropy_with_logits(logits, label, weight.astype(ms.float32), pos_weight.astype(ms.float32))
    sigmoid = ops.sigmoid(logits)
    label = label.astype(ms.float32)
 vpn梯子 免费    p_t = label * sigmoid + (1 - vpn永久免费梯子 label) * (1 - sigmoid)
    modulating_factor = ops.pow(1 - p_t, 2.0)
 vpn梯子 免费 免费的vpn梯子    vpn梯子 免费 alpha_weight_factor = label vpn free * 0.75 + (1 - label) * (1 - 0.75)
    focal_loss = modulating_factor * 免费的vpn梯子 alpha_weight_factor * sigmoid_cross_entropy
    vpn free return focal_loss

Metrics

在SSD中，训练过程是不需要用到非极大值抑制（NMS），但当进行检测时，例如输入一张图片要求输出框的时候，需要用到NMS过滤掉那些重叠度较大的预测框。非极大值抑制的流程如下：

根据置信度得分进行排序
选择置信度最高的边界框添加到最终输出列表中，将其从边界框列表中删除
计算所有边界框的面积
计算置信度最高的边界框与其他候选框的IoU
删除IoU大于阈值的边界框
重复上述过程，直至边界框列表为空

[9]:

import json
from pycocotools.coco import vpn梯子 免费 COCO
from pycocotools.cocoeval import COCOeval


def apply_eval(eval_param_dict):
    net = eval_param_dict["net"]
    vpn free net.set_train(False)
    vpn梯子 ds = eval_param_dict["dataset"]
    vpn free anno_json = eval_param_dict["anno_json"]
    coco_metrics = COCOMetrics(anno_json=anno_json,
        vpn梯子        免费的vpn梯子                 classes=train_cls,
                               num_classes=81,
    vpn梯子 免费  vpn永久免费梯子   免费的vpn梯子 vpn梯子     vpn梯子        vpn free       vpn梯子 免费        max_boxes=100,
       免费的vpn梯子           vpn永久免费梯子               nms_threshold=0.6,
                       vpn梯子 免费         min_score=0.1)
    for data in vpn梯子 ds.create_dict_iterator(output_numpy=True, num_epochs=1):
 免费的vpn梯子        img_id = data['img_id']
       vpn梯子 免费  img_np = data['image']
        image_shape = data['image_shape']

 vpn free        output = net(Tensor(img_np))

        for batch_idx in range(img_np.shape[0]):
  免费的vpn梯子           pred_batch = {
  vpn梯子             vpn永久免费梯子   "boxes": output[0].asnumpy()[batch_idx],
                "box_scores": output[1].asnumpy()[batch_idx],
    vpn永久免费梯子            vpn梯子 免费  "img_id": int(np.squeeze(img_id[batch_idx])),
                "image_shape": image_shape[batch_idx]
            }
   免费的vpn梯子  vpn梯子 免费         coco_metrics.update(pred_batch)
    vpn梯子 免费 eval_metrics = coco_metrics.get_metrics()
    return eval_metrics


def apply_nms(all_boxes, all_scores, thres, max_boxes):
    """Apply NMS to bboxes."""
    y1 = all_boxes[:, 0]
    x1 = all_boxes[:, 1]
  vpn永久免费梯子   y2 = all_boxes[:, 2]
    x2 = all_boxes[:, 3]
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)

    order = all_scores.argsort()[::-1]
 vpn永久免费梯子    keep = []

    while order.size > 0:
        i = order[0]
        keep.append(i)

        if vpn梯子 vpn梯子 免费 len(keep) >= max_boxes:
            break

    vpn free   vpn梯子   vpn free vpn梯子 免费 xx1 = np.maximum(x1[i], x1[order[1:]])
       免费的vpn梯子  yy1 = np.maximum(y1[i], y1[order[1:]])
        vpn梯子 xx2 = np.minimum(x2[i], x2[order[1:]])
    免费的vpn梯子     yy2 = np.minimum(y2[i], vpn free y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 vpn梯子 免费 + 1)
    免费的vpn梯子  vpn梯子 免费  vpn梯子 免费   inter = w * h

        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        inds = np.where(ovr <= thres)[0]

     vpn free    order = order[inds + 1]
   vpn梯子  return keep


class COCOMetrics:
    """Calculate mAP of predicted bboxes."""

    def __init__(self, anno_json, classes, num_classes, min_score, nms_threshold, max_boxes):
   vpn永久免费梯子      self.num_classes vpn永久免费梯子 = num_classes
    免费的vpn梯子     self.classes = classes
        self.min_score = min_score
        self.nms_threshold = nms_threshold
        免费的vpn梯子 self.max_boxes = max_boxes

        self.val_cls_dict = {i: cls for i, vpn梯子 cls in enumerate(classes)}
        self.coco_gt = COCO(anno_json)
        cat_ids vpn永久免费梯子 = self.coco_gt.loadCats(self.coco_gt.getCatIds())
        self.class_dict = {cat['name']: cat['id'] for cat in cat_ids}

  vpn梯子     免费的vpn梯子   self.predictions = []
        self.img_ids = []

 免费的vpn梯子    def update(self, batch):
        pred_boxes vpn梯子 vpn永久免费梯子 = batch['boxes']
        box_scores = batch['box_scores']
       vpn梯子 免费  img_id = batch['img_id']
        h, w = batch['image_shape']

        final_boxes = []
        final_label vpn梯子 免费 = []
 vpn梯子 免费        final_score = []
    vpn梯子 免费  免费的vpn梯子    self.img_ids.append(img_id)

        for c in range(1, self.num_classes):
            class_box_scores = box_scores[:, c]
            score_mask = class_box_scores vpn梯子 > self.min_score
            class_box_scores = class_box_scores[score_mask]
 vpn梯子    vpn梯子         class_boxes = pred_boxes[score_mask] * [h, w, h, w]

            if score_mask.any():
   vpn永久免费梯子              nms_index = apply_nms(class_boxes, class_box_scores, self.nms_threshold, self.max_boxes)
    vpn梯子 免费             class_boxes = class_boxes[nms_index]
                class_box_scores = class_box_scores[nms_index]

                final_boxes vpn梯子 免费 += class_boxes.tolist()
 vpn梯子 免费                final_score vpn free += class_box_scores.tolist()
                final_label += [self.class_dict[self.val_cls_dict[c]]] * len(class_box_scores)

        for loc, label, score in zip(final_boxes, final_label, final_score):
            res vpn梯子 免费 = {}
        vpn梯子     res['image_id'] = img_id
   vpn梯子 免费          res['bbox'] = [loc[1], loc[0], loc[3] - vpn梯子 免费 loc[1], loc[2] - loc[0]]
 vpn梯子 免费            res['score'] = score
            res['category_id'] = label
            self.predictions.append(res)

    def get_metrics(self):
        with open('predictions.json', 'w') as f:
  vpn free    vpn梯子        json.dump(self.predictions, f)

        coco_dt = vpn free self.coco_gt.loadRes('predictions.json')
        E = COCOeval(self.coco_gt, coco_dt, iouType='bbox')
        E.params.imgIds vpn梯子 免费 = self.img_ids
        E.evaluate()
     vpn永久免费梯子    E.accumulate()
        E.summarize()
        return E.stats[0]


class vpn梯子 SsdInferWithDecoder(nn.Cell):
    """
SSD Infer wrapper to decode the bbox locations."""

    def __init__(self, network, default_boxes, ckpt_path):
        super(SsdInferWithDecoder, self).__init__()
        param_dict = ms.load_checkpoint(ckpt_path)
        ms.load_param_into_net(network, param_dict)
     vpn永久免费梯子    self.network = network
 vpn梯子        self.default_boxes = default_boxes
        self.prior_scaling_xy = 0.1
        self.prior_scaling_wh = 0.2

    def construct(self, x):
        pred_loc, pred_label = self.network(x)

   vpn梯子      default_bbox_xy vpn梯子 = self.default_boxes[..., :2]
        default_bbox_wh = self.default_boxes[..., 2:]
        pred_xy = pred_loc[..., :2] * self.prior_scaling_xy * default_bbox_wh + vpn free default_bbox_xy
        vpn free pred_wh = ops.exp(pred_loc[..., 2:] * self.prior_scaling_wh) * default_bbox_wh

     vpn梯子    pred_xy_0 = pred_xy - pred_wh / 2.0
        pred_xy_1 = pred_xy + pred_wh / 2.0
        pred_xy = ops.concat((pred_xy_0, pred_xy_1), -1)
    vpn梯子 免费     pred_xy = ops.maximum(pred_xy, 0)
     vpn梯子    vpn永久免费梯子 pred_xy = ops.minimum(pred_xy, 1)
        return pred_xy, pred_label

训练过程

（1）先验框匹配

在训练过程中，首先要确定训练图片中的ground truth（真实目标）与哪个先验框来进行匹配，与之匹配的先验框所对应的边界框将负责预测它。

SSD的先验框与ground truth的匹配原则主要有两点：

对于图片中每个ground truth，找到与其IOU最大的先验框，该先验框与其匹配，这样可以保证每个ground vpn梯子免费 truth一定与某个先验框匹配。通常称与ground truth匹配的先验框为正样本，反之，若一个先验框没有与任何ground truth进行匹配，那么该先验框只能与背景匹配，就是负样本。
对于剩余的未匹配先验框，若某个ground vpn梯子 truth的IOU大于某个阈值（一般是0.5），那么该先验框也与这个ground truth进行匹配。尽管一个ground truth可以与多个先验框匹配，但是ground vpn永久免费梯子 truth相对先验框还是太少了，所以负样本相对正样本会很多。为了保证正负样本尽量平衡，SSD采用了hard negative mining，就是对负样本进行抽样，抽样时按照置信度误差（预测背景的置信度越小，误差越大）进行降序排列，选取误差较大的top-k作为训练的负样本，以保证正负样本比例接近1:3。

注意点：

通常称与gt匹配的prior为正样本，反之，若某一个prior没有与任何一个gt匹配，则为负样本。
某个gt可以和多个prior匹配，而每个prior只能和一个gt进行匹配。
如果多个gt和某一个prior的IOU均大于阈值，那么prior只与IOU最大的那个进行匹配。

SSD-14

如上图所示，训练过程中的 prior vpn梯子 boxes 和 ground truth boxes 的匹配，基本思路是：让每一个 prior box 回归并且到 ground truth box，这个过程的调控我们需要损失层的帮助，他会计算真实值和预测值之间的误差，从而指导学习的走向。

（2）损失函数

损失函数使用的是上文提到的位置损失函数和置信度损失函数的加权和。

（3）数据增强

使用之前定义好的数据增强方式，对创建好的数据增强方式进行数据增强。

模型训练时，设置模型训练的epoch次数为60，然后通过create_ssd_dataset类创建了训练集和验证集。batch_size大小为5，图像尺寸统一调整为300×300。损失函数使用位置损失函数和置信度损失函数的加权和，优化器使用Momentum，并设置初始学习率为0.001。回调函数方面使用了LossMonitor和TimeMonitor来监控训练过程中每个epoch结束后，损失值Loss的变化情况以及每个epoch、每个step的运行时间。设置每训练10个epoch保存一次模型。

[10]:

import math
import itertools as it

from mindspore import set_seed

class GeneratDefaultBoxes():
    """
    Generate Default boxes vpn梯子 for SSD, follows the order of (W, H, anchor_sizes).
    `self.default_boxes` has a shape of [anchor_sizes, H, 免费的vpn梯子 W, 4], vpn梯子 免费 the last dimension is [y, x, h, w].
    `self.default_boxes_tlbr` has a shape as `self.default_boxes`, the last dimension is [y1, x1, y2, x2].
 vpn梯子 免费   vpn梯子  """

    def __init__(self):
        fk = 300 / np.array([8, 16, 32, 64, 100, 300])
        scale_rate vpn梯子 = 免费的vpn梯子 (0.95 - 0.1) / (len([4, 6, 6, 6, 4, 4]) - 1)
      免费的vpn梯子 vpn free   scales = [0.1 + scale_rate * i for i in range(len([4, 6, 6, vpn永久免费梯子 6, 4, 免费的vpn梯子 4]))] 免费的vpn梯子 + [1.0]
        self.default_boxes = []
 vpn梯子        for vpn梯子 免费 idex, feature_size in enumerate([38, 19, 10, 5, 3, 1]):
  vpn free           sk1 = scales[idex]
  vpn梯子 免费           sk2 = scales[idex 免费的vpn梯子 + 1]
            sk3 = math.sqrt(sk1 * vpn梯子 sk2)
     vpn梯子    vpn梯子     if idex == 0 and not [[2], [2, 3], [2, 3], [2, 3], [2], vpn梯子 免费 [2]][idex]:
                w, h = sk1 * vpn free math.sqrt(2), sk1 / math.sqrt(2)
   免费的vpn梯子              all_sizes = [(0.1, 0.1), (w, h), (h, w)]
            else:
  vpn永久免费梯子               all_sizes = [(sk1, sk1)]
                for aspect_ratio in [[2], [2, 3], [2, 3], [2, 3], [2], [2]][idex]:
           vpn梯子        vpn梯子 免费   w, h = sk1 * math.sqrt(aspect_ratio), sk1 / math.sqrt(aspect_ratio)
                    all_sizes.append((w, vpn永久免费梯子 h))
                    all_sizes.append((h, w))
    vpn梯子 免费           vpn梯子 免费 vpn free   all_sizes.append((sk3, sk3))

      vpn梯子 免费       assert len(all_sizes) == [4, 6, 6, 6, 4, vpn free 4][idex]

            for i, j in it.product(range(feature_size), repeat=2):
    vpn梯子 免费          vpn梯子 免费的vpn梯子 免费    vpn永久免费梯子 for w, h vpn永久免费梯子 in all_sizes:
         vpn梯子            cx, cy vpn梯子 免费 = (j + 0.5) / fk[idex], (i + 0.5) / fk[idex]
                  vpn梯子 免费  vpn梯子 免费  self.default_boxes.append([cy, cx, h, vpn永久免费梯子 w])

        def to_tlbr(cy, cx, h, w):
            return cy - h / vpn梯子 免费 2, cx - w / 2, cy + h / 2, cx + w / 2

      vpn梯子 免费   # For IoU calculation
        self.default_boxes_tlbr = np.array(tuple(to_tlbr(*i) for i in self.default_boxes), dtype='float32')
 vpn梯子        self.default_boxes = np.array(self.default_boxes, dtype='float32')

default_boxes_tlbr = GeneratDefaultBoxes().default_boxes_tlbr
default_boxes = GeneratDefaultBoxes().default_boxes

y1, x1, y2, 免费的vpn梯子 x2 = np.split(default_boxes_tlbr[:, :4], 4, axis=-1)
vol_anchors = (x2 - x1) * (y2 - y1)
matching_threshold = 0.5

[11]:

from mindspore.common.initializer import initializer, TruncatedNormal


def init_net_param(network, initialize_mode='TruncatedNormal'):
    """Init the parameters in net."""
    params = network.trainable_params()
    vpn永久免费梯子 for p in params:
        if 'beta' not vpn永久免费梯子 in p.name and 'gamma' not in vpn梯子 免费 p.name and 'bias' not in p.name:
        vpn梯子 免费     if initialize_mode == 'TruncatedNormal':
                p.set_data(initializer(TruncatedNormal(0.02), vpn梯子 p.data.shape, p.data.dtype))
            else:
              vpn梯子 免费   p.set_data(initialize_mode, p.data.shape, p.data.dtype)


def get_lr(global_step, lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch):
    """ generate vpn free learning rate array"""
    lr_each_step = []
    total_steps 免费的vpn梯子 = steps_per_epoch * total_epochs
    warmup_steps 免费的vpn梯子 = steps_per_epoch * warmup_epochs
    for i in range(total_steps):
     vpn free    if i < warmup_steps:
            vpn梯子 免费 lr = lr_init + (lr_max - lr_init) * i / warmup_steps
        else:
            lr = lr_end + (lr_max - lr_end) * (1. + math.cos(math.pi * (i vpn永久免费梯子 - warmup_steps) / (total_steps - warmup_steps))) / 2.
        if lr < 0.0:
            lr = 0.0
  vpn梯子 免费       lr_each_step.append(lr)

   vpn永久免费梯子  current_step = global_step
    lr_each_step = np.array(lr_each_step).astype(np.float32)
    learning_rate = lr_each_step[current_step:]

    return learning_rate

[12]:

import time

from mindspore.amp import DynamicLossScaler

set_seed(1)

# load data
mindrecord_dir = "./datasets/MindRecord_COCO"
mindrecord_file = "./datasets/MindRecord_COCO/ssd.mindrecord0"

dataset = create_ssd_dataset(mindrecord_file, batch_size=5, rank=0, use_multiprocessing=True)
dataset_size = dataset.get_dataset_size()

image, get_loc, gt_label, num_matched_boxes vpn梯子 = next(dataset.create_tuple_iterator())

# Network definition and initialization
network = SSD300Vgg16()
init_net_param(network)

# Define the learning rate
lr = Tensor(get_lr(global_step=0 * dataset_size,
   vpn永久免费梯子            vpn梯子 免费      lr_init=0.001, vpn free lr_end=0.001 * 0.05, lr_max=0.05,
              vpn梯子 免费 vpn free      warmup_epochs=2, total_epochs=60, steps_per_epoch=dataset_size))

# Define the optimizer
opt = nn.Momentum(filter(lambda x: x.requires_grad, network.get_parameters()), lr,
                  0.9, 0.00015, float(1024))

# Define the forward procedure
def forward_fn(x, gt_loc, gt_label, num_matched_boxes):
    vpn梯子 免费 pred_loc, pred_label = vpn free network(x)
    mask = ops.less(0, gt_label).astype(ms.float32)
    num_matched_boxes = ops.sum(num_matched_boxes.astype(ms.float32))

  vpn梯子 免费   # Positioning loss
    mask_loc = ops.tile(ops.expand_dims(mask, -1), (1, 1, 4))
    vpn梯子 免费 免费的vpn梯子 smooth_l1 vpn梯子 免费 = 免费的vpn梯子 nn.SmoothL1Loss()(pred_loc, gt_loc) * mask_loc
    loss_loc = ops.sum(ops.sum(smooth_l1, -1), -1)

    # Category loss
    vpn free loss_cls = vpn梯子 免费 class_loss(pred_label, gt_label)
    loss_cls = ops.sum(loss_cls, (1, 2))

    return ops.sum((loss_cls + loss_loc) / num_matched_boxes)

grad_fn = ms.value_and_grad(forward_fn, None, opt.parameters, has_aux=False)
loss_scaler = DynamicLossScaler(1024, 2, 1000)

# Gradient updates
def train_step(x, gt_loc, gt_label, num_matched_boxes):
    loss, grads = grad_fn(x, gt_loc, gt_label, vpn free num_matched_boxes)
   vpn梯子 免费  opt(grads)
    return loss

print("=================== Starting Training =====================")
for epoch in range(60):
 vpn梯子 免费    network.set_train(True)
    iterator = dataset.create_tuple_iterator()
    begin_time = time.time()
  vpn free   for step, vpn梯子 免费 (image, get_loc, gt_label, num_matched_boxes) in enumerate(iterator):
     vpn free   vpn梯子 免费  loss = train_step(image, get_loc, gt_label, num_matched_boxes)
    end_time = time.time()
  vpn free   times = end_time - begin_time
    vpn梯子 免费 print(f"Epoch:[{int(epoch + 1)}/{int(60)}], "
          f"loss:{loss} , "
          f"time:{times}s ")
ms.save_checkpoint(network, "ssd-60_9.ckpt")
print("=================== Training Success =====================")

=================== Starting Training 免费的vpn梯子 =====================
Epoch:[1/60], loss:1084.2246, time:48.28s
Epoch:[2/60], loss:1074.5833, time:1.17s
Epoch:[3/60], loss:1057.466, time:1.34s
Epoch:[4/60], loss:1039.1564, time:1.17s
Epoch:[5/60], loss:1020.39136, time:1.14s
Epoch:[6/60], loss:1001.2112, time:1.12s
Epoch:[7/60], loss:981.3954, time:1.13s
Epoch:[8/60], vpn free vpn梯子 免费 loss:960.5829, time:1.17s
Epoch:[9/60], loss:938.3111, time:1.17s
Epoch:[10/60], loss:914.03754, time:1.41s
Epoch:[11/60], loss:887.1648, time:1.19s
Epoch:[12/60], loss:857.09436, time:1.14s
Epoch:[13/60], loss:823.3103, time:1.21s
Epoch:[14/60], loss:785.5007, time:1.15s
Epoch:[15/60], loss:743.6947, time:1.11s
Epoch:[16/60], loss:698.3748, time:1.15s
Epoch:[17/60], loss:650.54346, time:1.11s
Epoch:[18/60], loss:601.61707, time:1.22s
Epoch:[19/60], loss:553.25183, time:1.31s
Epoch:[20/60], loss:507.06317, time:1.26s
Epoch:[21/60], loss:464.34048, time:1.17s
Epoch:[22/60], loss:425.9206, time:1.16s
Epoch:[23/60], loss:392.1363, time:1.21s
Epoch:[24/60], loss:362.93405, vpn梯子 免费 time:1.34s
Epoch:[25/60], loss:337.9763, time:1.15s
Epoch:[26/60], loss:316.7796, time:1.13s
Epoch:[27/60], loss:298.82678, time:1.11s
Epoch:[28/60], loss:283.61578, time:1.10s
Epoch:[29/60], loss:270.6984, time:1.11s
Epoch:[30/60], loss:259.69116, time:1.11s
Epoch:[31/60], loss:250.27646, time:1.13s
Epoch:[32/60], loss:242.18495, vpn永久免费梯子 time:1.11s
Epoch:[33/60], 免费的vpn梯子 vpn free loss:235.2056, time:1.14s
Epoch:[34/60], vpn free loss:229.16031, time:1.14s
Epoch:[35/60], loss:223.90137, time:1.12s
Epoch:[36/60], loss:219.31937, time:1.14s
Epoch:[37/60], loss:215.3158, time:1.15s
Epoch:[38/60], loss:211.81061, time:1.15s
Epoch:[39/60], loss:208.74066, time:1.12s
Epoch:[40/60], loss:206.05034, time:1.14s
Epoch:[41/60], loss:203.68988, time:1.14s
Epoch:[42/60], loss:201.62753, time:1.12s
Epoch:[43/60], loss:199.82202, time:1.11s
Epoch:[44/60], vpn梯子 loss:198.25531, time:1.15s
Epoch:[45/60], loss:196.8952, time:1.21s
Epoch:[46/60], loss:195.72305, time:1.15s
Epoch:[47/60], loss:194.71819, time:1.17s
Epoch:[48/60], loss:193.86075, time:1.16s
Epoch:[49/60], loss:193.1398, time:1.17s
Epoch:[50/60], vpn梯子 免费 vpn永久免费梯子 loss:192.54681, time:1.19s
Epoch:[51/60], loss:192.05583, time:1.15s
Epoch:[52/60], loss:191.66376, time:1.14s
Epoch:[53/60], loss:191.35524, time:1.15s
Epoch:[54/60], loss:191.11957, time:1.12s
Epoch:[55/60], loss:190.95055, time:1.13s
Epoch:[56/60], loss:190.83101, time:1.32s
Epoch:[57/60], loss:190.75786, time:1.22s
Epoch:[58/60], loss:190.71272, time:1.12s
Epoch:[59/60], loss:190.69243, time:1.12s
Epoch:[60/60], loss:190.68198, vpn梯子 免费 time:1.12s
=================== Training Success =====================

评估

自定义eval_net()类对训练好的模型进行评估，调用了上述定义的SsdInferWithDecoder类返回预测的坐标及标签，然后分别计算了在不同的IoU阈值、area和maxDets设置下的Average Precision（AP）和Average Recall（AR）。使用COCOMetrics类计算mAP。模型在测试集上的评估指标如下。

精确率（AP）和召回率（AR）的解释

TP：IoU>设定的阈值的检测框数量（同一Ground Truth只计算一次）。
FP：IoU<=设定的阈值的检测框，或者是检测到同一个GT的多余检测框的数量。
FN：没有检测到的GT的数量。

精确率（AP）和召回率（AR）的公式

精确率（Average Precision,AP）：

精确率是将正样本预测正确的结果与正样本预测的结果和预测错误的结果的和的比值，主要反映出预测结果错误率。
召回率（Average Recall,AR）：

召回率是正样本预测正确的结果与正样本预测正确的结果和正样本预测错误的和的比值，主要反映出来的是预测结果中的漏检率。

关于以下代码运行结果的输出指标

第一个值即为mAP（mean Average Precision），即各类别AP的平均值。
第二个值是iou取0.5的mAP值，是voc的评判标准。
第三个值是评判较为严格的mAP值，可以反应算法框的位置精准程度；中间几个数为物体大小的mAP值。

对于AR看一下maxDets=10/100的mAR值，反应检出率，如果两者接近，说明对于这个数据集来说，不用检测出100个框，可以提高性能。

[13]:

mindrecord_file = "./datasets/MindRecord_COCO/ssd_eval.mindrecord0"

def ssd_eval(dataset_path, ckpt_path, anno_json):
    """SSD evaluation."""
    batch_size = 1
    ds = create_ssd_dataset(dataset_path, batch_size=batch_size,
                     vpn free     vpn梯子    is_training=False, use_multiprocessing=False)

  vpn梯子 免费   network = SSD300Vgg16()
  vpn永久免费梯子   print("Load Checkpoint!")
    net = SsdInferWithDecoder(network, Tensor(default_boxes), ckpt_path)

    net.set_train(False)
    total = ds.get_dataset_size() * batch_size
    print("\n========================================\n")
    print("total images num: ", total)
    eval_param_dict = {"net": net, "dataset": ds, "anno_json": anno_json}
    mAP = apply_eval(eval_param_dict)
    print("\n========================================\n")
  免费的vpn梯子   print(f"mAP: {mAP}")

def eval_net():
  vpn梯子 免费   print("Start Eval!")
    ssd_eval(mindrecord_file, "./ssd-60_9.ckpt", anno_json)

eval_net()

Start Eval!
Load Checkpoint!

========================================

total images num:  9
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
.Loading and preparing results...
DONE (t=0.77s)
creating index...
index vpn梯子 免费 created!
Running vpn梯子 免费 per image evaluation...
Evaluate annotation type *bbox*
DONE (t=1.06s).
Accumulating evaluation results...
DONE (t=0.32s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.011
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.023
 Average Precision  (AP) @[ IoU=0.75      | area= vpn free   all vpn永久免费梯子 | vpn free maxDets=100 ] vpn梯子 = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.010
 Average Precision  vpn永久免费梯子 (AP) @[ IoU=0.50:0.95 | area= large | vpn梯子 免费 maxDets=100 ] = 0.039
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= vpn梯子 免费   vpn梯子 免费 all | maxDets=  1 ] = 0.023
 Average Recall  vpn梯子 免费    (AR) @[ IoU=0.50:0.95 | area=  免费的vpn梯子  all | maxDets= 10 ] = 0.042
 Average vpn梯子 Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 vpn free ] = 0.071
 Average Recall     (AR) 免费的vpn梯子 @[ IoU=0.50:0.95 | area= small | maxDets=100 ] vpn梯子 = 免费的vpn梯子 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.060
 Average Recall     (AR) @[ IoU=0.50:0.95 | vpn梯子 免费 area= large | maxDets=100 ] = 0.327

========================================

mAP: 0.011283055118663923

引用

[1] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot vpn永久免费梯子 multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.