CN109784349B

CN109784349B - Image target detection model establishing method, device, storage medium and program product

Info

Publication number: CN109784349B
Application number: CN201811592967.8A
Authority: CN
Inventors: 蔡巍; 胡佳慧; 崔朝辉; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Beijing Xinyuan Technology Co ltd
Priority date: 2018-12-25
Filing date: 2018-12-25
Publication date: 2021-02-19
Anticipated expiration: 2038-12-25
Also published as: CN109784349A

Abstract

The present invention provides a method and device for establishing an image target detection model. The occlusion image samples are used to train the feature occlusion confrontation network model, and the occlusion mask of the image sample is obtained through the feature occlusion confrontation network model. In this way, the detection network model is performed. During the training, an occlusion mask is added to the feature map of the image samples used for training, and the occlusion mask is obtained by using the trained feature occlusion adversarial network model. In this way, since the feature occlusion adversarial network model is obtained by training occlusion image samples, a feature occlusion adversarial network model that generates better masks can be trained, and then the feature occlusion adversarial network is used to obtain an occlusion mask for training the detection network model. Therefore, the detection network model can be trained by using image samples with better occlusion masks, so that the detection network model can be fully trained for the occlusion situation, and the accuracy of the detection network model for the detection of occluded objects can be improved.

Description

Image target detection model establishment method, device, storage medium and program product

技术领域technical field

本发明涉及人工智能领域，特别涉及一种图像目标检测模型的建立方法及装置、存储介质及程序产品。The invention relates to the field of artificial intelligence, in particular to a method and device for establishing an image target detection model, a storage medium and a program product.

背景技术Background technique

目前，基于深度卷积神经网络的检测算法成为图像目标检测的主流方法，例如YOLO(You Only Look Once)和SSD(Single Shot MultiBox Detector)算法等，其将直接将目标边框定位的问题转化为回归问题进行处理，具有更快的检测速度。在图像目标检测的应用中，图像中的待检测物体存在遮挡的情形，而目前的检测模型较少考虑遮挡情形，无法实现对遮挡物体的准确检测。At present, detection algorithms based on deep convolutional neural networks have become the mainstream methods of image target detection, such as YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector) algorithms, which directly transform the problem of target frame positioning into regression. Problems are processed with faster detection speed. In the application of image target detection, the object to be detected in the image is occluded, and the current detection model seldom considers the occlusion situation and cannot achieve accurate detection of occluded objects.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种图像目标检测模型的建立方法及装置、存储介质及程序产品，提高对遮挡物体检测的准确性。In view of this, the purpose of the present invention is to provide a method and device for establishing an image target detection model, a storage medium and a program product, so as to improve the accuracy of the detection of occluded objects.

为实现上述目的，本发明有如下技术方案：For achieving the above object, the present invention has the following technical solutions:

一种图像目标检测的模型建立方法，包括：A model building method for image target detection, comprising:

对第一图像样本进行遮挡，以获得第一遮挡图像样本；occluding the first image sample to obtain a first occlusion image sample;

利用所述第一遮挡图像样本进行特征遮挡对抗网络模型的训练，所述特征遮挡对抗网络模型用于基于对抗网络获得图像样本的遮挡掩码；Use the first occlusion image sample to train a feature occlusion confrontation network model, where the feature occlusion confrontation network model is used to obtain an occlusion mask of the image sample based on the confrontation network;

进行检测网络模型的训练，且训练用的第二图像样本的特征图利用训练后的特征遮挡对抗网络模型添加有遮挡掩码，所述检测网络模型用于基于深度学习的图像目标检测。A detection network model is trained, and an occlusion mask is added to the feature map of the second image sample used for training using the trained feature occlusion confrontation network model, and the detection network model is used for image target detection based on deep learning.

可选地，还包括：利用添加有遮挡掩码的特征图作为图像样本，继续进行特征遮挡对抗网络模型的训练。Optionally, the method further includes: using the feature map added with the occlusion mask as an image sample to continue the training of the feature occlusion confrontation network model.

可选地，所述对第一图像样本进行遮挡，包括：Optionally, the blocking of the first image sample includes:

确定第一图像样本中具有最大定位准确度的候选框；determining the candidate frame with the maximum positioning accuracy in the first image sample;

将预设大小的滑动窗口映射到图像样本，并利用背景像素填充滑动窗口所在位置的图像区域；Map a sliding window of preset size to an image sample, and use background pixels to fill the image area where the sliding window is located;

利用检测网络模型进行填充后的图像样本的检测，将与候选框具有最大检测网络损失的所在位置处的滑动窗口作为第一图像样本的遮挡位置，以获得第一遮挡图像样本。The detection network model is used to detect the filled image samples, and the sliding window at the position where the candidate frame has the largest detection network loss is used as the occlusion position of the first image sample to obtain the first occlusion image sample.

可选地，所述进行检测网络模型的训练中，利用非极大值抑制算法确定预测框，其中，所述非极大值抑制算法中的预测框评价指标通过不同候选框的定位准确度和位置准确度确定。Optionally, in the training of the detection network model, a non-maximum value suppression algorithm is used to determine the prediction frame, wherein the prediction frame evaluation index in the non-maximum value suppression algorithm is determined by the positioning accuracy and the position accuracy of different candidate frames. Location accuracy is determined.

可选地，所述候选框评价指标CL的计算公式为：Optionally, the calculation formula of the candidate frame evaluation index CL is:

CL＝γ×sore_class+(1-γ)×score_location；其中，γ为超级参数，sore_class为分类准确率，score_location为定位准确度；CL=γ×sore _class +(1-γ)×score _location ; where γ is the hyperparameter, sore _class is the classification accuracy, and score _location is the positioning accuracy;

所述非极大值抑制算法的公式为：The formula of the non-maximum suppression algorithm is:

其中，CL为不同候选框的定位准确度。Among them, CL is the positioning accuracy of different candidate boxes.

可选地，预置有预测IoU网络模型，所述预测IoU网络模型用于获得不同候选框的定位准确度。Optionally, a predicted IoU network model is preset, and the predicted IoU network model is used to obtain the positioning accuracy of different candidate frames.

可选地，所述预测IoU网络模型的训练方法包括：Optionally, the training method of the predicted IoU network model includes:

生成第三图像样本的候选框集合；generating a set of candidate boxes for the third image sample;

获得所述候选框集合中各候选框的定位准确度；Obtain the positioning accuracy of each candidate frame in the candidate frame set;

去除所述候选框集合中定位准确度小于预设阈值的候选框，以确定第三遮挡图像样本的训练集合；removing candidate frames whose positioning accuracy is less than a preset threshold in the candidate frame set to determine the training set of the third occlusion image sample;

利用所述训练集合进行预测IoU网络模型的训练。Use the training set to train a predictive IoU network model.

可选地，所述检测网络模型包括全局池化模块以及池化分类模块，所述池化分类模块用于将全局池化后的特征图根据池化掩码的对应位置进行分类，以获得分类向量。Optionally, the detection network model includes a global pooling module and a pooling classification module, and the pooling classification module is used to classify the globally pooled feature map according to the corresponding position of the pooling mask to obtain a classification. vector.

一种图像目标检测的模型建立装置，包括：A model building device for image target detection, comprising:

遮挡样本获取单元，用于对第一图像样本进行遮挡，以获得第一遮挡图像样本；an occlusion sample obtaining unit, configured to occlude the first image sample to obtain the first occlusion image sample;

对抗网络训练单元，用于利用所述第一遮挡图像样本进行特征遮挡对抗网络模型的训练，所述特征遮挡对抗网络模型用于基于对抗网络获得图像样本的遮挡掩码；an adversarial network training unit, configured to use the first occlusion image sample to train a feature occlusion adversarial network model, where the feature occlusion adversarial network model is used to obtain an occlusion mask of the image sample based on the adversarial network;

检测网络训练单元，用于进行检测网络模型的训练，且训练用的第二图像样本的特征图利用训练后的特征遮挡对抗网络模型添加有遮挡掩码，所述检测网络模型用于基于深度学习的图像目标检测。The detection network training unit is used for training the detection network model, and the feature map of the second image sample used for training uses the trained feature occlusion confrontation network model to add an occlusion mask, and the detection network model is used based on deep learning. image object detection.

可选地，还包括：Optionally, also include:

对抗网络再训练单元，用于利用添加有遮挡掩码的特征图作为图像样本，继续进行特征遮挡对抗网络模型的训练。The adversarial network retraining unit is used to continue the training of the feature occlusion adversarial network model by using the feature map added with the occlusion mask as an image sample.

可选地，所述遮挡样本获取单元中，对第一图像样本进行遮挡，包括：Optionally, in the occlusion sample acquisition unit, occlusion of the first image sample includes:

可选地，还包括：预置的预测IoU网络模型，所述预测IoU网络模型用于获得不同候选框的定位准确度。Optionally, the method further includes: a preset predicted IoU network model, where the predicted IoU network model is used to obtain the positioning accuracy of different candidate frames.

可选地，还包括预测IoU网络模型训练单元，用于生成第三图像样本的候选框集合；获得所述候选框集合中各候选框的定位准确度；去除所述候选框集合中定位准确度小于预设阈值的候选框，以确定第三遮挡图像样本的训练集合；利用所述训练集合进行预测IoU网络模型的训练。Optionally, it also includes a prediction IoU network model training unit for generating a candidate frame set of the third image sample; obtaining the positioning accuracy of each candidate frame in the candidate frame set; removing the positioning accuracy in the candidate frame set The candidate frame smaller than the preset threshold is used to determine the training set of the third occlusion image sample; the training set is used to train the prediction IoU network model.

一种计算机可读存储介质，其特征在于，所述机算机可读存储介质中存储有指令，当所述指令在终端设备上运行时，使得所述终端设备执行上述任一所述的图像目标检测的模型建立方法。A computer-readable storage medium, wherein instructions are stored in the computer-readable storage medium, and when the instructions are executed on a terminal device, the terminal device is made to execute any one of the above-mentioned images A model building method for object detection.

一种计算机程序产品，所述计算机程序产品在终端设备上运行时，使得所述终端设备执行上述任一所述的图像目标检测的模型建立方法。A computer program product, when the computer program product runs on a terminal device, enables the terminal device to execute any one of the above-mentioned model building methods for image target detection.

本发明实施例提供的图像目标检测模型的建立方法及装置，利用遮挡图像样本进行特征遮挡对抗网络模型的训练，通过特征遮挡对抗网络模型来获得图像样本的遮挡掩码，这样，在进行检测网络模型的训练时，将训练用的图像样本的特征图添加遮挡掩码，该遮挡掩码利用训练好的特征遮挡对抗网络模型获得。这样，由于特征遮挡对抗网络模型利用遮挡图像样本训练而获得，能够训练出生成更好掩码的特征遮挡对抗网络模型，进而利用该特征遮挡对抗网络获得用于检测网络模型训练用的遮挡掩码，从而，可以利用具有更好遮挡掩码的图像样本进行检测网络模型的训练，使得检测网络模型对遮挡情形得到充分训练，提高检测网络模型对遮挡物体检测的准确性。The method and device for establishing an image target detection model provided by the embodiments of the present invention use occlusion image samples to train a feature occlusion adversarial network model, and obtain an occlusion mask of an image sample through the feature occlusion adversarial network model. During the training of the model, an occlusion mask is added to the feature map of the image samples used for training, and the occlusion mask is obtained by using the trained feature occlusion confrontation network model. In this way, since the feature occlusion adversarial network model is obtained by training occlusion image samples, a feature occlusion adversarial network model that generates a better mask can be trained, and then the feature occlusion adversarial network is used to obtain an occlusion mask for training the detection network model. Therefore, the detection network model can be trained by using the image samples with better occlusion masks, so that the detection network model can be fully trained for the occlusion situation, and the accuracy of the detection network model for the detection of occluded objects can be improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative effort.

图1示出了根据本发明实施例图像目标检测模型的建立方法的流程图；1 shows a flowchart of a method for establishing an image target detection model according to an embodiment of the present invention;

图2示出了根据本发明实施例图像目标检测模型的建立方法获得遮挡样本的过程示意图；2 shows a schematic diagram of a process for obtaining occlusion samples according to a method for establishing an image target detection model according to an embodiment of the present invention;

图3示出了根据本发明实施例的掩码的示意图；3 shows a schematic diagram of a mask according to an embodiment of the present invention;

图4根据本发明实施例图像目标检测模型的建立装置的结构示意图。4 is a schematic structural diagram of an apparatus for establishing an image target detection model according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图对本发明的具体实施方式做详细的说明。In order to make the above objects, features and advantages of the present invention more clearly understood, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是本发明还可以采用其他不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本发明内涵的情况下做类似推广，因此本发明不受下面公开的具体实施例的限制。Many specific details are set forth in the following description to facilitate a full understanding of the present invention, but the present invention can also be implemented in other ways different from those described herein, and those skilled in the art can do so without departing from the connotation of the present invention. Similar promotion, therefore, the present invention is not limited by the specific embodiments disclosed below.

正如背景技术中的描述，基于深度卷积神经网络的检测算法，已成为图像目标检测的主流方法，而在图像目标检测的应用中，图像中的待检测物体往往存在遮挡的情形，而目前的检测模型较少考虑遮挡情形，无法实现对遮挡物体的准确检测。基于此，本申请提出了一种图像目标检测模型的建立方法，在模型训练时采用具有更好遮挡掩码的图像样本进行检测网络模型的训练，使得检测网络模型对遮挡情形得到充分训练，提高检测网络模型对遮挡物体检测的准确性。As described in the Background Art, the detection algorithm based on the deep convolutional neural network has become the mainstream method of image target detection, and in the application of image target detection, the object to be detected in the image is often occluded, and the current The detection model seldom considers the occlusion situation and cannot achieve accurate detection of occluded objects. Based on this, this application proposes a method for establishing an image target detection model. During model training, image samples with better occlusion masks are used to train the detection network model, so that the detection network model can be fully trained for the occlusion situation and improve the The accuracy of the detection network model for the detection of occluded objects.

为了更好的理解本申请的技术方案和技术效果，以下将结合流程图对具体的实施例进行详细的描述。In order to better understand the technical solutions and technical effects of the present application, specific embodiments will be described in detail below with reference to flowcharts.

参考图1所示，在步骤S01，对第一图像样本进行遮挡，以获得第一遮挡图像样本。Referring to FIG. 1 , in step S01, a first image sample is occluded to obtain a first occluded image sample.

该第一图像样本为原始的图像样本，通过在原始的图像样本上添加遮挡，获得遮挡的第一图像样本，即第一遮挡图像样本。The first image sample is an original image sample, and an occluded first image sample is obtained by adding occlusion on the original image sample, that is, the first occlusion image sample.

在具体的应用中，可以采用合适的方法进行遮挡，在本实施例中，具体的，对第一图像样本进行遮挡具体包括如下步骤。In a specific application, an appropriate method may be used to perform occlusion. In this embodiment, specifically, the occlusion of the first image sample specifically includes the following steps.

S011，确定第一图像样本中具有最大定位准确度的候选框。S011, determine a candidate frame with the maximum positioning accuracy in the first image sample.

S012，将预设大小的滑动窗口映射到图像样本，并利用背景像素填充滑动窗口所在位置的图像区域。S012 , map a sliding window of a preset size to an image sample, and use background pixels to fill the image area where the sliding window is located.

S013，利用检测网络模型进行填充后的图像样本的检测，将与候选框具有最大检测网络损失的所在位置处的滑动窗口作为第一图像样本的遮挡位置，以获得第一遮挡图像样本。S013, the detection network model is used to detect the filled image samples, and the sliding window at the position where the candidate frame has the largest detection network loss is used as the occlusion position of the first image sample, so as to obtain the first occlusion image sample.

对于第一图像样本，先在该第一图像样本上确定出具有最大定位准确度的候选框，具有最大定位准确度的候选框为与图像样本上真实标记具有最大IoU(Intersectionover Union，交并比)的候选框。可以获得图像样本的特征图，在特征图上设置不同的候选框，并计算不同的候选框与真实标记的IoU，将具有最大IoU值的候选框作为最大定位准确度的候选框。For the first image sample, first determine the candidate frame with the maximum positioning accuracy on the first image sample, and the candidate frame with the maximum positioning accuracy has the maximum IoU (Intersection over Union, intersection ratio) with the real mark on the image sample. ) candidate box. The feature map of the image sample can be obtained, different candidate frames are set on the feature map, and the IoU of the different candidate frames and the real mark is calculated, and the candidate frame with the largest IoU value is used as the candidate frame with the maximum positioning accuracy.

而后，设置预设大小的滑动窗口，滑动窗口的大小可以根据具体的需要来设置，例如可以为(w/3，d/3)大小的矩形框，w和d分别为图像样本的长和宽，滑动窗口的大小将保持不变，在图像样本上的位置发生变化，对于不同位置处的滑动窗口，将其映射到图像样本上，并利用背景像素对滑动窗口所在位置的图像区域进行填充，背景像素可以是随机填充的方式，这样，就在图像样本的不同位置处，利用滑动窗口形成了遮挡。Then, set a sliding window with a preset size. The size of the sliding window can be set according to specific needs. For example, it can be a rectangular box of (w/3, d/3) size, and w and d are the length and width of the image sample respectively. , the size of the sliding window will remain unchanged, and the position on the image sample will change. For the sliding window at different positions, map it to the image sample, and use the background pixels to fill the image area where the sliding window is located. The background pixels can be filled randomly so that, at different positions of the image sample, an occlusion is formed by a sliding window.

之后，利用检测网络模型进行填充后的图像样本的检测，检测网络模型为用于基于深度学习的图像目标检测的模型，即基于深度卷积神经网络的检测网络模型，其例如可以是基于YOLO(You Only Look Once)和SSD(Single Shot MultiBox Detector)等算法的模型，该检测网络模型可以是经过一定训练的模型。通过选择使检测网络损失最大的滑动窗口作为图像样本的遮挡位置，从而，可以获得具有最佳遮挡位置的遮挡图像样本。After that, the detection of the filled image samples is carried out by using the detection network model, and the detection network model is a model for image target detection based on deep learning, that is, a detection network model based on a deep convolutional neural network, which can be, for example, based on YOLO ( You Only Look Once) and SSD (Single Shot MultiBox Detector) and other algorithms, the detection network model can be a trained model. By selecting the sliding window that maximizes the loss of the detection network as the occlusion position of the image sample, the occlusion image sample with the best occlusion position can be obtained.

通过该实施例的方法获得遮挡图像样本时，确定图像样本中具有最大定位准确度的候选框，该候选框所框选的位置与真实标记最为贴近，更能体现检测目标，并以最大检测网络损失所在位置处的滑动窗口作为图像的样本位置，从而，在最佳检测目标位置上获得了具有最佳遮挡位置的遮挡图像样本，提高了遮挡样本的质量。When the occlusion image sample is obtained by the method of this embodiment, the candidate frame with the maximum positioning accuracy in the image sample is determined, and the position selected by the candidate frame is closest to the real mark, which can better reflect the detection target, and the maximum detection network is used. The sliding window at the position of the loss is used as the sample position of the image, so that the occlusion image sample with the best occlusion position is obtained at the optimal detection target position, and the quality of the occlusion sample is improved.

为了更便于理解该获得遮挡图像样本的方法，以下以一个图像样本的示例进行遮挡样本获取过程的说明，参考图2所示，首先，可以在框选了多个候选框的图像样本，如图(A)所示，而后，获得该图像样本的特征图，如图(B)所示，接着，确定出其中与真实标记具有最大IoU的候选框，如图(C)所示，以该候选框为检测目标，利用滑动窗口进行遮挡填充，将与候选框具有最大检测网络损失的所在位置处的滑动窗口作为图像样本的遮挡位置，如图(D)所示，这样，就获得了具有遮挡的遮挡图像样本。In order to make it easier to understand the method of obtaining occlusion image samples, the following describes the process of obtaining occlusion samples with an example of an image sample. Referring to Figure 2, first, image samples of multiple candidate frames can be selected in the frame, as shown in Figure 2. As shown in (A), then, the feature map of the image sample is obtained, as shown in Figure (B), and then, the candidate frame with the largest IoU with the real mark is determined, as shown in Figure (C), with the candidate frame The frame is the detection target, and the sliding window is used for occlusion filling, and the sliding window at the position where the candidate frame has the largest detection network loss is used as the occlusion position of the image sample, as shown in Figure (D), in this way, the occlusion position is obtained. occlusion image samples.

在步骤S02，利用所述第一遮挡图像样本进行特征遮挡对抗网络模型的训练，所述特征遮挡对抗网络模型用于基于对抗网络获得图像样本的遮挡掩码。In step S02, the first occlusion image sample is used to train a feature occlusion adversarial network model, and the feature occlusion adversarial network model is used to obtain an occlusion mask of the image sample based on the adversarial network.

特征遮挡对抗网络模型是用于获得图像样本的遮挡掩码的模型，其是基于对抗网络的模型，是在对抗网络的基础上建立的深度学习模型。在本实施例中，该特征遮挡对抗网络在获得图像样本的特征图之后，增加用于学习遮挡掩码的全连接层来实现，该遮挡掩码为与特征图对应的掩码，用于体现遮挡样本的掩码。The feature occlusion adversarial network model is a model for obtaining an occlusion mask of an image sample, which is a model based on an adversarial network and a deep learning model established on the basis of an adversarial network. In this embodiment, the feature occlusion adversarial network is implemented by adding a fully connected layer for learning an occlusion mask after obtaining the feature map of the image sample. The occlusion mask is a mask corresponding to the feature map and is used to reflect Mask to occlude samples.

为了便于理解，参考图3所示，为一个图像样本所对应的掩码M，其为与图像样本不同位置相对应的阵列掩码值。此外，需要说明的是，为了便于描述，在本申请中，将具有遮挡的遮挡图像样本对应的掩码则称为遮挡掩码。For ease of understanding, as shown in FIG. 3 , it is a mask M corresponding to one image sample, which is an array mask value corresponding to different positions of the image sample. In addition, it should be noted that, for the convenience of description, in this application, the mask corresponding to the occlusion image sample with occlusion is referred to as an occlusion mask.

利用遮挡图像样本进行特征遮挡对抗网络模型的训练，尤其是高质量的遮挡图像样本进行训练，可以获得生成更优掩码的特征遮挡对抗网络模型，更优地，在特征遮挡对抗网络训练中，可以采用交叉熵的损失函数，具体的表达式为：Using occlusion image samples to train the feature occlusion adversarial network model, especially high-quality occlusion image samples for training, can obtain a feature occlusion adversarial network model that generates better masks. The loss function of cross entropy can be used, and the specific expression is:

其中，

为训练图像样本中第p个图像样本的掩码M中(i,j)位置的值，

为预测值，n表示训练图像样本个数，d表示训练图像样本大小。in,

is the value of the (i,j) position in the mask M of the p-th image sample in the training image sample,

is the predicted value, n represents the number of training image samples, and d represents the training image sample size.

在步骤S03，进行检测网络模型的训练，且训练用的第二图像样本的特征图利用训练后的特征遮挡对抗网络模型添加有遮挡掩码，所述检测网络模型用于基于深度学习的图像目标检测。In step S03, the training of the detection network model is performed, and the feature map of the second image sample used for training uses the trained feature occlusion confrontation network model to add an occlusion mask, and the detection network model is used for deep learning-based image targets detection.

如前所述，该检测网络模型是基于深度学习且用于图像目标检测的模型，该步骤中，继续对该检测网络模型进行训练，训练时采用第二图像样本进行，该第二图像样本可以是不同于上述第一图像样本的样本，对于该第二图像样本，在将其生成对应的特征图之后，在特征图上添加遮挡掩码，遮挡掩码是利用上述训练后的特征遮挡对抗网络模型获得的，添加遮挡掩码之后，就在特征图的相应位置处进行了遮挡，也就是说，获得了具有遮挡的第二图像样本的特征图。利用该具有遮挡的特征图进行检测网络模型的训练，可以使得检测网络模型对遮挡情形得到充分训练，提高检测网络模型对遮挡物体检测的准确性。As mentioned above, the detection network model is a model based on deep learning and used for image target detection. In this step, the detection network model is continued to be trained, and the training is performed by using a second image sample, and the second image sample can be is a sample different from the above-mentioned first image sample. For the second image sample, after generating the corresponding feature map, add an occlusion mask to the feature map. The occlusion mask is to use the above trained feature to occlude the adversarial network. The model obtained, after adding the occlusion mask, occlusion is performed at the corresponding position of the feature map, that is, the feature map of the second image sample with occlusion is obtained. Using the feature map with occlusion to train the detection network model can enable the detection network model to fully train the occlusion situation and improve the detection accuracy of the detection network model for occluded objects.

此外，更进一步地，在进行检测网络模型训练时，可以进行特征遮挡对抗网络模型的交叉训练，一方面利用特征遮挡对抗网络模型为检测网络模型训练时的图像样本添加遮挡掩码，从而，可以利用遮挡样本进行检测网络模型的训练；同时，可以利用检测网络模型训练时，产生的添加有遮挡掩码的第二图像样本的特征图，作为图像样本，继续进行特征遮挡对抗网络模型的训练，也就是利用检测网络模型训练时生成的遮挡样本，继续进行特征遮挡对抗网络模型的训练，再次训练有助于训练得到能生成更优掩码的特征遮挡对抗网络模型。In addition, when the detection network model is trained, cross-training of the feature occlusion adversarial network model can be performed. On the one hand, the feature occlusion adversarial network model is used to add occlusion masks to the image samples during the training of the detection network model. The occlusion samples are used to train the detection network model; at the same time, the feature map of the second image sample with the occlusion mask added generated during the training of the detection network model can be used as the image sample, and the training of the feature occlusion confrontation network model can be continued. That is, using the occlusion samples generated during the training of the detection network model, continue to train the feature occlusion adversarial network model, and retraining helps to train a feature occlusion adversarial network model that can generate better masks.

在检测网络模型训练过程中，需要从候选框中确定出预测框，预测框评价标准对于预测框位置的准确度有着决定性的作用。在本申请更优的实施例中，在进行检测网络模型的训练中，利用非极大值抑制算法(NMS，Non-Maximum Suppression)确定预测框，其中，所述非极大值抑制算法中的预测框评价指标通过不同候选框的定位准确度和位置准确度确定。In the training process of the detection network model, the prediction frame needs to be determined from the candidate frame, and the prediction frame evaluation standard plays a decisive role in the accuracy of the prediction frame position. In a more preferred embodiment of the present application, during the training of the detection network model, a non-maximum suppression algorithm (NMS, Non-Maximum Suppression) is used to determine the prediction frame, wherein the non-maximum suppression algorithm in the The prediction frame evaluation index is determined by the positioning accuracy and position accuracy of different candidate frames.

具体的，非极大值抑制算法是利用NMS评价指标移除一些多余的候选框，在剩余的候选框中通过相关计算确定出预测框。在该实施例中，非极大值抑制算法中的预测框评价指标通过不同候选框的定位准确度和位置准确度确定，这样，可以综合考虑分类置信度与定位准确度，其更具有合理性，有利于提高预测的准确性。在具体的应用中，该候选框评价指标CL的计算公式可以为：Specifically, the non-maximum value suppression algorithm uses the NMS evaluation index to remove some redundant candidate frames, and determines the prediction frame through correlation calculation in the remaining candidate frames. In this embodiment, the prediction frame evaluation index in the non-maximum value suppression algorithm is determined by the positioning accuracy and position accuracy of different candidate frames. In this way, the classification confidence and positioning accuracy can be comprehensively considered, which is more reasonable. , which is beneficial to improve the prediction accuracy. In a specific application, the calculation formula of the candidate frame evaluation index CL may be:

CL＝γ×sore_class+(1-γ)×score_location； (2)CL=γ×sore _class +(1-γ)×score _location ; (2)

其中，γ为超级参数，sore_class为分类准确率，score_location为定位准确度，该定位准确度score_location可以为IoU值，该分类准确率sore_class可以通过检测网络模型获得。Among them, γ is a super parameter, sore _class is the classification accuracy, and score _location is the positioning accuracy. The positioning accuracy score _location can be the IoU value, and the classification accuracy sore _class can be obtained by detecting the network model.

此外，在利用该候选框评价指标CL，通过非极大值抑制算法确定预测框时，不同阈值的设置，会产生不同的检测结果，合理设置阈值可以避免误检和漏检，更优地，非极大值抑制算法可以采用如下具有三段阈值设置的公式实现，具体的如下公式：In addition, when the candidate frame evaluation index CL is used to determine the prediction frame through the non-maximum value suppression algorithm, different threshold settings will produce different detection results. Reasonable setting of the threshold value can avoid false detection and missed detection. The non-maximum suppression algorithm can be implemented by the following formula with three-stage threshold setting, the specific formula is as follows:

其中，CL为不同候选框的定位准确度，CL可以为通过上述公式(2)获得。Among them, CL is the positioning accuracy of different candidate frames, and CL can be obtained by the above formula (2).

通过三段阈值设置，对于分差大的候选框，基本为目标准确度不高的边框，对其降低阈值，则该候选框更容易被抑制；而对于分差低的候选框，其为目标准确度高的边框，对其提高阈值，则可以减轻抑制，从而，避免误检和漏检，提高检测的准确度。Through the three-stage threshold setting, for a candidate frame with a large difference, it is basically a frame with low target accuracy. If the threshold is lowered, the candidate frame is easier to be suppressed; and for a candidate frame with a low difference, it is the target. For a frame with high accuracy, if the threshold is increased, the suppression can be reduced, thereby avoiding false detection and missed detection, and improving the detection accuracy.

进一步地，对于上述提及的与候选框相关的定位准确度，如对第一图像样本进行遮挡时，确定第一图像样本中具有最大定位准确度的候选框时，以及进行检测网络模型的训练中，非极大值抑制算法中的预测框评价指标中采用的位置准确度，都可以利用该预测IoU网络模型来获得。该预测IoU网络模型用于获得不同候选框的定位准确度，该预测IoU网络模型是基于深度学习的模型，可以包括池化层和全连接层。Further, for the positioning accuracy related to the candidate frame mentioned above, such as when the first image sample is occluded, when the candidate frame with the maximum positioning accuracy in the first image sample is determined, and the training of the detection network model is performed. , the location accuracy used in the prediction frame evaluation index in the non-maximum suppression algorithm can be obtained by using the prediction IoU network model. The predicted IoU network model is used to obtain the positioning accuracy of different candidate boxes. The predicted IoU network model is a deep learning-based model and can include a pooling layer and a fully connected layer.

在具体的应用中，预测IoU网络模型的训练方法可以包括：In a specific application, the training method for predicting the IoU network model may include:

该第三图像样本可以是与第一图像样本或第二图像样本相同或不同的训练用样本，可以随机地在第三图像样本上进行框选，生成第三图像样本的候选框集合，而后，获得候选框集合中各候选框的定位准确度，该定位准确度为各候选框与真实标记之间的IoU值，在删除定位准确度较小的候选框后，通常是定位准确度小于0.5的候选框，得到第三遮挡图像样本的训练集合C：

其中，C_i表示第i个候选框，

为第i个候选框与真实标记之间的IoU值，利用该训练集合C进行预测IoU网络模型的训练，在该训练过程中，可以采用平滑损失函数L1，具体表达式为：The third image sample may be the same or different training sample as the first image sample or the second image sample, and frame selection may be performed on the third image sample at random to generate a set of candidate frames for the third image sample, and then, Obtain the positioning accuracy of each candidate frame in the candidate frame set. The positioning accuracy is the IoU value between each candidate frame and the real mark. After deleting the candidate frame with less positioning accuracy, the positioning accuracy is usually less than 0.5. The candidate frame, the training set C of the third occlusion image sample is obtained:

Among them, C _i represents the ith candidate frame,

is the IoU value between the i-th candidate frame and the real mark, and the training set C is used to train the prediction IoU network model. In the training process, the smooth loss function L1 can be used, and the specific expression is:

其中，n表示候选框个数，iou_i为第i个候选框的预测值和真实标记值之间的IoU值，

为第i个候选框与真实标记之间的IoU值。Among them, n represents the number of candidate frames, iou _i is the IoU value between the predicted value of the i-th candidate frame and the true label value,

is the IoU value between the ith candidate box and the ground truth.

此外，在具体的应用过程中，候选框内通常还包含有背景图像，背景图像也会对物体检测产生干扰，导致检测的不准确性。在本申请更优的实施例中，检测网络模型中包括全局池化模块以及池化分类模块，池化分类模块用于将全局池化后的特征图根据池化掩码的对应位置进行分类，以获得分类向量。In addition, in a specific application process, the candidate frame usually also contains a background image, and the background image will also interfere with object detection, resulting in inaccuracy of detection. In a more preferred embodiment of the present application, the detection network model includes a global pooling module and a pooling classification module, and the pooling classification module is used to classify the globally pooled feature map according to the corresponding position of the pooling mask, to get a categorical vector.

全局池化模块将输入的样本进行全局平均池化，本申请实施例中，输入的样本为添加有遮挡掩码的特征图，池化分类模块可以包括全连接层、池化掩码层以及向量分类模块，全连接层用于学习全局平均池化的权值，从而区分不同位置的信息，进而通过池化掩码层获得各特征图的池化掩码，并将特征图与池化掩码中对应位置进行相乘运算，进而得到分类向量，通过向量分类模块对分类向量进行分类，向量分类模块例如可以为softmax工具，就获得了分类向量。该分类向量将物体周围背景信息同时考虑进去，会对物体分类产生一定影响，从而，进一步提高检测的准确性。The global pooling module performs global average pooling on the input samples. In the embodiment of this application, the input samples are feature maps with occlusion masks added. The pooling classification module may include a fully connected layer, a pooling mask layer, and a vector In the classification module, the fully connected layer is used to learn the weight of the global average pooling, so as to distinguish the information of different positions, and then obtain the pooling mask of each feature map through the pooling mask layer, and combine the feature map with the pooling mask. Multiply the corresponding positions in the , and then obtain the classification vector, and classify the classification vector through the vector classification module. For example, the vector classification module can be a softmax tool to obtain the classification vector. The classification vector takes into account the background information around the object at the same time, which will have a certain impact on the classification of the object, thereby further improving the detection accuracy.

以上对本申请实施例的图像目标检测的模型建立方法进行了详细的描述，此外，本申请还提供了实现上述方法的图像目标检测的模型建立装置，参考图4所示，包括：The model building method for image target detection according to the embodiment of the present application has been described in detail above. In addition, the present application also provides a model building device for image target detection implementing the above method. Referring to FIG. 4 , it includes:

遮挡样本获取单元400，用于对第一图像样本进行遮挡，以获得第一遮挡图像样本；an occlusion sample obtaining unit 400, configured to occlude the first image sample to obtain the first occlusion image sample;

对抗网络训练单元410，用于利用所述第一遮挡图像样本进行特征遮挡对抗网络模型的训练，所述特征遮挡对抗网络模型用于基于对抗网络获得图像样本的遮挡掩码；An adversarial network training unit 410, configured to use the first occlusion image sample to train a feature occlusion adversarial network model, where the feature occlusion adversarial network model is used to obtain an occlusion mask of the image sample based on the adversarial network;

检测网络训练单元420，用于进行检测网络模型的训练，且训练用的第二图像样本的特征图利用训练后的特征遮挡对抗网络模型添加有遮挡掩码，所述检测网络模型用于基于深度学习的图像目标检测。The detection network training unit 420 is used for training the detection network model, and the feature map of the second image sample used for training utilizes the trained feature to occlude the confrontation network model with an occlusion mask added, and the detection network model is used for depth-based Learned Image Object Detection.

进一步地，还包括：Further, it also includes:

进一步地，所述遮挡样本获取单元400中，对第一图像样本进行遮挡，包括：Further, in the occlusion sample obtaining unit 400, the occlusion of the first image sample includes:

进一步地，所述进行检测网络模型的训练中，利用非极大值抑制算法确定预测框，其中，所述非极大值抑制算法中的预测框评价指标通过不同候选框的定位准确度和位置准确度确定。Further, in the training of the detection network model, a non-maximum value suppression algorithm is used to determine the prediction frame, wherein the prediction frame evaluation index in the non-maximum value suppression algorithm is determined by the positioning accuracy and position of different candidate frames. Accuracy is ok.

进一步地，所述候选框评价指标CL的计算公式为：Further, the calculation formula of the candidate frame evaluation index CL is:

进一步地，还包括：预置的预测IoU网络模型，所述预测IoU网络模型用于获得不同候选框的定位准确度。Further, it also includes: a preset prediction IoU network model, where the prediction IoU network model is used to obtain the positioning accuracy of different candidate frames.

进一步地，还包括预测IoU网络模型训练单元，用于生成第三图像样本的候选框集合；获得所述候选框集合中各候选框的定位准确度；去除所述候选框集合中定位准确度小于预设阈值的候选框，以确定第三遮挡图像样本的训练集合；利用所述训练集合进行预测IoU网络模型的训练。Further, it also includes a prediction IoU network model training unit, which is used to generate a candidate frame set of the third image sample; obtain the positioning accuracy of each candidate frame in the candidate frame set; remove the positioning accuracy in the candidate frame set less than The candidate frame of the preset threshold is used to determine the training set of the third occlusion image sample; the training set is used to train the prediction IoU network model.

进一步地，所述检测网络模型包括全局池化模块以及池化分类模块，所述池化分类模块用于将全局池化后的特征图根据池化掩码的对应位置进行分类，以获得分类向量。Further, the detection network model includes a global pooling module and a pooling classification module, and the pooling classification module is used to classify the global pooled feature map according to the corresponding position of the pooling mask to obtain a classification vector. .

另外，本申请实施例还提供一种计算机可读存储介质，所述机算机可读存储介质中存储有指令，当所述指令在终端设备上运行时，使得所述终端设备执行上述的图像目标检测的模型建立方法。In addition, an embodiment of the present application further provides a computer-readable storage medium, where an instruction is stored in the computer-readable storage medium, and when the instruction is executed on a terminal device, the terminal device is made to execute the above image A model building method for object detection.

本申请实施例还提供一种计算机程序产品，所述计算机程序产品在终端设备上运行时，使得所述终端设备执行上述的图像目标检测的模型建立方法。Embodiments of the present application further provide a computer program product, which, when running on a terminal device, enables the terminal device to execute the above-mentioned model building method for image target detection.

需要说明的是，本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统或装置而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。It should be noted that the various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments may be referred to each other. For the system or device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method.

还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this document, relational terms such as first and second are used only to distinguish one entity or operation from another, and do not necessarily require or imply those entities or operations There is no such actual relationship or order between them. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块，或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in conjunction with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two. A software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

以上所述仅是本发明的优选实施方式，虽然本发明已以较佳实施例披露如上，然而并非用以限定本发明。任何熟悉本领域的技术人员，在不脱离本发明技术方案范围情况下，都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰，或修改为等同变化的等效实施例。因此，凡是未脱离本发明技术方案的内容，依据本发明的技术实质对以上实施例所做的任何的简单修改、等同变化及修饰，均仍属于本发明技术方案保护的范围内。The above descriptions are only preferred embodiments of the present invention. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art, without departing from the scope of the technical solution of the present invention, can make many possible changes and modifications to the technical solution of the present invention by using the methods and technical contents disclosed above, or modify them into equivalents of equivalent changes. Example. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solutions of the present invention still fall within the protection scope of the technical solutions of the present invention.

Claims

1. a model establishment method of image target detection, is characterized in that, comprises:

Determine the candidate frame with the maximum positioning accuracy in the first image sample; map the sliding window of preset size to the image sample, and use the background pixels to fill the image area where the sliding window is located; use the detection network model to fill the image sample , taking the sliding window at the position where the candidate frame has the largest detection network loss as the occlusion position of the first image sample to obtain the first occlusion image sample;

Use the first occlusion image sample to train a feature occlusion confrontation network model, where the feature occlusion confrontation network model is used to obtain an occlusion mask of the image sample based on the confrontation network;

A detection network model is trained, and an occlusion mask is added to the feature map of the second image sample used for training using the trained feature occlusion confrontation network model, and the detection network model is used for image target detection based on deep learning.

2 . The method according to claim 1 , further comprising: using the feature map added with the occlusion mask as an image sample to continue the training of the feature occlusion adversarial network model. 3 .

3. The method according to claim 1, wherein, in the training of the detection network model, a prediction frame is determined by a non-maximum suppression algorithm, wherein the candidate frame in the non-maximum suppression algorithm The evaluation index is determined by the positioning accuracy and position accuracy of different candidate frames.

4. The method according to claim 3, wherein the calculation formula of the candidate frame evaluation index CL is:

CL=γ×sore _class +(1-γ)×score _location ; where γ is the hyperparameter, sore _class is the classification accuracy, and score _location is the positioning accuracy;

The formula with three-stage threshold setting adopted by the non-maximum suppression algorithm is:

5. The method according to any one of claims 1, 3-4, wherein a predicted IoU network model is preset, and the predicted IoU network model is used to obtain the positioning accuracy of different candidate frames.

6. The method according to claim 5, wherein the training method of the predicted IoU network model comprises:

generating a set of candidate boxes for the third image sample;

Obtain the positioning accuracy of each candidate frame in the candidate frame set;

removing candidate frames whose positioning accuracy is less than a preset threshold in the candidate frame set to determine the training set of the third occlusion image sample;

Use the training set to train a predictive IoU network model.

7. A model establishment device for image target detection, characterized in that, comprising:

an occlusion sample acquisition unit, used for determining a candidate frame with the maximum positioning accuracy in the first image sample; mapping a sliding window of a preset size to the image sample, and using background pixels to fill the image area where the sliding window is located; using a detection network The model detects the filled image samples, and uses the sliding window at the position where the candidate frame has the largest detection network loss as the occlusion position of the first image sample to obtain the first occluded image sample;

an adversarial network training unit, configured to use the first occlusion image sample to train a feature occlusion adversarial network model, where the feature occlusion adversarial network model is used to obtain an occlusion mask of the image sample based on the adversarial network;

The detection network training unit is used for training the detection network model, and the feature map of the second image sample used for training uses the trained feature occlusion confrontation network model to add an occlusion mask, and the detection network model is used based on deep learning. image object detection.

8. A computer-readable storage medium, wherein an instruction is stored in the computer-readable storage medium, and when the instruction is executed on a terminal device, the terminal device is made to execute any one of claims 1-6. The model building method for image target detection described in item.