CN113743439A

CN113743439A - Target detection method, device and storage medium

Info

Publication number: CN113743439A
Application number: CN202011271127.9A
Authority: CN
Inventors: 齐鹏飞; 赖荣凤
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-12-03

Abstract

The invention provides a target detection method, a target detection device and a storage medium, wherein a current image to be detected is obtained; carrying out target detection on the current image to be detected by using a target detection model, and predicting a current detection result; the target detection model is obtained by carrying out background prediction training on a data set which is a negative sample. The target detection model can improve the training amount of the target detection model by carrying out background prediction training on the data group which is negative samples, and reduces the probability of false detection of the target detection model during application.

Description

Target detection method, device and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a target detection method, a target detection device and a storage medium.

Background

Currently, in order to ensure the accuracy of target detection, a detection result may be obtained through a target detection model. In the prior art, image target detection realized by using a convolutional neural network is the mainstream method of the current target detection, and particularly, the characteristic of data driving is relied on, so that the detection index can be greatly improved under the condition of a large amount of target data. In the process of realizing target detection by the convolutional neural network, sample data is needed to train a target detection model, and the sample data is mainly pictures.

Under the existing open-set big data scene, the target detection model needs to be adapted to data under various different scenes and different domains. Data under different scenes and different domains mostly have different distributions and complicated backgrounds. While the object to be detected by the model often occupies a very small portion of the picture. The scheme used by the target detection model during training can enable the target detection model to neglect the training of a large number of pictures only with backgrounds, so that the training of the target detection model is insufficient, and a large number of false detections are caused in detection application by the target detection model.

Disclosure of Invention

In view of this, embodiments of the present invention provide a target detection method, an apparatus, and a storage medium, and the target detection method provided in the embodiments of the present invention has a low false detection rate when detecting a current image to be detected.

The embodiment of the invention provides a target detection method, which comprises the following steps:

acquiring a current image to be detected;

performing target detection on the current image to be detected by using a target detection model, and predicting a current detection result;

the target detection model is obtained on the basis of carrying out background class prediction training on a data set which is a negative sample.

In the above scheme, the method further includes, before performing target detection on the current image to be detected by using a preset target detection model and predicting a current detection result:

acquiring a training image set; the training image set comprises a plurality of data sets, one data set comprising a plurality of samples for training;

if the one data set does not include positive samples, extracting a first predetermined number of first negative samples from the trained sample set;

inputting the first negative sample into an initial detection model to obtain a first prediction background class result;

in a back propagation stage, training the initial detection model based on the first prediction background class result and a real result corresponding to the first negative sample until the target detection model is obtained.

In the above scheme, after the acquiring the training image set, the method further includes:

inputting the training image set into an image recognition model to obtain a first image recognition result;

when the first image identification result corresponding to the data group has an image with a fitting degree larger than a fitting degree threshold value with a preset standard result, determining that the data group contains a positive sample;

and when the first image identification result corresponding to one data set does not have an image with the fitting degree with a preset standard result larger than a fitting degree threshold value, determining that the one data set does not contain a positive sample.

In the foregoing solution, if the one data set does not include the positive samples, extracting a first predetermined number of first negative samples from the trained sample set includes:

if the one data group does not comprise the positive samples, extracting a first preset number of first negative samples in a second negative sample of the trained sample set according to the negative sample loss corresponding to the second negative sample, wherein the first preset number of first negative samples are extracted from the negative sample loss in the descending order;

the first predetermined number is an integer multiple of the number of samples in the one data set.

if the positive samples are included in the one data group, extracting second positive samples and second negative samples in the one data group according to a first preset proportion;

and inputting the second positive sample and the second negative sample into an initial detection model to obtain a target result corresponding to the second positive sample and a second prediction background result corresponding to the second negative sample.

In the foregoing solution, when the data group of the positive sample and the data group of the positive sample do not exist in the training image set, in the back propagation stage, training the initial detection model based on the first prediction background result and the real result corresponding to the first negative sample until a target detection model is obtained includes:

in a back propagation stage, determining negative sample loss according to a first preset loss function, the first prediction background class result and a first real result corresponding to the first negative sample;

determining the loss of the positive sample according to a second preset loss function, the target result, the second prediction background class result, a target real result corresponding to the target result and a second real result corresponding to the second negative sample;

and training the initial detection model based on the positive sample loss and the negative sample loss until a target detection model is obtained.

In the foregoing solution, the training the initial detection model based on the negative sample loss and the positive sample loss until a target detection model is obtained includes:

determining the current loss according to the negative sample loss and the positive sample loss;

and if the current loss is greater than a preset loss threshold, obtaining a next training image set, and continuing training the initial detection model until the next loss is not greater than the preset loss threshold, so as to obtain the target detection model.

In the foregoing solution, the determining a current loss according to the negative sample loss and the positive sample loss includes:

weighting the positive sample loss and the negative sample loss to determine the current loss; wherein a first weight corresponding to the positive sample penalty is greater than a second weight corresponding to the negative sample penalty.

In the foregoing solution, the weighting the positive sample loss and the negative sample loss to determine the current loss includes:

and multiplying the positive sample loss by the first weight, and adding the product of the positive sample loss and the negative sample loss by the second weight to obtain the current loss.

There is also provided an object detection apparatus comprising:

the image acquisition unit is used for acquiring a current image to be detected;

the detection unit is used for carrying out target detection on the current image to be detected by using a target detection model and predicting a current detection result;

There is also provided an object detection apparatus comprising a memory and a processor, the memory storing a computer program operable on the processor, the processor implementing the steps of the method when executing the program.

A computer-readable storage medium is also provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

In the embodiment of the invention, the current image to be detected is obtained; carrying out target detection on the current image to be detected by using a target detection model, and predicting a current detection result; the target detection model is obtained by carrying out background prediction training on a data set which is a negative sample. The target detection model can improve the training amount of the background type prediction training of the target detection model by performing the background type prediction training on the data group which is a negative sample, and reduces the probability of false detection of the target detection model during application.

Drawings

FIG. 1 is a diagram illustrating an alternative effect of data sets provided by an embodiment of the present invention;

FIG. 2 is a diagram illustrating an alternative effect of data sets provided by an embodiment of the present invention;

FIG. 3 is a diagram illustrating an alternative effect of an image to be detected according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an alternative effect of an image to be detected according to an embodiment of the present invention;

fig. 5 is a first schematic flowchart of a target detection method according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of a second target detection method according to an embodiment of the present invention;

fig. 7 is a third schematic flowchart of a target detection method according to an embodiment of the present invention;

fig. 8 is a fourth schematic flowchart of a target detection method according to an embodiment of the present invention;

fig. 9 is a fifth flowchart of a target detection method according to an embodiment of the present invention;

fig. 10 is a sixth schematic flowchart of a target detection method according to an embodiment of the present invention;

fig. 11 is a seventh flowchart of a target detection method according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention;

fig. 13 is a hardware entity diagram of a target detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention are further described in detail with reference to the drawings and the embodiments, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

To the extent that similar descriptions of "first/second" appear in this patent document, the description below will be added, where reference is made to the term "first \ second \ third" merely to distinguish between similar objects and not to imply a particular ordering with respect to the objects, it being understood that "first \ second \ third" may be interchanged either in a particular order or in a sequential order as permitted, to enable embodiments of the invention described herein to be practiced in other than the order illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

The embodiment of the invention provides target detection model training. Fig. 5 is a schematic flow chart of a target detection method according to an embodiment of the present invention.

And S01, acquiring the current image to be detected.

In the embodiment of the present invention, the image to be detected includes one frame image or a photo in a video frame. The image to be detected may or may not include a target. The target in the image to be detected can comprise a human face, a two-dimensional code, an automobile, a pedestrian, a mobile phone and the like, and the target can be detected by the target detection model and has an entity image.

In the embodiment of the present invention, the image to be detected acquired by the target detection device may be acquired by an image acquisition unit of the target detection device. The image to be detected is stored in a database in advance according to a preset rule, wherein the image to be detected can be classified and stored in the database according to the size of the image and can also be classified and stored in the database according to the source of the image to be detected. The image acquisition unit and a database for storing the image to be detected establish communication connection, and when the target detection model receives a detection instruction, the database transmits the appointed image to be detected to the image acquisition unit for the target detection model to detect.

And S02, performing target detection on the current image to be detected by using the target detection model, and predicting the current detection result.

In the embodiment of the invention, the target detection model comprises a detection model based on a convolutional neural network model, the target detection model is obtained by carrying out background class prediction training on a data set which is a negative sample, the target detection model can effectively process a large number of images to be detected in different domains after the background class prediction training, and the false detection rate of the target detection model after the background class prediction training is obviously reduced when the images to be detected are processed. In the embodiment of the present invention, the network structure of the target detection model includes: a real-time fast target detection model (You Only Look one, YOLO) or a Single-machine multi-card target detection algorithm (SSD). The network structure of the target detection model may include: an input layer, a number of intermediate layers and an output layer. The target detection model acquires an image to be detected through the image acquisition unit, the image to be detected is input into an input layer of the target detection model, the image to be detected is processed through a target detection model intermediate layer, and an output layer of the target detection model outputs coordinate information and confidence of a target in the image to be detected. With reference to fig. 4, in the embodiment of the present invention, the image 3 to be detected is input into the trained target detection model, and the target result 4 frame of the image 3 to be detected can be obtained through initialization of the target detection model. And then comparing the target result 4 with the target real result for calculation, and outputting the coordinate information and the confidence of the target result 4 of the image 3 to be detected.

In the embodiment of the invention, the current image to be detected is obtained; carrying out target detection on the current image to be detected by using a target detection model, and predicting a current detection result; the target detection model is obtained by carrying out background prediction training on a data set which is a negative sample. The target detection model can improve the training amount of the target detection model by carrying out background prediction training on the data group which is negative samples, and reduces the probability of false detection of the target detection model during application.

In some embodiments, referring to fig. 6, fig. 6 is an optional schematic flow chart of the method provided in the embodiments of the present invention, and S01 or S02 shown in fig. 5 may further include S03 to S06, which will be described with reference to the steps.

And S03, acquiring a training image set.

In the embodiment of the invention, before the target detection model detects the image to be detected or before the target detection model obtains the current image to be detected, a large amount of background prediction training needs to be carried out on the target detection model. And before the target detection model finishes the background class prediction training, the initial detection model is adopted. The initial detection model acquires a training image set, wherein the training image set comprises a plurality of data groups, and each data group comprises a plurality of samples used for training the initial detection model.

In the embodiment of the present invention, the samples are pre-stored in a sample database, wherein the sample database may include positive samples and negative samples. In the embodiment of the invention, the positive sample can be a picture including a human face, a picture including a two-dimensional code or a picture including a mobile phone. The negative examples may be pictures without faces or two-dimensional codes or mobile phones. The samples in the sample database are divided into a plurality of data groups before the initial detection model is trained. The sample database is communicatively coupled to the initial detection model, and the sample database transmits a training image set comprising a plurality of data sets to the initial detection model.

Referring to fig. 1, in an embodiment of the present invention, the data group in the training image set acquired by the initial detection model may include a data group 1. Wherein the data set 1 consists of positive and negative samples drawn in a ratio of 1:3 in the sample database. The data set 1 includes 8 samples, positive samples being pictures including faces, and negative samples being pictures not including faces. Where samples 1 and 2 are positive samples and samples 3, 4, 5, 6, 7 and 8 are negative samples. In the embodiment of the invention, the initial detection model can establish communication connection with the sample database, and when the initial detection model needs to be trained, the sample database transmits the data group 1 to the initial detection model.

With reference to fig. 2, in the embodiment of the present invention, the data set acquired by the initial detection model may include the data set 2. Where the data set 2 may consist of samples drawn in a sample database. The data group 2 includes 8 samples, the positive sample is a picture including a face, and the negative sample is a picture not including a face. Wherein, sample 1, sample 2, sample 3, sample 4, sample 5, sample 6, sample 7 and sample 8 are all negative samples. The manner in which the initial detection model obtains the data set 2 is the same as the manner in which the initial detection model obtains the data set 1, and details thereof are not repeated herein.

S04, if no positive samples are included in a data set, a first predetermined number of first negative samples are extracted from the trained sample set.

In the embodiment of the present invention, after the initial detection model acquires a plurality of data groups in the training image set, it is first required to determine whether the data groups include positive samples. The initial detection model compares the samples in the training image set with a preset standard result, so that the positive samples and the negative samples in the data set can be distinguished and marked. After the data set enters the initial detection model, the processing unit of the initial detection model identifies positive samples and negative samples in the data set, and when the samples in one data set are identified not to include the positive samples, the initial detection model extracts a first preset number of first negative samples from the trained sample set.

The initial detection model extracts a first predetermined number of first negative samples in the trained sample set in order of negative sample loss from large to small. The trained sample set may be a set formed by samples in other trained data groups in the model training. The set of trained samples includes positive and negative samples from other sets of trained data. After the other trained data sets are trained, the initial detection model can obtain the corresponding losses of the trained positive and negative samples. The initial detection model stores the positive and negative examples that have been trained to form a set of examples. Referring to fig. 2, in the embodiment of the present invention, when the data set 2 obtained by the initial detection model does not include a positive sample, the initial detection model extracts 16 first negative samples in the sample set. Wherein the number of first negative samples is exactly 2 times the number of samples in data set 2.

And S05, inputting the first negative sample into the initial detection model to obtain a first prediction background class result.

After the initial detection model acquires the first negative sample in the embodiment of the invention. The initial detection model can predict the first prediction background class result on the first negative sample by adopting a sliding window method or a selective search method. With reference to fig. 3, when the initial detection model obtains the image 1 to be detected, the intermediate layer of the initial detection model may predict the first prediction background class result 2 on the image 1 to be detected by using a sliding window method.

And S06, in the back propagation stage, training the initial detection model based on the first prediction background class result and the real result corresponding to the first negative sample until the target detection model is obtained.

In the embodiment of the present invention, one training of the initial detection model can be represented by the following stages: a forward propagation stage, a backward propagation stage and a weight updating stage. The forward propagation phase is the transmission of the first negative sample from the input layer input back to the output layer. The back propagation phase is from the output layer onwards all the way to the input layer. In the target detection method provided by the embodiment of the invention, the first negative sample is input into the network structure of the initial detection model to be trained in the forward propagation stage. And calculating the loss corresponding to the first negative sample by the network structure of the initial detection model through a loss function based on the first prediction background class result of the first negative sample and the real result corresponding to the first negative sample. And the real result corresponding to the first negative sample is a background type result of the negative sample calibrated by a person or a machine in advance.

In the embodiment of the invention, the network structure of the initial detection model calculates the loss corresponding to the first negative sample based on the loss function, and adds the loss of the first negative sample and the loss corresponding to other data groups to obtain the current loss. If the current loss is larger than the loss threshold value, the network structure of the initial detection model transmits back to the middle layer and the input layer by layer through the output layer based on the current loss, and the weight of each layer is corrected in a gradient descending mode. After the weight values of all layers of the network structure of the initial detection model are corrected, the network structure of the initial detection model can continue to train the newly obtained training image set. The process of training the initial detection model to obtain the target detection model is carried out until the current loss calculated by the initial detection model is not greater than a loss threshold value, or the training times of the initial detection model reach the preset training times.

The embodiment of the invention extracts a first negative sample from a trained sample set; inputting the first negative sample into an initial detection model to obtain a first prediction background class result; and in a back propagation stage, training the initial detection model based on the first prediction background class result and the real result corresponding to the first negative sample until a target detection model is obtained. A large amount of background prediction training can be carried out when the initial detection model training meets a target-free sample, so that the training amount of the initial detection model is increased, and the probability of false detection of the target detection model in application is reduced.

In some embodiments, referring to fig. 7, fig. 7 is an optional flowchart of the method provided by the embodiment of the present invention, and S101-S103 are further included between S03 and S04 shown in fig. 6, which will be described with reference to each step.

S101, inputting the training image set into an image recognition model to obtain a first image recognition result.

In the embodiment of the present invention, after the initial detection model acquires each sample of the data set in the training image set, the samples in the training image set are input into the image recognition model, where the image recognition model may be a pre-established convolutional neural network model, and the image recognition model is a branch model of the initial detection model. The image recognition model processes each sample in the training image set to obtain a first image recognition result corresponding to each sample. A first image recognition result for each sample within the data set may be obtained.

In an embodiment of the present invention, the first image recognition result is a recognition block diagram of a corresponding sample. The image recognition model processes the positive sample to obtain a first image recognition result corresponding to the positive sample, wherein the first image recognition result corresponding to the positive sample comprises the target. The image recognition model processes the negative sample to obtain a first image recognition result corresponding to the negative sample, wherein the first image recognition result corresponding to the negative sample comprises a background area of the negative sample.

S102, when the first image identification result corresponding to one data group has an image with the fitting degree with a preset standard result larger than a fitting degree threshold value, determining that one data group contains a positive sample.

In the embodiment of the invention, the initial detection model compares the first image recognition results corresponding to the samples in one data set with the preset standard results to obtain the fitting degree of the first image recognition results corresponding to all the samples in one data set and the preset standard results. When the fitting degree of the first image identification result corresponding to the sample and the preset standard result in one data set is larger than the fitting degree threshold value, the data set comprises the positive sample.

In the embodiment of the present invention, the fitness threshold is an artificially set constant, and is used to determine the error between the first image recognition result of the sample in the data set and the preset standard result. If the fitting degree is larger than the fitting degree threshold value, the error between the first image recognition result and the preset standard result is small, and the sample corresponding to the first image recognition result is a positive sample.

In an exemplary combination with fig. 1, after the initial detection model acquires the data set 1, it is recognized that the degree of engagement between the sample and the preset standard result in the data set 1 is greater than the degree of engagement threshold, which indicates that the data set 1 includes the positive sample.

In the embodiment of the invention, the degree of engagement comprises the overlapping rate of the images between the first image recognition result and the preset standard result. The fitness threshold is a preset overlapping rate threshold, and when the fitness is larger than the fitness threshold, the overlapping rate of the images between the first image recognition result and the preset standard result is high. The first image recognition result is similar to the preset standard result.

S103, when the first image identification result corresponding to one data group does not have an image with the fitting degree larger than the fitting degree threshold value with the preset standard result, determining that one data group does not contain a positive sample.

In the embodiment of the present invention, when the degree of engagement between the first image recognition result corresponding to the sample and the preset standard result is not present in one data set and is greater than the degree of engagement threshold, the data set does not include the positive sample.

In an embodiment of the present invention, the fitness threshold may be an artificially set constant, and is used to determine an error between the first image recognition result of the sample in the data set and the preset standard result. And if the fitting degree is not greater than the fitting degree threshold value, the error between the first image recognition result and the preset standard result is larger, and the sample corresponding to the first image recognition result is a negative sample.

In the embodiment of the invention, the degree of engagement comprises the overlapping rate of the images between the first image recognition result and the preset standard result. The fitness threshold is a preset overlapping rate threshold, and when the fitness is not greater than the fitness threshold, the overlapping rate of the images between the first image recognition result and the preset standard result is very low. The first image recognition result is very different from the preset standard result.

In an exemplary combination with fig. 2, after the initial detection model acquires the data set 2, it is recognized that there is no sample in the data set 2 and the degree of engagement of the preset standard result is greater than the degree of engagement threshold, which indicates that no positive sample is included in the data set 2.

In some embodiments, referring to fig. 8, fig. 8 is an optional flowchart of the method provided in the embodiments of the present invention, and S21 to S22 are further included between S03 and S06 shown in fig. 6, which will be described with reference to the steps.

S21, if a data set includes positive samples, extracting second positive samples and second negative samples in a data set according to a first predetermined ratio.

In an embodiment of the present invention, if a positive sample is included in a data set, the initial detection model will extract a second positive sample and a second negative sample in the data set according to a first predetermined ratio. Wherein the positive sample is a sample including the target.

In an embodiment of the present invention, the first predetermined ratio may be 1:3 or 1: 2 or 1:4 in the same ratio. The number of samples within a data set is typically 10 or 20. The initial detection model extracts a number of second positive samples and second negative samples within a data set according to a first predetermined ratio.

In the embodiment of the invention, the initial detection model judges that a data set comprises positive samples through S101-S103, and then extracts second positive samples and second negative samples in the data set according to a first preset proportion. With reference to fig. 1, in the embodiment of the present invention, the data set acquired by the initial detection model may be a data set 1. Data set 1 includes 8 samples. Where samples 1 and 2 are positive samples and samples 3, 4, 5, 6, 7 and 8 are negative samples. In an embodiment of the present invention, the second positive and negative samples may be extracted in the data set 1 at a ratio of 1:3 or 1:4, respectively. For example, 1 second positive sample and 3 second negative samples are extracted in data set 1. Wherein the second positive sample may include: sample 1, the second negative sample may include: sample 3, sample 4 and sample 5.

And S22, inputting the second positive sample and the second negative sample into the initial detection model to obtain a target result corresponding to the second positive sample and a second prediction background result corresponding to the second negative sample.

In the embodiment of the invention, the second positive sample and the second negative sample are input into the initial detection model, and the target result corresponding to the second positive sample and the second prediction background result corresponding to the second negative sample can be obtained through initialization of the initial detection model.

In an embodiment of the present invention, the target result corresponding to the second positive sample may be a picture frame including only the target. The result of the predicted background class corresponding to the second negative example may be a picture frame including the background area on the second negative example.

After the initial detection model acquires 1 second positive sample and 3 second negative samples in the embodiment of the invention. The initial detection model may use a sliding window method or a selective search method to process the sample 1 in the second positive sample to obtain a target result corresponding to the second positive sample, where the target result is a face region in the sample 1. Meanwhile, the initial detection model may use a sliding window method or a selective search method to process the sample 3, the sample 4, and the sample 5 in the second negative sample to obtain a second prediction background class result corresponding to the sample 3, a second prediction background class result corresponding to the sample 4, and a second prediction background class result corresponding to the sample 5. The second prediction background type result corresponding to the sample 3 is a bonsai image area in the sample 3, the second prediction background type result corresponding to the sample 4 is a notebook computer image area in the sample 4, and the second prediction background type result corresponding to the sample 5 is a pony image area in the sample 5.

In some embodiments, referring to fig. 9, fig. 9 is an optional flowchart of the method provided in the embodiments of the present invention, and S06 shown in fig. 6 may be implemented through S23-S25, which will be described with reference to the steps.

And S23, in the back propagation stage, determining the negative sample loss according to the first preset loss function, the first prediction background result and the first real result corresponding to the first negative sample.

In the embodiment of the present invention, the initial detection model may determine the negative sample loss according to a first preset loss function, a first prediction background class result, and a first real result corresponding to the first negative sample. Wherein the first predetermined loss function may be a cross-entropy loss function or an L2 loss function.

In the embodiment of the present invention, the coordinate of the center point of the first prediction background class result, the width of the first prediction background class result, and the height of the first prediction background class result of the first negative sample may be obtained through the network structure of the initial detection model, and meanwhile, the network structure of the initial detection model may also obtain the true distribution probability of the sample on the first negative sample by performing comparison calculation through the coordinate of the center point of the first prediction background class result, the width of the first prediction background class result, the height of the first prediction background class result, and the first true result corresponding to the first negative sample.

In the embodiment of the present invention, the network structure of the initial detection model may determine the negative sample loss by calculating the first negative sample through a cross entropy loss function or an L2 loss function. In the embodiment of the present invention, the cross entropy loss function formula (1) is:

wherein n is the number of the first negative samples, p (x)_i) Is the ith first negative sample true distribution probability, q (x)_i) The distribution probability is predicted for the ith first negative sample. The initial detection model may obtain a negative sample loss of the first negative sample by calculating a difference between the first negative sample true distribution probability and the first negative sample predicted distribution probability.

And S24, determining the loss of the positive sample according to the second preset loss function, the target result, the second prediction background class result, the target real result corresponding to the target result and the second real result corresponding to the second negative sample.

In the embodiment of the invention, the initial detection model determines the second positive sample loss through the second loss function based on the target result and the target real result corresponding to the target result. And the initial detection model determines the loss of the second negative sample through a second loss function based on the second prediction background class result and a second real result corresponding to the second negative sample. The second positive sample loss and the second negative sample loss are added to a positive sample loss.

In the embodiment of the present invention, the coordinate of the central point of the target result, the width of the target result, and the height of the predicted target result on the second positive sample may be obtained through the network structure of the initial detection model, and the network structure of the initial detection model may also calculate and obtain the true distribution probability of the sample on the second positive sample according to the coordinate of the central point of the target result, the width of the target result, the height of the target result, and the target true result corresponding to the second positive sample, and simultaneously acquire the pixel value of each pixel on the target result. The network structure of the initial detection model can obtain the coordinates of the center point of the second prediction background class result, the width of the second prediction background class result and the height of the second prediction background class result of the second negative sample through the network structure of the initial detection model, and simultaneously the network structure of the initial detection model can also obtain the sample real distribution probability on the second negative sample through the coordinates of the center point of the second prediction background class result, the width of the second prediction background class result and the height of the second prediction background class result and the second real result corresponding to the second negative sample, and acquire the pixel value of each pixel on the second prediction background class result.

In the embodiment of the invention, the network structure of the initial detection model calculates the positive sample loss through the cross entropy loss function and the L2 loss function on the second positive sample and the second negative sample. Wherein the positive sample loss comprises a second positive sample loss and a second negative sample loss. In the embodiment of the present invention, the initial detection model calculates the second negative sample loss through a cross entropy loss function, where the cross entropy loss function formula (1) is:

wherein n is the number of the second negative samples, p (x)_i) For the ith second negative sample true distribution probability, q (x)_i) The distribution probability is predicted for the ith second negative sample. The initial detection model may obtain a second negative sample loss by calculating a difference between the second negative sample true distribution probability and the second negative sample predicted distribution probability.

In the embodiment of the present invention, the initial detection model calculates the second positive sample loss through an L2 loss function, where the L2 loss function formula (2) is:

where N is the number of second positive samples, P is the total number of pixels in a single second positive sample, O_ij(θ) is the output value of the jth pixel in the ith second positive sample, Y_ijIn the corresponding real result for the second positive sampleThe value of the jth pixel element in the ith sample, ξ_jThe weight assigned to the target in the second positive sample. Wherein, P and O_ij(θ) may be obtained by the primary detection model. The initial detection model may obtain the second positive sample loss by calculating the second positive sample and the loss of the corresponding true result of the second positive sample.

In the embodiment of the present invention, the positive sample loss may be obtained by adding the second negative sample loss and the second positive sample loss.

And S25, training the initial detection model based on the positive sample loss and the negative sample loss until a target detection model is obtained.

In the embodiment of the invention, the initial detection model adds the positive sample loss and the negative sample loss to obtain the current loss of the initial detection model, if the current loss is greater than the loss threshold, the initial detection model obtains the next training image set to perform the next model training until the next loss is not greater than the loss threshold, and the target detection model is obtained.

In the embodiment of the invention, the initial detection model adds the positive sample loss and the negative sample to obtain the current loss of the initial detection model, if the current loss is greater than the loss threshold, the initial detection model obtains the next training image set to perform the next model training until the training times of the initial detection model reach the preset times, and the target detection model is obtained.

In some embodiments, referring to fig. 10, fig. 10 is an optional flowchart of the method provided in the embodiment of the present invention, and S25 shown in fig. 9 may be implemented through S26-S27, which will be described in conjunction with the steps.

And S26, determining the current loss according to the negative sample loss and the positive sample loss.

In the embodiment of the present invention, the current loss can be obtained by adding the positive sample loss and the negative sample loss.

In the embodiment of the invention, the positive sample loss is the loss calculated by a loss function of the data group including the positive sample. Negative sample loss is the loss calculated by the loss function for a data set that does not include positive samples. The current loss is obtained by training the initial detection model for one time aiming at the training image set.

And S27, if the current loss is larger than the preset loss threshold, acquiring the next training image set, and continuing training the initial detection model until the next loss is not larger than the preset loss threshold, so as to obtain the target detection model.

In the embodiment of the invention, if the current loss is greater than the preset loss threshold, the network structure of the initial detection model transmits back to the middle layer and the input layer by layer through the output layer based on the current loss, and the weight of each layer of the initial detection model is corrected in a gradient descending manner. After the weight values of all layers of the network structure of the initial detection model are corrected, the network structure of the initial detection model can continue to train the newly obtained training image set. The process of training the initial detection model to obtain the target detection model is carried out until the current loss calculated by the initial detection model is not greater than a preset loss threshold value, or the training times of the initial detection model reach preset training times.

In an embodiment of the present invention, the next training image set obtained by the initial detection model may be obtained in the same way as the current training image set. In some embodiments, the initial detection model may treat the current training image set as the next training image set.

In some embodiments, referring to fig. 11, fig. 11 is an optional flowchart of the method provided in the embodiment of the present invention, and S26 shown in fig. 10 may be implemented by S28, which will be described with reference to steps.

S28, the positive sample loss and the negative sample loss are weighted to determine the current loss.

In the embodiment of the invention, the initial detection model is trained to obtain the positive sample loss and the negative sample loss corresponding to the newly acquired data group in the training image set. And multiplying the loss of the positive sample corresponding to the data group in the newly acquired training image set by the first weight, multiplying the loss of the negative sample corresponding to the data group in the newly acquired training image set by the second weight, and adding the adjusted loss of the positive sample and the loss of the negative sample by the initial detection model to determine the current loss. Wherein the first weight is greater than the second weight.

In the embodiment of the invention, when the initial detection model trains the corresponding positive sample of the data group in the newly acquired training image set to be lost, the target result of the data group including the positive sample in the training image set is multiplied by the first weight, and the positive sample loss is calculated. The initial detection model adds the positive sample loss and the negative sample loss to determine the current loss.

In the embodiment of the invention, when the initial detection model trains the corresponding positive sample loss of the newly acquired data group in the training image set, the second positive sample loss in the positive sample loss is multiplied by the first weight, and then the second positive sample loss and the second negative sample loss are added to obtain the positive sample loss. The initial detection model adds the positive sample loss and the negative sample loss to determine the current loss.

Fig. 12 is a schematic structural diagram of a target detection apparatus according to an embodiment of the present invention.

The object detection apparatus 800 in the embodiment of the present invention includes: an image acquisition unit 81 and a detection unit 82.

The image acquisition unit 81 is used for acquiring a current image to be detected;

and the detection unit 82 is used for performing target detection on the current image to be detected by using the target detection model and predicting a current detection result. In the embodiment of the invention, the target detection model comprises a detection model based on a convolutional neural network model, the target detection model is obtained by carrying out background class prediction training on a data set which is a negative sample, the target detection model can effectively process a large number of images to be detected in different domains after the background class prediction training, and the false detection rate of the target detection model after the background class prediction training is obviously reduced when the images to be detected are processed.

In the embodiment of the present invention, the target detection apparatus 800 first acquires a training image set when performing the background class prediction training. Wherein the training image set comprises a plurality of data groups, and one data group comprises a plurality of samples for training; if no positive samples are included in a data set, the target detection apparatus 800 extracts a first predetermined number of first negative samples from the trained sample set; the target detection device 800 inputs the first negative sample into the initial detection model to obtain a first prediction background class result; in the back propagation stage, the target detection apparatus 800 trains the initial detection model based on the first prediction background result and the real result corresponding to the first negative sample until the target detection model is obtained.

In the embodiment of the present invention, after the target detection apparatus 800 acquires the training image set, it needs to determine whether the data group in the training image set includes a positive sample. The target detection device 800 inputs the training image set into the image recognition model to obtain a first image recognition result; when a first image identification result corresponding to one data set has an image with a conformity with a preset standard result larger than a conformity threshold value, determining that the one data set contains a positive sample; and when the first image identification result corresponding to one data set does not have an image with the fitting degree with the preset standard result larger than the fitting degree threshold, determining that one data set does not contain the positive sample.

In this embodiment of the present invention, if one data group does not include a positive sample, the target detection apparatus 800 extracts, from the second negative samples in the trained sample set, a first predetermined number of first negative samples in the descending order of the negative sample loss according to the negative sample loss corresponding to the second negative samples; the first predetermined number may be an integer multiple of the number of samples in a data group.

In the embodiment of the present invention, if a data group includes positive samples, the object detection apparatus 800 extracts a second positive sample and a second negative sample in a data group according to a first predetermined ratio; and inputting the second positive sample and the second negative sample into the initial detection model to obtain a target result corresponding to the second positive sample and a second prediction background result corresponding to the second negative sample. In the back propagation stage, the target detection apparatus 800 determines the negative sample loss according to the first preset loss function, the first prediction background result, and the first real result corresponding to the first negative sample. The target detection apparatus 800 determines the positive sample loss according to the second preset loss function, the target result, the second prediction background class result, the target real result corresponding to the target result, and the second real result corresponding to the second negative sample. Adding the negative sample loss and the positive sample loss to determine the current loss; if the current loss is greater than the preset loss threshold, the target detection device 800 acquires the next training image set, continues training the initial detection model until the next loss is not greater than the preset loss threshold, and acquires the target detection model.

In this embodiment of the present invention, the target detection apparatus 800 may further perform weighting processing on the positive sample loss and the negative sample loss to determine the current loss; wherein the first weight corresponding to the positive sample loss is greater than the second weight corresponding to the negative sample loss. The object detection apparatus 800 may multiply the positive sample loss by the first weight and then add the product of the negative sample loss and the second weight to obtain the current loss.

In the embodiment of the invention, the current image to be detected is obtained through an image acquisition unit; the detection unit is used for carrying out target detection on the current image to be detected by using the target detection model and predicting a current detection result; the target detection model is obtained by carrying out background prediction training on a data set which is a negative sample. The target detection model can improve the training amount of the target detection model by carrying out background prediction training on the data group which is negative samples, and reduces the probability of false detection of the target detection model during application.

It should be noted that, in the embodiment of the present invention, if the target detection method is implemented in the form of a software functional module and sold or used as a standalone product, the target detection method may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing an object detection apparatus (which may be a personal computer, a network device, or the like) to execute all or part of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Correspondingly, the embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the above-mentioned method.

Correspondingly, the embodiment of the present invention provides an object detection apparatus, which includes a memory 803 and a processor 801, wherein the memory 803 stores a computer program that can be executed on the processor 801, and the processor 801 executes the computer program to implement the steps of the method.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention.

It should be noted that fig. 13 is a schematic diagram of a hardware entity of the object detection apparatus according to the embodiment of the present invention, as shown in fig. 13, the hardware entity of the object detection apparatus 800 includes: a processor 801, a communication interface 802, and a memory 803, wherein

The processor 801 generally controls the overall operation of the object detection apparatus 800.

The communication interface 802 may enable the object detection apparatus to communicate with a data set or sample database over a network.

The Memory 803 is configured to store instructions and applications executable by the processor 801, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by each module in the processor 801 and the target detection apparatus 800, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and all such changes or substitutions are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of object detection, comprising:

acquiring a current image to be detected;

2. The method for detecting the target of claim 1, wherein before the target detection is performed on the current image to be detected by using a preset target detection model and a current detection result is predicted, the method further comprises:

3. The method of object detection according to claim 2, wherein after said acquiring a set of training images, the method further comprises:

and when the first image identification result corresponding to the data group does not have an image with the fitting degree with the preset standard result larger than the fitting degree threshold, determining that the data group does not contain the positive sample.

4. The method of claim 2, wherein the extracting a first predetermined number of first negative samples from the trained sample set if the one data set does not include positive samples comprises:

if the positive sample is not included in the data group, extracting a first preset number of first negative samples from second negative samples of the trained sample set according to negative sample losses corresponding to the second negative samples in a descending order of the negative sample losses;

5. The method of object detection according to any of claims 2-4, wherein after said acquiring a set of training images, the method further comprises:

and inputting the second positive sample and the second negative sample into the initial detection model to obtain a target result corresponding to the second positive sample and a second prediction background result corresponding to the second negative sample.

6. The object detection method of claim 5, wherein when the training image set comprises: when the data group of the positive sample exists and the data group of the positive sample does not exist, in the back propagation stage, training the initial detection model based on the first prediction background class result and the real result corresponding to the first negative sample until the target detection model is obtained includes:

7. The method of claim 6, wherein the training the initial detection model based on the negative sample loss and the positive sample loss until obtaining the target detection model comprises:

8. The method of claim 7, wherein determining a current loss from the negative sample loss and the positive sample loss comprises:

9. The method of claim 8, wherein weighting the positive sample loss and the negative sample loss to determine the current loss comprises:

10. An object detection device, comprising:

11. An object detection apparatus, comprising a memory and a processor, the memory storing a computer program operable on the processor, the processor implementing the steps of the method of any one of claims 1 to 9 when executing the program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.