CN109784349B - Image target detection model establishing method, device, storage medium and program product - Google Patents
Image target detection model establishing method, device, storage medium and program product Download PDFInfo
- Publication number
- CN109784349B CN109784349B CN201811592967.8A CN201811592967A CN109784349B CN 109784349 B CN109784349 B CN 109784349B CN 201811592967 A CN201811592967 A CN 201811592967A CN 109784349 B CN109784349 B CN 109784349B
- Authority
- CN
- China
- Prior art keywords
- network model
- occlusion
- image sample
- image
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 109
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012549 training Methods 0.000 claims abstract description 79
- 238000004422 calculation algorithm Methods 0.000 claims description 25
- 230000001629 suppression Effects 0.000 claims description 19
- 238000011156 evaluation Methods 0.000 claims description 15
- 238000013135 deep learning Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 238000011176 pooling Methods 0.000 description 29
- 239000013598 vector Substances 0.000 description 11
- 230000008569 process Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides a method and a device for establishing an image target detection model, which are characterized in that an occlusion image sample is utilized to train a feature occlusion countermeasure network model, and an occlusion mask of the image sample is obtained through the feature occlusion countermeasure network model, so that the occlusion mask is added to a feature map of the image sample for training when the detection network model is trained, and the occlusion mask is obtained by utilizing the trained feature occlusion countermeasure network model. Therefore, the characteristic shielding countermeasure network model can be trained to generate a better mask code because the characteristic shielding countermeasure network model is obtained by utilizing the shielding image sample training, and then the shielding mask code for detecting the network model training is obtained by utilizing the characteristic shielding countermeasure network, so that the image sample with the better shielding mask code can be utilized for detecting the network model training, the detection network model can be fully trained for shielding situations, and the accuracy of detecting the shielding object by the detection network model is improved.
Description
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a method and an apparatus for establishing an image target detection model, a storage medium, and a program product.
Background
At present, a detection algorithm based on a deep convolutional neural network becomes a mainstream method for detecting an image target, such as a yolo (young Only Look one) algorithm and an ssd (single Shot multi box detector) algorithm, which directly converts a problem of target frame positioning into a regression problem for processing and has a faster detection speed. In the application of image target detection, the shielding situation of an object to be detected in an image exists, the shielding situation is less considered in the current detection model, and the accurate detection of the shielded object cannot be realized.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for establishing an image target detection model, a storage medium, and a program product, which improve the accuracy of detecting an occluding object.
In order to achieve the purpose, the invention has the following technical scheme:
a model building method for image target detection comprises the following steps:
occlusion the first image sample to obtain a first occluded image sample;
training a feature occlusion countermeasure network model by using the first occlusion image sample, wherein the feature occlusion countermeasure network model is used for obtaining an occlusion mask of the image sample based on a countermeasure network;
and training a detection network model, adding an occlusion mask to the feature image of the second image sample for training by using the trained feature occlusion countermeasure network model, wherein the detection network model is used for image target detection based on deep learning.
Optionally, the method further comprises: and utilizing the feature map added with the occlusion mask as an image sample, and continuing training the feature occlusion countermeasure network model.
Optionally, the blocking the first image sample includes:
determining a candidate frame in the first image sample having a greatest positioning accuracy;
mapping a sliding window with a preset size to an image sample, and filling an image area where the sliding window is located by using background pixels;
and detecting the filled image samples by using the detection network model, and taking a sliding window at the position where the candidate frame has the largest detection network loss as the occlusion position of the first image sample to obtain the first occlusion image sample.
Optionally, in the training of the detection network model, a prediction box is determined by using a non-maximum suppression algorithm, wherein a prediction box evaluation index in the non-maximum suppression algorithm is determined by the positioning accuracy and the position accuracy of different candidate boxes.
Optionally, the calculation formula of the candidate box evaluation index CL is:
CL=γ×soreclass+(1-γ)×scorelocation(ii) a Wherein gamma is a hyper-parameter, soreclassScore for classification accuracylocationThe positioning accuracy is obtained;
the formula of the non-maximum suppression algorithm is:
where CL is the positioning accuracy of the different candidate frames.
Optionally, a predictive IoU network model is preset, the predictive IoU network model being used to obtain the positioning accuracy of different candidate boxes.
Optionally, the training method of the predictive IoU network model includes:
generating a set of candidate frames for a third image sample;
obtaining the positioning accuracy of each candidate frame in the candidate frame set;
removing candidate frames with the positioning accuracy smaller than a preset threshold value in the candidate frame set to determine a training set of a third occlusion image sample;
and utilizing the training set to train the prediction IoU network model.
Optionally, the detection network model includes a global pooling module and a pooling classification module, where the pooling classification module is configured to classify the feature map after global pooling according to a corresponding position of a pooling mask to obtain a classification vector.
A model building apparatus for image object detection, comprising:
the occlusion sample acquisition unit is used for occluding the first image sample to obtain a first occlusion image sample;
the confrontation network training unit is used for training a characteristic shielding confrontation network model by utilizing the first shielding image sample, and the characteristic shielding confrontation network model is used for obtaining a shielding mask of the image sample based on the confrontation network;
and the detection network training unit is used for training a detection network model, the feature diagram of the second image sample for training is added with an occlusion mask by using the trained feature occlusion countermeasure network model, and the detection network model is used for image target detection based on deep learning.
Optionally, the method further comprises:
and the confrontation network retraining unit is used for utilizing the feature graph added with the shielding mask as an image sample and continuing training the feature shielding confrontation network model.
Optionally, in the occlusion sample acquiring unit, occluding the first image sample, including:
determining a candidate frame in the first image sample having a greatest positioning accuracy;
mapping a sliding window with a preset size to an image sample, and filling an image area where the sliding window is located by using background pixels;
and detecting the filled image samples by using the detection network model, and taking a sliding window at the position where the candidate frame has the largest detection network loss as the occlusion position of the first image sample to obtain the first occlusion image sample.
Optionally, in the training of the detection network model, a prediction box is determined by using a non-maximum suppression algorithm, wherein a prediction box evaluation index in the non-maximum suppression algorithm is determined by the positioning accuracy and the position accuracy of different candidate boxes.
Optionally, the calculation formula of the candidate box evaluation index CL is:
CL=γ×soreclass+(1-γ)×scorelocation(ii) a Wherein gamma is a hyper-parameter, soreclassScore for classification accuracylocationThe positioning accuracy is obtained;
the formula of the non-maximum suppression algorithm is:
where CL is the positioning accuracy of the different candidate frames.
Optionally, the method further comprises: a pre-set predictive IoU network model, the predictive IoU network model for obtaining positioning accuracy of different candidate boxes.
Optionally, a prediction IoU network model training unit is further included for generating a set of candidate boxes for the third image sample; obtaining the positioning accuracy of each candidate frame in the candidate frame set; removing candidate frames with the positioning accuracy smaller than a preset threshold value in the candidate frame set to determine a training set of a third occlusion image sample; and utilizing the training set to train the prediction IoU network model.
Optionally, the detection network model includes a global pooling module and a pooling classification module, where the pooling classification module is configured to classify the feature map after global pooling according to a corresponding position of a pooling mask to obtain a classification vector.
A computer-readable storage medium, wherein the computer-readable storage medium stores instructions that, when executed on a terminal device, cause the terminal device to perform any one of the above-mentioned image object detection model building methods.
A computer program product, which when run on a terminal device, causes the terminal device to execute any of the above-described image object detection model building methods.
According to the method and the device for establishing the image target detection model, the feature occlusion countermeasure network model is trained by using the occlusion image sample, and the occlusion mask of the image sample is obtained through the feature occlusion countermeasure network model, so that the occlusion mask is added to the feature image of the image sample for training when the detection network model is trained, and the occlusion mask is obtained by using the trained feature occlusion countermeasure network model. Therefore, the characteristic shielding countermeasure network model can be trained to generate a better mask code because the characteristic shielding countermeasure network model is obtained by utilizing the shielding image sample training, and then the shielding mask code for detecting the network model training is obtained by utilizing the characteristic shielding countermeasure network, so that the image sample with the better shielding mask code can be utilized for detecting the network model training, the detection network model can be fully trained for shielding situations, and the accuracy of detecting the shielding object by the detection network model is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 shows a flow diagram of a method of building an image target detection model according to an embodiment of the invention;
FIG. 2 is a schematic diagram illustrating a process of obtaining an occlusion sample by the method for establishing an image target detection model according to the embodiment of the invention;
FIG. 3 shows a schematic diagram of a mask according to an embodiment of the invention;
fig. 4 is a schematic structural diagram of an apparatus for building an image target detection model according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
As described in the background art, a detection algorithm based on a deep convolutional neural network has become a mainstream method for detecting an image target, and in the application of image target detection, an object to be detected in an image often has a shielding situation, while the shielding situation is rarely considered in the current detection model, and accurate detection of the shielded object cannot be realized. Based on the above, the application provides an image target detection model establishing method, and the image sample with better shielding mask is adopted for training the detection network model during model training, so that the detection network model is fully trained on shielding situations, and the accuracy of the detection network model on shielding objects is improved.
In order to better understand the technical solutions and technical effects of the present application, specific embodiments will be described in detail below with reference to flowcharts.
Referring to fig. 1, in step S01, the first image sample is masked to obtain a first masked image sample.
The first image sample is an original image sample, and an occluded first image sample, i.e., a first occluded image sample, is obtained by adding an occlusion to the original image sample.
In a specific application, the occlusion may be performed by using a suitable method, and in this embodiment, the step of occluding the first image sample specifically includes the following steps.
S011, determining a candidate frame with the maximum positioning accuracy in the first image sample.
S012, maps the sliding window with a preset size to the image sample, and fills the image area where the sliding window is located with the background pixels.
And S013, detecting the filled image samples by using the detection network model, and taking the sliding window at the position where the candidate frame has the largest detection network loss as the occlusion position of the first image sample to obtain the first occlusion image sample.
For the first image sample, a candidate frame with the maximum localization accuracy is determined on the first image sample, and the candidate frame with the maximum localization accuracy is the candidate frame with the maximum Intersection over Union ratio IoU (Intersection over Union ratio) to the real mark on the image sample. A feature map of the image sample may be obtained, different candidate boxes may be placed on the feature map, and the different candidate boxes and true labeled IoU may be computed, with the candidate box having the largest IoU value being the candidate box for the greatest positioning accuracy.
Then, a sliding window with a preset size is set, the size of the sliding window can be set according to specific needs, for example, the sliding window can be a rectangular frame with a size of (w/3, d/3), w and d are respectively the length and width of the image sample, the size of the sliding window will remain unchanged, the position on the image sample changes, for the sliding window at different positions, the sliding window is mapped onto the image sample, and the image area where the sliding window is located is filled with background pixels, which can be in a random filling manner, so that at different positions of the image sample, a barrier is formed by using the sliding window.
Then, the filled image sample is detected by using a detection network model, where the detection network model is a model for detecting an image target based on deep learning, that is, a detection network model based on a deep convolutional neural network, and may be a model based on algorithms such as yolo (young Only Look one) and ssd (single Shot multi box detector), for example, and the detection network model may be a model subjected to certain training. By selecting the sliding window that maximizes the loss of the detection network as the occlusion position of the image sample, an occlusion image sample with the best occlusion position can be obtained.
When the method of the embodiment is used for obtaining the occlusion image sample, the candidate frame with the maximum positioning accuracy in the image sample is determined, the selected position of the candidate frame is closest to the real mark, the detection target can be better embodied, and the sliding window at the position where the maximum detection network loss is located is used as the sample position of the image, so that the occlusion image sample with the best occlusion position is obtained at the best detection target position, and the quality of the occlusion sample is improved.
To further facilitate understanding of the method for obtaining an occlusion image sample, the following description of an occlusion sample obtaining process is performed by using an example of an image sample, and referring to fig. 2, first, an image sample of a plurality of candidate frames may be selected in a frame, as shown in (a), and then, a feature map of the image sample may be obtained, as shown in (B), and then, a candidate frame having a maximum IoU with a true mark may be determined, as shown in (C), with the candidate frame as a detection target, occlusion filling may be performed by using a sliding window, and a sliding window at a position having a maximum detection network loss with the candidate frame may be used as an occlusion position of the image sample, as shown in (D), so that an occlusion image sample having an occlusion may be obtained.
In step S02, the first occlusion image sample is used to perform training of a feature occlusion countermeasure network model, which is used to obtain an occlusion mask of the image sample based on the countermeasure network.
The feature occlusion countermeasure network model is a model for obtaining an occlusion mask of an image sample, is based on a model of an countermeasure network, and is a deep learning model established on the basis of the countermeasure network. In this embodiment, after obtaining the feature map of the image sample, the feature occlusion countermeasure network is implemented by adding a full connection layer for learning an occlusion mask, where the occlusion mask is a mask corresponding to the feature map and is used to embody a mask of the occlusion sample.
For ease of understanding, reference is made to FIG. 3, which shows a mask M corresponding to an image sample, which is an array mask value corresponding to a different position of the image sample. For convenience of description, in the present application, a mask corresponding to an occlusion image sample having an occlusion is referred to as an occlusion mask.
The method comprises the following steps of training a feature occlusion countermeasure network model by utilizing occlusion image samples, particularly training high-quality occlusion image samples, obtaining the feature occlusion countermeasure network model generating a more optimal mask, and preferably adopting a loss function of cross entropy in the feature occlusion countermeasure network training, wherein a specific expression is as follows:
wherein,for the value of the (i, j) position in the mask M for the p-th image sample in the training image samples,for the prediction value, n represents the number of training image samples, and d represents the size of the training image samples.
In step S03, training of a detection network model for image target detection based on deep learning is performed, and an occlusion mask is added to the feature map of the second image sample for training by using the trained feature occlusion countermeasure network model.
As mentioned above, the detection network model is a model for detecting an image target based on deep learning, in this step, training of the detection network model is continued, the training is performed by using a second image sample, which may be a sample different from the first image sample, for the second image sample, after generating a corresponding feature map, an occlusion mask is added to the feature map, the occlusion mask is obtained by using the trained feature occlusion countermeasure network model, and after adding the occlusion mask, occlusion is performed at a corresponding position of the feature map, that is, a feature map of the second image sample with occlusion is obtained. The characteristic diagram with shielding is utilized to train the detection network model, so that the detection network model can be fully trained on shielding situations, and the accuracy of the detection network model on shielding objects is improved.
In addition, when the detection network model is trained, cross training of the feature occlusion countermeasure network model can be performed, on one hand, the feature occlusion countermeasure network model is used for adding an occlusion mask to an image sample during the detection network model training, and therefore the occlusion sample can be used for training the detection network model; meanwhile, when the detection network model is trained, the generated feature map of the second image sample added with the shielding mask is used as the image sample, and the training of the feature shielding countermeasure network model is continued, namely, the shielding sample generated during the training of the detection network model is used for continuing the training of the feature shielding countermeasure network model, and the retraining is helpful for the training to obtain the feature shielding countermeasure network model capable of generating a more optimal mask.
In the training process of the detection network model, a prediction frame needs to be determined from the candidate frames, and the evaluation standard of the prediction frame has a decisive effect on the accuracy of the position of the prediction frame. In a more preferred embodiment of the present application, in training a detection network model, a prediction frame is determined by using a Non-Maximum Suppression algorithm (NMS), where an evaluation index of the prediction frame in the Non-Maximum Suppression algorithm is determined by the positioning accuracy and the position accuracy of different candidate frames.
Specifically, the non-maximum suppression algorithm removes some redundant candidate frames by using the NMS evaluation index, and determines a prediction frame in the remaining candidate frames through correlation calculation. In the embodiment, the evaluation index of the prediction frame in the non-maximum suppression algorithm is determined by the positioning accuracy and the position accuracy of different candidate frames, so that the classification confidence coefficient and the positioning accuracy can be comprehensively considered, and the method is more reasonable and beneficial to improving the accuracy of prediction. In a specific application, the calculation formula of the candidate box evaluation index CL may be:
CL=γ×soreclass+(1-γ)×scorelocation; (2)
wherein gamma is a hyper-parameter, soreclassScore for classification accuracylocationFor positioning accuracy, the positioning accuracy scorelocationMay be an IoU value, the classification accuracy rate soreclassCan be obtained by inspecting the network model.
In addition, when the candidate frame evaluation index CL is used to determine the prediction frame through the non-maximum suppression algorithm, different detection results can be generated by setting different thresholds, the thresholds can be reasonably set to avoid false detection and missed detection, and preferably, the non-maximum suppression algorithm can be implemented by adopting the following formula with three threshold settings, specifically the following formula:
where CL is the positioning accuracy of different candidate frames, CL may be obtained by the above formula (2).
Through three threshold settings, for a candidate frame with large difference, the frame is basically a frame with low target accuracy, and the threshold is reduced, so that the candidate frame is more easily inhibited; and for the candidate frame with low diversity, the frame with high target accuracy is used, and the threshold value is increased, so that the inhibition can be reduced, the false detection and the missing detection are avoided, and the detection accuracy is improved.
Further, for the above-mentioned positioning accuracy related to the candidate frame, such as when the first image sample is occluded, when the candidate frame with the maximum positioning accuracy in the first image sample is determined, and when the training of the detection network model is performed, the position accuracy adopted in the evaluation index of the prediction frame in the non-maximum suppression algorithm can be obtained by using the prediction IoU network model. The predictive IoU network model is used to obtain the positioning accuracy of different candidate boxes, the predictive IoU network model is a deep learning based model that may include a pooling layer and a fully connected layer.
In a specific application, the training method of the predictive IoU network model may include:
generating a set of candidate frames for a third image sample;
obtaining the positioning accuracy of each candidate frame in the candidate frame set;
removing candidate frames with the positioning accuracy smaller than a preset threshold value in the candidate frame set to determine a training set of a third occlusion image sample;
and utilizing the training set to train the prediction IoU network model.
The third image sample may be the same as or different from the first image sample or the second image sample, the third image sample may be randomly selected to generate a candidate frame set of the third image sample, and then the positioning accuracy of each candidate frame in the candidate frame set is obtained, where the positioning accuracy is IoU value between each candidate frame and the real mark,after deleting the candidate frame with smaller positioning accuracy, which is usually the candidate frame with positioning accuracy less than 0.5, a training set C of the third occlusion image sample is obtained:wherein, CiThe (i) th candidate box is represented,training a prediction IoU network model by using the training set C for IoU values between the ith candidate box and the true marker, wherein in the training process, a smoothing loss function L1 can be adopted, and a specific expression is as follows:
where n represents the number of candidate frames, iouiFor the IoU value between the predicted value and the true tag value of the ith candidate box,is the IoU value between the ith candidate box and the true label.
In addition, in a specific application process, a background image is usually included in the candidate frame, and the background image also interferes with object detection, resulting in detection inaccuracy. In a more preferred embodiment of the present application, the detection network model includes a global pooling module and a pooling classification module, and the pooling classification module is configured to classify the feature map after global pooling according to a corresponding position of the pooling mask to obtain a classification vector.
The global pooling module performs global average pooling on input samples, in the embodiment of the application, the input samples are feature maps added with shielding masks, the pooling classification module can comprise a full connection layer, a pooling mask layer and a vector classification module, the full connection layer is used for learning global average pooling weights, information of different positions is distinguished, pooling masks of all feature maps are obtained through the pooling mask layer, multiplication operation is performed on the feature maps and corresponding positions in the pooling masks, classification vectors are obtained, the classification vectors are classified through the vector classification module, and the vector classification module can be a softmax tool, for example, and the classification vectors are obtained. The classification vector takes the background information around the object into consideration, and can generate certain influence on object classification, so that the detection accuracy is further improved.
The above describes in detail the model building method for image target detection according to the embodiment of the present application, and in addition, the present application also provides a model building apparatus for image target detection, which implements the method described above, and with reference to fig. 4, the method includes:
an occlusion sample obtaining unit 400, configured to occlude the first image sample to obtain a first occlusion image sample;
a confrontation network training unit 410, configured to perform training of a feature occlusion confrontation network model using the first occlusion image sample, where the feature occlusion confrontation network model is configured to obtain an occlusion mask of the image sample based on a confrontation network;
and the detection network training unit 420 is configured to train a detection network model, add an occlusion mask to the feature map of the second image sample for training by using the trained feature occlusion countermeasure network model, where the detection network model is used for image target detection based on deep learning.
Further, still include:
and the confrontation network retraining unit is used for utilizing the feature graph added with the shielding mask as an image sample and continuing training the feature shielding confrontation network model.
Further, in the occlusion sample acquiring unit 400, occluding the first image sample includes:
determining a candidate frame in the first image sample having a greatest positioning accuracy;
mapping a sliding window with a preset size to an image sample, and filling an image area where the sliding window is located by using background pixels;
and detecting the filled image samples by using the detection network model, and taking a sliding window at the position where the candidate frame has the largest detection network loss as the occlusion position of the first image sample to obtain the first occlusion image sample.
Further, in the training of the detection network model, a prediction frame is determined by using a non-maximum suppression algorithm, wherein a prediction frame evaluation index in the non-maximum suppression algorithm is determined by the positioning accuracy and the position accuracy of different candidate frames.
Further, the calculation formula of the candidate box evaluation index CL is:
CL=γ×soreclass+(1-γ)×scorelocation(ii) a Wherein gamma is a hyper-parameter, soreclassScore for classification accuracylocationThe positioning accuracy is obtained;
the formula of the non-maximum suppression algorithm is:
where CL is the positioning accuracy of the different candidate frames.
Further, still include: a pre-set predictive IoU network model, the predictive IoU network model for obtaining positioning accuracy of different candidate boxes.
Further, a prediction IoU network model training unit is included for generating a set of candidate boxes for the third image sample; obtaining the positioning accuracy of each candidate frame in the candidate frame set; removing candidate frames with the positioning accuracy smaller than a preset threshold value in the candidate frame set to determine a training set of a third occlusion image sample; and utilizing the training set to train the prediction IoU network model.
Further, the detection network model includes a global pooling module and a pooling classification module, and the pooling classification module is configured to classify the feature map after global pooling according to a corresponding position of the pooling mask to obtain a classification vector.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute the above-mentioned model building method for detecting an image object.
The embodiment of the present application further provides a computer program product, and when the computer program product runs on a terminal device, the terminal device is enabled to execute the above model building method for detecting an image target.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the system or the device disclosed by the embodiment, the description is simple because the system or the device corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.
Claims (8)
1. A model building method for image target detection is characterized by comprising the following steps:
determining a candidate frame in the first image sample having a greatest positioning accuracy; mapping a sliding window with a preset size to an image sample, and filling an image area where the sliding window is located by using background pixels; detecting the filled image samples by using a detection network model, and taking a sliding window at the position where the candidate frame has the largest detection network loss as an occlusion position of the first image sample to obtain the first occlusion image sample;
training a feature occlusion countermeasure network model by using the first occlusion image sample, wherein the feature occlusion countermeasure network model is used for obtaining an occlusion mask of the image sample based on a countermeasure network;
and training a detection network model, adding an occlusion mask to the feature image of the second image sample for training by using the trained feature occlusion countermeasure network model, wherein the detection network model is used for image target detection based on deep learning.
2. The method of claim 1, further comprising: and utilizing the feature map added with the occlusion mask as an image sample, and continuing training the feature occlusion countermeasure network model.
3. The method of claim 1, wherein in the training of the detection network model, the prediction box is determined by using a non-maximum suppression algorithm, wherein the candidate box evaluation index in the non-maximum suppression algorithm is determined by the positioning accuracy and the position accuracy of different candidate boxes.
4. The method according to claim 3, wherein the candidate box evaluation index CL is calculated by the formula:
CL=γ×soreclass+(1-γ)×scorelocation(ii) a Wherein gamma is a hyper-parameter, soreclassScore for classification accuracylocationThe positioning accuracy is obtained;
the non-maximum suppression algorithm adopts a formula with three threshold settings as follows:
5. the method of any of claims 1, 3-4, wherein a predictive IoU network model is pre-provisioned, and wherein the predictive IoU network model is used to obtain positioning accuracy of different candidate boxes.
6. The method of claim 5, wherein the training method of the predictive IoU network model comprises:
generating a set of candidate frames for a third image sample;
obtaining the positioning accuracy of each candidate frame in the candidate frame set;
removing candidate frames with the positioning accuracy smaller than a preset threshold value in the candidate frame set to determine a training set of a third occlusion image sample;
and utilizing the training set to train the prediction IoU network model.
7. A model building apparatus for image object detection, comprising:
an occlusion sample obtaining unit for determining a candidate frame with a maximum positioning accuracy in the first image sample; mapping a sliding window with a preset size to an image sample, and filling an image area where the sliding window is located by using background pixels; detecting the filled image samples by using a detection network model, and taking a sliding window at the position where the candidate frame has the largest detection network loss as an occlusion position of the first image sample to obtain the first occlusion image sample;
the confrontation network training unit is used for training a characteristic shielding confrontation network model by utilizing the first shielding image sample, and the characteristic shielding confrontation network model is used for obtaining a shielding mask of the image sample based on the confrontation network;
and the detection network training unit is used for training a detection network model, the feature diagram of the second image sample for training is added with an occlusion mask by using the trained feature occlusion countermeasure network model, and the detection network model is used for image target detection based on deep learning.
8. A computer-readable storage medium having stored therein instructions that, when run on a terminal device, cause the terminal device to perform the method of modeling image object detection of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811592967.8A CN109784349B (en) | 2018-12-25 | 2018-12-25 | Image target detection model establishing method, device, storage medium and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811592967.8A CN109784349B (en) | 2018-12-25 | 2018-12-25 | Image target detection model establishing method, device, storage medium and program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109784349A CN109784349A (en) | 2019-05-21 |
CN109784349B true CN109784349B (en) | 2021-02-19 |
Family
ID=66498222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811592967.8A Active CN109784349B (en) | 2018-12-25 | 2018-12-25 | Image target detection model establishing method, device, storage medium and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109784349B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175168B (en) * | 2019-05-28 | 2021-06-01 | 山东大学 | Time sequence data filling method and system based on generation of countermeasure network |
CN110163183B (en) * | 2019-05-30 | 2021-07-09 | 北京旷视科技有限公司 | Target detection algorithm evaluation method and device, computer equipment and storage medium |
CN110210482B (en) * | 2019-06-05 | 2022-09-06 | 中国科学技术大学 | Target detection method for improving class imbalance |
CN110728628B (en) * | 2019-08-30 | 2022-06-17 | 南京航空航天大学 | Face de-occlusion method for generating confrontation network based on condition |
CN110728330A (en) | 2019-10-23 | 2020-01-24 | 腾讯科技(深圳)有限公司 | Object identification method, device, equipment and storage medium based on artificial intelligence |
CN111126402B (en) * | 2019-11-04 | 2023-11-03 | 京东科技信息技术有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111046956A (en) * | 2019-12-13 | 2020-04-21 | 苏州科达科技股份有限公司 | Occlusion image detection method and device, electronic equipment and storage medium |
CN113808003B (en) * | 2020-06-17 | 2024-02-09 | 北京达佳互联信息技术有限公司 | Training method of image processing model, image processing method and device |
CN111753783B (en) * | 2020-06-30 | 2024-05-28 | 北京小米松果电子有限公司 | Finger shielding image detection method, device and medium |
CN111709951B (en) * | 2020-08-20 | 2020-11-13 | 成都数之联科技有限公司 | Target detection network training method and system, network, device and medium |
CN112102340B (en) * | 2020-09-25 | 2024-06-11 | Oppo广东移动通信有限公司 | Image processing method, apparatus, electronic device, and computer-readable storage medium |
CN114596263B (en) * | 2022-01-27 | 2024-08-02 | 阿丘机器人科技(苏州)有限公司 | Deep learning mainboard appearance detection method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106886795A (en) * | 2017-02-17 | 2017-06-23 | 北京维弦科技有限责任公司 | Object identification method based on the obvious object in image |
CN108182657A (en) * | 2018-01-26 | 2018-06-19 | 深圳市唯特视科技有限公司 | A kind of face-image conversion method that confrontation network is generated based on cycle |
CN108830827A (en) * | 2017-05-02 | 2018-11-16 | 通用电气公司 | Neural metwork training image generation system |
CN108960011A (en) * | 2017-05-23 | 2018-12-07 | 湖南生物机电职业技术学院 | The citrusfruit image-recognizing method of partial occlusion |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145867A (en) * | 2017-05-09 | 2017-09-08 | 电子科技大学 | Face and face occluder detection method based on multitask deep learning |
CN107316058A (en) * | 2017-06-15 | 2017-11-03 | 国家新闻出版广电总局广播科学研究院 | Improve the method for target detection performance by improving target classification and positional accuracy |
CN107403160A (en) * | 2017-07-28 | 2017-11-28 | 中国地质大学(武汉) | Image detecting method, equipment and its storage device in a kind of intelligent driving scene |
CN108334847B (en) * | 2018-02-06 | 2019-10-22 | 哈尔滨工业大学 | A kind of face identification method based on deep learning under real scene |
CN108573222B (en) * | 2018-03-28 | 2020-07-14 | 中山大学 | Pedestrian image occlusion detection method based on cyclic confrontation generation network |
CN108765452A (en) * | 2018-05-11 | 2018-11-06 | 西安天和防务技术股份有限公司 | A kind of detection of mobile target in complex background and tracking |
CN108961174A (en) * | 2018-05-24 | 2018-12-07 | 北京飞搜科技有限公司 | A kind of image repair method, device and electronic equipment |
-
2018
- 2018-12-25 CN CN201811592967.8A patent/CN109784349B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106886795A (en) * | 2017-02-17 | 2017-06-23 | 北京维弦科技有限责任公司 | Object identification method based on the obvious object in image |
CN108830827A (en) * | 2017-05-02 | 2018-11-16 | 通用电气公司 | Neural metwork training image generation system |
CN108960011A (en) * | 2017-05-23 | 2018-12-07 | 湖南生物机电职业技术学院 | The citrusfruit image-recognizing method of partial occlusion |
CN108182657A (en) * | 2018-01-26 | 2018-06-19 | 深圳市唯特视科技有限公司 | A kind of face-image conversion method that confrontation network is generated based on cycle |
Also Published As
Publication number | Publication date |
---|---|
CN109784349A (en) | 2019-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109784349B (en) | Image target detection model establishing method, device, storage medium and program product | |
CN108846826B (en) | Object detection method, object detection device, image processing apparatus, and storage medium | |
CN107527009B (en) | Remnant detection method based on YOLO target detection | |
CN107424177B (en) | Positioning correction long-range tracking method based on continuous correlation filter | |
CN107424171B (en) | Block-based anti-occlusion target tracking method | |
CN110084836B (en) | Target tracking method based on deep convolution characteristic hierarchical response fusion | |
CN112560876A (en) | Single-stage small sample target detection method for decoupling measurement | |
CN108805016B (en) | Head and shoulder area detection method and device | |
CN110633745A (en) | Image classification training method and device based on artificial intelligence and storage medium | |
CN110210304B (en) | Method and system for target detection and tracking | |
US9911191B2 (en) | State estimation apparatus, state estimation method, and integrated circuit with calculation of likelihood data and estimation of posterior probability distribution data | |
CN114049382B (en) | Target fusion tracking method, system and medium in intelligent network connection environment | |
CN107346538A (en) | Method for tracing object and equipment | |
CN108229675B (en) | Neural network training method, object detection method, device and electronic equipment | |
CN107527355A (en) | Visual tracking method, device based on convolutional neural networks regression model | |
CN112651274B (en) | Road obstacle detection device, road obstacle detection method, and recording medium | |
KR20210027778A (en) | Apparatus and method for analyzing abnormal behavior through object detection and tracking | |
JP6756406B2 (en) | Image processing equipment, image processing method and image processing program | |
CN110009663B (en) | Target tracking method, device, equipment and computer readable storage medium | |
JP7384217B2 (en) | Learning devices, learning methods, and programs | |
JP2016058085A (en) | Method and device for detecting shielding of object | |
CN110866428B (en) | Target tracking method, device, electronic equipment and storage medium | |
CN111354022A (en) | Target tracking method and system based on kernel correlation filtering | |
CN103077534B (en) | Spatiotemporal object moving target detecting method | |
CN113033356A (en) | Scale-adaptive long-term correlation target tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |