CN113780277A

CN113780277A - Training method and device of target detection model, electronic equipment and storage medium

Info

Publication number: CN113780277A
Application number: CN202111048969.2A
Authority: CN
Inventors: 王威
Original assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Current assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-12-10
Anticipated expiration: 2041-09-08
Also published as: CN113780277B

Abstract

The embodiment of the invention discloses a training method and a device of a target detection model, electronic equipment and a storage medium, wherein the method comprises the following steps: classifying and regressing the candidate sample regions in the image through forward propagation of a target detection network to obtain the classification confidence and the regression position of the candidate samples in the image; determining loss weight information adopted by the candidate sample region in a model training process according to the classification confidence coefficient and the regression position of the candidate sample and the position of a reference sample region; wherein the reference sample region comprises a neglecting sample region corresponding to the blurred target in the image and a difficult sample region in the image; and performing back propagation according to the loss weight information adopted by the candidate sample region, and controlling the parameters of the target detection network to adjust in the directions of inhibiting the neglected samples and enhancing the difficultly-instantiated samples to obtain the trained target detection model. The learning of the enhancement model to the difficult samples and the attention of the suppression model to the neglected samples are realized.

Description

Training method and device of target detection model, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a training method and device of a target detection model, electronic equipment and a storage medium.

Background

Target detection is widely applied in many fields as the field of computer vision, and people, objects and other targets in an image can be detected through target detection.

Due to the problems of serious image distortion, extreme angles of targets, excessive target shielding, fuzzy target categories and the like, the type of the targets cannot be determined, certain adverse effects on model learning can be caused, and all features can be deleted by a conventional filling scheme, so that the learning effect of the model is reduced. In addition, the model is difficult to learn and is easy to make mistakes to cause targets of missed detection or false detection, the difficult samples are forcibly added into training by conventional online selection of the difficult samples, the quality of the difficult samples is not high, and the training of the model on the difficult samples cannot be emphasized.

Disclosure of Invention

The embodiment of the invention provides a training method and device of a target detection model, electronic equipment and a storage medium, and aims to enhance the learning of the model on difficult samples and inhibit the attention of the model on neglected samples.

In a first aspect, an embodiment of the present invention provides a method for training a target detection model, including:

classifying and regressing the candidate sample regions in the image through forward propagation of a target detection network to obtain the classification confidence and the regression position of the candidate samples in the image;

determining loss weight information adopted by the candidate sample region in a model training process according to the classification confidence coefficient and the regression position of the candidate sample and the position of a reference sample region; wherein the reference sample region comprises a neglecting sample region corresponding to the blurred target in the image and a difficult sample region in the image;

and performing back propagation according to the loss weight information adopted by the candidate sample region, and controlling the parameters of the target detection network to adjust in the directions of inhibiting the neglected samples and enhancing the difficultly-instantiated samples to obtain the trained target detection model.

In a second aspect, an embodiment of the present invention further provides a training apparatus for a target detection model, including:

the classification confidence coefficient and regression position acquisition module is used for classifying and regressing the candidate sample regions in the image through forward propagation of the target detection network to obtain the classification confidence coefficient and regression position of the candidate samples in the image;

the loss weight information acquisition module is used for determining loss weight information adopted by the candidate sample region in the model training process according to the classification confidence coefficient and the regression position of the candidate sample and the position of the reference sample region; wherein the reference sample region comprises a neglecting sample region corresponding to the blurred target in the image and a difficult sample region in the image;

and the target detection model acquisition module is used for performing back propagation according to the loss weight information adopted by the candidate sample region, controlling the parameters of the target detection network to be adjusted in the direction of inhibiting the neglected samples and enhancing the difficult samples, and obtaining the trained target detection model.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processing devices;

storage means for storing one or more programs;

when executed by the one or more processing devices, cause the one or more processing devices to implement a method of training the object detection model as provided in any embodiment of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processing device, implements the method for training the object detection model as provided in any of the embodiments of the present invention.

The embodiment of the invention provides a training method and a device of a target detection model, electronic equipment and a storage medium, wherein the method comprises the following steps: classifying and regressing the candidate sample regions in the image through forward propagation of a target detection network to obtain the classification confidence and the regression position of the candidate samples in the image; determining loss weight information adopted by the candidate sample region in a model training process according to the classification confidence coefficient and the regression position of the candidate sample and the position of a reference sample region; wherein the reference sample region comprises a neglecting sample region corresponding to the blurred target in the image and a difficult sample region in the image; and performing back propagation according to the loss weight information adopted by the candidate sample region, and controlling the parameters of the target detection network to adjust in the directions of inhibiting the neglected samples and enhancing the difficultly-instantiated samples to obtain the trained target detection model. The learning of the enhancement model to the difficult samples and the attention of the suppression model to the neglected samples are realized.

The above summary of the present invention is merely an overview of the technical solutions of the present invention, and the present invention can be implemented in accordance with the content of the description in order to make the technical means of the present invention more clearly understood, and the above and other objects, features, and advantages of the present invention will be more clearly understood.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart of a training method of a target detection model according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for training a target detection model according to a second embodiment of the present invention;

FIG. 2A is an exemplary illustration of an intersection region of a candidate sample region and a reference sample region as provided in an embodiment of the present invention;

FIG. 2B is an exemplary union region illustration of a candidate sample region and a reference sample region provided in an embodiment of the present invention;

fig. 3 is a block diagram of a training apparatus for a target detection model according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations (or steps) can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The following embodiments and alternatives thereof are provided to describe the training method, apparatus, electronic device and storage medium of the object detection model provided in the present application in detail.

Example one

Fig. 1 is a flowchart of a training method of a target detection model according to an embodiment of the present invention. The embodiment of the invention can be suitable for the condition of training the target detection model. The method can be executed by a training device of the target detection model, which can be implemented in a software and/or hardware manner and integrated on any electronic equipment with network communication function. As shown in fig. 1, the training method of the target detection model provided in the embodiment of the present application may include the following steps:

s110, classifying and regressing the candidate sample regions in the image through forward propagation of the target detection network to obtain the classification confidence and the regression position of the candidate samples in the image.

The target detection may be to find out all interested objects in the image, perform feature extraction on the objects, and classify and locate the objects at the same time. For example, finding out all neglected samples and difficult samples in the image, and classifying, identifying and positioning the neglected samples and the difficult samples according to the characteristics of the neglected samples and the difficult samples.

Forward propagation may refer to a process of forward-backward propagation in a neural network from an input layer, layer-by-layer forward propagation, through a hidden layer to an output layer. For example, the model may be propagated from the input layer to the output layer in the target detection network, and then the hidden layer is passed to the output layer, so as to obtain the candidate frames.

The candidate sample region may refer to a region in the image where the candidate sample is located, such as, but not limited to, an anchor frame region in the image and a sample region output via forward propagation.

The classification confidence may refer to the probability that a certain sample appears in the sample region, for example, the probability that a candidate sample appears in the image, and usually, the set value of the confidence is 95%. And (4) directly classifying and regressing the anchor frame by forward propagation of the target detection network so as to obtain the classification confidence of each sample.

Optionally, in the single-stage target detection, the target detection network includes a single-stage convolutional neural network, and the candidate sample region includes an anchor frame region in the image; and, at the time of two-stage target detection, the target detection network comprises a convolutional neural network of a second stage of the two-stage target detection, and the candidate sample regions comprise sample regions that are forward-propagated output by a candidate region network of a first stage of the two-stage target detection.

S120, determining loss weight information adopted by the candidate sample region in the model training process according to the classification confidence degree and the regression position of the candidate sample and the position of the reference sample region.

The weight may be determined according to a difference between the candidate frame and the real labeling frame, for example, a category difference between the candidate frame and the real labeling frame, or an offset of a position of the candidate frame with respect to the real labeling frame.

Wherein the reference sample region includes, but is not limited to, a disregard sample region corresponding to a blurred target in the image and a difficult sample region in the image.

S130, performing back propagation according to the loss weight information adopted by the candidate sample region, and controlling the parameters of the target detection network to adjust in the direction of inhibiting the neglected samples and enhancing the difficult samples to obtain the trained target detection model.

The back propagation may refer to a process of continuously adjusting the gradient of the loss function with respect to each parameter, for example, the method may be to adjust each parameter by using a gradient descent method, and update the parameters of the model by using the loss weight, so as to reduce the error caused by the weight.

Ignoring a sample may mean that it is impossible to determine what kind of target is due to problems of severe image distortion, extreme angles of the target, excessive shielding of the target, fuzzy target category, and the like; in some special image fields including but not limited to x-ray security inspection machine images, medical images, underwater images, etc., due to the particularity of the acquisition technology and scenes, the frequency of the occurrence of ambiguous objects in the images is higher, and the annotators in the scenes cannot know the real category of the object in advance.

The difficult sample can be the target of missed detection or false detection caused by insufficient data volume of the articles or difficult characteristics of the articles, insufficient learning of the articles by the model and easy mistake making. For example, when detecting an object, the background where the object is located is often detected as the object, thereby causing false detection.

The embodiment of the invention provides a training method of a target detection model, which comprises the steps of classifying and regressing a candidate sample region in an image through forward propagation of a target detection network to obtain a classification confidence coefficient and a regression position of a candidate sample in the image; determining loss weight information adopted by the candidate sample region in a model training process according to the classification confidence coefficient and the regression position of the candidate sample and the position of a reference sample region; wherein the reference sample region comprises a neglecting sample region corresponding to the blurred target in the image and a difficult sample region in the image; and performing back propagation according to the loss weight information adopted by the candidate sample region, and controlling the parameters of the target detection network to adjust in the directions of inhibiting the neglected samples and enhancing the difficultly-instantiated samples to obtain the trained target detection model. Discarding partial features by softening for neglected samples by dynamically adjusting sample weights; for the samples which are difficult to sample, the learning of the model to the difficult samples is improved more gently, the model training is more stable, and the learning of the model to the samples which are difficult to sample is enhanced and the attention of the model to the neglected samples is restrained.

Example two

Fig. 2 is a flowchart of a training method for a target detection model according to a second embodiment of the present invention, where the second embodiment of the present invention further optimizes the first embodiment based on the above-mentioned embodiments, and the second embodiment of the present invention may be combined with various alternatives in one or more of the above-mentioned embodiments. As shown in fig. 2, the training method of the target detection model provided in the embodiment of the present application may include the following steps:

s210, classifying and regressing the candidate sample regions in the image through forward propagation of the target detection network to obtain the classification confidence and the regression position of the candidate samples in the image.

The candidate samples include, but are not limited to, an ignore sample and a difficult sample, and the ignore sample and the difficult sample labeling information need to be acquired before training. Manually labeling ignored samples into a certain class or not labeling is not beneficial to the learning of the model. Learning of the model positive samples, if labeled as a target, may introduce outlier noise; if not, the model is treated as a negative sample during learning, but the region itself contains features of a portion of the target, which results in an increase in false negatives. For difficult samples, the model often fails to detect or misdetects some similar articles in actual detection, and the learning of the model for the articles is insufficient due to insufficient data quantity of the articles or difficult characteristics of the articles.

Optionally, ignoring the sample comprises a ambiguous target that cannot be judged manually because the original characteristics are lost due to problems of angle, distortion, shielding and/or imaging noise, which are marked in the image in advance before training; the method comprises the steps that the difficult sample comprises a target which is labeled in an image in advance before training, when the model is actually used, the model missing detection probability or the false detection probability is larger than the preset probability value, wherein the target with the model missing detection probability larger than the preset missing detection probability value is labeled as the difficult positive sample, and the target with the model false detection probability larger than the preset false detection probability value is labeled as the difficult negative sample.

Optionally, the neglected samples are labeled manually, the original feature detection target is lost due to the problems of angle, distortion, shielding and/or imaging noise and the like in the selected images before model training, or the position coordinates of the target are labeled due to the fact that the ambiguity can cause the difficulty of manual judgment to be increased, and classification is not distinguished. And the phenomenon that the original distribution of the image features is influenced by deleting some useful features in the region by violent filling is avoided.

Optionally, the difficult sample is labeled manually, a target frequently causing model missing detection or false detection in the image is selected before the model is trained, the position coordinates of the target are labeled, the target frequently missing detection or false detection among categories is labeled as a positive difficult case, and the background target frequently false detection is labeled as a negative difficult case. Meanwhile, a mode of model inspection can be adopted to replace manual labeling, namely, a trained model is used for reasoning on training data, and false inspection and missed inspection targets are obtained through calculation according to a reasoning result and real labeling information. The quality of selecting the samples with difficult cases is improved, and training of the model to the difficult cases is improved in an emphasized manner.

Illustratively, the difficult sample comprises a target which is labeled in an image in advance before training, and when the model is actually used, the model is missed or the false detection probability is greater than the preset probability value, wherein the target with the model missed detection probability greater than the preset missed detection probability value is labeled as a difficult positive sample, and the target with the model false detection probability greater than the preset false detection probability value is labeled as a difficult negative sample. Screening out targets which are easy to cause omission detection by presetting omission detection probability values to serve as targets easy to omit detection; targets which are easy to cause false detection can be screened out as targets which are easy to cause false detection through the preset false detection probability value.

Optionally, the model is subjected to forward propagation calculation to obtain a candidate frame output by the area generation network, the candidate features in the candidate frame are fixed to the same scale, and finally, classification and regression are performed on the samples at the second stage to obtain the classification confidence coefficient and the coordinates after regression of each sample. In addition, if the network is a one-stage network such as SSD, YOLO, etc., the network is propagated forward to directly classify and regress the anchor frame, thereby obtaining the classification confidence and the coordinates after regression for each sample.

The ROI Pooling can be selected by fixing the candidate features in the candidate frame to the same scale, can be used for target detection, allows feature maps in a neural network to be repeatedly utilized, remarkably increases the speed of training and detection, and allows an end-to-end form to train a target detection system.

S220, calculating the area ratio of the intersection area of the candidate sample area and the reference sample area in the image relative to the candidate sample area according to the regression position of the candidate sample and the position of the reference sample area.

Wherein, the intersection region of the candidate sample region and the reference sample region in the image is as shown in fig. 2A, and may refer to a portion where the candidate sample region and the reference sample region coincide; the union region of the candidate sample region and the reference sample region in the image may refer to the entire region obtained by adding the candidate sample region and the reference sample region, as shown in fig. 2B.

Alternatively, for the definition of positive and negative samples in the candidate box, an intersection-to-union ratio (iou) is generally used, which may refer to a ratio of an intersection of the candidate sample region and the reference sample region and a union of the candidate sample region and the reference sample region. When the iou of the candidate sample region and the reference sample region is more than or equal to 0.5, the sample is regarded as a positive sample; a negative sample is considered when iou <0.5 for the candidate sample region and the reference sample region.

And S230, determining the influence degree value of the reference sample region under the candidate sample region according to the region proportion of the relative candidate sample region and the classification confidence of the candidate sample.

The influence degree value comprises a neglect degree score of the neglect sample in the image and a difficulty level score of the difficulty sample in the image.

Optionally, the set of the ignore mark boxes is defined as I, and the ignore mark boxes do not distinguish between positive and negative. If the candidate sample region is determined to belong to the positive sample region, adopting a formula

Calculating to obtain a neglected degree score of the neglected samples in the image under the candidate sample region;

if the candidate sample region is determined to belong to the negative sample region, adopting a formula

wherein, C_iRepresenting the classification confidence of the ith candidate sample in the image, B_iRepresenting the position coordinates of the ith candidate sample in the image, n being the total number of ignored frames, I_jFor the position coordinate of the jth disregard sample in the image, iof (B, I) represents the area ratio of the disregard sample region in the candidate sample region.

Optionally, the set of the difficult example labeling boxes is H, the difficult example distinguishes between a positive difficult example labeling box and a negative difficult example labeling box, the positive difficult example labeling box set is hp (hard positive), and the negative difficult example labeling box set is hn (hard negative). If the candidate sample region is determined to belong to the positive sample region, adopting a formula

Calculating to obtain a difficulty routine degree score of a difficulty routine sample in the image under the candidate sample region;

wherein, C_iRepresenting the classification confidence of the ith candidate sample in the image, B_iThe position coordinates of the ith candidate sample in the image are shown, I is the total number of samples difficult to be instantiated, HP_kIs the position coordinate of the kth refractory sample, HN_kIof (B, HN), which is the position coordinate of the kth difficult-to-exemplify negative sample, represents the area proportion of the difficult-to-exemplify sample region in the candidate sample region.

S240, determining loss weight information of the candidate sample region according to the influence degree value of the reference sample region under the candidate sample region.

Optionally, according to the neglect degree score of the neglected samples in the image under the candidate sample region and the difficulty degree score of the difficult samples in the image, the formula LW is 1-S_ignore+S_hardCalculating loss weight information of the candidate sample region;

wherein LW represents loss weight information of the candidate sample region, S_ignoreA score representing the degree of omission of the sample in the image under the candidate sample region, S_hardAnd representing the difficulty routine degree score of the difficulty routine sample in the image under the candidate sample region.

And S250, performing back propagation according to the loss weight information adopted by the candidate sample region, and controlling the parameters of the target detection network to adjust in the direction of inhibiting the neglected samples and enhancing the difficult samples to obtain the trained target detection model.

When the dynamic loss weight of each sample in the set L is calculated, the original loss weight is updated by using the weight, the default of the original loss weight is 1, the model parameters are updated by using the loss weight when the model is subjected to back propagation calculation, and the method is used for training or fine-tuning the model from the beginning, so that the neglected samples are quantitatively restrained, and the difficultly-sampling samples are quantitatively enhanced. The influence of neglected samples on the model is restrained, the learning of the model on difficult samples is enhanced, two types of problems are well processed through a universal paradigm, and the time consumption of the model is not influenced.

The embodiment of the invention provides a training method of a target detection model, which is characterized in that position coordinates of neglected samples are marked manually, so that possible useful features in a region are prevented from being deleted during direct violent filling, and original distribution of image features is prevented from being influenced by abnormal edges after filling; the position coordinates of the difficult sample are manually marked, the frequently missed target or the frequently mistakenly detected target among the categories is marked as a positive difficult sample, the frequently mistakenly detected background target is marked as a negative difficult sample, and the problem that the quality of the difficult sample selected on line by the model is not high is avoided; meanwhile, a mode of model inspection can be adopted to replace manual labeling, namely, a trained model is used for reasoning on training data, and false inspection and missed inspection targets are obtained through calculation according to a reasoning result and real labeling information. Influence of the samples on the model is neglected through quantitative suppression, learning of the model on the samples difficult to learn is enhanced quantitatively, two types of problems are well handled through a general paradigm, and time consumption of the model is not influenced. The method has the advantages that the method plays a role in softening influence by dynamically adjusting the weight of the sample, and partial characteristics are discarded by softening the sample; for the samples with difficult cases, the learning of the model to the difficult cases is improved more gently by the method, so that the model training is more stable, and the training of the model to the difficult cases is improved emphatically; the learning of the enhancement model to the difficult samples and the attention of the suppression model to the neglected samples are realized.

EXAMPLE III

Fig. 3 is a block diagram of a structure of a training apparatus for a target detection model according to a third embodiment of the present invention. The embodiment of the invention can be suitable for the training situation of the target detection model. The device can be implemented in software and/or hardware and integrated on any electronic equipment with network communication function. As shown in fig. 3, the training apparatus for the target detection model provided in the embodiment of the present application may include the following: a classification confidence and regression location acquisition module 310, a loss weight information acquisition module 320, and a target detection model acquisition module 330. Wherein:

a classification confidence and regression position obtaining module 310, configured to classify and regress the candidate sample regions in the image through forward propagation of the target detection network, so as to obtain a classification confidence and a regression position of the candidate samples in the image;

a loss weight information obtaining module 320, configured to determine, according to the classification confidence and the regression position of the candidate sample and the position of the reference sample region, loss weight information adopted by the candidate sample region in a model training process; wherein the reference sample region comprises a neglecting sample region corresponding to the blurred target in the image and a difficult sample region in the image;

and the target detection model acquisition module 330 is configured to perform back propagation according to the loss weight information adopted by the candidate sample region, and control parameters of the target detection network to adjust in directions of suppressing neglected samples and enhancing hard samples, so as to obtain a trained target detection model.

Based on the above embodiment, optionally, the classification confidence and regression position obtaining module 310 includes:

upon single stage target detection, the target detection network comprises a single stage convolutional neural network, the candidate sample region comprises an anchor frame region in the image; and, at the time of two-stage target detection, the target detection network comprises a convolutional neural network of a second stage of the two-stage target detection, and the candidate sample regions comprise sample regions that are forward-propagated output by a candidate region network of a first stage of the two-stage target detection.

On the basis of the above embodiment, optionally, the neglected sample includes a ambiguous target that is marked in advance in the image before training and loses original features due to problems of angles, distortion, occlusion and/or imaging noise, so that a human cannot make a judgment; the difficult sample comprises a target which is labeled in an image in advance before training and easily causes model missing detection or false detection when the model is actually used, wherein the target which is easily missed is labeled as a difficult positive sample, and the target which is easily false detected is labeled as a difficult negative sample. For example, the difficult sample comprises a target which is labeled in an image in advance before training, and when the model is actually used, the model is missed or the false detection probability is greater than the preset probability value, wherein the target with the model missed detection probability greater than the preset missed detection probability value is labeled as a difficult positive sample, and the target with the model false detection probability greater than the preset false detection probability value is labeled as a difficult negative sample. Screening out targets which are easy to cause omission detection by presetting omission detection probability values to serve as targets easy to omit detection; targets which are easy to cause false detection can be screened out as targets which are easy to cause false detection through the preset false detection probability value.

On the basis of the foregoing embodiment, optionally, the loss weight information obtaining module 320 includes:

calculating the area ratio of the intersection region of the candidate sample region and the reference sample region relative to the candidate sample region in the image according to the regression position of the candidate sample and the position of the reference sample region; determining the influence degree value of the reference sample region under the candidate sample region according to the region proportion of the relative candidate sample region and the classification confidence coefficient of the candidate sample; the influence degree value comprises an ignorance degree score of an ignore sample in the image and a difficult routine degree score of a difficult sample in the image;

and determining loss weight information of the candidate sample region according to the influence degree value of the reference sample region under the candidate sample region.

On the basis of the foregoing embodiment, optionally, when a reference sample region is a neglected sample region, determining a value of a degree of influence of the reference sample region under the candidate sample region according to a region proportion of the relative candidate sample region and a classification confidence of the candidate sample, includes:

if the candidate sample region is determined to belong to the positive sample region, adopting a formula

On the basis of the foregoing embodiment, optionally, when a reference sample region is a difficult-to-sample region, determining a value of an influence degree of the reference sample region in the candidate sample region according to a region proportion of the relative candidate sample region and a classification confidence of the candidate sample, includes:

On the basis of the foregoing embodiment, optionally, determining the loss weight information of the candidate sample region according to the influence degree value of the reference sample region under the candidate sample region includes:

according to the neglect degree score of the neglected samples in the image under the candidate sample region and the difficulty degree score of the difficult samples in the image, adopting the formula LW (1-S)_ignore+S_hardCalculating loss weight information of the candidate sample region;

wherein S is_ignoreA score representing the degree of omission of the sample in the image under the candidate sample region, S_hardAnd representing the difficulty routine degree score of the difficulty routine sample in the image under the candidate sample region.

The training device for the target detection model provided in the embodiment of the present invention may execute the training method for the target detection model provided in any embodiment of the present invention, and has the corresponding functions and beneficial effects of the training method for executing the target detection model, and the detailed process refers to the related operations of the training method for the target detection model in the foregoing embodiment.

Example four

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. As shown in fig. 4, the electronic device provided in the embodiment of the present invention includes: one or more processors 410 and storage 420; the processor 410 in the electronic device may be one or more, and one processor 410 is taken as an example in fig. 4; storage 420 is used to store one or more programs; the one or more programs are executed by the one or more processors 410, such that the one or more processors 410 implement a method of training an object detection model as described in any of the embodiments of the invention.

The electronic device may further include: an input device 430 and an output device 440.

The processor 410, the storage device 420, the input device 430, and the output device 440 in the electronic apparatus may be connected by a bus or other means, and fig. 4 illustrates an example of connection by a bus 450.

The storage device 420 in the electronic device is used as a computer-readable storage medium for storing one or more programs, which may be software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the training method of the object detection model provided in the embodiment of the present invention. The processor 410 executes various functional applications and data processing of the electronic device by executing software programs, instructions and modules stored in the storage device 420, namely, implements the training method of the target detection model in the above method embodiments.

The storage device 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the storage 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage 420 may further include memory located remotely from the processor 410, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus. The output device 440 may include a display device such as a display screen.

And, when one or more programs included in the above-mentioned electronic device are executed by the one or more processors 410, the programs perform the following operations:

Of course, it will be understood by those skilled in the art that when one or more programs included in the electronic device are executed by the one or more processors 410, the programs may also perform related operations in the training method of the object detection model provided in any embodiment of the present invention.

EXAMPLE five

An embodiment of the present invention provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, is configured to perform a method for training an object detection model, the method including:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take a variety of forms, including, but not limited to: an electromagnetic signal, an optical signal, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for training an object detection model, the method comprising:

2. The method of claim 1, wherein, at single stage target detection, the target detection network comprises a single stage convolutional neural network, the candidate sample region comprises an anchor frame region in an image; and, at the time of two-stage target detection, the target detection network comprises a convolutional neural network of a second stage of the two-stage target detection, and the candidate sample regions comprise sample regions that are forward-propagated output by a candidate region network of a first stage of the two-stage target detection.

3. The method according to claim 1, wherein the ignored samples include ambiguous objects that are marked in advance in the image before training and that have lost original features due to angle, distortion, occlusion and/or imaging noise problems, resulting in artificial failure to make a judgment; the method comprises the steps that the difficult sample comprises a target which is labeled in an image in advance before training, when the model is actually used, the model missing detection probability or the false detection probability is larger than the preset probability value, wherein the target with the model missing detection probability larger than the preset missing detection probability value is labeled as the difficult positive sample, and the target with the model false detection probability larger than the preset false detection probability value is labeled as the difficult negative sample.

4. The method of claim 1, wherein determining loss weight information for the candidate sample regions used in model training based on the classification confidence and regression locations of the candidate samples and the locations of the reference sample regions comprises: calculating the area ratio of the intersection region of the candidate sample region and the reference sample region relative to the candidate sample region in the image according to the regression position of the candidate sample and the position of the reference sample region; determining the influence degree value of the reference sample region under the candidate sample region according to the region proportion of the relative candidate sample region and the classification confidence coefficient of the candidate sample; the influence degree value comprises an ignorance degree score of an ignore sample in the image and a difficult routine degree score of a difficult sample in the image;

5. The method of claim 4, wherein determining a value of the degree of influence of a reference sample region under a candidate sample region based on the region occupancy of the relative candidate sample region and the classification confidence of the candidate sample when the reference sample region is a ignore sample region comprises:

6. The method of claim 4, wherein determining a value of the degree of influence of a reference sample region under a candidate sample region based on the region occupancy of the relative candidate sample region and the classification confidence of the candidate sample when the reference sample region is a difficult sample region comprises:

wherein, C_iRepresenting the classification confidence of the ith candidate sample in the image, B_iThe position coordinates of the ith candidate sample in the image are shown, I is the total number of samples difficult to be instantiated, HP_kIof (B, HP) represents the area ratio of the hard case positive sample region in the candidate sample region, HN_kIof (B, HN), which is the position coordinate of the kth difficult-to-exemplify negative sample, represents the area proportion of the difficult-to-exemplify negative sample region in the candidate sample region.

7. The method of claim 4, wherein determining loss weight information for the candidate sample region based on a measure of the extent of influence of the reference sample region under the candidate sample region comprises:

according to the candidate sample regionThe neglect degree score of the neglect sample in the lower image and the difficulty degree score of the difficulty sample in the image adopt the formula LW to be 1-S_ignore+S_hardCalculating loss weight information of the candidate sample region;

8. An apparatus for training an object detection model, the apparatus comprising:

9. An electronic device, comprising:

one or more processing devices;

storage means for storing one or more programs;

when executed by the one or more processing devices, cause the one or more processing devices to implement the method of training an object detection model of any of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processing means, is adapted to carry out a method of training an object detection model according to any one of claims 1 to 7.