CN113780277B

CN113780277B - Training method and device of target detection model, electronic equipment and storage medium

Info

Publication number: CN113780277B
Application number: CN202111048969.2A
Authority: CN
Inventors: 王威
Original assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Current assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2023-06-30
Anticipated expiration: 2041-09-08
Also published as: CN113780277A

Abstract

The embodiment of the invention discloses a training method and device of a target detection model, electronic equipment and a storage medium, wherein the method comprises the following steps: classifying and regressing candidate sample areas in the image through forward propagation of a target detection network to obtain classification confidence and regression positions of candidate samples in the image; determining loss weight information adopted by the candidate sample region in the model training process according to the classification confidence coefficient and regression position of the candidate sample and the position of the reference sample region; the reference sample area comprises an neglected sample area corresponding to the fuzzy target in the image and a difficult sample area in the image; and (3) carrying out back propagation according to the loss weight information adopted by the candidate sample area, and controlling the parameters of the target detection network to adjust towards the directions of suppressing the neglected sample and enhancing the difficult sample to obtain a trained target detection model. The learning of the enhancement model on the difficult sample and the attention of the suppression model on the neglected sample are realized.

Description

Training method and device of target detection model, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a training method and device of a target detection model, electronic equipment and a storage medium.

Background

Object detection is widely used in many fields as a field of computer vision, and objects such as people and objects in images can be detected by object detection.

Due to the problems of serious image distortion, extreme angles of the target, excessive target shielding, fuzzy target types and the like, the type of the target cannot be determined, and the model learning can be influenced negatively, and all the characteristics can be deleted by a conventional filling scheme, so that the learning effect of the model is reduced. In addition, the model is difficult to learn and is easy to make mistakes to cause the target of missed detection or false detection, and the conventional online selection of the difficult sample forces the difficult sample to be added into training, so that the difficult sample is low in quality, and the training of the model on the difficult sample cannot be emphasized.

Disclosure of Invention

The embodiment of the invention provides a training method, a training device, electronic equipment and a storage medium of a target detection model, which are used for realizing the study of a difficult sample by an enhanced model and the attention of a neglected sample by a suppression model.

In a first aspect, an embodiment of the present invention provides a training method for a target detection model, including:

classifying and regressing candidate sample areas in the image through forward propagation of a target detection network to obtain classification confidence and regression positions of candidate samples in the image;

Determining loss weight information adopted by the candidate sample region in the model training process according to the classification confidence coefficient and regression position of the candidate sample and the position of the reference sample region; the reference sample area comprises an neglected sample area corresponding to the fuzzy target in the image and a difficult sample area in the image;

and (3) carrying out back propagation according to the loss weight information adopted by the candidate sample area, and controlling the parameters of the target detection network to adjust towards the directions of suppressing the neglected sample and enhancing the difficult sample to obtain a trained target detection model.

In a second aspect, an embodiment of the present invention further provides a training device for a target detection model, including:

the classification confidence coefficient and regression position acquisition module is used for classifying and regressing candidate sample areas in the image through forward propagation of the target detection network to obtain classification confidence coefficient and regression positions of candidate samples in the image;

the loss weight information acquisition module is used for determining loss weight information adopted by the candidate sample area in the model training process according to the classification confidence degree and regression position of the candidate sample and the position of the reference sample area; the reference sample area comprises an neglected sample area corresponding to the fuzzy target in the image and a difficult sample area in the image;

The target detection model acquisition module is used for carrying out back propagation according to the loss weight information adopted by the candidate sample area, controlling the parameters of the target detection network to adjust towards the directions of suppressing neglected samples and enhancing difficult samples, and obtaining the trained target detection model.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processing devices;

a storage means for storing one or more programs;

when the one or more programs are executed by the one or more processing devices, the one or more processing devices are caused to implement a training method for the object detection model as provided in any embodiment of the present invention.

In a fourth aspect, there is also provided in an embodiment of the present invention a computer readable storage medium having stored thereon a computer program which, when executed by a processing device, implements a training method for the object detection model as provided in any embodiment of the present invention.

The embodiment of the invention provides a training method and device of a target detection model, electronic equipment and a storage medium, wherein the method comprises the following steps: classifying and regressing candidate sample areas in the image through forward propagation of a target detection network to obtain classification confidence and regression positions of candidate samples in the image; determining loss weight information adopted by the candidate sample region in the model training process according to the classification confidence coefficient and regression position of the candidate sample and the position of the reference sample region; the reference sample area comprises an neglected sample area corresponding to the fuzzy target in the image and a difficult sample area in the image; and (3) carrying out back propagation according to the loss weight information adopted by the candidate sample area, and controlling the parameters of the target detection network to adjust towards the directions of suppressing the neglected sample and enhancing the difficult sample to obtain a trained target detection model. The learning of the enhancement model on the difficult sample and the attention of the suppression model on the neglected sample are realized.

The foregoing summary is merely an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more fully understood, and in order that the same or additional objects, features and advantages of the present invention may be more fully understood.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a flowchart of a training method of a target detection model according to an embodiment of the present invention;

FIG. 2 is a flowchart of a training method of a target detection model according to a second embodiment of the present invention;

FIG. 2A is an intersection region legend of candidate sample regions and reference sample regions provided in an embodiment of the present invention;

FIG. 2B is a union region illustration of candidate sample regions and reference sample regions provided in an embodiment of the present invention;

FIG. 3 is a block diagram of a training device for a target detection model according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Before discussing the exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts operations (or steps) as a sequential process, many of the operations (or steps) can be performed in parallel, concurrently, or at the same time. Furthermore, the order of the operations may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The following detailed description is made on the training method, device, electronic apparatus and storage medium of the object detection model provided in the present application through the following embodiments and optional solutions thereof.

Example 1

Fig. 1 is a flowchart of a training method of a target detection model according to an embodiment of the present invention. The embodiment of the invention can be suitable for training the target detection model. The method can be performed by a training device of the object detection model, which can be implemented in software and/or hardware and integrated on any electronic device with network communication functions. As shown in fig. 1, the training method of the target detection model provided in the embodiment of the present application may include the following steps:

s110, classifying and regressing candidate sample areas in the image through forward propagation of the target detection network, and obtaining the classification confidence and regression position of the candidate samples in the image.

The target detection may be to find all interested objects in the image, extract features of the objects, and classify and locate the objects at the same time. For example, finding out all the neglected samples and the difficult samples in the image, and classifying, identifying and positioning the two samples according to the characteristics of the neglected samples and the difficult samples.

Forward propagation may refer to a process of propagating forward layer by layer in a neural network, starting from the input layer, through the hidden layer to the output layer. For example, the model may be propagated forward layer by layer from the input layer to the output layer through the hidden layer in the object detection network, so as to obtain the candidate box.

The candidate sample region may refer to a region of the image in which the candidate sample is located, including, but not limited to, an anchor box region in the image and a sample region output via forward propagation.

Classification confidence may refer to the probability that a sample is present in a sample region, for example, the probability that a candidate sample is present in an image, typically with a confidence setting of 95%. And directly classifying and regressing the anchor frame by forward propagation of the target detection network, thereby obtaining the classification confidence coefficient of each sample.

Optionally, in single-stage target detection, the target detection network includes a single-stage convolutional neural network, and the candidate sample region includes an anchor frame region in the image; and, upon two-stage target detection, the target detection network comprises a convolutional neural network of a second stage of the two-stage target detection, and the candidate sample region comprises a sample region that is forward propagated output through the candidate region network of the first stage of the two-stage target detection.

And S120, determining loss weight information adopted by the candidate sample region in the model training process according to the classification confidence coefficient and regression position of the candidate sample and the position of the reference sample region.

The weight may be determined according to a difference between the candidate frame and the real labeling frame, for example, may be a category difference between the candidate frame and the real labeling frame, and may be an offset of a position of the candidate frame relative to the real labeling frame.

Wherein the reference sample area includes, but is not limited to, a ignored sample area corresponding to a blurred object in the image and a difficult sample area in the image.

And S130, back propagation is carried out according to the loss weight information adopted by the candidate sample area, and parameters of the target detection network are controlled to be adjusted towards the directions of suppressing neglected samples and enhancing difficult samples, so that a trained target detection model is obtained.

The back propagation may be a process of continuously adjusting a gradient of the loss function with respect to each parameter, for example, the gradient descent method may be used to adjust each parameter, and the loss weight is used to update the parameters of the model, so as to reduce errors caused by the weight.

Ignoring the sample may mean that what kind of target cannot be determined due to problems such as serious image distortion, extreme angles of the target, excessive target shielding, and fuzzy target class; the method can also be applied to the field of special images, including but not limited to x-ray security inspection machine images, medical images, underwater images and the like, and the frequency of the occurrence of ambiguous targets in the images is higher due to the specificity of the acquisition technical means and scenes, and the labeling personnel in the scenes cannot know the real types of the targets in advance.

The difficult sample can be the object of missed detection or false detection caused by insufficient data quantity of the articles or difficult characteristic of the articles. For example, in the detection of an object, the background where the object is located is often used as the object to detect, thereby causing false detection.

The embodiment of the invention provides a training method of a target detection model, which classifies and regresses candidate sample areas in an image through forward propagation of a target detection network to obtain classification confidence and regression positions of candidate samples in the image; determining loss weight information adopted by the candidate sample region in the model training process according to the classification confidence coefficient and regression position of the candidate sample and the position of the reference sample region; the reference sample area comprises an neglected sample area corresponding to the fuzzy target in the image and a difficult sample area in the image; and (3) carrying out back propagation according to the loss weight information adopted by the candidate sample area, and controlling the parameters of the target detection network to adjust towards the directions of suppressing the neglected sample and enhancing the difficult sample to obtain a trained target detection model. Discarding part of the characteristics of the neglected sample in a softening way by dynamically adjusting the weight of the sample; for the difficult sample, the model is more stable to learn the difficult sample by the more gentle improvement model, so that the learning of the difficult sample by the enhancement model and the attention of the neglected sample by the inhibition model are realized.

Example two

Fig. 2 is a flowchart of a training method of a target detection model according to a second embodiment of the present invention, where the foregoing embodiment is further optimized based on the foregoing embodiment, and the embodiment of the present invention may be combined with each of the alternatives in one or more embodiments. As shown in fig. 2, the training method of the target detection model provided in the embodiment of the present application may include the following steps:

s210, classifying and regressing candidate sample areas in the image through forward propagation of the target detection network, and obtaining the classification confidence and regression position of the candidate samples in the image.

The candidate samples include, but are not limited to, neglected samples and difficult samples, and the neglected samples and the difficult samples need to be obtained before training. Manually labeling neglected samples into a certain class or not labeling is unfavorable for learning the model. If the target is marked, outlier noise may be brought to the learning of the positive sample of the model; if not targeted, the model is treated as a negative sample during learning, but the region itself contains some of the targeted features, which can lead to a rise in the false negative. For difficult samples, the model often fails to detect or misdetects some similar articles in actual detection, and the model learns the articles insufficiently due to insufficient data quantity of the articles or difficult characteristics of the articles.

Optionally, ignoring samples includes ambiguous targets that were previously noted in the image prior to training due to loss of original characteristics due to angle, distortion, occlusion, and/or imaging noise issues, resulting in an inability to make a judgment by human personnel; the difficult sample comprises targets which are marked in the image in advance before training, and the model miss detection probability or false detection probability is larger than a preset probability value when the model is actually used, wherein the targets with the model miss detection probability larger than the preset miss detection probability value are marked as difficult positive samples, and the targets with the model false detection probability larger than the preset false detection probability value are marked as difficult negative samples.

Optionally, the neglected sample is marked manually, and the original characteristic detection target is lost due to the problems of angles, distortion, shielding, imaging noise and the like in the selected image before model training, or the target with increased difficulty in manual judgment can be caused by the ambiguity, so that the position coordinates of the target are marked without distinguishing the categories. And the violent filling is avoided, part of useful features in the region are deleted, and the original distribution of the image features is influenced.

Optionally, the difficult sample is marked manually, the target frequently causing model missing or false detection in the image is selected before model training, the position coordinates of the target are marked, the target frequently missing or false detection among categories is marked as a positive difficult case, and the background target frequently false detection is marked as a negative difficult case. Meanwhile, manual marking can be replaced by a model inspection mode, namely, a trained model is used for reasoning on training data, and false detection and omission detection targets are obtained through calculation according to a reasoning result and real marking information. The method is favorable for improving the quality of difficult sample selection, and the training of the model on difficult cases is emphasized.

The difficult sample comprises a target which is marked in the image in advance before training and has a model missing detection probability or false detection probability larger than a preset probability value when the model is actually used, wherein the target with the model missing detection probability larger than the preset missing detection probability value is marked as the difficult positive sample, and the target with the model false detection probability larger than the preset false detection probability value is marked as the difficult negative sample. The target which is easy to cause missed detection can be screened out through presetting the missed detection probability value to be used as the target which is easy to miss detection; the target which is easy to cause false detection can be screened out through presetting false detection probability values and used as the target which is easy to cause false detection.

Optionally, performing forward propagation calculation on the model to obtain a candidate frame output by the regional generation network, fixing candidate features in the candidate frame to the same scale, and finally performing second-stage classification and regression on the samples to obtain the classification confidence coefficient and the coordinates after regression of each sample. In addition, if the network is a one-stage network such as SSD, YOLO, etc., the forward propagation of the network directly classifies and regresses the anchor frame, so that the classification confidence coefficient and the regressed coordinates of each sample are obtained.

The candidate features in the candidate frames are fixed to the same scale, and the ROI Pooling can be selected and used for target detection, so that the feature images in the neural network can be recycled, the training and detection speed is remarkably increased, and meanwhile, the target detection system is trained in an end-to-end mode.

S220, calculating the area ratio of the intersection area of the candidate sample area and the reference sample area in the image relative to the candidate sample area according to the regression position of the candidate sample and the position of the reference sample area.

The intersection region of the candidate sample region and the reference sample region in the image is shown in fig. 2A, and may refer to a portion where the candidate sample region and the reference sample region overlap; the union region of the candidate sample region and the reference sample region in the image is shown in fig. 2B, and may refer to the entire region obtained by adding the candidate sample region and the reference sample region.

Alternatively, for definition of positive and negative samples in the candidate box, an intersection ratio (iou) is generally used, which may refer to the ratio of the intersection of the candidate sample region and the reference sample region and the union of the candidate sample region and the reference sample region. When the iou of the candidate sample region and the reference sample region is more than or equal to 0.5, the candidate sample region and the reference sample region are considered to be positive samples; a negative sample is considered when the iou of the candidate sample region and the reference sample region is < 0.5.

S230, determining the influence degree value of the reference sample region under the candidate sample region according to the region ratio of the relative candidate sample region and the classification confidence of the candidate sample.

The influence degree value comprises an neglect degree score of neglect samples in the image and a difficulty routine degree score of difficult-to-sample samples in the image.

Optionally, a set of neglecting annotation boxes is defined as I, and the neglecting annotation boxes do not distinguish positive from negative. If the candidate sample region is determined to belong to the positive sample region, adopting a formula

Calculating to obtain an neglect degree score of a neglect sample in the image under the candidate sample area;

if the candidate sample region is determined to belong to the negative sample region, adopting the formula

wherein C is _i Representing the confidence of the classification of the ith candidate sample in the image, B _i Representing the position coordinates of the ith candidate sample in the image, n is the total number of ignored boxes, I _j The position coordinates of the j-th ignored sample in the image are denoted iof (B, I) as the area duty ratio of the ignored sample area in the candidate sample area.

Optionally, the set of the refractory sample labeling frames is H, the refractory sample distinguishes a positive refractory labeling frame and a negative refractory labeling frame, the set of the positive refractory labeling frames is HP (hard positive), and the set of the negative refractory labeling frames is HN (hard negative). If the candidate sample region is determined to belong to the positive sample region, adopting a formula

Calculating to obtain a difficulty routine degree score of a difficulty sample in the image under the candidate sample area;

wherein C is _i Representing the confidence of the classification of the ith candidate sample in the image, B _i Representing the position coordinates of the ith candidate sample in the image, I being a difficult sampleTotal number of books, HP _k Position coordinates of the kth positive sample, HN _k The position coordinates of the kth sample, iof (B, HN), represent the area ratio of the sample area of the sample in the candidate sample area.

S240, determining loss weight information of the candidate sample area according to the influence degree value of the reference sample area under the candidate sample area.

Optionally, according to the ignoring degree score of the ignoring sample in the image under the candidate sample area and the difficulty degree score of the difficult sample in the image, the formula lw=1-S is adopted _ignore +S _hard Calculating to obtain loss weight information of the candidate sample area;

wherein LW represents loss weight information of the candidate sample region, S _ignore A degree of omission score representing the omitted samples in the image under the candidate sample region, S _hard A difficulty routine score representing a difficulty sample in the image under the candidate sample region.

S250, back propagation is carried out according to loss weight information adopted by the candidate sample area, parameters of the target detection network are controlled to be adjusted towards the directions of suppressing neglected samples and enhancing difficult samples, and a trained target detection model is obtained.

When calculating the dynamic loss weight of each sample in the set L, the original loss weight is updated by using the weight, and is usually 1 by default, model parameters are updated by using the loss weight when model back propagation calculation, and the model is trained or fine-tuned from the beginning by using the method, so that neglect of samples and enhancement of difficult samples quantitatively are quantitatively inhibited. The model learning of the difficult sample is enhanced by inhibiting the influence of the neglected sample on the model, and two types of problems are well processed through a general paradigm without influencing the time consumption of the model.

The embodiment of the invention provides a training method of a target detection model, which is characterized in that the position coordinates of neglected samples are marked manually, so that the possible useful features of the region are prevented from being deleted during direct violent filling, and the abnormal edges after filling are prevented from influencing the original distribution of image features; marking the position coordinates of the difficult sample manually, marking the frequently missed or inter-category false detection targets as positive difficult samples, marking the frequently false detection background targets as negative difficult samples, and avoiding the low quality of the difficult samples of the online selection of the model; meanwhile, manual marking can be replaced by a model inspection mode, namely, a trained model is used for reasoning on training data, and false detection and omission detection targets are obtained through calculation according to a reasoning result and real marking information. The model is quantitatively improved by quantitatively inhibiting the influence of neglected samples on the model, and the model is quantitatively enhanced to learn difficult samples, so that two types of problems are better processed through a universal paradigm, and the time consumption of the model is not influenced. The method plays a role in softening influence by dynamically adjusting the weight of the sample, and for inhibiting the sample, part of the characteristics are discarded by softening; for a difficult sample, the method can moderately improve the learning of the model on the difficult sample, so that the model training is more stable, and the model training on the difficult sample is emphasized; the learning of the enhancement model on the difficult sample and the attention of the suppression model on the neglected sample are realized.

Example III

Fig. 3 is a block diagram of a training device for a target detection model according to a third embodiment of the present invention. The embodiment of the invention can be applied to the training of the target detection model. The apparatus may be implemented in software and/or hardware and integrated on any electronic device having network communication capabilities. As shown in fig. 3, the training device for the target detection model provided in the embodiment of the present application may include the following: a classification confidence and regression location acquisition module 310, a loss weight information acquisition module 320, and a target detection model acquisition module 330. Wherein:

the classification confidence coefficient and regression position obtaining module 310 is configured to classify and regress the candidate sample region in the image through forward propagation of the target detection network, so as to obtain a classification confidence coefficient and a regression position of the candidate sample in the image;

the loss weight information obtaining module 320 is configured to determine loss weight information adopted by the candidate sample region in the model training process according to the classification confidence level and regression position of the candidate sample and the position of the reference sample region; the reference sample area comprises an neglected sample area corresponding to the fuzzy target in the image and a difficult sample area in the image;

The target detection model obtaining module 330 is configured to perform back propagation according to the loss weight information adopted by the candidate sample region, and control parameters of the target detection network to adjust in a direction in which the neglected sample and the enhanced refractory sample are suppressed, so as to obtain a trained target detection model.

Based on the above embodiment, optionally, the classification confidence and regression location obtaining module 310 includes:

when single-stage target detection is carried out, the target detection network comprises a single-stage convolutional neural network, and the candidate sample region comprises an anchor frame region in an image; and, upon two-stage target detection, the target detection network comprises a convolutional neural network of a second stage of the two-stage target detection, and the candidate sample region comprises a sample region that is forward propagated output through the candidate region network of the first stage of the two-stage target detection.

On the basis of the embodiment, optionally, the neglected sample includes ambiguous targets marked in the image in advance before training, which lose original characteristics due to problems of angles, distortion, shielding and/or imaging noise, so that a person cannot make judgment; the difficult sample comprises targets marked in the image in advance before training, and the model is easy to miss or misdetect when in actual use, wherein the targets easy to miss are marked as difficult positive samples, and the targets easy to misdetect are marked as difficult negative samples. For example, the difficult sample comprises targets which are marked in the image in advance before training, and the model miss detection probability or false detection probability is larger than a preset probability value when the model is actually used, wherein the targets which are marked as difficult positive samples and the targets which are marked as difficult negative samples. The target which is easy to cause missed detection can be screened out through presetting the missed detection probability value to be used as the target which is easy to miss detection; the target which is easy to cause false detection can be screened out through presetting false detection probability values and used as the target which is easy to cause false detection.

Based on the above embodiment, optionally, the loss weight information obtaining module 320 includes:

calculating the area ratio of the intersection area of the candidate sample area and the reference sample area in the image relative to the candidate sample area according to the regression position of the candidate sample and the position of the reference sample area; determining the influence degree value of the reference sample region under the candidate sample region according to the region duty ratio of the relative candidate sample region and the classification confidence of the candidate sample; the influence degree value comprises an neglect degree score of a neglect sample in the image and a difficulty routine degree score of a difficult sample in the image;

and determining loss weight information of the candidate sample region according to the influence degree value of the reference sample region under the candidate sample region.

On the basis of the above embodiment, optionally, when the reference sample area is an neglected sample area, determining the influence degree of the reference sample area under the candidate sample area according to the area ratio of the relative candidate sample area and the classification confidence of the candidate sample includes:

if the candidate sample region is determined to belong to the positive sample region, adopting a formula

wherein C is _i Representing the confidence of the classification of the ith candidate sample in the image, B _i Representation of the drawingsPosition coordinates of the ith candidate sample in the image, n is the total number of ignored boxes, I _j The position coordinates of the j-th ignored sample in the image are denoted iof (B, I) as the area duty ratio of the ignored sample area in the candidate sample area.

On the basis of the above embodiment, optionally, when the reference sample area is a difficult sample area, determining, according to the area ratio of the relative candidate sample area and the classification confidence of the candidate sample, the influence degree of the reference sample area under the candidate sample area includes:

Wherein C is _i Representing the confidence of the classification of the ith candidate sample in the image, B _i Representing the position coordinates of the ith candidate sample in the image, I being the total number of difficult samples, HP _k Position coordinates of the kth positive sample, HN _k The position coordinates of the kth sample, iof (B, HN), represent the area ratio of the sample area of the sample in the candidate sample area.

On the basis of the foregoing embodiment, optionally, determining loss weight information of the candidate sample area according to a value of an influence degree of the reference sample area under the candidate sample area includes:

ignoring in the image under the candidate sample areaThe ignorance degree score of the sample and the difficulty routine degree score of the difficult sample in the image are calculated by adopting a formula LW=1-S _ignore +S _hard Calculating to obtain loss weight information of the candidate sample area;

wherein S is _ignore A degree of omission score representing the omitted samples in the image under the candidate sample region, S _hard A difficulty routine score representing a difficulty sample in the image under the candidate sample region.

The training device for the target detection model provided in the embodiment of the invention can execute the training method for the target detection model provided in any embodiment of the invention, has the corresponding functions and beneficial effects of executing the training method for the target detection model, and the detailed process refers to the related operation of the training method for the target detection model in the embodiment.

Example IV

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. As shown in fig. 4, an electronic device provided in an embodiment of the present invention includes: one or more processors 410 and a storage 420; the number of processors 410 in the electronic device may be one or more, one processor 410 being illustrated in fig. 4; the storage 420 is used for storing one or more programs; the one or more programs are executed by the one or more processors 410 to cause the one or more processors 410 to implement a method of training a target detection model according to any of the embodiments of the present invention.

The electronic device may further include: an input device 430 and an output device 440.

The processor 410, the memory device 420, the input device 430, and the output device 440 in the electronic device may be connected by a bus or other means, which is illustrated in fig. 4 as being connected by a bus 450.

The storage 420 in the electronic device is used as a computer readable storage medium, and may be used to store one or more programs, which may be software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the training method of the object detection model provided in the embodiments of the present invention. The processor 410 executes various functional applications of the electronic device and data processing by running software programs, instructions and modules stored in the storage 420, i.e. implements the training method of the object detection model in the above-described method embodiment.

The storage 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device, etc. In addition, the storage 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, the storage 420 may further include memory remotely located with respect to the processor 410, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output 440 may include a display device such as a display screen.

And, when one or more programs included in the above-described electronic device are executed by the one or more processors 410, the programs perform the following operations:

Of course, those skilled in the art will appreciate that the program(s) may also perform the associated operations in the training method of the object detection model provided in any of the embodiments of the present invention when the program(s) included in the electronic device are executed by the processor(s) 410.

Example five

In an embodiment of the present invention, there is provided a computer-readable medium having stored thereon a computer program for executing a training method of an object detection model when executed by a processor, the method comprising:

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to: electromagnetic signals, optical signals, or any suitable combination of the preceding. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, radio frequency (RadioFrequency, RF), and the like, or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of training a target detection model, the method comprising:

2. The method of claim 1, wherein upon single-stage target detection, the target detection network comprises a single-stage convolutional neural network, and the candidate sample region comprises an anchor frame region in the image; and, upon two-stage target detection, the target detection network comprises a convolutional neural network of a second stage of the two-stage target detection, and the candidate sample region comprises a sample region that is forward propagated output through the candidate region network of the first stage of the two-stage target detection.

3. The method according to claim 1, wherein the neglecting of samples comprises ambiguous targets previously noted in the image prior to training that were lost in their original characteristics due to angle, distortion, occlusion and/or imaging noise problems, resulting in an inability of human judgment; the difficult sample comprises targets which are marked in the image in advance before training, and the model miss detection probability or false detection probability is larger than a preset probability value when the model is actually used, wherein the targets with the model miss detection probability larger than the preset miss detection probability value are marked as difficult positive samples, and the targets with the model false detection probability larger than the preset false detection probability value are marked as difficult negative samples.

4. The method of claim 1, wherein determining loss weight information employed by the candidate sample region during model training based on classification confidence and regression locations of the candidate sample and locations of reference sample regions comprises: calculating the area ratio of the intersection area of the candidate sample area and the reference sample area in the image relative to the candidate sample area according to the regression position of the candidate sample and the position of the reference sample area; determining the influence degree value of the reference sample region under the candidate sample region according to the region duty ratio of the relative candidate sample region and the classification confidence of the candidate sample; the influence degree value comprises an neglect degree score of a neglect sample in the image and a difficulty routine degree score of a difficult sample in the image;

5. The method of claim 4, wherein when the reference sample region is an ignore sample region, determining the impact level of the reference sample region under the candidate sample region based on the region duty cycle of the relative candidate sample region and the classification confidence of the candidate sample comprises:

6. The method of claim 4, wherein when the reference sample region is a refractory sample region, determining the impact level of the reference sample region under the candidate sample region based on the region duty cycle of the relative candidate sample region and the classification confidence of the candidate sample comprises:

wherein C is _i Representing the confidence of the classification of the ith candidate sample in the image, B _i Representing the position coordinates of the ith candidate sample in the image, I being the total number of difficult samples, HP _k For the position coordinates of the kth refractory positive sample, iof (B, HP) is expressed as the area ratio of the refractory positive sample area in the candidate sample area, HN _k The position coordinates of the kth negative sample are denoted iof (B, HN) as the area ratio of the negative sample area in the candidate sample area.

7. The method of claim 4, wherein determining loss weight information for the candidate sample region based on the impact level value of the reference sample region under the candidate sample region comprises:

according to the ignoring degree score of the neglected sample in the image under the candidate sample area and the difficulty routine degree score of the difficult sample in the image, adopting a formula LW=1-S _ignore +S _hard Calculating to obtain loss weight information of the candidate sample area;

8. A training apparatus for a target detection model, the apparatus comprising:

9. An electronic device, comprising:

one or more processing devices;

a storage means for storing one or more programs;

when the one or more programs are executed by the one or more processing devices, the one or more processing devices are caused to implement the method of training the object detection model of any of claims 1-7.

10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processing means, implements a method for training a target detection model according to any one of claims 1-7.