CN109961107B - Training method and device for target detection model, electronic equipment and storage medium - Google Patents

Training method and device for target detection model, electronic equipment and storage medium Download PDF

Info

Publication number
CN109961107B
CN109961107B CN201910315195.1A CN201910315195A CN109961107B CN 109961107 B CN109961107 B CN 109961107B CN 201910315195 A CN201910315195 A CN 201910315195A CN 109961107 B CN109961107 B CN 109961107B
Authority
CN
China
Prior art keywords
loss function
classification network
detection model
network
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910315195.1A
Other languages
Chinese (zh)
Other versions
CN109961107A (en
Inventor
李永波
李伯勋
俞刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN201910315195.1A priority Critical patent/CN109961107B/en
Publication of CN109961107A publication Critical patent/CN109961107A/en
Application granted granted Critical
Publication of CN109961107B publication Critical patent/CN109961107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The embodiment of the application provides a training method and device for a target detection model, electronic equipment and a storage medium, wherein the target detection model comprises a first classification network, and the method comprises the following steps: setting at least one second classification network, wherein the input of the second classification network is the same as the input of the first classification network during training; and training the target detection model based on the total loss function until the total loss function is converged, wherein the total loss function comprises the loss function of the target detection model and the loss function of the second classification network. Compared with the existing training mode of the target detection model, the scheme of the embodiment of the application can effectively enhance the learning of the model on false detection and false alarm by adding the second classification network and training the target detection model based on the loss function of the second classification network and the loss function of the model when the target detection model is trained, thereby improving the detection precision of the target detection model.

Description

Training method and device of target detection model, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for training a target detection model, an electronic device, and a storage medium.
Background
The task of object detection is to find objects of interest in the image, for example, when the object is a face, face detection is intended to detect the face and its corresponding location in the scene. Target detection is one of the important problems in the field of computer vision, and has long-term research value and wide application requirements in the fields of security detection, human-computer interaction and the like.
In recent years, with the development of deep neural networks and hardware devices, the target detection technology has been developed rapidly, but in the practical application process, the target detection technology is often accompanied by a great amount of false alarms, that is, some non-target areas are identified as target areas, which seriously affects the popularization and use of the target detection technology. Therefore, how to suppress false alarms in the target detection network and improve the target detection accuracy is a very important problem in the field.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks, in particular, the technical drawback of high false alarm rate in the target detection process.
In a first aspect, an embodiment of the present application provides a method for training a target detection model, where the target detection model includes a first classification network, and the method includes:
setting at least one second classification network, wherein the input of the second classification network is the same as the input of the first classification network during training;
and training the target detection model based on the total loss function until the total loss function is converged, wherein the total loss function comprises the loss function of the target detection model and the loss function of the second classification network.
In an alternative embodiment of the present application, the target detection model comprises a single stage detection network architecture.
In an alternative embodiment of the present application, the single-stage detection network structure comprises a RetinaNet network structure.
In an alternative embodiment of the present application, the second classification network comprises cascaded convolutional layers and fully-connected layers, wherein an input of the convolutional layers is connected to an output of a Backbone network of the RetinaNet network structure.
In an alternative embodiment of the present application, the loss function of the second classification network comprises at least one of a first loss function determined based on an output of the second classification network and a second loss function determined based on an output of the first classification network and an output of the second classification network.
In the embodiment of the present application, the first loss function is:
LC1=-(1-α)*p1 γlog(1-p1)*(1-y)-α*(1-p1)γlog(p1)*y
wherein L isC1Representing a first loss function, alpha being a weighting factor, p1And y represents a sample label, and gamma is an adjusting factor.
In an alternative embodiment of the present application, the second loss function is:
LC2=-(y*(1-M)+(1-y)*M)*
(α*p1 γlog(p1)*M-(1-α)*(1-p1)γlog(1-p1)*(1-M))
wherein L isc2Representing a second loss function, alpha being a weighting factor, p1And (3) an output result of the second classification network, wherein y represents a sample label, gamma is an adjusting factor, M is a target area result label, and the value of M is determined in the following way:
Figure BDA0002032847610000021
where p represents the output determination of the first classification network and th represents a preset threshold.
In an alternative embodiment of the present application, the loss function of the target detection model includes a loss function of the first classification network and a loss function of the target box regression network.
In a second aspect, an embodiment of the present application further provides an image detection method, where the method includes:
acquiring an image to be detected;
detecting an image to be detected through a target detection model, wherein the target detection model is obtained through training by a training method of the target detection model in the first aspect of the embodiment of the application;
and obtaining a detection result of the image to be detected based on the output of the target detection model.
In a third aspect, an embodiment of the present application provides a training apparatus for a target detection model, where the target detection model includes a first classification network, and the apparatus includes:
the training supervision network setting module is used for setting at least one second classification network, wherein the input of the second classification network is the same as the input of the first classification network during training;
and the model training module is used for training the target detection model based on the total loss function until the total loss function is converged, wherein the total loss function comprises the loss function of the target detection model and the loss function of the second classification network.
In an alternative embodiment of the present application, the target detection model comprises a single stage detection network architecture.
In an alternative embodiment of the present application, the single-stage detection network structure comprises a RetinaNet network structure.
In an alternative embodiment of the present application, the second classification network comprises a convolutional layer and a fully-connected layer in cascade, wherein an input of the convolutional layer is connected to an output of the backhaul network of the RetinaNet network structure.
In an alternative embodiment of the present application, the loss function of the second classification network comprises at least one of a first loss function determined based on an output of the second classification network and a second loss function determined based on an output of the first classification network and an output of the second classification network.
In an alternative embodiment of the present application, the first loss function is:
LC1=-(1-α)*p1 γlog(1-p1)*(1-y)-α*(1-p1)γlog(p1)*y
wherein L isC1Representing a first loss function, α being a weighting factor, p1And y represents a sample label, and gamma is an adjusting factor, wherein the output result of the second classification network is shown as y.
In the embodiment of the present application, the second loss function is:
LC2=-(y*(1-M)+(1-y)*M)*
(α*p1 γlog(p1)*M-(1-α)*(1-p1)γlog(1-p1)*(1-M))
wherein L isc2Representing a second loss function, alpha being a weighting factor, p1And (3) an output result of the second classification network, wherein y represents a sample label, gamma is an adjusting factor, M is a target area result label, and the value of M is determined in the following way:
Figure BDA0002032847610000031
where p represents the output determination of the first classification network and th represents a preset threshold.
In an alternative embodiment of the present application, the loss function of the target detection model includes a loss function of the first classification network and a loss function of the target box regression network.
In a fourth aspect, an embodiment of the present application further provides an image detection apparatus, including:
the image acquisition module is used for acquiring an image to be detected;
the image detection module is configured to detect an image to be detected through a target detection model, and obtain a detection result of the image to be detected based on output of the target detection model, where the target detection model is obtained by training through a training method of the target detection model in the first aspect of the embodiment of the present application.
In a fifth aspect, the present application provides an electronic device, comprising: a processor and a memory;
a memory for storing operating instructions;
and the processor is used for executing the method shown in any one of the first aspect and the second aspect of the application by calling the operation instruction.
In a sixth aspect, the present application provides a computer readable storage medium storing at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method as set forth in any of the first or second aspects of the present application.
The beneficial effect that technical scheme that this application provided brought is:
compared with the existing training mode of the target detection model, the scheme of the embodiment of the application can effectively enhance the learning of the model on false detection and false alarm by adding the second classification network and training the target detection model based on the loss function of the second classification network and the loss function of the model when the target detection model is trained, thereby improving the detection precision of the target detection model.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a method for training a target detection model according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a second classification network according to an embodiment of the present application;
fig. 3 is a schematic diagram of another second classification network provided in an embodiment of the present application;
fig. 4 is a schematic diagram of another second classification network provided in an embodiment of the present application;
fig. 5 is a schematic diagram of another second classification network provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of an image detection method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a training apparatus for a target detection model according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and are only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" include plural referents unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
In order to make the objects, technical solutions and advantages of the present application clearer, in the embodiments of the present application, a scheme in the embodiments of the present application is described by taking an example of detecting a face region in an image by using an object detection model, and embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
An embodiment of the present application provides a training method for a target detection model, where the target detection model includes a first classification network, and as shown in fig. 1, the method may include:
step S110, at least one second classification network is set, wherein the input of the second classification network is the same as the input of the first classification network during training.
Wherein the target detection model is used for detecting a target region in an image, for example, the target detection model may be a face detection model for detecting a face region in the image, or a human body detection model or other object detection models for detecting a human body region in the image, and the output result of the classification network is a class for characterizing the target region, for example, for the face detection model, the output result of the classification network may be a probability that the detected target region in the input image is the face region.
And step S120, training the target detection model based on the total loss function until the total loss function is converged, wherein the total loss function comprises the loss function of the target detection model and the loss function of the second classification network.
The loss function is used for estimating the inconsistency degree of the prediction result and the real result of the model, is a non-negative real-value function, the smaller the loss function is, the better the robustness of the model is, and the loss function is a core part of the empirical risk function and is also an important component of the structural risk function. The loss function convergence is a limit concept, and generally, if the function value tends to a certain finite value as the variable, the loss function is converged.
It is understood that the loss function of the object detection model refers to the loss function portion of the model itself, the specific form of which is related to the structure of the object detection model. For example, if the target detection model includes a classification network, the loss function of the target detection model includes a loss function corresponding to the classification network, and if the target detection model further includes a regression network, the loss function of the target detection model includes a loss function corresponding to the classification network and a loss function corresponding to the regression network. The form of the loss function of each sub-network (e.g., classification network, regression network) of the model is also related to the structure of each sub-network.
In an embodiment of the present application, the loss function of the target detection model includes a loss function of the first classification network and a loss function of the target box regression network.
That is, in the embodiment of the present application, the target detection model may include a first classification network and a target box regression network, and the loss function of the target detection model includes a loss function of the first classification network and a loss function of the target box regression network. Accordingly, the total loss function may include a loss function of the first classification network, a loss function of the target box regression network, and a loss function of the second classification network.
In practical applications, the manner of determining whether the loss function (such as the total loss function) converges may be configured according to actual requirements. For example, during training, if the function value of the total loss function approaches a finite value, the function may be considered to converge. Generally, the smaller the total loss function is, the better the total loss function is, the value of the total loss function will be continuously reduced and tend to be stable with the increase of the training times, for example, the convergence condition may mean that the difference between the values of the total loss functions of two adjacent training times is smaller than a set threshold, when the training result meets the convergence condition, the total loss function may be considered to be converged, and of course, other convergence conditions may be configured according to actual needs or other manners of determining whether the function is converged may be adopted.
In the embodiment of the application, when the target detection model is trained, at least one second classification network is additionally arranged, and the target detection model is trained based on the loss function of the second classification network and the loss function of the target detection model. The total loss function comprises the loss function of the target detection model and the loss function of the second classification network, and the loss function of the second classification network can play a role in supervision when the target detection model is trained.
In addition, the second classification network is arranged outside the network structure of the target detection model, so that the network structure of the original target detection model is not influenced, and the detection speed is not influenced because the network structure of the target detection model is not changed when the target detection model is subsequently used for target area detection.
In an alternative embodiment of the present application, the target detection model comprises a single stage detection network architecture.
Of course, the object detection model may also include a multi-level detection network architecture. The Single-stage detection network structure, i.e., the one-stage detection network structure, may include, but is not limited to, a YOLO (just one pass) structure, an SSD (Single Shot multi box Detector) network structure, or a RetinaNet network structure, for example.
In an alternative embodiment of the present application, the single-stage detection network structure comprises a RetinaNet network structure.
The RetinaNet network structure is a network used for detecting a target image, the RetinaNet network structure is a single network consisting of a Backbone network and two sub-networks with specific tasks, the Backbone network is responsible for calculating convolution characteristics on the whole image, the first sub-network performs an image classification task (namely, a first classification network) on the output of the Backbone network, and the second sub-network is responsible for convolution frame regression (namely, a regression network). The first classification network may include cascaded convolutional layers and fully-connected layers, and the RetinaNet network structure corresponds to a loss function LCIncluding a loss function L of the first classification networkclsLoss function L corresponding to regression networkbbI.e. LC=Lbb+Lcls
In an optional embodiment of the present application, the second classification network includes a convolutional layer and a fully connected layer in cascade, where an input of the convolutional layer is connected to an output of a backhaul network of a RetinaNet network structure, and the backhaul network functions to perform pre-training in the form of a classifier to extract features from an image.
In practical applications, the second classification network may be designed according to the first classification network, the structural form of the second classification network may be the same as or different from that of the first classification network, and when the structural form of the second classification network is the same as that of the first classification network, the network parameters in the second classification network and the first classification network may be different. In an example, if the single-stage detection network structure is a RetinaNet network structure, and the first classification network in the RetinaNet network structure includes cascaded convolutional layers and fully-connected layers, the second classification network may also include cascaded convolutional layers and fully-connected layers, and an input of a convolutional layer in the second classification network is connected to an output of a backhaul network of the RetinaNet network structure.
In an example, a target detection model is taken as an example to describe a RetinaNet network structure, and as shown in fig. 2, an embodiment of the present application provides a schematic diagram of a RetinaNet network structure and a second classification network. The network Branch corresponding to Branch-c1 in the figure represents a second classification network, the second classification network is composed of a convolutional layer Conv3 and a full connection layer FC3 which are cascaded, a part inside a dotted frame in the figure is a RetinaNet network structure, and the RetinaNet network structure comprises two network branches of Branch-c and Branch-b. The network Branch corresponding to Branch-c represents a first classification network, the first classification network is composed of the cascaded convolutional layer Conv1 and the fully-connected layer, and the network Branch corresponding to Branch-b is a regression network, and the regression network is composed of the cascaded convolutional layer Conv2 and the fully-connected layer FC2 in this example. In this example, the output of the network Branch corresponding to Branch-b may be the coordinates of the target area, that is, the output result is the coordinates of the target area, and the output of the network branches corresponding to Branch-c and Branch-c1 may be the probability that the detection area is the target area, that is, the output result is the probability.
In an optional embodiment of the present application, the loss function of the second classification network comprises at least one of a first loss function determined based on an output of the second classification network and a second loss function determined based on an output of the first classification network and an output of the second classification network.
That is, in practical applications, the loss function of the second classification network may have different combinations, that is, may include a first loss function determined based on the output of the second classification network, or may include a second loss function determined based on the output of the first classification network and the output of the second classification network, or may include a first loss function determined based on the output of the second classification network, and a second loss function determined based on the output of the first classification network and the output of the second classification network.
In an alternative embodiment of the present application, the first loss function is:
LC1=-(1-α)*p1 γlog(1-p1)*(1-y)-α*(1-p1)γlog(p1)*y
wherein L isC1Representing a first loss function, alpha being a weighting factor, p1As a result of the output of the second classification network, y is the sample label and γ is the adjustment factor.
The sample label refers to a label added for whether a target area exists in the sample image, and if the sample image is used for detecting whether a face area exists in the image, at this time, the sample image includes a face, y may be set to 1, and if no face exists (that is, the sample image is a background image), y may be set to 0.
In an alternative embodiment of the present application, the second loss function is:
LC2=-(y*(1-M)+(1-y)*M)*
(α*p1 γlog(p1)*M-(1-α)*(1-p1)γlog(1-p1)*(1-M))
wherein L isC2Representing a second loss function, alpha being a weighting factor, p1And the result output by the second classification network is y, a sample label is y, a regulating factor is gamma, M is a target area result label, and the value of M is determined by the following method:
Figure BDA0002032847610000091
where p represents the output determination of the first classification network and th represents a preset threshold.
That is, if p ≧ th and M ═ 1, the result is considered positive, i.e., the target region, and if p < th and M ═ 0, the result is considered negative, i.e., the target region is not identified.
In the following, for a specific alternative of the loss function of the second classification network, taking the target detection model as the RetinaNet network structure as an example, and a detailed description is given to the total loss function by combining a specific example.
1. The loss function of the second classification network comprises a first loss function L determined based on the output of the second classification networkC1At this time, the total loss function L includes a loss function L corresponding to the RetinaNet network structureC(LC=Lbb+Lcls) And a first loss function L determined based on an output of the second classification networkC1E.g. L ═ LC1+LC
As an example, as shown in fig. 3, when training a RetinaNet network structure, after a sample image is input to a backhaul network, an output result of a first classification network in the RetinaNet network structure is a probability p, an output result of a regression network is represented by a Box, and an output result of a second classification network is a probability p1In this example, the total loss function can be expressed as:
L=LC+LC1=Lbb+Lcls+LC1
wherein the content of the first and second substances,
Lcls=-(1-α)*pγlog(1-p)*(1-y)-α*(1-p)γlog(p)*y
LC1=-(1-α)*p1 γlog(1-p1)*(1-y)-α*(1-p1)γlog(p1)*y
2. the loss function of the second classification network comprises a second loss function L determined based on the output of the first classification network and the output of the second classification networkC2At this time, the total loss function L includes a loss function L corresponding to the RetinaNet network structureC(LC=Lbb+Lcls) And output and the first classification network basedSecond loss function L determined by output of two-class networkC2E.g. L ═ LC2+LC
As an example, as shown in fig. 4, when training a RetinaNet network structure, after a sample image is input into a backhaul network, an output result of a first classification network in the RetinaNet network structure is a probability p, an output result of a regression network is represented by Box, and an output result of a second classification network is the probability p1In this example, the total loss function may be expressed as:
L=LC2+LC=Lbb+Lcls+LC2
LC2=-(y*(1-M)+(1-y)*M)*
(α*p1 γlog(p1)*M-(1-α)*(1-p1)γlog(1-p1)*(1-M))
wherein, the value of M can be determined by the following templates:
Figure BDA0002032847610000101
wherein L isclsDetailed description of the invention and L in the above examplesclsThe same applies to the above embodiments, and the details are not repeated herein.
3. The loss function of the second classification network comprises a first loss function L determined based on the output of the second classification networkC1And a second loss function L determined based on an output of the first classification network and an output of the second classification networkC2At this time, the total loss function L includes a loss function L corresponding to the RetinaNet network structureC(LC=Lbb+Lcls) A first loss function L determined based on the output of the second classification networkC1And a second loss function L determined based on the output of the first classification network and the output of the second classification networkC2E.g. L ═ LC2+LC+LC1
As an example, as shown in FIG. 5, when training a RetinaNet network structureAfter the sample image is input into the backsbone network, the output result of the first classification network in the RetinaNet network structure is probability p, the output result of the regression network is represented by Box, and the output result of the second classification network is probability p1In this example, the total loss function can be expressed as:
L=LC2+LC+LC1=Lbb+Lcls+LC2+LC1
wherein L iscls、LC2And LC1Specific forms of (1) and L in the above examplescls、LC2And LC1For details, reference may be made to the above embodiments, which are not described herein again.
Based on the target detection model provided by the embodiment of the present invention, as shown in fig. 6, the embodiment of the present application further provides an image detection method, including:
step S610, acquiring an image to be detected;
step S620, detecting the image to be detected through the target detection model;
the target detection model is the target detection model trained by the training method of the target detection model in the above embodiment, and the specific implementation manner of the training method may refer to the description of the training method of the target detection model in the above embodiment, which is not described herein again, for example, the target detection model may be a trained RetinaNet network structure.
And step S630, obtaining a detection result of the image to be detected based on the output of the target detection model.
The target area can be set by itself according to actual needs, and is not limited to the face area of a person, such as a human body area.
That is to say, when image detection is performed, an image to be detected can be input into the trained target detection model, the target detection model can output a result, and a detection result of the image to be detected can be obtained based on the output result of the target detection model.
Based on the same principle as the method shown in fig. 1, in an embodiment of the present application, there is also provided an object detection model training apparatus 70, where the object detection model training apparatus 70 includes a first classification network, and as shown in fig. 7, the object detection model training apparatus 70 may include: a training supervision network setup module 710 and a model training module 720, wherein:
a training supervision network setting module 710, configured to set at least one second classification network, where an input of the second classification network is the same as an input of the first classification network during training;
and a model training module 720, configured to train the target detection model based on the total loss function until the total loss function converges, where the total loss function includes a loss function of the target detection model and a loss function of the second classification network.
In an embodiment of the application, the target detection model includes a single-stage detection network structure.
In the embodiment of the application, the single-stage detection network structure comprises a RetinaNet network structure.
In the embodiment of the present application, the second classification network includes a convolutional layer and a fully connected layer, which are cascaded, where an input of the convolutional layer is connected to an output of the backhaul network of the RetinaNet network structure.
In an embodiment of the application, the loss function of the second classification network comprises at least one of a first loss function determined based on an output of the second classification network and a second loss function determined based on an output of the first classification network and an output of the second classification network.
In the embodiment of the present application, the first loss function is:
LC1=-(1-α)*p1 γlog(1-p1)*(1-y)-α*(1-p1)γlog(p1)*y
wherein L isC1Representing a first loss function, α being a weighting factor, p1And y represents a sample label, and gamma is an adjusting factor, wherein the output result of the second classification network is shown as y.
In the embodiment of the present application, the second loss function is:
LC2=-(y*(1-M)+(1-y)*M)*
(α*p2 γlog(p2)*M-(1-α)*(1-p2)γlog(1-p2)*(1-M))
wherein L isc2Representing a second loss function, alpha being a weighting factor, p1And (3) an output result of the second classification network, wherein y represents a sample label, gamma is an adjusting factor, M is a target area result label, and the value of M is determined in the following way:
Figure BDA0002032847610000121
where p represents the output determination of the first classification network and th represents a preset threshold.
In an embodiment of the present application, the loss function of the target detection model includes a loss function of the first classification network and a loss function of the target frame regression network.
The training device of the target detection model in the embodiments of the present application may execute the training method of the target detection model provided in the embodiments of the present application, and the implementation principle is similar, the actions performed by each module in the training device of the target detection model in each embodiment of the present application correspond to the steps in the training method of the target detection model in each embodiment of the present application, and the detailed functional description of each module of the training device of the target detection model may specifically refer to the description in the training method of the corresponding target detection model shown in the foregoing, and will not be described again here.
Based on the same principle as the method shown in fig. 6, an embodiment of the present application further provides an image detection apparatus 80, and as shown in fig. 8, the image detection apparatus 80 may include: an image acquisition module 810 and an image detection module 820, wherein:
an image obtaining module 810, configured to obtain an image to be detected;
the image detection module 820 is used for detecting the image to be detected through the target detection model and obtaining a detection result of the image to be detected based on the output of the target detection model; the target detection model is obtained by training through the training method of the target detection model in the embodiment.
The image detection apparatus of the embodiment of the present application can execute the image detection method provided by the embodiment of the present application, and the implementation principles thereof are similar, the actions executed by the modules in the image detection apparatus in the embodiments of the present application correspond to the steps in the image detection method in the embodiments of the present application, and the detailed functional description of the modules in the image detection apparatus may specifically refer to the description in the corresponding image detection method shown in the foregoing, and will not be described again here.
Embodiments of the present application also provide an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing computer operating instructions; and the processor is used for executing the method shown in the embodiment by calling the computer operation instruction.
Yet another embodiment of the present application provides a computer-readable storage medium storing at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the respective contents of the aforementioned method embodiments.
In an alternative embodiment, an electronic device is provided, as shown in fig. 9, the electronic device 4000 shown in fig. 9 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. It should be noted that the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 4001 may also be a combination that performs a computing function, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, etc.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in any of the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (12)

1. A method for training an object detection model, wherein the object detection model is used for detecting an object region in an image, and the object detection model comprises a first classification network, and the method comprises:
setting at least one second classification network, wherein the input of the second classification network is the same as the input of the first classification network during training;
training the target detection model based on a total loss function until the total loss function converges, wherein the total loss function comprises a loss function of the target detection model and a loss function of the second classification network, and the loss function of the second classification network comprises a second loss function determined based on an output of the first classification network and an output of the second classification network;
wherein the second loss function is:
LC2=-(y*(1-M)+(1-y)*M)*(α*p1 γlog(p1)*M-(1-α)*(1-p1)γlog(1-p1)*(1-M))
wherein L isc2Representing a second loss function, α being a weighting factor, p1Is the output result of the second classification network, y represents a sample label, gamma is an adjusting factor, M is a target area result label, and the taking of MThe values are determined by:
Figure FDA0003621463950000011
where p represents the output determination of the first classification network and th represents a preset threshold.
2. The method of claim 1, wherein the object detection model comprises a single stage detection network architecture.
3. The method of claim 2, wherein the single stage detection network structure comprises a RetinaNet network structure.
4. The method of claim 3, wherein the second classification network comprises cascaded convolutional layers and fully-connected layers, wherein inputs of the convolutional layers are connected to outputs of a Backbone Backbone network of the RetinaNet network structure.
5. The method of any of claims 1 to 4, wherein the loss function of the second classification network further comprises a first loss function determined based on an output of the second classification network.
6. The method of claim 5, wherein the first loss function is:
LC1=-(1-α)*p1 γlog(1-p1)*(1-y)-α*(1-p1)γlog(p1)*y
wherein L isC1Representing a first loss function, alpha being a weighting factor, p1And y represents a sample label, and gamma is an adjusting factor.
7. The method of claim 1, wherein the loss function of the target detection model comprises a loss function of the first classification network and a loss function of a target box regression network.
8. An image detection method, characterized in that the method comprises:
acquiring an image to be detected;
detecting the image to be detected through the target detection model, wherein the target detection model is obtained by training through the method of any one of claims 1 to 7;
and obtaining a detection result in the image to be detected based on the output of the target detection model.
9. An apparatus for training an object detection model, the object detection model being used for detecting an object region in an image, the object detection model comprising a first classification network, the apparatus comprising:
the training supervision network setting module is used for setting at least one second classification network, wherein the input of the second classification network is the same as the input of the first classification network during training;
a model training module, configured to train the target detection model based on a total loss function until the total loss function converges, where the total loss function includes a loss function of the target detection model and a loss function of the second classification network, and the loss function of the second classification network includes a second loss function determined based on an output of the first classification network and an output of the second classification network;
wherein the second loss function is:
LC2=-(y*(1-M)+(1-y)*M)*(α*p1 γlog(p1)*M-(1-α)*(1-p1)γlog(1-p1)*(1-M))
wherein L isc2Representing a second loss function, alpha being a weighting factor, p1Is the output result of the second classification network, y represents a sample label, gamma is an adjustment factor, M is a target area result label, and the value of M is obtained by the following stepsThe following is determined:
Figure FDA0003621463950000031
where p represents the output determination of the first classification network and th represents a preset threshold.
10. An image detection apparatus, characterized in that the apparatus comprises:
the image acquisition module is used for acquiring an image to be detected;
an image detection module, configured to detect the image to be detected through a target detection model, and obtain a detection result of the image to be detected based on an output of the target detection model, where the target detection model is obtained by training according to the method of any one of claims 1 to 7.
11. An electronic device, characterized in that the electronic device comprises: a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the method of any one of claims 1 to 8 by calling the operation instruction.
12. A computer readable storage medium, characterized in that it stores at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method according to any one of claims 1 to 8.
CN201910315195.1A 2019-04-18 2019-04-18 Training method and device for target detection model, electronic equipment and storage medium Active CN109961107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910315195.1A CN109961107B (en) 2019-04-18 2019-04-18 Training method and device for target detection model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910315195.1A CN109961107B (en) 2019-04-18 2019-04-18 Training method and device for target detection model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109961107A CN109961107A (en) 2019-07-02
CN109961107B true CN109961107B (en) 2022-07-19

Family

ID=67026354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910315195.1A Active CN109961107B (en) 2019-04-18 2019-04-18 Training method and device for target detection model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109961107B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533640B (en) * 2019-08-15 2022-03-01 北京交通大学 Improved YOLOv3 network model-based track line defect identification method
CN110991312A (en) * 2019-11-28 2020-04-10 重庆中星微人工智能芯片技术有限公司 Method, apparatus, electronic device, and medium for generating detection information
CN113139559B (en) * 2020-01-17 2022-06-24 魔门塔(苏州)科技有限公司 Training method of target detection model, and data labeling method and device
CN111768005B (en) * 2020-06-19 2024-02-20 北京康夫子健康技术有限公司 Training method and device for lightweight detection model, electronic equipment and storage medium
CN111950411B (en) * 2020-07-31 2021-12-28 上海商汤智能科技有限公司 Model determination method and related device
CN112085096A (en) * 2020-09-09 2020-12-15 华东师范大学 Method for detecting local abnormal heating of object based on transfer learning
CN112308150B (en) * 2020-11-02 2022-04-15 平安科技(深圳)有限公司 Target detection model training method and device, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520229A (en) * 2018-04-04 2018-09-11 北京旷视科技有限公司 Image detecting method, device, electronic equipment and computer-readable medium
CN108694401A (en) * 2018-05-09 2018-10-23 北京旷视科技有限公司 Object detection method, apparatus and system
CN108875521A (en) * 2017-12-20 2018-11-23 北京旷视科技有限公司 Method for detecting human face, device, system and storage medium
CN109102024A (en) * 2018-08-14 2018-12-28 中山大学 A kind of Layer semantics incorporation model finely identified for object and its implementation
CN109360198A (en) * 2018-10-08 2019-02-19 北京羽医甘蓝信息技术有限公司 Bone marrwo cell sorting method and sorter based on deep learning
CN109472214A (en) * 2018-10-17 2019-03-15 福州大学 One kind is taken photo by plane foreign matter image real-time detection method based on deep learning
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN109614968A (en) * 2018-10-10 2019-04-12 浙江大学 A kind of car plate detection scene picture generation method based on multiple dimensioned mixed image stylization

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6316298B2 (en) * 2012-09-10 2018-04-25 オレゴン ヘルス アンド サイエンス ユニバーシティ Quantification of local circulation by OCT angiography
US10115039B2 (en) * 2016-03-10 2018-10-30 Siemens Healthcare Gmbh Method and system for machine learning based classification of vascular branches
CN106803071B (en) * 2016-12-29 2020-02-14 浙江大华技术股份有限公司 Method and device for detecting object in image
CN107145857B (en) * 2017-04-29 2021-05-04 深圳市深网视界科技有限公司 Face attribute recognition method and device and model establishment method
CN108460341B (en) * 2018-02-05 2020-04-07 西安电子科技大学 Optical remote sensing image target detection method based on integrated depth convolution network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875521A (en) * 2017-12-20 2018-11-23 北京旷视科技有限公司 Method for detecting human face, device, system and storage medium
CN108520229A (en) * 2018-04-04 2018-09-11 北京旷视科技有限公司 Image detecting method, device, electronic equipment and computer-readable medium
CN108694401A (en) * 2018-05-09 2018-10-23 北京旷视科技有限公司 Object detection method, apparatus and system
CN109102024A (en) * 2018-08-14 2018-12-28 中山大学 A kind of Layer semantics incorporation model finely identified for object and its implementation
CN109360198A (en) * 2018-10-08 2019-02-19 北京羽医甘蓝信息技术有限公司 Bone marrwo cell sorting method and sorter based on deep learning
CN109614968A (en) * 2018-10-10 2019-04-12 浙江大学 A kind of car plate detection scene picture generation method based on multiple dimensioned mixed image stylization
CN109472214A (en) * 2018-10-17 2019-03-15 福州大学 One kind is taken photo by plane foreign matter image real-time detection method based on deep learning
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Automatic Ship Detection Based on RetinaNet Using Multi-Resolution Gaofen-3 Imagery;Yuanyuan Wang等;《Remote Sense》;20190305;第1-14页 *
Focal Loss for Dense Object Detection;Tsung-Yi Lin等;《International Conference on Computer Vision (ICCV)》;20171029;第2980-2988页 *
Learning Better Features for Face Detection with Feature Fusion and Segmentation Supervision;Wanxin Tian等;《https://arxiv.org/abs/1811.08557v1》;20181120;第1-9页 *
何恺明大神的「Focal Loss」,如何更好地理解?;苏剑林;《https://zhuanlan.zhihu.com/p/32423092》;20171028;第1-5页 *
基于深度学习的交通场景多目标检测与分类研究;何朋朋;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115;第I138-2735页 *

Also Published As

Publication number Publication date
CN109961107A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
CN109961107B (en) Training method and device for target detection model, electronic equipment and storage medium
CN109740534B (en) Image processing method, device and processing equipment
CN107358157B (en) Face living body detection method and device and electronic equipment
US10452893B2 (en) Method, terminal, and storage medium for tracking facial critical area
CN107545263B (en) Object detection method and device
CN109492674B (en) Generation method and device of SSD (solid State disk) framework for target detection
Nguyen et al. Yolo based real-time human detection for smart video surveillance at the edge
CN108229658B (en) Method and device for realizing object detector based on limited samples
CN111985458A (en) Method for detecting multiple targets, electronic equipment and storage medium
CN113963333B (en) Traffic sign board detection method based on improved YOLOF model
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN117173182B (en) Defect detection method, system, equipment and medium based on coding and decoding network
CN113837257A (en) Target detection method and device
WO2024011859A1 (en) Neural network-based face detection method and device
CN111353577B (en) Multi-task-based cascade combination model optimization method and device and terminal equipment
CN111027716A (en) Load prediction method and device
WO2020052170A1 (en) Target object identification method and device, and storage medium
CN113255671B (en) Target detection method, system, device and medium for object with large length-width ratio
Kaur et al. Deep transfer learning based multiway feature pyramid network for object detection in images
CN114913588A (en) Face image restoration and recognition method applied to complex scene
Liu et al. Research on Small Target Pedestrian Detection Algorithm Based on Improved YOLOv3
CN116913259B (en) Voice recognition countermeasure method and device combined with gradient guidance
CN113569727B (en) Method, system, terminal and medium for identifying construction site in remote sensing image
EP4332892A1 (en) Estimating volumes of liquid
Cheng et al. Research on Target Recognition Algorithm Based on Improved Faster-RCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Training methods, devices, electronic devices, and storage media for object detection models

Effective date of registration: 20230404

Granted publication date: 20220719

Pledgee: Shanghai Yunxin Venture Capital Co.,Ltd.

Pledgor: MEGVII (BEIJING) TECHNOLOGY Co.,Ltd.

Registration number: Y2023990000192