CN111881764B

CN111881764B - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN111881764B
Application number: CN202010626321.8A
Authority: CN
Inventors: 李一力; 张�浩; 邵新庆; 刘强; 徐�明
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2023-11-03
Anticipated expiration: 2040-07-01
Also published as: CN111881764A

Abstract

The application discloses a target detection method, a target detection device, electronic equipment and a storage medium, wherein the target detection method comprises the following steps: obtaining a target sample to be detected; inputting a target sample to be detected into a trained deep learning model to obtain a target to be detected in the target sample to be detected, wherein the pre-trained deep model comprises a target classification task and a target position regression task, and the loss weight of a loss function of the target classification task and the loss weight of a loss function of the target position regression task are obtained through dynamic adjustment in the training process of the deep learning model. Because the loss weights of the loss functions of the target classification task and the target position regression task are dynamically adjusted in the deep learning model training process, more proper loss weights of the loss functions of the target classification task and the target position regression task can be set in different target detection or different stages of the same target detection process.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer vision, and in particular, to a target detection method, apparatus, electronic device, and storage medium.

Background

The task of target detection is to judge whether a target to be detected exists in one picture, and if the target to be detected exists, the position of the target is required to be output. Therefore, in the training stage of the target detection model, two kinds of loss functions of tasks are respectively used for judging whether positions of targets and regression targets exist in the picture by the measurement model, the two tasks are respectively called a target classification task and a target position regression task, and the corresponding loss functions are respectively a classification loss function and a regression loss function.

The weights in the existing classification and regression loss functions are fixed throughout the model training process, often set empirically, so that the task of specifically targeting a certain target may not be appropriate. Moreover, even for the same target detection task, the model training pre-stage and model training post-stage are different in bias for the classification task and the target regression task.

Disclosure of Invention

The application aims to provide a target detection method, a target detection device, electronic equipment and a storage medium, which can set more proper loss weights of a target classification task and a target position regression task in different stages of different target detection or the same target detection process.

According to a first aspect, in one embodiment, there is provided a target detection method, including:

obtaining a target sample to be detected;

inputting the target sample to be detected into a pre-trained deep learning model to obtain a target to be detected in the target sample to be detected; the pre-trained depth model comprises a target classification task and a target position regression task, wherein the loss weight of the loss function of the target classification task and the loss weight of the loss function of the target position regression task are obtained through dynamic adjustment in the training process of the deep learning model.

Further, the loss weight of the loss function of the target classification task is obtained by dynamic adjustment in the training process of the deep learning model, and the method comprises the following steps:

acquiring a training target sample set;

constructing a deep learning model, and inputting the training target sample set into the deep learning model;

acquiring the accuracy of the training target sample set in the current target classification task in the deep learning model training process;

and calculating the loss weight of the loss function of the current target classification task according to the accuracy of the training target sample set in the current target classification task.

Further, the loss weight of the loss function of the target position regression task is obtained by dynamic adjustment in the training process of the deep learning model, and the method comprises the following steps:

acquiring the cross ratio of the training target sample set in the current target position regression task in the deep learning model training process; the intersection ratio is the ratio of the intersection of the target real position area and the target predicted position area to the union of the target real position area and the target predicted position area;

and calculating the loss weight of the loss function of the current target position regression task according to the intersection ratio of the training target sample set in the current target position regression task.

Further, the calculating the loss weight of the loss function of the current target classification task according to the accuracy rate of the training target sample set in the current target classification task comprises:

the loss weight of the loss function of the current target classification task is calculated according to the following formula:

FL1＝-(1-A) ^γ log(A)

wherein FL1 is the loss weight of the loss function of the current target classification task, A is the accuracy of the training target sample set in the current target classification task, and gamma is a constant.

Further, the calculating the loss weight of the loss function of the current target position regression task according to the intersection ratio of the training target sample set in the current target position regression task comprises:

the loss weight of the loss function of the current target position regression task is calculated according to the following formula:

FL2＝-(1-IoU) ^γ log(IoU)

wherein FL2 is the loss weight of the loss function of the current target position regression task, ioU is the cross ratio of the training target sample set in the current target position regression task, and gamma is a constant.

Further, the loss function of the target position regression task is a loss function for the target position regression task in the SSD algorithm, and the loss function of the target classification task is a loss function for the target classification task in the SSD algorithm.

Further, the target sample is a face picture, and the target to be detected is a face.

According to a second aspect, there is provided in one embodiment an object detection apparatus comprising:

the sample acquisition module is used for acquiring a target sample to be detected and a training target sample set;

the target detection module is used for inputting the target sample to be detected into a pre-trained deep learning model to obtain a target to be detected in the target sample to be detected; the pre-trained depth model comprises a target classification task and a target position regression task, wherein the loss weight of the loss function of the target classification task and the loss weight of the loss function of the target position regression task are obtained through dynamic adjustment in the training process of the deep learning model.

According to a third aspect, an embodiment provides an electronic device, including:

a memory for storing a program;

and a processor, configured to implement the method according to the above embodiment by executing the program stored in the memory.

According to a fourth aspect, an embodiment provides a computer readable storage medium comprising a program executable by a processor to implement a method as described in the above embodiments.

According to the target detection method, the device, the electronic equipment and the storage medium, the loss weights of the loss functions of the target classification tasks and the loss weights of the loss functions of the target position regression tasks are dynamically adjusted in the training process of the deep learning model, so that more proper loss weights of the loss functions of the target classification tasks and the target position regression tasks can be set in different target detection or different stages of the same target detection process.

Drawings

FIG. 1 is a flow chart of a target detection method of the present application;

FIG. 2 is a flow chart of a method of object detection according to one embodiment;

FIG. 3 is a schematic diagram of the intersection of a real frame and a predicted frame of a target, wherein (a) is a schematic diagram of the intersection of a real frame and a predicted frame, and (b) is a schematic diagram of the intersection of a real frame and a predicted frame;

FIG. 4 is a block diagram of an object detection device according to an embodiment;

fig. 5 is a block diagram of an electronic device of an embodiment.

Detailed Description

The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.

In the embodiment of the application, in the process of training the deep learning model for detecting the target, the loss weight of the loss function of the target classification task and the loss weight of the loss function of the target position regression task are dynamically adjusted, so that the loss weights of the loss functions of the target classification task and the target position regression task can be set more appropriately, and the result of the trained deep learning model for detecting the target is more accurate.

Referring to fig. 1, fig. 1 is a flowchart of a target detection method, which includes steps S10 to S20.

Step S10, a target sample to be detected is obtained.

Step S20, inputting a target sample to be detected into a pre-trained deep learning model to obtain a target to be detected in the target sample to be detected; the pre-trained depth model comprises a target classification task and a target position regression task, wherein the loss weight of the loss function of the target classification task and the loss weight of the loss function of the target position regression task are obtained through dynamic adjustment in the training process of the deep learning model.

Referring to fig. 2, fig. 2 is a flowchart of an object detection method according to an embodiment, in which a face picture is taken as an example, the method is described, and the method includes steps S101 to S105.

Step S101, a training target sample set is acquired. In this embodiment, the training sample set is a picture having a face tag, where the face tag is used to identify a face and a position of the face in the picture, where the position of the face in the picture is represented by a real frame including the face. The method for training the face set obtained in this step may be a method commonly used in the prior art, for example, a method for monitoring face images in videos captured by a camera.

Step S102, a deep learning model is built, and a loss function of a target classification task and a target position regression task in the deep learning model is set. And taking the face pictures in the training target sample set as input, taking face features corresponding to the face pictures as labels, performing machine learning, and training to obtain a model function of the corresponding relation between the face pictures and the face features, wherein the model function is a deep learning model. In the training process, the deep learning model detects the target by continuously adjusting the position of the prediction frame in the picture, if the prediction frame is overlapped with the real frame in the picture, the detection is completed, the deep learning model comprises a cascade target classification task and a target position regression task, the target classification task is used for judging whether the target exists in the prediction frame or not, the target classification task comprises the existence or nonexistence of two results, and the target position regression task is used for judging whether the prediction frame is overlapped with the real frame or not, namely, the accurate position of the target is detected.

In this embodiment, the loss functions of the target classification task and the target position regression task in the deep learning model may adopt classification loss functions and regression loss functions in the SSD algorithm.

Wherein the classification loss function is:L ₁ for the classification loss value, x is the scoreClass probability values.

The regression loss function is:L ₂ for regression loss value, y is the position of the real box, +.>Is the position of the prediction box.

Step S103, inputting a training target sample set into the constructed deep learning model for training, and dynamically adjusting the loss weight of the loss function of the target classification task and the loss weight of the loss function of the target position regression task in the deep learning model training process.

In step S102, the loss weights of the classification loss function and the regression loss function in the SSD algorithm are both 1, and since the problem that the first half deep learning model in the training process has a target in judging the sample is large, the loss weight of the classification loss function needs to be increased; the problem of judging whether the sample has the target is not great in the second-half deep learning model of the training process, and how to more accurately position the target is more critical, and the loss weight of the regression loss function should be increased at this time, so the embodiment dynamically adjusts the loss weight of the loss function of the target classification task and the loss weight of the loss function of the target position regression task according to different stages.

Wherein dynamically adjusting the loss weight of the loss function of the target classification task during the deep learning model training process comprises: acquiring the accuracy of a training target sample set in a current target classification task in the deep learning model training process; and calculating the loss weight of the loss function of the current target classification task according to the accuracy of the training target sample set in the current target classification task. Because all training target samples in the training target sample set are provided with the face labels, the classification accuracy can be determined according to the face labels and the classification results output in the current target classification task, and the accuracy in different training periods is changed, for example, the classification accuracy is very low in the initial training period, and the classification accuracy is higher when the training period is reached to the later training period.

The present embodiment calculates the loss weight of the loss function of the current target classification task according to the following formula:

FL1＝-(1-A) ^γ log(A)

wherein FL1 is the loss weight of the loss function of the current target classification task, A is the accuracy of the target sample set in the current target classification task, gamma is a constant, and the value of gamma is 0-5.

Similarly, dynamically adjusting the loss weight of the loss function of the target position regression task in the deep learning model training process comprises: acquiring the cross ratio of a training target sample set in a current target position regression task in the deep learning model training process; the intersection ratio is the ratio of the intersection of the target real position area and the target predicted position area to the union of the target real position area and the target predicted position area; and calculating the loss weight of the loss function of the current target position regression task according to the intersection ratio of the target sample set in the current target position regression task.

In this embodiment, the target real position area is the target real frame 301, the target predicted position area is the target predicted frame 302, as shown in fig. 3, fig. 3 (a) is a schematic diagram of the intersection of the real frame 301 and the predicted frame 302, and fig. 3 (b) is a schematic diagram of the union of the real frame 301 and the predicted frame 302. Wherein the intersection ratio is the ratio of the intersection area of the real frame 301 and the predicted frame 302 to the union area of the real frame 301 and the predicted frame 302.

The present embodiment calculates the loss weight of the loss function of the current target position regression task according to the following formula:

FL2＝-(1-IoU) ^γ log(IoU)

wherein FL2 is the loss weight of the loss function of the current target position regression task, ioU is the cross ratio of the target sample set in the current target position regression task, gamma is a constant, and the value of gamma is 0-5.

Loss weights of the loss functions of the current target classification task and the target position regression task calculated based on the embodiment, then in the SDD algorithm in the embodimentClassification loss function and regression loss function of FL1 x L, respectively ₁ And FL2 x L ₂ In the training process of the deep learning model, FL1 and FL2 change along with the change of the accuracy A of the target classification task and the intersection ratio IoU of the target position regression task, so that the classification loss function and the regression loss function also change, and the deep learning model is trained more efficiently and accurately.

Step S104, obtaining a target sample to be detected. The target samples in this embodiment are face images, and the target sample to be detected is an image of whether a face exists or not and the position of the face to be detected.

Step S105, inputting the target sample to be detected into the trained deep learning model to obtain the target to be detected in the target sample to be detected. After the training of the deep learning model is completed, the target sample to be detected is input into the trained deep learning model, and then the target in the target sample to be detected, for example, the face in the face picture can be output. According to the embodiment, the target to be detected can be extracted from the target sample to be detected based on the detected target to be detected, the extracted target to be detected is compared with the preset target in similarity, and if the comparison result shows that the target to be detected is similar to the preset target, the target to be detected is marked as a label of the preset target, so that the track of the preset target can be tracked conveniently.

In this embodiment, when the deep learning model just starts training, the target classification task at this time is relatively inaccurate, so the accuracy a of the target classification task is low, the loss weight FL1 of the loss function of the current target classification task is large, and the deep learning model tends to optimize the target classification task preferentially; when the deep learning model is trained to a later stage, the target classification task is accurate, and the target position regression task is inaccurate, so that the intersection ratio of the target sample set in the current target position regression task is relatively lower than IoU, the loss weight of the loss function of the current target position regression task is larger than FL2, the loss weight of the target position regression task is increased, the deep learning model tends to optimize the target position regression task, and the loss weights of the target classification task and the target position regression task are dynamically adjusted.

Referring to fig. 4, fig. 4 is a block diagram of an object detection device according to an embodiment, where the object detection device includes: a sample acquisition module 101 and a target detection module 102.

The sample acquisition module 101 is configured to acquire a target sample to be detected.

The target detection module 102 is configured to input a target sample to be detected into a pre-trained deep learning model, so as to obtain a target to be detected in the target sample to be detected; the pre-trained depth model comprises a target classification task and a target position regression task, wherein the loss weight of the loss function of the target classification task and the loss weight of the loss function of the target position regression task are obtained through dynamic adjustment in the training process of the deep learning model.

The method comprises the steps of obtaining the loss weight of the loss function of the target classification task through dynamic adjustment in the training process of the deep learning model, wherein the method comprises the following steps: acquiring a training target sample set; constructing a deep learning model, and inputting a training target sample set into the deep learning model; acquiring the accuracy of a training target sample set in a current target classification task in the deep learning model training process; and calculating the loss weight of the loss function of the current target classification task according to the accuracy of the training target sample set in the current target classification task. The loss weight of the loss function of the target position regression task is obtained by dynamic adjustment in the training process of the deep learning model, and the method comprises the following steps: acquiring the cross ratio of a training target sample set in a current target position regression task in the deep learning model training process; the intersection ratio is the ratio of the intersection of the target real position area and the target predicted position area to the union of the target real position area and the target predicted position area; and calculating the loss weight of the loss function of the current target position regression task according to the intersection ratio of the training target sample set in the current target position regression task.

The functions implemented by each module in the apparatus of this embodiment correspond to the steps in the method of the foregoing embodiment, and specific implementation and technical effects thereof refer to descriptions of the steps of the method of the foregoing embodiment, which are not repeated herein.

Referring to fig. 5, an embodiment of the present application provides an electronic device. The electronic device comprises, among other things, a memory 201, a processor 202, an input/output interface 203. Wherein the memory 201 is used for storing a program. The processor 202 is configured to invoke the program stored in the memory 201 to execute the target detection method according to the embodiment of the present application. The processor 202 is connected to the memory 201, the input/output interface 203, respectively, such as via a bus system and/or other form of connection mechanism (not shown). The memory 201 may be used to store programs and data, including feature fusion programs involved in embodiments of the present application, and the processor 202 performs various functional applications of the electronic device and data processing by running the programs stored in the memory 201.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.

The foregoing description of the application has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the application pertains, based on the idea of the application.

Claims

1. A method of detecting an object, comprising:

obtaining a target sample to be detected;

inputting the target sample to be detected into a pre-trained deep learning model to obtain a target to be detected in the target sample to be detected; the pre-trained depth model comprises a target classification task and a target position regression task, wherein the loss weight of a loss function of the target classification task and the loss weight of a loss function of the target position regression task are obtained through dynamic adjustment in the training process of the deep learning model;

the loss weight of the loss function of the target classification task is obtained by dynamic adjustment in the training process of the deep learning model, and the method comprises the following steps:

acquiring a training target sample set;

calculating the loss weight of the loss function of the current target classification task according to the accuracy of the training target sample set in the current target classification task;

the loss weight of the loss function of the target position regression task is obtained through dynamic adjustment in the training process of the deep learning model, and the method comprises the following steps:

calculating the loss weight of the loss function of the current target position regression task according to the intersection ratio of the training target sample set in the current target position regression task;

the calculating the loss weight of the loss function of the current target classification task according to the accuracy of the training target sample set in the current target classification task comprises the following steps:

FL1＝-(1-A) ^γ log(A)

wherein FL1 is the loss weight of the loss function of the current target classification task, A is the accuracy of the training target sample set in the current target classification task, gamma is a constant, and the value of gamma is 0-5;

the calculating the loss weight of the loss function of the current target position regression task according to the intersection ratio of the training target sample set in the current target position regression task comprises the following steps:

FL2＝-(1-IoU) ^γ log(IoU)

wherein FL2 is the loss weight of the loss function of the current target position regression task, ioU is the cross ratio of the training target sample set in the current target position regression task, gamma is a constant, and the value of gamma is 0-5.

2. The method of claim 1, wherein the loss function of the destination location regression task is a loss function for the destination location regression task in an SSD algorithm, and the loss function of the destination classification task is a loss function for the destination classification task in the SSD algorithm.

3. The method for detecting an object according to claim 1, wherein the object sample is a face picture and the object to be detected is a face.

4. An object detection apparatus, comprising:

the sample acquisition module is used for acquiring a target sample to be detected;

the target detection module is used for inputting the target sample to be detected into a pre-trained deep learning model to obtain a target to be detected in the target sample to be detected; the pre-trained depth model comprises a target classification task and a target position regression task, wherein the loss weight of a loss function of the target classification task and the loss weight of a loss function of the target position regression task are obtained through dynamic adjustment in the training process of the deep learning model;

acquiring a training target sample set;

FL1＝-(1-A) ^γ log(A)

FL2＝-(1-IoU) ^γ log(IoU)

5. An electronic device, comprising:

a memory for storing a program;

a processor for implementing the method of any one of claims 1-3 by executing a program stored in the memory.

6. A computer readable storage medium comprising a program executable by a processor to implement the method of any one of claims 1-3.