WO2020155518A1

WO2020155518A1 - Object detection method and device, computer device and storage medium

Info

Publication number: WO2020155518A1
Application number: PCT/CN2019/091100
Authority: WO
Inventors: 巢中迪; 庄伯金; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-02-03
Filing date: 2019-06-13
Publication date: 2020-08-06
Also published as: CN110020592A; CN110020592B

Abstract

An object detection method and device, a computer device and a storage medium, which relate to the field of artificial intelligence. The object detection method comprises: acquiring an image to be detected (S1); and inputting the image to be detected into a target object detection model and carrying out object detection, so as to obtain an object detection result of the image to be detected, wherein model training steps used by the target object detection model comprise: acquiring a training sample; inputting the training sample into the object detection model and carrying out model training, the object detection model comprising a detection module, a classification module and a discrimination module; obtaining a detection loss generated by the detection module, a classification loss generated by the classification module and a discrimination loss generated by the discrimination module in the model training process; and updating the object detection model according to the detection loss, the classification loss and the discrimination loss, so as to obtain the target object detection model (S2). The accuracy of object detection may be effectively improved by using the described object detection method.

Description

Object detection method, device, computer equipment and storage medium

This application is based on the Chinese invention patent application filed on February 3, 2019 with the application number 201910108522.6, titled "Object Detection Model Training Method, Device, Computer Equipment and Storage Medium", and claims its priority.

【Technical Field】

This application relates to the field of artificial intelligence, and in particular to an object detection method, device, computer equipment and storage medium.

【Background technique】

Object detection is one of the classic problems in computer vision. Its task is to use a box to mark the position of the object in the image and give the object category. From the traditional artificially designed feature plus shallow classifier framework to the end-to-end detection framework based on deep learning, object detection is improving step by step. However, currently commonly used object detection methods such as YOLO (You Only Look Once) detection Methods, SSD (Single Shot Multi-Box Detection) and other detection methods still generally have the problem of low object detection accuracy.

[Content of the invention]

In view of this, the embodiments of the present application provide an object detection method, device, computer equipment, and storage medium to solve the problem that the object detection accuracy rate is still low.

In the first aspect, an embodiment of the present application provides an object detection method, including:

Obtain the image to be detected;

Input the to-be-detected image into a target object detection model for object detection to obtain an object detection result of the to-be-detected image, wherein the model training step adopted by the target object detection model includes:

Obtain training samples;

Inputting the training samples into an object detection model for model training, where the object detection model includes a detection module, a classification module, and a discrimination module;

Obtaining the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module in the model training process;

The object detection model is updated according to the detection loss, the classification loss, and the discrimination loss to obtain a target object detection model.

In the second aspect, an embodiment of the present application provides an object detection model training device, including:

The image acquisition module to be detected is used to acquire the image to be detected;

The object detection result acquisition module is used to input the to-be-detected image into a target object detection model for object detection to obtain the object detection result of the to-be-detected image, wherein the target object detection model adopts a training sample acquisition module, The model training module, the loss acquisition module and the target object detection model acquisition module obtain:

The training sample acquisition module is used to acquire training samples;

A model training module is used to input the training samples into an object detection model for model training, where the object detection model includes a detection module, a classification module, and a discrimination module;

A loss acquisition module, configured to acquire the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module during the model training process;

The target object detection model acquisition module is used to update the object detection model according to the detection loss, the classification loss and the discrimination loss to obtain a target object detection model.

In a third aspect, a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor. When the processor executes the computer-readable instructions, the foregoing The steps of the object detection method.

In a fourth aspect, an embodiment of the present application provides a computer non-volatile readable storage medium, including: computer readable instructions, which implement the steps of the above object detection method when the computer readable instructions are executed by a processor.

In the object detection method, device, computer equipment and storage medium provided in this application, the image to be detected is first obtained; then the image to be detected is input into the target object detection model for object detection, and the object detection result of the image to be detected is obtained , Wherein the target object detection model combines the detection loss, classification loss and discrimination loss to update the object detection model, which has better detection and classification effects, and can obtain detection results with higher accuracy.

【Explanation of drawings】

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative labor, other drawings can be obtained from these drawings.

FIG. 1 is a flowchart of an object-based detection method in an embodiment of the present application;

Figure 2 is a schematic diagram of an object-based detection device in an embodiment of the present application;

Fig. 3 is a schematic diagram of a computer device in an embodiment of the present application.

【detailed description】

In order to better understand the technical solutions of the present application, the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

It should be clear that the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms of "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings.

It should be understood that the term "and/or" used in this article is only a description of the same field of the associated object, indicating that there can be three relationships, for example, A and/or B can mean that A exists alone and A exists at the same time. And B, there are three cases of B alone. In addition, the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe the preset range, etc., these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from each other. For example, without departing from the scope of the embodiments of the present application, the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.

Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination" or "in response to detection". Similarly, depending on the context, the phrase "if determined" or "if detected (statement or event)" can be interpreted as "when determined" or "in response to determination" or "when detected (statement or event) )" or "in response to detection (statement or event)".

Fig. 1 shows a flow chart of the object detection method in this embodiment. The object detection method can be applied to an object detection system, and the object detection system can be used to realize the detection and classification of objects, and the object detection system can be specifically applied to computer equipment. Among them, the computer device is a device that can perform human-computer interaction with the user, including but not limited to devices such as computers, smart phones, and tablets. As shown in Figure 1, the object detection method includes the following steps:

S1: Obtain the image to be detected.

S2: Input the image to be detected into the target object detection model for object detection, and obtain the object detection result of the image to be detected. In step S2, the model training steps adopted by the target object detection model specifically include:

S10: Obtain training samples.

In one embodiment, training samples required for model training are obtained. Specifically, images related to a certain type of scene can be selected as training samples according to the needs of object detection. For example, the images saved in the driving recorder can be used as training samples. The pictures saved in the driving recorder can reflect the road conditions ahead during the driving of the vehicle. The image can be used as a training sample to train the target object detection model, so that the trained target object detection model can be The object is detected, so that the vehicle makes a preset response according to the received detection result. Understandably, it is necessary to pre-label the objects appearing in the image saved in the driving recorder before the model training (label the objects that need to be detected, and the objects that are not required for detection may not be labeled). In addition, it is also necessary to use deep neural A network (such as a convolutional neural network) extracts deep features of images belonging to the same category as the object to be annotated to identify the category of the object when the object detection model (including the corresponding deep neural network for extracting image features) detects.

S20: Input training samples into the object detection model for model training, where the object detection model includes a detection module, a classification module, and a discrimination module.

Among them, model training refers to the training of the target object detection model. The detection module is used to detect objects in the image, and the classification module is used to identify and classify the detected objects. The judgment module includes a first judgment module and/or a second judgment module. The first judgment module is used to judge the output of the detection module. Whether the result is correct or not, the second judgment module is used to judge whether the output result of the classification module is correct. Among them, the first judgment module and the second judgment module can exist at the same time, or only the second judgment module exists, and the second judgment module is used as the judgment Module.

In an embodiment, the training samples are input into the object detection model for model training, where the object detection model includes not only a detection model and a classification model, but also a discriminant model. Understandably, model training with training samples is the process of inputting training samples into the object detection model for detection.

Further, before step S20, it further includes:

S211: Obtain a detection model of the object to be processed, which includes a detection module and a classification module.

In one embodiment, the detection model of the object to be processed is obtained. It is understandable that the detection model of the object to be processed may specifically be a detection model such as YOLO (You Only Look Once) detection model and SSD (Single Shot Multi-Box Detection). These models include detection modules and classification modules. This embodiment is an improvement based on these object detection models to be processed.

S212: Add a discrimination module to the detection model of the object to be processed, where the discrimination module is used to discriminate the results output by the detection module and/or the classification module.

In one embodiment, a discrimination module is added on the original basis of the object detection model to be processed, so as to determine the output result of the object detection model to be processed. Adding a discrimination module can help to know the accuracy of the detection model of the object to be processed, so as to update the model according to the detection error of the object detection model to improve the accuracy of detection.

S213: Perform model initialization operation on the object detection model to be processed after adding the discrimination module to obtain the object detection model.

Among them, the initialization operation of the model refers to the initialization of the network parameters in the model, and the initial values of the network parameters may be preset based on experience.

Understandably, if the model is not initialized, the network parameters in the detection module and the classification module in the object detection model to be processed have actually been updated through multiple trainings. At this time, the discrimination module is then based on the detection module and/or classification The result of the module output will be discriminated and updated, because the detection module and the classification module have been learning for a long time at the beginning of the training. It is more thorough if the discrimination module is used for updating in a short time; on the contrary, the initialization After the operation, the discrimination module will make a judgment every time the detection module and/or classification module outputs a result during the training phase, and can update the network parameters according to the output result in time with the training process, so as to achieve better detection accuracy.

In steps S211-S213, an implementation method for obtaining an object detection model is provided. Specifically, a discrimination module is added to the object detection model to be processed, and the initialization operation of the model is performed, which is beneficial to improve the subsequent training and update of the object detection model. The detection accuracy of the target object detection model.

Further, in step S20, inputting the training samples into the object detection model for model training includes:

S221: Input the training sample, and extract the feature vector of the training sample through the object detection model.

In an embodiment, the object detection model includes a deep neural network for extracting feature vectors of training samples, which may specifically be a convolutional neural network. When the training samples are input to the object detection model, the object detection model will use the deep neural network to extract the feature vectors of the training samples to provide a technical basis for model training.

S222: Perform normalization processing on the feature vector to obtain a normalized feature vector, where the expression of the normalization process is: y=(x-MinValue)/(MaxValue-MinValue), y is the normalized feature vector , X is the feature vector, MaxValue is the maximum value of the feature value in the feature vector, and MinValue is the minimum value of the feature value in the feature vector.

Among them, the feature value in the feature vector specifically refers to the pixel value.

In one embodiment, the feature vector is normalized, that is, the feature value in the feature vector is normalized to the interval of [0,1]. Be appreciated that the pixel values of the image are ^28, ²¹² and ²¹⁶ the pixel value level, etc., a large number of different images may be contained in the pixel value, which makes the computational efficiency is low, so the use of the normalized manner The eigenvalues in the eigenvectors are compressed in the same range, so that the calculation efficiency is improved and the model training time is shorter.

S223: Perform model training on the object detection model according to the normalized feature vector.

In steps S221-S223, an implementation method for inputting training samples into the object detection model for model training is provided, and the extracted training sample features are normalized, and the feature values in the feature vector are compressed in the same range Within the interval, training time can be significantly shortened and training efficiency improved.

S30: Obtain the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module in the model training process.

Understandably, the detection module, the classification module, and the discrimination module respectively represent a function. When each function is implemented, it is possible to make a mistake, that is, the loss generated. According to the detection loss generated by the detection module, the classification The classification loss generated by the module and the discrimination loss generated by the discrimination module can be used for reference to help adjust the object detection model, so that the target object detection model can be used as much as possible when the detection module, classification module and discrimination module are used again to achieve functions. Errors to improve the accuracy of the target object detection model detection.

Further, in step S30, the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module are obtained during the model training process, which specifically include:

S31: During the model training process, the first training feature vector output by the detection module is obtained, and the preset detection loss function is used to calculate the loss between the first training feature vector and the pre-stored first label vector to obtain the detection loss.

Among them, the first training feature vector is the result output by the detection module, and the first label vector is a feature vector used to verify whether the first training feature vector is correct, and represents the real result.

In an embodiment, a preset detection loss function is used to calculate the loss between the first training feature vector and the pre-stored first label vector to obtain the detection loss, so as to update the network parameters of the model according to the detection loss. Specifically, the detection loss function may include a loss function for the predicted center coordinates, which is expressed as:

Among them, λ represents the adjustment factor, which is a preset parameter value, i represents the grid unit divided during detection, I represents the total number of grid units, j represents the predicted value of the bounding box, and J represents the total number of predicted values of the bounding box.

Indicates that when there is a target in the i-th grid unit, the j-th bounding box predicted value is valid for the prediction,

Take 1,; if there is no target in the i-th grid unit,

Take 0, (x _i , y _i ) represents the position of the predicted bounding box,

Represents the true position of the bounding box output by the training sample.

Understandably, object detection models such as yolo need to perform image segmentation on the input training samples to obtain I grid units. When predicting the object positions of the training samples, J prediction bounding boxes are obtained.

The obj stands for object, which means to detect the object.

Further, the detection loss function may also include a loss function about the width and height of the predicted bounding box, expressed as:

among them,

Represents the square root of the predicted width and the square root of the predicted height,

Represents the true value of the square root of the width and the square root of the height output by the training sample (other repeated parameters will not be explained, so as not to repeat them).

Understandably, the above provides two aspects of the center coordinates predicted by the model and the width and height of the predicted bounding box to measure the loss during detection, where the first training feature vector output by the detection module specifically includes (x _i , y _i ) and

The first label vector specifically includes

with

Through the detection loss function, the network parameters of the object detection model can be updated more accurately.

S32: In the model training process, the second training feature vector output by the classification module is obtained, and the preset classification loss function is used to calculate the loss between the second training feature vector and the pre-stored second label vector to obtain the classification loss.

Among them, the second training feature vector is the result output by the classification module, and the second label vector is a feature vector used to verify whether the second training feature vector is correct, and represents the real result.

In an embodiment, a preset classification loss function is used to calculate the loss between the second training feature vector and the pre-stored second label vector to obtain the classification loss, so as to update the network parameters of the model according to the classification loss. Specifically, the classification loss function can be expressed as:

Among them, i represents the grid unit divided during detection, I represents the total number of grid units,

Indicates that when there is a target in the i-th grid cell,

Take 1, otherwise

Take 0, p _i represents the predicted classification,

Represents the true situation of the classification output by the training sample. Among them, the second training feature vector output by the classification module specifically includes p _i , and the second label vector specifically includes

Through the classification loss function, the network parameters of the object classification model can be updated more accurately.

S33: In the model training process, obtain the third training feature vector output by the discrimination module, and calculate the discrimination loss by using the preset discriminant loss function according to the third training feature vector.

Among them, the third training feature vector is the result output by the discrimination module.

In an embodiment, according to the third training feature vector, a preset discriminant loss function is used to calculate the discriminant loss, so as to update the network parameters of the model according to the discriminant loss. Specifically, taking the discrimination module that only includes the second discrimination module as an example: the discrimination loss function can be specifically expressed as:

Wherein, I represents the total number of grid cells, i denotes grid cells obtained divided detection, classification D (p _i) denotes the prediction result of the discrimination output of the module. The discriminant loss function can reflect the loss generated by the discriminant module during training, so as to more accurately update the network parameters of the object discriminant model.

Understandably, the formulas in steps S31-S33 take the loss of one training sample as an example when calculating the detection loss, classification loss, and discrimination loss. When there are actually a large number of samples for model training, the values of each training sample will be The detection loss, classification loss, and discrimination loss arithmetic are added to obtain the total detection loss, classification loss, and discrimination loss, and the model is updated according to the total detection loss, classification loss and discrimination loss.

Steps S31-S33 provide specific implementations for obtaining detection loss, classification loss, and discrimination loss. The obtained detection loss, classification loss, and discrimination loss can accurately describe the loss generated during the training process, so that the model can be updated more accurately. accurate.

S40: Update the object detection model according to the detection loss, classification loss and discrimination loss to obtain the target object detection model.

Further, step S40 specifically includes:

S41: According to the detection loss, classification loss and discrimination loss, the back propagation algorithm is used to update the network parameters in the object detection model.

Among them, the back-propagation algorithm is a learning algorithm suitable for multi-layer neural networks under the guidance of a tutor. It is based on the gradient descent method.

In one embodiment, updating the object detection model using a back propagation algorithm can speed up the update and improve the training efficiency of model training. When the total loss of detection loss, classification loss and discrimination loss is large, the use of backpropagation algorithm has better results.

S42: When the change values of the network parameters are all less than the iterative stop threshold, stop updating the network parameters to obtain the target object detection model.

In one embodiment, when the change value of the network parameter is less than the stop iteration threshold, that is, when the change value of the network parameter is within the acceptable error range, the update process can be stopped, the training ends, and the detection accuracy rate is higher. High target object detection model.

Steps S41-S42 provide an implementation manner for updating the object detection model, which can quickly complete the update process and obtain a target object detection model with higher detection accuracy.

In the embodiment of the present application, the image to be detected is first acquired, and then the image to be detected is input into the target object detection model for object detection, and the object detection result of the image to be detected is obtained, wherein the target object detection model combines The detection loss, classification loss and discrimination loss jointly update the object detection model, so that the target object detection model obtained by training has better detection and classification effects.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

Based on the object detection methods provided in the embodiments, the embodiments of the present application further provide device embodiments that implement the steps and methods in the foregoing method embodiments.

Fig. 2 shows a principle block diagram of an object detection device corresponding to the object detection method in the embodiment one to one. As shown in FIG. 2, the object detection device includes a to-be-detected image acquisition module 10, an object detection result acquisition module 20, and also includes a training sample acquisition module 30, a model training module 40, a loss acquisition module 50, and a target object detection model acquisition module 60 . Among them, the realization functions of the to-be-detected image acquisition module 10, the object detection result acquisition module 20, the training sample acquisition module 30, the model training module 40, the loss acquisition module 50, and the target object detection model acquisition module 60 correspond to the object detection method in the embodiment The steps of are one-to-one correspondence, in order to avoid redundant description, this embodiment will not describe them one by one.

The to-be-detected image acquisition module 10 is used to acquire the to-be-detected image.

The object detection result acquisition module 20 is used to input the image to be detected into the target object detection model for object detection, and obtain the object detection result of the image to be detected.

The training sample acquisition module 30 is used to acquire training samples.

The model training module 40 is used to input training samples into the object detection model for model training, where the object detection model includes a detection module, a classification module and a discrimination module.

The loss acquisition module 50 is used to obtain the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module during the model training process.

The target object detection model acquisition module 60 is used to update the object detection model according to the detection loss, classification loss and discrimination loss to obtain the target object detection model.

Optionally, the object detection device further includes a detection model acquisition unit of the object to be processed, a discrimination module adding unit and an initialization unit.

The to-be-processed object detection model acquisition unit is used to acquire the to-be-processed object detection model. The to-be-processed object detection model includes a detection module and a classification module.

The discrimination module adding unit is used to add a discrimination module to the object detection model to be processed, wherein the discrimination module is used to discriminate the results output by the detection module and/or the classification module.

The initialization unit is used to initialize the object detection model to be processed after adding the discrimination module to obtain the object detection model.

Optionally, the model training module 40 includes a feature vector extraction unit, a normalized feature vector acquisition unit, and a model training unit.

The feature vector extraction unit is used to input training samples, and extract feature vectors of the training samples through the object detection model.

The normalized feature vector acquiring unit is used to normalize the feature vector to obtain the normalized feature vector, where the expression of the normalized processing is: y=(x-MinValue)/(MaxValue-MinValue) , Y is the normalized feature vector, x is the feature vector, MaxValue is the maximum value of the feature value in the feature vector, and MinValue is the minimum value of the feature value in the feature vector.

The model training unit is used to perform model training on the object detection model according to the normalized feature vector.

Optionally, the loss acquisition module 50 includes a detection loss acquisition unit, a classification loss acquisition unit, and a discrimination loss acquisition unit.

The detection loss acquisition unit is used to obtain the first training feature vector output by the detection module during the model training process, and calculate the loss between the first training feature vector and the pre-stored first label vector using a preset detection loss function, Get detection loss.

The classification loss acquisition unit is used to obtain the second training feature vector output by the classification module during the model training process, and calculate the loss between the second training feature vector and the pre-stored second label vector by using a preset classification loss function, Get classification loss.

The discrimination loss acquisition unit is used to obtain the third training feature vector output by the discrimination module during the model training process, and calculate the discrimination loss by using the preset discriminant loss function according to the third training feature vector.

Optionally, the target object detection model acquisition module 60 includes a network parameter update unit and a target object detection model acquisition unit.

The network parameter update unit is used to update the network parameters in the object detection model by using the back propagation algorithm according to the detection loss, classification loss and discrimination loss.

The target object detection model acquisition unit is used to stop updating the network parameters when the change values of the network parameters are less than the iterative stop threshold to obtain the target object detection model.

This embodiment provides a computer non-volatile readable storage medium. The computer non-volatile readable storage medium stores computer readable instructions. When the computer readable instructions are executed by a processor, the object detection method in the embodiment is implemented. To avoid repetition, I won’t repeat them here. Alternatively, the computer-readable instructions realize the functions of the modules/units in the object detection device in the embodiment when they are executed by the processor. To avoid repetition, details are not repeated here.

Fig. 3 is a schematic diagram of a computer device provided by an embodiment of the present application. As shown in FIG. 3, the computer device 70 of this embodiment includes: a processor 71, a memory 72, and computer-readable instructions 73 stored in the memory 72 and running on the processor 71, and the computer-readable instructions 73 are processed The object detection method in the embodiment is implemented when the device 71 is executed. To avoid repetition, it will not be repeated here. Alternatively, when the computer-readable instruction 73 is executed by the processor 71, the function of each model/unit in the object detection device in the embodiment is realized. In order to avoid repetition, it will not be repeated here.

The computer device 70 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device 70 may include, but is not limited to, a processor 71 and a memory 72. Those skilled in the art can understand that FIG. 3 is only an example of the computer device 70, and does not constitute a limitation on the computer device 70. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. For example, computer equipment may also include input and output devices, network access devices, buses, and so on.

The so-called processor 71 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 72 may be an internal storage unit of the computer device 70, such as a hard disk or memory of the computer device 70. The memory 72 may also be an external storage device of the computer device 70, such as a plug-in hard disk equipped on the computer device 70, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on. Further, the memory 72 may also include both an internal storage unit of the computer device 70 and an external storage device. The memory 72 is used to store computer readable instructions and other programs and data required by the computer equipment. The memory 72 can also be used to temporarily store data that has been output or will be output.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units, Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

The above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still compare the previous embodiments. The recorded technical solutions are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and shall be included in the application Within the scope of protection.

Claims

An object detection method, characterized in that the method includes:

Obtain the image to be detected;

Input the to-be-detected image into a target object detection model for object detection to obtain an object detection result of the to-be-detected image, wherein the model training step adopted by the target object detection model includes:

Obtain training samples;

Inputting the training samples into an object detection model for model training, where the object detection model includes a detection module, a classification module, and a discrimination module;

Obtaining the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module during the model training process;

The object detection model is updated according to the detection loss, the classification loss, and the discrimination loss to obtain a target object detection model.
The method according to claim 1, wherein before said inputting said training samples into an object detection model for model training, said method further comprises:

Acquiring a detection model of the object to be processed, where the detection model of the object to be processed includes the detection module and the classification module;

Adding the discrimination module to the object detection model to be processed, wherein the discrimination module is used to discriminate the results output by the detection module and/or the classification module;

Perform a model initialization operation on the object detection model to be processed after adding the discrimination module to obtain the object detection model.
The method according to claim 1, wherein said inputting said training samples into an object detection model for model training comprises:

Input the training sample, and extract the feature vector of the training sample through the object detection model;

The eigenvector is normalized to obtain a normalized eigenvector, where the expression of the normalization is: y=(x-MinValue)/(MaxValue-MinValue), y is the normalization Feature vector, x is the feature vector, MaxValue is the maximum value of the feature value in the feature vector, and MinValue is the minimum value of the feature value in the feature vector;

Model training is performed on the object detection model according to the normalized feature vector.
The method of claim 1, wherein the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module are obtained in the model training process ,include:

In the process of model training, the first training feature vector output by the detection module is obtained, and the preset detection loss function is used to calculate the loss between the first training feature vector and the pre-stored first label vector to obtain the Detection loss;

In the model training process, the second training feature vector output by the classification module is obtained, and the preset classification loss function is used to calculate the loss between the second training feature vector and the pre-stored second label vector to obtain the Classification loss

In the model training process, the third training feature vector output by the discrimination module is obtained, and the discrimination loss is calculated by using a preset discriminant loss function according to the third training feature vector.
The method according to any one of claims 1 to 4, wherein said updating said object detection model according to said detection loss, said classification loss and said discrimination loss to obtain a target object detection model, comprising:

According to the detection loss, the classification loss and the discrimination loss, use a back propagation algorithm to update the network parameters in the object detection model;

When the change values of the network parameters are all less than the iterative stop threshold, stop updating the network parameters to obtain the target object detection model.
An object detection device, characterized in that the device includes:

The image acquisition module to be detected is used to acquire the image to be detected;

The object detection result acquisition module is used to input the to-be-detected image into a target object detection model for object detection to obtain the object detection result of the to-be-detected image, wherein the target object detection model adopts a training sample acquisition module, The model training module, the loss acquisition module and the target object detection model acquisition module obtain:

The training sample acquisition module is used to acquire training samples;

A model training module is used to input the training samples into an object detection model for model training, where the object detection model includes a detection module, a classification module, and a discrimination module;

A loss acquisition module, configured to acquire the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module during the model training process;

The target object detection model acquisition module is used to update the object detection model according to the detection loss, the classification loss and the discrimination loss to obtain a target object detection model.
The device according to claim 6, characterized in that the object detection model training device further comprises a to-be-processed object detection model acquisition unit, a discrimination module adding unit and an initialization unit:

The object detection model acquisition unit to be processed is used to acquire the object detection model to be processed, and the object detection model to be processed includes a detection module and a classification module;

The discrimination module adding unit is used to add a discrimination module to the object detection model to be processed, wherein the discrimination module is used to discriminate the results output by the detection module and/or the classification module;

The initialization unit is used to initialize the object detection model to be processed after adding the discrimination module to obtain the object detection model.
The device according to claim 6, wherein the model training module comprises a feature vector extraction unit, a normalized feature vector acquisition unit, and a model training unit:

The feature vector extraction unit is configured to input the training sample, and extract the feature vector of the training sample through the object detection model;

The normalized feature vector obtaining unit is used to perform normalization processing on the feature vector to obtain a normalized feature vector, where the expression of the normalization processing is: y=(x-MinValue)/(MaxValue- MinValue), y is the normalized feature vector, x is the feature vector, MaxValue is the maximum value of the feature value in the feature vector, and MinValue is the minimum value of the feature value in the feature vector;

The model training unit is configured to perform model training on the object detection model according to the normalized feature vector.
The device according to claim 6, wherein the loss acquisition module comprises a detection loss acquisition unit, a classification loss acquisition unit, and a discrimination loss acquisition unit:

The detection loss acquisition unit is used to obtain the first training feature vector output by the detection module during the model training process, and calculate the loss between the first training feature vector and the pre-stored first label vector using a preset detection loss function, Get detection loss;

The classification loss acquisition unit is used to obtain the second training feature vector output by the classification module during the model training process, and calculate the loss between the second training feature vector and the pre-stored second label vector by using a preset classification loss function, Get classification loss;

The discrimination loss acquisition unit is used to obtain the third training feature vector output by the discrimination module during the model training process, and calculate the discrimination loss by using the preset discriminant loss function according to the third training feature vector.
The device according to claims 6-9, wherein the target object detection model acquisition module comprises a network parameter update unit and a target object detection model acquisition unit:

A network parameter update unit, configured to use a back propagation algorithm to update the network parameters in the object detection model according to the detection loss, the classification loss, and the discrimination loss;

The target object detection model acquisition unit is configured to stop updating the network parameters when the change values of the network parameters are less than the iterative stop threshold to obtain the target object detection model.
A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer-readable instructions as follows step:

Obtain the image to be detected;

Input the to-be-detected image into a target object detection model for object detection, and obtain an object detection result of the to-be-detected image, wherein the model training step adopted by the target object detection model includes:

Obtain training samples;

Inputting the training samples into an object detection model for model training, where the object detection model includes a detection module, a classification module, and a discrimination module;

Obtaining the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module during the model training process;

The object detection model is updated according to the detection loss, the classification loss, and the discrimination loss to obtain a target object detection model.
The computer device according to claim 11, wherein the processor further implements the following steps when executing the computer-readable instruction:

Acquiring a detection model of the object to be processed, where the detection model of the object to be processed includes the detection module and the classification module;

Adding the discrimination module to the object detection model to be processed, wherein the discrimination module is used to discriminate the results output by the detection module and/or the classification module;

Perform a model initialization operation on the object detection model to be processed after adding the discrimination module to obtain the object detection model.
The computer device according to claim 11, wherein the processor further implements the following steps when executing the computer-readable instruction:

Input the training sample, and extract the feature vector of the training sample through the object detection model;

The eigenvector is normalized to obtain a normalized eigenvector, where the expression of the normalization is: y=(x-MinValue)/(MaxValue-MinValue), y is the normalization Feature vector, x is the feature vector, MaxValue is the maximum value of the feature value in the feature vector, and MinValue is the minimum value of the feature value in the feature vector;

Model training is performed on the object detection model according to the normalized feature vector.
The computer device according to claim 11, wherein the processor further implements the following steps when executing the computer-readable instruction:

In the process of model training, the first training feature vector output by the detection module is obtained, and the preset detection loss function is used to calculate the loss between the first training feature vector and the pre-stored first label vector to obtain the Detection loss;

In the model training process, the second training feature vector output by the classification module is obtained, and the preset classification loss function is used to calculate the loss between the second training feature vector and the pre-stored second label vector to obtain the Classification loss

In the model training process, the third training feature vector output by the discrimination module is obtained, and the discrimination loss is calculated by using a preset discriminant loss function according to the third training feature vector.
14. The computer device according to claims 11-14, wherein the processor further implements the following steps when executing the computer-readable instructions:

According to the detection loss, the classification loss and the discrimination loss, use a back propagation algorithm to update the network parameters in the object detection model;

When the change values of the network parameters are all less than the iterative stop threshold, stop updating the network parameters to obtain the target object detection model.
A computer non-volatile readable storage medium, the computer non-volatile readable storage medium storing computer readable instructions, characterized in that, when the computer readable instructions are executed by a processor, the following steps are implemented:

Obtain the image to be detected;

Input the to-be-detected image into a target object detection model for object detection to obtain an object detection result of the to-be-detected image, wherein the model training step adopted by the target object detection model includes:

Obtain training samples;

Inputting the training samples into an object detection model for model training, where the object detection model includes a detection module, a classification module, and a discrimination module;

Obtaining the detection loss generated by the detection module, the classification loss generated by the classification module, and the discrimination loss generated by the discrimination module during the model training process;

The object detection model is updated according to the detection loss, the classification loss, and the discrimination loss to obtain a target object detection model.
The computer non-volatile readable storage medium according to claim 16, wherein when the computer readable instructions are executed by one or more processors, the one or more processors further implement the following steps :

Acquiring a detection model of the object to be processed, where the detection model of the object to be processed includes the detection module and the classification module;

Adding the discrimination module to the object detection model to be processed, wherein the discrimination module is used to discriminate the results output by the detection module and/or the classification module;

Perform a model initialization operation on the object detection model to be processed after adding the discrimination module to obtain the object detection model.
The computer non-volatile readable storage medium according to claim 16, wherein when the computer readable instructions are executed by one or more processors, the one or more processors further implement the following steps :

Input the training sample, and extract the feature vector of the training sample through the object detection model;

The eigenvector is normalized to obtain a normalized eigenvector, where the expression of the normalization is: y=(x-MinValue)/(MaxValue-MinValue), y is the normalization Feature vector, x is the feature vector, MaxValue is the maximum value of the feature value in the feature vector, and MinValue is the minimum value of the feature value in the feature vector;

Model training is performed on the object detection model according to the normalized feature vector.
The computer non-volatile readable storage medium according to claim 16, wherein when the computer readable instructions are executed by one or more processors, the one or more processors further implement the following steps :

In the process of model training, the first training feature vector output by the detection module is obtained, and the preset detection loss function is used to calculate the loss between the first training feature vector and the pre-stored first label vector to obtain the Detection loss;

In the model training process, the second training feature vector output by the classification module is obtained, and the preset classification loss function is used to calculate the loss between the second training feature vector and the pre-stored second label vector to obtain the Classification loss

In the model training process, the third training feature vector output by the discrimination module is obtained, and the discrimination loss is calculated by using a preset discriminant loss function according to the third training feature vector.
The computer non-volatile readable storage medium according to claims 16-19, wherein when the computer readable instructions are executed by one or more processors, the one or more processors also implement The following steps:

According to the detection loss, the classification loss and the discrimination loss, use a back propagation algorithm to update the network parameters in the object detection model;

When the change values of the network parameters are all less than the iterative stop threshold, stop updating the network parameters to obtain the target object detection model.