CN110827253A

CN110827253A - Training method and device of target detection model and electronic equipment

Info

Publication number: CN110827253A
Application number: CN201911047960.2A
Authority: CN
Inventors: 冯文雅; 宋丛礼
Original assignee: Reach Best Technology Co Ltd
Current assignee: Reach Best Technology Co Ltd; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-21

Abstract

The embodiment of the application provides a training method and a device of a target detection model and electronic equipment, wherein the method comprises the following steps: selecting a sample image from a preset training image set and inputting the sample image into a network model to be trained to obtain predicted position information and predicted probability of a predicted target in the sample image; calculating a first loss of the network model to be trained according to the predicted position information and a preset regression function; calculating a second loss of the network model to be trained according to the prediction probability and a preset focus loss function; calculating the target loss of the network model to be trained through a preset target loss function; and adjusting the network model to be trained according to the target loss to obtain the trained network model. According to the technical scheme of the application, the weight is corrected according to the proportion of different positive samples through the preset focus loss function, so that the trained model can adapt to the proportion of different positive samples, and the detection precision of the target in the image to be detected is improved.

Description

Training method and device of target detection model and electronic equipment

Technical Field

The present application relates to the field of information technologies, and in particular, to a method and an apparatus for training a target detection model, and an electronic device.

Background

Currently, target detection is a key technology in the field of computer vision, and has wide application in many application programs. Particularly, with the speed increase of mobile networks and the popularization of mobile devices in recent years, there is an increasing demand for running target detection algorithms in devices such as mobile devices. The target detection algorithm can detect targets such as people or objects in the image, and generally, the method obtains a plurality of target candidate regions to respectively judge whether each candidate region is a target or adopts a loss function of classification and regression mixing to detect the target.

However, by obtaining a plurality of target candidate regions to respectively judge whether each candidate region is a target, the required calculation amount is large, the consumed memory is large, especially for a mobile terminal, the calculation requirement is often difficult to meet, when a classification and regression mixed loss function is used for target detection, imbalance of targets in various samples often occurs due to the fact that the target and non-target proportions in images in the natural world are very different, so that the problem that the convergence condition of the loss function is poor, and the model detection precision after convergence is poor is caused.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for training a target detection model, and an electronic device, so as to solve the problem of poor detection accuracy of the target detection model. The specific technical scheme is as follows:

in a first aspect, the present application provides a method for training a target detection model, including:

selecting a sample image from a preset training image set, inputting the sample image into a network model to be trained, and detecting the sample image through the network model to be trained to obtain predicted position information and predicted probability of a predicted target in the sample image, wherein the sample image comprises a positive sample image and a negative sample image, the positive sample image comprises a preset detected target and a mark of the preset detected target, the negative sample image does not comprise the preset detected target, and the predicted probability is the probability that the predicted target is a correct target;

calculating a first loss of the network model to be trained according to the predicted position information and a preset regression function;

calculating a second loss of the network model to be trained according to the prediction probability and a preset focus loss function, wherein the preset focus loss function is expressed as a product of a weight and a classification loss function, and the weight is in negative correlation with the prediction probability;

calculating the target loss of the network model to be trained through a preset target loss function according to the first loss and the second loss;

and adjusting parameters of the network model to be trained according to the target loss until a preset ending condition is met, so as to obtain the trained network model.

Optionally, the preset regression function is:

L₁＝MSE

where MSE is the mean square error between the predicted location information and the correct location information in the sample image.

Optionally, the preset focus loss function is:

L₂＝-(1-p)^αlog(p)

wherein L is₂For the second loss, p is the prediction probability and α is the weight index.

Optionally, the target loss function is preset to

L₃＝w*L₁+L₂+b

Wherein L is₁Is the first loss, L₂For the second loss, L₃For the target loss of the network model, w and b are both preset learnable parameters.

Optionally, the preset ending condition includes:

the accuracy of target detection of the network model to be trained reaches a preset accuracy threshold;

and/or the training times of the network model to be trained reach a preset training time threshold.

In a second aspect, the present application provides a target detection method, including:

acquiring an image to be detected;

and inputting the image to be detected into a pre-trained network model to obtain a detection result, wherein the pre-trained network model is obtained by the training method of any one target detection model.

In a third aspect, the present application provides a training apparatus for a target detection model, including:

the model training module is used for selecting a sample image in a preset training image set, inputting the sample image into a network model to be trained, detecting the sample image through the network model to be trained to obtain the predicted position information and the predicted probability of a predicted target in the sample image, wherein the sample image comprises a positive sample image and a negative sample image, the positive sample image comprises a preset detected target and a mark of the preset detected target, the negative sample image does not comprise the preset detected target, and the predicted probability is the probability that the predicted target is a correct target;

the loss calculation module is used for calculating first loss of the network model to be trained according to the predicted position information and a preset regression function;

the second loss module is used for calculating second loss of the network model to be trained according to the prediction probability and a preset focus loss function, wherein the preset focus loss function is expressed as a product of the weight and the classification loss function, and the weight is in negative correlation with the prediction probability;

the target loss module is used for calculating the target loss of the network model to be trained through a preset target loss function according to the first loss and the second loss;

and the parameter adjusting module is used for adjusting the parameters of the network model to be trained according to the target loss until a preset ending condition is met, so that the trained network model is obtained.

Optionally, the preset regression function is:

L₁＝MSE

Optionally, the preset focus loss function is:

L₂＝-(1-p)^αlog(p)

Optionally, the target loss function is preset to

L₃＝w*L₁+L₂+b

Optionally, the preset ending condition includes:

In a fourth aspect, the present application provides an object detection apparatus, comprising:

the image acquisition module is used for acquiring an image to be detected;

and the sample probability module is used for inputting the image to be detected into a pre-trained network model to obtain a detection result, wherein the pre-trained network model is obtained by the training method of any one target detection model.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of any one of the above object detection models.

According to a sixth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any of the above object detection methods.

According to a seventh aspect of the embodiments of the present disclosure, there is provided a storage medium,

the instructions in the storage medium, when executed by a processor of the electronic device, enable the electronic device to perform any of the above-described methods of training a target detection model.

According to an eighth aspect of the embodiments of the present disclosure, there is provided a storage medium,

the instructions in the storage medium, when executed by a processor of the electronic device, enable the electronic device to perform any of the object detection methods described above.

According to a ninth aspect of embodiments of the present disclosure, there is provided a computer program product which, when executed by a computer, enables the computer to perform any of the above-described methods of training an object detection model.

According to a tenth aspect of embodiments of the present disclosure, there is provided a computer program product which, when executed by a computer, enables the computer to perform any of the object detection methods described above.

The embodiment of the application provides a training method and a device of a target detection model and electronic equipment, wherein the method comprises the following steps: selecting a sample image from a preset training image set, inputting the sample image into a network model to be trained, and detecting the sample image through the network model to be trained to obtain predicted position information and predicted probability of a predicted target in the sample image; calculating a first loss of the network model to be trained according to the predicted position information and a preset regression function; calculating a second loss of the network model to be trained according to the prediction probability and a preset focus loss function; calculating the target loss of the network model to be trained through a preset target loss function according to the first loss and the second loss; and adjusting parameters of the network model to be trained according to the target loss until a preset ending condition is met, so as to obtain the trained network model. According to the technical scheme, the network model to be trained is corrected through the preset regression function and the preset focus loss function, the trained network model is obtained through the preset focus loss function and the weight correction according to the proportion of different positive samples, and therefore the model obtained through training can adapt to the proportion of different positive samples, and the detection precision of the target in the image to be detected is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow diagram illustrating a method of training a target detection model in accordance with an exemplary embodiment;

FIG. 2 is another flow diagram illustrating a method of training an object detection model in accordance with an exemplary embodiment;

FIG. 3 is a block diagram of a training apparatus for a target detection model, according to an exemplary embodiment;

FIG. 4 is another block diagram illustrating an apparatus for training an object detection model in accordance with an exemplary embodiment;

FIG. 5 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment;

FIG. 6 is a schematic diagram of another electronic device shown in accordance with an exemplary embodiment;

FIG. 7 is a schematic diagram of a storage medium shown in accordance with an exemplary embodiment;

FIG. 8 is a schematic diagram of another storage medium shown in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The training method of the target detection model according to the embodiment of the disclosure is directed to a model in an intelligent terminal device, and therefore, the training method can be executed through the intelligent terminal device, specifically, the intelligent terminal device may be a device specially used for model training, or a device for model training and image detection through a model obtained through self training, and the intelligent terminal device may be a computer or a server.

Fig. 1 is a flowchart illustrating a method for training an object detection model according to an exemplary embodiment, where the method for training an object detection model, as shown in fig. 1, includes the following steps.

In step S11, a sample image is selected from the preset training image set and input into the network model to be trained, and the sample image is detected by the network model to be trained, so as to obtain the predicted position information and the predicted probability of the predicted target in the sample image.

The sample image comprises a positive sample image and a negative sample image, the positive sample image comprises a preset detection target and a mark of the preset detection target, the negative sample image does not comprise the preset detection target, and the prediction probability is the probability that the prediction target is the correct target. The predicted position information is the position of a predicted target in the sample image, the predicted target is a predicted target judged by the network model to be trained, and the predicted probability is the probability that the predicted target is a correct target.

The preset training image set may include a plurality of sample images, and the sample images may be various types of images, for example, the sample images may be directly acquired independent images, or may be a certain frame in an acquired video. Meanwhile, the format of the image to be detected is not limited in the embodiment of the application, and the format of the image can be various types of formats, such as jpg (joint Photographic Experts group), png (portable Network graphics), and the like.

For example, when the network model to be trained is used to detect a face, the positive samples used are pictures containing the face, and the negative samples are pictures not containing the face. When the face detection is carried out on the face and the faces of a plurality of animals in the image set, the face is a correct target, and the faces of the animals are incorrect targets. The obtained predicted position is the position of the face in the sample image. The prediction probability is the probability that the detected target is the correct target calculated by the network model to be trained. For example, the network model to be trained calculates the probability that the detected target is the target face.

The method comprises the steps of selecting a sample image from a preset verification image set, inputting the sample image into a network model to be trained, detecting the sample image through the network model to be trained to obtain predicted position information and predicted probability of a predicted target in the sample image, and predicting the predicted target and the position of the predicted target in the image through a preset regression function.

Optionally, the preset regression function is:

L₁＝MSE

In step S12, a first loss of the network model to be trained is calculated according to the predicted location information and a preset regression function.

The predetermined regression function may be a least square loss function:

L₁＝MSE

where MSE is the mean square error. The least square loss can be obtained by calculating the prediction position information of the sample image by the least square loss function.

In step S13, a second loss of the network model to be trained is calculated according to the prediction probability and the preset focus loss function.

Wherein the predetermined focus loss function is expressed as a product of a weight and a classification loss function, the weight being inversely related to the prediction probability.

Optionally, the preset focus loss function is:

L₂＝-(1-p)^αlog(p)

In step S14, a target loss of the network model to be trained is calculated according to the first loss and the second loss by a preset target loss function.

Optionally, the target loss function is preset to

L₃＝w*L₁+L₂+b

And judging the network model to be trained according to the detection result of the verification image through the preset target loss function. The larger the loss calculated by the preset target loss function is, the worse the quality of model memorability target detection is, otherwise, the smaller the loss is, the better the quality of the model for target detection is, that is, the higher the precision is. Optionally, in the model training process, the initial values of w and b may be set to 1 and 0, respectively.

In step S15, parameters of the network model to be trained are adjusted according to the target loss until a preset end condition is satisfied, so as to obtain a trained network model.

Optionally, the weight includes a weight index, and the adjusting of the parameters of the network model to be trained may include adjusting the weight index of the focus loss function based on a principle of reducing target loss.

Wherein the weight index of the focus loss function is adjusted based on a principle of reducing the target loss. The weights (1-p) of the positive and negative samples are obtained because the proportion of the positive and negative samples is different in the application^αIn contrast, by adjusting the weight index of the focus loss function, the weight of the positive sample can be increased, and the weight of the negative sample can be decreased, so that the training of the model focuses on samples that cannot be correctly classified rather than a large number of samples that can be correctly classified.

According to the target detection method, the network model to be trained is corrected through the preset regression function and the preset focus loss function, the trained network model is obtained through the preset focus loss function according to the weight correction of different positive sample proportions, and therefore the trained model can adapt to different positive sample proportions, and the detection accuracy of the target in the image to be detected is improved.

Fig. 2 is another flowchart illustrating a method for training an object detection model according to an exemplary embodiment, where the method for training an object detection model, as shown in fig. 2, includes the following steps.

The sample image comprises a positive sample image and a negative sample image, the positive sample image comprises a preset detection target and a mark of the preset detection target, the negative sample image does not comprise the preset detection target, and the prediction probability is the probability that the prediction target is the correct target.

Wherein the predetermined focus loss function is expressed as a product of a weight and a classification loss function, the weight being inversely related to the prediction probability

Optionally, the preset focus loss function is:

L₂＝-(1-p)^αlog(p)

Optionally, the target loss function is preset to

L₃＝w*L₁+L₂+b

In step S21, adjusting parameters of the network model to be trained according to the target loss until the accuracy of target detection of the network model to be trained reaches a preset accuracy threshold; and/or the training times of the network model to be trained reach a preset training time threshold value, so as to obtain the trained network model.

Wherein, the accuracy of the target detection of the network model to be trained reaches a preset accuracy threshold, which can be the accuracy of the target detection of the network model to be trained reaching the preset accuracy threshold; and/or judging that the training is finished if the training times of the network model to be trained reach a preset training time threshold value, and obtaining the trained network model. Or the training times of the trained network model reach a preset training time threshold, for example, after 500 times of training, the training is determined to be finished, and the trained network model is obtained.

The embodiment of the application also provides a target detection method, which comprises the following steps:

in step a, an image to be detected is acquired.

And in the step B, inputting the image to be detected into a pre-trained network model to obtain a detection result.

The pre-trained network model is obtained by the training method of any one of the target detection models.

According to the target detection method, the network model to be trained is corrected through the preset regression function and the preset focus loss function, the trained network model is obtained through the preset focus loss function according to the weight correction of different positive sample proportions, and therefore the obtained network model is adaptive to the different positive sample proportions, and the detection precision of the target in the image to be detected is improved.

FIG. 3 is a block diagram illustrating an apparatus for training an object detection model in accordance with an exemplary embodiment. Referring to fig. 3, the apparatus includes a model training module 121, a loss calculating module 122, a second loss module 123, a target loss module 124, and a parameter adjusting module 125.

The model training module 121 is configured as a model training module, and is configured to select a sample image from a preset training image set, input the sample image into a network model to be trained, and detect the sample image through the network model to be trained to obtain predicted position information and predicted probability of a predicted target in the sample image.

The loss calculating module 122 is configured to calculate a first loss of the network model to be trained according to the predicted location information and a preset regression function.

The second loss module 123 is configured to calculate a second loss of the network model to be trained according to the prediction probability and a preset focus loss function.

The target loss module 124 is configured to calculate a target loss of the network model to be trained according to the first loss and the second loss by using a preset target loss function.

The parameter adjusting module 125 is configured to adjust parameters of the network model to be trained according to the target loss until a preset end condition is met, so as to obtain a trained network model.

Optionally, the apparatus further comprises:

the system comprises a sample acquisition module, a training image acquisition module and a training image analysis module, wherein the sample acquisition module is used for acquiring a preset training image set and a network model to be trained;

and the model training module is used for selecting unselected sample images from a preset training image set, inputting the unselected sample images into the network model to be trained, and training the network model to be trained.

Optionally, the preset focus loss function is:

L₂＝-(1-p)^αlog(p)

where L2 is the second loss, p is the prediction probability, α is the weight index.

Optionally, the target loss function is preset to

L₃＝w*L₁+L₂+b

Optionally, the preset ending condition includes:

Through the target detection device provided by the embodiment of the application, the network model to be trained is corrected through the preset regression function and the preset focus loss function, and the trained network model is obtained by correcting the weight through the preset focus loss function according to the proportion of different positive samples, so that the trained model can adapt to the proportion of different positive samples, and the detection precision of the target in the image to be detected is improved.

FIG. 4 is another block diagram illustrating an apparatus for training an object detection model in accordance with an exemplary embodiment. Referring to fig. 4, the apparatus includes a model training module 121, a loss calculation module 122, a second loss module 123, a target loss module 124, and a parameter adjustment submodule 131.

The parameter adjusting submodule 131 is configured to adjust parameters of the network model to be trained according to the target loss until the accuracy of target detection of the network model to be trained reaches a preset accuracy threshold; and/or the training times of the network model to be trained reach a preset training time threshold value, so as to obtain the trained network model.

The embodiment of the application also provides a target detection device, which comprises an image acquisition module 1 and a sample probability module 2.

The image acquisition module 1 is configured for acquiring an image to be detected.

The sample probability module 2 is configured to input the image to be detected into a pre-trained network model to obtain a detection result, where the pre-trained network model is obtained by a training device of any one of the above target detection models.

The pre-trained network model is obtained by any network model training method.

Through the target detection device, the network model to be trained is corrected through the preset regression function and the preset focus loss function, the trained network model is obtained through the preset focus loss function according to the weight correction of different positive samples, and therefore the obtained network model is adaptive to different positive sample proportions, and the detection precision of the target in the image to be detected is improved.

FIG. 5 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment. For example, the apparatus 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the apparatus 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, input/output (I/O) interface 512, sensor component 514, and communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the apparatus 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 500.

The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC) configured to receive external audio signals when apparatus 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the apparatus 500, the sensor assembly 514 may also detect a change in the position of the apparatus 500 or a component of the apparatus 500, the presence or absence of user contact with the apparatus 500, orientation or acceleration/deceleration of the apparatus 500, and a change in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communication between the apparatus 500 and other devices in a wired or wireless manner. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described training method of the target detection model.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the apparatus 500 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

FIG. 6 is a schematic diagram of another electronic device shown in accordance with an example embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the apparatus 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described object detection methods.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

FIG. 7 is a schematic diagram of a storage medium shown in accordance with an exemplary embodiment. For example, the apparatus 700 may be provided as a server. Referring to fig. 7, apparatus 700 includes a processing component 722 that further includes one or more processors and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Further, the processing component 722 is configured to execute instructions to perform the above-described method of training the target detection model.

The apparatus 700 may also include a power component 726 configured to perform power management of the apparatus 700, a wired or wireless network interface 750 configured to connect the apparatus 700 to a network, and an input output (I/O) interface 758. The apparatus 700 may operate based on an operating system stored in memory 732, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

FIG. 8 is a schematic diagram of another storage medium shown in accordance with an exemplary embodiment. For example, the apparatus 800 may be provided as a server. Referring to FIG. 8, the apparatus 800 includes a processing component 822, which further includes one or more processors, and memory resources, represented by memory 832, for storing instructions, such as applications, that are executable by the processing component 822. The application programs stored in memory 832 may include one or more modules that each correspond to a set of instructions. Further, the processing component 822 is configured to execute instructions to perform the above-described object detection method.

The device 800 may also include a power component 826 configured to perform power management of the device 800, a wired or wireless network interface 850 configured to connect the device 800 to a network, and an input/output (I/O) interface 858. The apparatus 800 may operate based on an operating system stored in the memory 832, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

There is also provided a computer program product according to an embodiment of the present disclosure, which, when executed by a computer, enables the computer to execute any one of the above-mentioned training methods for an object detection model.

There is also provided another computer program product according to an embodiment of the present disclosure, which, when executed by a computer, enables the computer to perform any one of the object detection methods described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for training a target detection model, comprising:

selecting a sample image from a preset training image set, inputting the sample image into a network model to be trained, and detecting the sample image through the network model to be trained to obtain predicted position information and predicted probability of a predicted target in the sample image, wherein the sample image comprises a positive sample image and a negative sample image, the positive sample image comprises a preset detection target and a mark of the preset detection target, the negative sample image does not comprise the preset detection target, and the predicted probability is the probability that the predicted target is a correct target;

calculating a second loss of the network model to be trained according to the prediction probability and a preset focus loss function, wherein the preset focus loss function is expressed as a product of a weight and a classification loss function, and the weight is inversely related to the prediction probability;

2. The method of claim 1, wherein the preset focus loss function is:

L₂＝-(1-p)^αlog(p)

3. The method of claim 1, wherein the predetermined target loss function is

L₃＝w*L₁+L₂+b

Wherein L is₁Is the first loss, L₂For the second loss, L₃And w and b are preset learnable parameters for the target loss of the network model.

4. A method of object detection, comprising:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained network model to obtain a detection result, wherein the pre-trained network model is obtained by the training method of the target detection model according to any one of claims 1 to 3.

5. An apparatus for training an object detection model, comprising:

the model training module is used for selecting a sample image in a preset training image set, inputting the sample image into a network model to be trained, detecting the sample image through the network model to be trained, and obtaining the predicted position information and the predicted probability of a predicted target in the sample image, wherein the sample image comprises a positive sample image and a negative sample image, the positive sample image comprises a preset detection target and a mark of the preset detection target, the negative sample image does not comprise the preset detection target, and the predicted probability is the probability that the predicted target is a correct target;

a second loss module, configured to calculate a second loss of the network model to be trained according to the prediction probability and a preset focus loss function, where the preset focus loss function is expressed as a product of a weight and a classification loss function, and the weight is inversely related to the prediction probability;

6. An object detection device, comprising:

the image acquisition module is used for acquiring an image to be detected;

a sample probability module, configured to input the image to be detected into a pre-trained network model to obtain a detection result, where the pre-trained network model is obtained by the training method of the target detection model according to any one of claims 1 to 3.

7. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of training of an object detection model as claimed in any one of claims 1 to 3.

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the object detection method of claim 4.

9. A storage medium in which instructions, when executed by an electronic device, enable the electronic device to perform a method of training an object detection model as claimed in any one of claims 1 to 3.

10. A storage medium in which instructions, when executed by an electronic device, enable the electronic device to perform the object detection method of claim 4.