CN110633717A

CN110633717A - Training method and device for target detection model

Info

Publication number: CN110633717A
Application number: CN201810641844.2A
Authority: CN
Inventors: 张立成
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-06-21
Filing date: 2018-06-21
Publication date: 2019-12-31

Abstract

The invention discloses training and a device of a target detection model, and relates to the technical field of computers. One embodiment of the method comprises: inputting the preselected image into a target detection model to determine an alternative sample; each candidate sample has a first total cost function, and the first total cost function comprises a classification cost and a regression cost; weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated; selecting training samples from all the alternative samples according to a preset selection rule according to the second total cost function of each alternative sample; and training the selected training samples through a back propagation algorithm and a gradient descent algorithm. According to the embodiment, some samples which are difficult to learn can be well learned, the model learning effect is improved, and therefore the model accuracy is higher.

Description

Training method and device for target detection model

Technical Field

The invention relates to the technical field of computers, in particular to a training method and a training device for a target detection model.

Background

Currently, common models for target detection mainly include a YOLO (a target detection model), an SSD (Single Shot multi box Detector), and a fast RCNN (Faster convolutional neural network based on image regions). Among them, the fast RCNN model showed the best performance.

When the existing target detection model training method is used for training models such as fast RCNN, part of samples are difficult to learn and have poor learning effect, and the reason is that some samples cannot be well learned simultaneously when being used as training samples for training; and other samples are often not selected when training samples are selected, so that the sample learning difficulty is increased, and some samples which are difficult to learn cannot be well learned.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

some samples in the existing target detection model training are difficult to learn and have poor learning effect.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for training a target detection model, so that some samples that are difficult to learn can be well learned, and a model learning effect is improved, so that model accuracy is higher.

To achieve the above object, according to an aspect of an embodiment of the present invention, a method for training an object detection model is provided.

A method of training an object detection model, comprising: inputting a preselected image into a target detection model to determine alternative samples for training the target detection model; wherein each candidate sample has a respective first total cost function, and the first total cost function comprises a classification cost and a regression cost; weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated; according to the second total cost function of each alternative sample, selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule; and training the selected training sample through a back propagation algorithm and a gradient descent algorithm to obtain a trained target detection model.

Optionally, the step of inputting a preselected image into a target detection model to determine an alternative sample for training the target detection model comprises: inputting a preselected image into the object detection model to perform: extracting features from the preselected image to generate a first feature map; generating a plurality of detection frames according to the first feature map; performing down-sampling and feature extraction processing on the first feature map corresponding to each detection frame to obtain a second feature map; and carrying out classification and regression processing on the second feature map to determine an alternative sample.

Optionally, the target detection model is a fast RCNN target detection model or an improved fast RCNN target detection model, and the improved fast RCNN target detection model constructs a feature extraction network by using a lightweight convolutional neural network.

Optionally, the lightweight convolutional neural network is a ThiNet network and/or a SqueezeNet network.

Optionally, the weight corresponding to each regression cost is dynamically updated according to the following method: when the training times reach the preset weight updating times, acquiring the classification cost and the regression cost of each alternative sample during the training; and updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each alternative sample during the current training.

Optionally, the step of updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each candidate sample during the secondary training includes: respectively calculating the ratio of the classification cost and the regression cost of each alternative sample during the current training; and updating the weight value corresponding to each regression cost according to each ratio.

Optionally, the step of selecting a preset number of candidate samples from all the candidate samples as training samples according to a preset selection rule according to the second total cost function of each candidate sample includes: sorting the second total cost function of each alternative sample according to the magnitude of the function value; and taking the preset number of alternative samples with the maximum function value of the second total cost function as training samples.

According to another aspect of the embodiments of the present invention, there is provided a training apparatus for an object detection model.

A training apparatus for an object detection model, comprising: the alternative sample determining module is used for inputting a preselected image into a target detection model so as to determine an alternative sample for training the target detection model; wherein each candidate sample has a respective first total cost function, and the first total cost function comprises a classification cost and a regression cost; the cost function processing module is used for weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated; the training sample selection module is used for selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule according to the second total cost function of each alternative sample; and the training execution module is used for training the selected training sample through a back propagation algorithm and a gradient descent algorithm to obtain a trained target detection model.

Optionally, the alternative sample determination module is further configured to: inputting a preselected image into the object detection model to perform: extracting features from the preselected image to generate a first feature map; generating a plurality of detection frames according to the first feature map; performing down-sampling and feature extraction processing on the first feature map corresponding to each detection frame to obtain a second feature map; and carrying out classification and regression processing on the second feature map to determine an alternative sample.

Optionally, the system further includes a weight value updating module, configured to: when the training times reach the preset weight updating times, acquiring the classification cost and the regression cost of each alternative sample during the training; and updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each alternative sample during the current training.

Optionally, the weight value updating module includes an updating sub-module, configured to: respectively calculating the ratio of the classification cost and the regression cost of each alternative sample during the current training; and updating the weight value corresponding to each regression cost according to each ratio.

Optionally, the training sample selecting module is further configured to: sorting the second total cost function of each alternative sample according to the magnitude of the function value; and taking the preset number of alternative samples with the maximum function value of the second total cost function as training samples.

According to yet another aspect of an embodiment of the present invention, an electronic device is provided.

An electronic device, comprising: one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for training an object detection model provided by the present invention.

According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.

A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training an object detection model according to the invention.

One embodiment of the above invention has the following advantages or benefits: inputting the preselected image into a target detection model to determine alternative samples for training the target detection model; each candidate sample has a respective first total cost function, and the first total cost function comprises a classification cost and a regression cost; weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated; and selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule according to the second total cost function of each alternative sample. The method can enable some samples which are difficult to learn well, improve the learning effect of the model, and enable the precision of the model to be higher.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main steps of a training method of an object detection model according to an embodiment of the present invention;

FIG. 2 is a block diagram of an improved fast RCNN target detection model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the main blocks of a training apparatus for an object detection model according to an embodiment of the present invention;

FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of main steps of a training method of an object detection model according to an embodiment of the present invention.

As shown in fig. 1, the training method of the target detection model according to the embodiment of the present invention mainly includes the following steps S101 to S104.

Step S101: the pre-selected images are input into a target detection model to determine candidate samples for training the target detection model.

The preselected images are a preselected number of input images that include detection targets.

Step S101 may specifically include: inputting a preselected image into the object detection model to perform: extracting features from the preselected image to generate a first feature map; generating a plurality of detection frames according to the first feature map; performing down-sampling and feature extraction processing on the first feature map corresponding to each detection frame to obtain a second feature map; and carrying out classification and regression processing on the second feature map to determine an alternative sample.

The candidate samples are embodied as detection frames, and each candidate sample has a respective first total cost function, and the first total cost function includes a classification cost and a regression cost.

The improved fast RCNN target detection model is a target detection model which is established by improving the embodiment of the invention on the basis of a fast RCNN framework, wherein the improved fast RCNN target detection model utilizes a lightweight convolutional neural network to establish a feature extraction network. The improved Faster RCNN object detection model is also described in detail below.

The lightweight convolutional neural network may be a ThiNet network and/or a SqueezeNet network.

Step S102: and weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated.

According to the embodiment of the invention, the classification cost and the weighted regression cost in the total cost function of each alternative sample can be balanced by weighting each regression cost without being biased to any party, and the classification cost and the regression cost in the total cost function of each alternative sample can be always kept balanced by weighting the regression cost by adopting the dynamically updated weight, so that the balancing effect is further improved.

The weight corresponding to each regression cost can be dynamically updated according to the following method:

when the training times reach the preset weight updating times, acquiring the classification cost and the regression cost of each alternative sample during the training; and updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each alternative sample.

Specifically, the step of updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each candidate sample during the current training may include: respectively calculating the ratio of the classification cost and the regression cost of each alternative sample during the current training; and updating the weight corresponding to each regression cost according to each ratio.

The preset weight updating times can be set according to needs, for example, the total training times of the model in the embodiment of the present invention can be set to 20 ten thousand times, and the preset weight updating times can be set to be integer multiples of 5000, that is, the weight corresponding to each regression cost is updated every time the model is trained for 5000 times. During the first training, the ratio of the classification cost (denoted as LA1) and the regression cost (denoted as LB1) of each candidate sample in the current training, namely LA1/LB1, is used as the initial value of the weight corresponding to each regression cost, and during the subsequent training, when the training frequency is equal to the integral multiple of 5000, for example, the 5000 th training, the ratio of the classification cost (denoted as LA5000) and the regression cost (denoted as LB5000) of each candidate sample in the training, namely LA5000/LB5000, is used as a new weight corresponding to each regression cost, and the weight set last time (namely, the weight corresponding to each regression cost set for the first time) is updated according to the new weight.

In the past model training, the weight value corresponding to each regression cost is automatically updated regularly, so that the classification cost and the weighted regression cost in the total cost function of each alternative sample can be always kept balanced, namely, the classification cost and the regression cost in the total cost function of each candidate sample are always in the same order, and the classification cost and the regression cost are not biased to either one of the classification cost and the regression cost, thereby improving the classification effect and the positioning effect of the detection frame, avoiding the unbalanced learning effect of the classification layer and the regression layer of the model in the training process (unbalanced learning effect, for example, one of the classification layer and the regression layer is well and the other is poor), the method has the advantages that samples which are difficult to learn can be well learned, the learning effect of the model is improved, and the detection precision of the trained model is further improved.

Step S103: and selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule according to the second total cost function of each alternative sample.

Step S103 may specifically include: sorting the second total cost function of each alternative sample according to the magnitude of the function value; and taking the preset number of alternative samples with the maximum function value of the second total cost function as training samples.

For example, assuming that the number of candidate samples is 2000, the second total cost functions of the 2000 candidate samples may be sorted according to the size of the function value, and the 128 candidate samples with the largest second total cost function value may be selected as the training samples.

Training samples are selected by sequencing the second total cost function of each alternative sample according to the magnitude of the function value, so that the problem that some samples which are difficult to learn are selected infrequently, and difficult samples cannot be learned well can be solved.

Step S104: and training the selected training sample through a back propagation algorithm and a gradient descent algorithm to obtain a trained target detection model.

The step of training the training samples by the back propagation algorithm and the gradient descent algorithm may comprise: inputting a training sample; forward propagation; calculating Loss (Loss including classification layer and regression layer); calculating derivatives of the losses with respect to the parameters of the model; parameters of the model are updated.

FIG. 2 is a block diagram of an improved fast RCNN target detection model according to an embodiment of the present invention.

The invention also provides a new target detection model which is an improved fast RCNN target detection model based on the fast RCNN. Wherein, a ThiNet network and/or a squeezet network can be used to construct the feature extraction network of the model, as shown in fig. 2, an improved fast RCNN object detection model of the embodiment of the present invention uses a ThiNet network as the feature extraction network of the model, and the specific architecture may include: ThiNet network 201, region generation network 202, ROI downsampling layer 203, fully connected layer 204, fully connected layer 205, classification layer 206, and regression layer 207. The ThiNet network comprises a plurality of convolution layers, a down-sampling layer and an active layer. The target detection model of the embodiment of the present invention selects only a partial layer of the ThiNet network (the selected partial layer may be referred to simply as the selected layer), and thus, the ThiNet network 201 represents the selected layer of the ThiNet network, and the selected layer is specifically the layer before the active layer relu5_3 (including the active layer relu5_3 itself) of the ThiNet network.

The ThiNet network 201 is configured to perform feature extraction on an input image, and output a feature map of the obtained input image from the active layer relu5_3 of the ThiNet network 201.

The area generation network 202 is connected to the active layer relu5_3 of the ThiNet network 201, and is configured to generate a plurality of detection blocks according to the feature map of the input image. The region generation network 202 may specifically include a convolution layer, a classification layer, a regression layer (the classification layer and the regression layer are located in the region generation network 202, and are not shown in the figure, the classification layer and the regression layer are different from the classification layer 206 and the regression layer 207), and a suggestion layer. The convolutional layer is connected to the classification layer and the regression layer of the region generation network 202, respectively, and is used to further extract features from the feature map of the input image, so as to obtain a feature map (referred to as a third feature map). Each point of the third feature map corresponds to a plurality of rectangular frames (also called rectangular frames or anchors), each rectangular frame corresponds to an area of the input image, and the size and aspect ratio of each rectangular frame can be set according to the detection target. The classification layer of the area generation network 202 is configured to classify each rectangular frame to determine that the category to which each rectangular frame belongs is a foreground or a background. Specifically, the probability that each rectangular frame belongs to the foreground can be obtained through the classification layer, and when the probability that a certain rectangular frame belongs to the foreground is greater than a preset foreground probability threshold, the class to which the rectangular frame belongs is determined as the foreground. The regression layer of the area generation network 202 is used to perform regression processing on each rectangular frame to determine the position of each rectangular frame. The position of each rectangular frame can be represented by four coordinate values corresponding to each rectangular frame. The suggestion layer is used for selecting a rectangular frame with the confidence coefficient larger than a preset value from rectangular frames belonging to the category of the foreground so as to generate a plurality of detection frames.

The ROI down-sampling layer 203 is connected to the active layer relu5_3 and the region generation network 202 of the ThiNet network 201, respectively, to obtain feature maps with the same size for each detection frame according to the feature map of the input image and the plurality of detection frames, the ROI down-sampling layer 203 is connected to the full connection layer 204 and the full connection layer 205 in cascade connection, and features are further extracted through the full connection layer 204 and the full connection layer 205, so as to generate a fourth feature map corresponding to each detection frame.

The full connection layer 205 is respectively connected to the classification layer 206 and the regression layer 207, classifies each detection frame corresponding to the fourth feature map through the classification layer 206 to determine a category label of each detection frame corresponding to the fourth feature map, performs regression processing on each detection frame corresponding to the fourth feature map through the regression layer 207 to determine position information of each detection frame corresponding to the fourth feature map, and finally determines a detection target and position information of the detection target in the input image according to the category label of each detection frame output by the classification layer 206 and the position information of each detection frame output by the regression layer 207.

The improved fast RCNN target detection model of the embodiment of the invention utilizes the selected layer of the lightweight convolutional neural network as the feature extraction network, can reduce the calculated amount of target detection, and thus meets the real-time application requirement.

The functions of the parts of the target detection model in fig. 2 after being trained by the training method according to the embodiment of the present invention are described above, and the training process for the target detection model is described in detail below.

Firstly, extracting features of a preselected image by using a selected layer of a ThiNet network 201, outputting a obtained feature map through an active layer relu5_3, and taking the output as the input of a subsequent region generation network 202 and an ROI down-sampling layer 203; secondly, further extracting features through the convolutional layer of the region generation network 202, judging whether a rectangular frame corresponding to each point in a feature map output by the convolutional layer belongs to a foreground or a background through a classification layer of the region generation network 202, and estimating the position of the rectangular frame through a regression layer of the region generation network 202, namely the coordinate of the center point and the width height value of each rectangular frame; thirdly, processing the rectangular frames through the suggestion layer of the area generation network 202, removing the rectangular frames beyond the image boundary, sorting confidence degrees of the rectangular frames (the confidence degree can be determined according to the probability that the rectangular frames belong to the foreground, the higher the confidence degree is), selecting the first a rectangular frames (in the embodiment, a is 12000, and may be other values), and then using a non-maximum suppression method to suppress the rectangular frames with the overlapping degree greater than the preset value, and obtaining B suppressed rectangular frames (in the embodiment, B is 2000, and may be other values) as detection frames; fourthly, performing downsampling on B rectangular frames with different sizes through the ROI downsampling layer 203 to obtain feature maps with the same size; features are further extracted through the full-link layer 204 and the full-link layer 205, and then the classification of the candidate frame and the position information of the candidate frame are respectively obtained through the classification layer 206 and the regression layer 207. And fifthly, taking the B (B ═ 2000) detection frames as candidate samples, wherein each candidate sample has a respective first total cost function, and the first total cost function includes a classification cost and a regression cost, and the classification cost can be obtained according to the output of the classification layer 206, and the regression cost can be obtained according to the output of the regression layer 207. And weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost. The weight corresponding to each regression cost can be dynamically updated, and the specific dynamic updating process is described above in detail and is not described here again. Sorting the second total cost functions of the 2000 candidate samples according to the magnitude of the function values, selecting 128 candidate samples with the largest second total cost function values as training samples, training the selected training samples through a back propagation algorithm and a gradient descent algorithm, training for 20 ten thousand times (the training times can be customized) according to the process, and finally obtaining the trained target detection model.

By using the training method of the target detection model of the embodiment of the invention to train the target detection model of FIG. 2, some samples which are difficult to learn can be well learned, the learning effect of the model is improved, and the model precision is higher. It should be noted that the method for training the target detection model according to the embodiment of the present invention is applied to training the target detection model of the feature extraction network by using the squeezet network or the combined network of the ThiNet network and the squeezet network, and is also applied to training the existing fast RCNN target detection model, so as to achieve the above technical effects. For example, when the method is used for training the fast RCNN pedestrian and vehicle detection model, some samples which are difficult to learn can be well learned, the model learning effect is improved, and the accuracy of the fast RCNN target detection model for detecting the pedestrian and the vehicle is higher.

Fig. 3 is a schematic diagram of main blocks of a training apparatus for an object detection model according to an embodiment of the present invention.

As shown in fig. 3, the training apparatus 300 of the target detection model according to the embodiment of the present invention mainly includes: a candidate sample determining module 301, a cost function processing module 302, a training sample selecting module 303, and a training executing module 304.

The alternative sample determination module 301 is configured to input a preselected image into the target detection model to determine an alternative sample for training the target detection model; wherein each candidate sample has a respective first total cost function, and the first total cost function includes a classification cost and a regression cost.

The alternative sample determination module 301 may specifically be configured to: inputting a preselected image into the object detection model to perform: extracting features from the preselected image to generate a first feature map; generating a plurality of detection frames according to the first feature map; performing down-sampling and feature extraction processing on the first feature map corresponding to each detection frame to obtain a second feature map; and carrying out classification and regression processing on the second feature map to determine an alternative sample.

The target detection model can be a fast RCNN target detection model or an improved fast RCNN target detection model, and the improved fast RCNN target detection model utilizes a lightweight convolutional neural network to construct a feature extraction network.

The cost function processing module 302 is configured to weight each regression cost by using the weight corresponding to each regression cost, and generate a second total cost function for each candidate sample according to the classification cost and the weighted regression cost, where the weight corresponding to each regression cost is dynamically updated.

The training apparatus 300 for the target detection model may further include a weight updating module, configured to: when the training times reach the preset weight updating times, acquiring the classification cost and the regression cost of each alternative sample during the training; and updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each alternative sample during the current training.

The weight value updating module may include an updating submodule configured to: respectively calculating the ratio of the classification cost and the regression cost of each alternative sample during the current training; and updating the weight corresponding to each regression cost according to each ratio.

The training sample selecting module 303 is configured to select a preset number of candidate samples from all the candidate samples as training samples according to a preset selecting rule according to the second total cost function of each candidate sample.

The training sample selection module 303 may be specifically configured to: sorting the second total cost function of each alternative sample according to the magnitude of the function value; and taking the preset number of alternative samples with the maximum function value of the second total cost function as training samples.

The training execution module 304 is configured to train the selected training sample through a back propagation algorithm and a gradient descent algorithm to obtain a trained target detection model.

In addition, in the embodiment of the present invention, the details of the implementation of the training apparatus for the object detection model have been described in detail in the above-mentioned training method for the object detection model, and therefore, the repeated contents are not described again.

Fig. 4 shows an exemplary system architecture 400 of a training method of an object detection model or a training apparatus of an object detection model to which an embodiment of the present invention may be applied.

As shown in fig. 4, the system architecture 400 may include

terminal devices

401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the

terminal devices

401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The

terminal devices

401, 402, 403 may have various communication client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.

The

terminal devices

401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 405 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the

terminal devices

401, 402, and 403. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., push information) to the terminal device.

It should be noted that the method for training the target detection model provided in the embodiment of the present invention may be executed by the server 405 or the

terminal devices

401, 402, and 403, and accordingly, the device for training the target detection model may be disposed in the server 405 or the

terminal devices

401, 402, and 403.

It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises an alternative sample determination module 301, a cost function processing module 302, a training sample selection module 303, and a training execution module 304. Where the names of these modules do not in some cases constitute a limitation of the module itself, for example, the alternative sample determination module 301 may also be described as a "module for inputting pre-selected images into a target detection model to determine alternative samples for training the target detection model".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: inputting a preselected image into a target detection model to determine alternative samples for training the target detection model; wherein each candidate sample has a respective first total cost function, and the first total cost function comprises a classification cost and a regression cost; weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated; according to the second total cost function of each alternative sample, selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule; and training the selected training sample through a back propagation algorithm and a gradient descent algorithm to obtain a trained target detection model.

According to the technical scheme of the embodiment of the invention, a preselected image is input into a target detection model so as to determine an alternative sample for training the target detection model; each candidate sample has a respective first total cost function, and the first total cost function comprises a classification cost and a regression cost; weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated; and selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule according to the second total cost function of each alternative sample. The method can enable some samples which are difficult to learn well, improve the learning effect of the model, and enable the precision of the model to be higher.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for training a target detection model, comprising:

inputting a preselected image into a target detection model to determine alternative samples for training the target detection model; wherein each candidate sample has a respective first total cost function, and the first total cost function comprises a classification cost and a regression cost;

weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated;

according to the second total cost function of each alternative sample, selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule;

and training the selected training sample through a back propagation algorithm and a gradient descent algorithm to obtain a trained target detection model.

2. The method of claim 1, wherein the step of inputting preselected images into a target detection model to determine candidate samples for training the target detection model comprises: inputting a preselected image into the object detection model to perform:

extracting features from the preselected image to generate a first feature map;

generating a plurality of detection frames according to the first feature map;

performing down-sampling and feature extraction processing on the first feature map corresponding to each detection frame to obtain a second feature map;

and carrying out classification and regression processing on the second feature map to determine an alternative sample.

3. The method of claim 1, wherein the object detection model is a fast RCNN object detection model or a modified fast RCNN object detection model that utilizes a lightweight convolutional neural network to construct a feature extraction network.

4. The method of claim 3, wherein the lightweight convolutional neural network is a ThiNet network and/or a SqueezeNet network.

5. The method according to claim 1, wherein the weight corresponding to each regression cost is dynamically updated according to the following method:

when the training times reach the preset weight updating times, acquiring the classification cost and the regression cost of each alternative sample during the training;

and updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each alternative sample during the current training.

6. The method according to claim 5, wherein the step of updating the weight corresponding to each regression cost according to the classification cost and the regression cost of each candidate sample during the secondary training comprises:

respectively calculating the ratio of the classification cost and the regression cost of each alternative sample during the current training;

and updating the weight value corresponding to each regression cost according to each ratio.

7. The method according to claim 1, wherein the step of selecting a preset number of candidate samples from all the candidate samples as training samples according to a preset selection rule based on the second total cost function of each candidate sample comprises:

sorting the second total cost function of each alternative sample according to the magnitude of the function value;

and taking the preset number of alternative samples with the maximum function value of the second total cost function as training samples.

8. An apparatus for training an object detection model, comprising:

the alternative sample determining module is used for inputting a preselected image into a target detection model so as to determine an alternative sample for training the target detection model; wherein each candidate sample has a respective first total cost function, and the first total cost function comprises a classification cost and a regression cost;

the cost function processing module is used for weighting each regression cost by using the weight value corresponding to each regression cost, and generating a second total cost function of each alternative sample according to the classification cost and the weighted regression cost, wherein the weight value corresponding to each regression cost can be dynamically updated;

the training sample selection module is used for selecting a preset number of alternative samples from all the alternative samples as training samples according to a preset selection rule according to the second total cost function of each alternative sample;

and the training execution module is used for training the selected training sample through a back propagation algorithm and a gradient descent algorithm to obtain a trained target detection model.

9. The apparatus of claim 8, wherein the alternative sample determination module is further configured to: inputting a preselected image into the object detection model to perform:

extracting features from the preselected image to generate a first feature map;

generating a plurality of detection frames according to the first feature map;

10. The apparatus of claim 8, wherein the object detection model is a fast RCNN object detection model or a modified fast RCNN object detection model that utilizes a lightweight convolutional neural network to construct a feature extraction network.

11. The apparatus of claim 10, wherein the lightweight convolutional neural network is a ThiNet network and/or a squeezet network.

12. The apparatus of claim 8, further comprising a weight update module configured to:

13. The apparatus of claim 12, wherein the weight update module comprises an update submodule configured to:

14. The apparatus of claim 8, wherein the training sample selection module is further configured to:

15. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.

16. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.