CN109961107A

CN109961107A - Training method, device, electronic equipment and the storage medium of target detection model

Info

Publication number: CN109961107A
Application number: CN201910315195.1A
Authority: CN
Inventors: 李永波; 李伯勋; 俞刚
Original assignee: Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Maigewei Technology Co Ltd
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2019-07-02
Anticipated expiration: 2039-04-18
Also published as: CN109961107B

Abstract

The embodiment of the present application provides training method, device, electronic equipment and the storage medium of a kind of target detection model, target detection model includes the first sorter network, this method comprises: at least one second sorter network is arranged, wherein, the input of the second sorter network is identical as the input of the first sorter network when training；Target detection model is trained based on total losses function, until total losses function convergence, wherein total losses function includes the loss function of target detection model and the loss function of the second sorter network.Compared with the training method of existing target detection model, the scheme of the embodiment of the present application, when being trained to target detection model, by adding the second sorter network, and the loss function of the loss function and model based on the second sorter network is trained to target detection model, model can effectively be reinforced to the study to erroneous detection and false-alarm, to improve the detection accuracy of target detection model.

Description

Training method, device, electronic equipment and the storage medium of target detection model

Technical field

This application involves technical field of image processing, specifically, this application involves a kind of training of target detection model Method, apparatus, electronic equipment and storage medium.

Background technique

The task of target detection is to find out in image interested target, for example, when target is face, Face datection Then it is intended to detect the face and its corresponding position in scene.Target detection is one of major issue of computer vision field, There is long-range researching value in the fields such as security protection detection, human-computer interaction and is widely applied demand.

In recent years, target detection technique obtains fast development with the development of deep neural network and hardware device, But target detection technique is often associated with a large amount of false-alarm in actual application, i.e., regards as certain nontarget areas Target area has seriously affected the popularization and use of target detection technique.Therefore, how to inhibit the void in target detection network It is alert, the precision of target detection is promoted, is the extremely important problem in the field.

Summary of the invention

The purpose of the application is intended at least can solve above-mentioned one of technological deficiency, especially false-alarm during target detection The high technological deficiency of rate.

In a first aspect, the embodiment of the present application provides a kind of training method of target detection model, target detection model packet The first sorter network is included, this method comprises:

At least one second sorter network is set, wherein the input of the second sorter network and the first sorter network when training Input it is identical；

Target detection model is trained based on total losses function, until total losses function convergence, wherein total losses letter Number includes the loss function of target detection model and the loss function of the second sorter network.

In the alternative embodiment of the application, target detection model includes single-stage detection network structure.

In the alternative embodiment of the application, it includes RetinaNet network structure that single-stage, which detects network structure,.

In the alternative embodiment of the application, the second sorter network includes cascade convolutional layer and full articulamentum, wherein convolution The input of layer is connect with the output of Backbone (backbone) network of RetinaNet network structure.

In the alternative embodiment of the application, the loss function of the second sorter network includes the output based on the second sorter network Determining first-loss function, and second damage determining based on the output of the first sorter network and the output of the second sorter network Lose at least one in function.

In the embodiment of the present application, first-loss function are as follows:

L_C1=-(1- α) * p₁ ^γlog(1-p₁)*(1-y)-α*(1-p₁)^γlog(p₁)*y

Wherein, L_C1Indicate first-loss function, α is weight factor, p₁For the second sorter network output as a result, y indicate Sample label, γ are regulatory factor.

In the alternative embodiment of the application, the second loss function are as follows:

L_C2=-(y* (1-M)+(1-y) * M) *

(α*p₁ ^γlog(p₁)*M-(1-α)*(1-p₁)^γlog(1-p₁)*(1-M))

Wherein, L_c2Indicate that the second loss function, α are weight factor, p₁For the second sorter network output as a result, y indicate Sample label, γ are regulatory factor, and M is target area result label, and the value of M determines in the following manner:

Wherein, p indicates that the output of the first sorter network determines that th indicates preset threshold.

In the alternative embodiment of the application, the loss function of target detection model includes the loss function of the first sorter network With the loss function of target frame Recurrent networks.

Second aspect, the embodiment of the present application are also provided to a kind of image detecting method, this method comprises:

Obtain image to be detected；

Image to be detected is detected by target detection model, target detection model is by the embodiment of the present application What the training method training of the target detection model in one side obtained；

Based on the output of target detection model, the testing result of image to be detected is obtained.

The third aspect, the embodiment of the present application provide a kind of training device of target detection model, target detection model packet The first sorter network is included, which includes:

Training supervision network settings module, at least one second sorter network to be arranged, wherein the second classification when training The input of network is identical as the input of the first sorter network；

Model training module, for being trained based on total losses function to target detection model, until total losses function Convergence, wherein total losses function includes the loss function of target detection model and the loss function of the second sorter network.

In the alternative embodiment of the application, the second sorter network includes cascade convolutional layer and full articulamentum, wherein convolution The input of layer is connect with the output of the Backbone network of RetinaNet network structure.

In the alternative embodiment of the application, first-loss function are as follows:

L_C1=-(1- α) * p₁ ^γlog(1-p₁)*(1-y)-α*(1-p₁)^γlog(p₁)*y

In the embodiment of the present application, the second loss function are as follows:

L_C2=-(y* (1-M)+(1-y) * M) *

(α*p₁ ^γlog(p₁)*M-(1-α)*(1-p₁)^γlog(1-p₁)*(1-M))

Fourth aspect, the embodiment of the present application also provides a kind of image detection device, which includes:

Image collection module, for obtaining image to be detected；

Image detection module is based on target detection model for detecting by target detection model to image to be checked Output, obtain the testing result of image to be detected, which is by the embodiment of the present application first aspect What the training method training of target detection model obtained.

5th aspect, this application provides a kind of electronic equipment, which includes: processor and memory；

Memory, for storing operational order；

Processor executes any implementation in the application first aspect or second aspect for instructing by call operation Method shown in example.

6th aspect, this application provides a kind of computer readable storage medium, which is stored at least one Instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, at least a Duan Chengxu, code set or instruction set by Reason device is loaded and is executed to realize method shown in any embodiment in the application first aspect or second aspect.

Technical solution provided by the present application has the benefit that

In the scheme of the embodiment of the present application, compared with the training method of existing target detection model, to target detection When model is trained, by adding the second sorter network, and the loss of the loss function based on the second sorter network and model Function is trained to target detection model, can effectively reinforce model to the study to erroneous detection and false-alarm, to improve The detection accuracy of target detection model.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application Attached drawing to be used is needed to be briefly described.

Fig. 1 is a kind of flow diagram of the training method of target detection model provided by the embodiments of the present application；

Fig. 2 is a kind of schematic diagram of second sorter network provided by the embodiments of the present application；

Fig. 3 is the schematic diagram of another second sorter network provided by the embodiments of the present application；

Fig. 4 is the schematic diagram of another the second sorter network provided by the embodiments of the present application；

Fig. 5 is the schematic diagram of another second sorter network provided by the embodiments of the present application；

Fig. 6 is a kind of flow diagram of image detecting method provided by the embodiments of the present application；

Fig. 7 is a kind of structural schematic diagram of the training device of target detection model provided by the embodiments of the present application；

Fig. 8 is a kind of structural schematic diagram of image detection device provided by the embodiments of the present application；

Fig. 9 is the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.

Specific embodiment

Embodiments herein is described below in detail, the example of embodiment is shown in the accompanying drawings, wherein identical from beginning to end Or similar label indicates same or similar element or element with the same or similar functions.It is retouched below with reference to attached drawing The embodiment stated is exemplary, and is only used for explaining the application, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " " and "the" may also comprise plural form.It is to be further understood that " the packet of wording used in the description of the present application Include " refer to existing characteristics, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition it is one or more Other features, integer, step, operation, element, component and/or their group.It should be understood that when we claim element to be " connected " Or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be intermediary elements.This Outside, " connection " or " coupling " used herein may include being wirelessly connected or wirelessly coupling.Wording "and/or" packet used herein Include one or more associated wholes for listing item or any cell and all combination.

To keep the purposes, technical schemes and advantages of the application clearer, with target detection model in the embodiment of the present application The scheme in the embodiment of the present application is illustrated for human face region in detection image, below in conjunction with attached drawing to the application Embodiment is described in further detail.

Embodiments herein provides a kind of training method of target detection model, and target detection model includes first point Class network, as shown in Figure 1, this method may include:

At least one second sorter network is arranged in step S110, wherein the input and first of the second sorter network when training The input of sorter network is identical.

Wherein, the effect of target detection model is for the target area in detection image, for example, for target detection mould Type can be the Face datection model for the human face region in detection image, or for the human region in detection image Human testing model or other object detection models etc., sorter network export the classification the result is that for characterizing target area, For example, the output result of sorter network can be the target in the input picture for indicating detected for Face datection model Region is the probability of human face region.

Step S120 is trained target detection model based on total losses function, until total losses function convergence, In, total losses function includes the loss function of target detection model and the loss function of the second sorter network.

Wherein, loss function is the inconsistent degree of the prediction result and legitimate reading for estimating model, it is one Non-negative real-valued function, loss function is smaller, then the robustness of model is better, and loss function is the core of empirical risk function Point and structure risk function important component.Loss function convergence is the concept of a limit, in general if function Value is when variable tends to some finite value, then loss function is exactly convergent.

It is understood that the loss function of target detection model refers to the loss function part of model itself, the portion The concrete form divided is related with the structure of target detection model.For example, including sorter network in target detection model, then target is examined The loss function for surveying model includes loss function corresponding to sorter network, if further including Recurrent networks in target detection model, Then the loss function of target detection model includes the corresponding loss function of sorter network and the corresponding loss function of Recurrent networks.Its In, the form of the loss function of each sub-network (such as sorter network, Recurrent networks) of model is also related to the structure of each sub-network.

In the embodiment of the present application, the loss function of target detection model includes the loss function and mesh of the first sorter network Mark the loss function of frame Recurrent networks.

That is, in the embodiment of the present application, target detection model may include that the first sorter network and target frame return Return network, the loss function of target detection model then includes the loss function and target frame Recurrent networks of the first sorter network at this time Loss function.Correspondingly, total losses function may include the damage of the loss function of the first sorter network, target frame Recurrent networks Lose the loss function of function and the second sorter network.

It should be noted that in practical applications, judge loss function (such as above-mentioned total losses function) whether convergent side Formula can be configured according to actual needs.For example, in training, if to tend to some limited for the functional value of total losses function Value, it may be considered that function convergence.Generally, total losses function is the smaller the better, with the increase of frequency of training, total losses function Value constantly can reduce and tend to be steady, as the condition of convergence can refer to the difference of the value of adjacent total losses function trained twice Less than given threshold, then when training result meets the condition of convergence, it may be considered that total losses function convergence, naturally it is also possible to Other conditions of convergence are configured according to actual needs or use the whether convergent mode of other discriminant functions.

In the embodiment of the present application, in training objective detection model, by adding at least one second sorter network, and The loss function of loss function and target detection model based on the second sorter network is trained target detection model.Due to The total losses function includes the loss function of target detection model and the loss function of the second sorter network, and the second sorter network Loss function supervisory role can be played when being trained to target detection model, the instruction with existing target detection model The mode of white silk is compared, and the precision of target detection model can be promoted, and therefore, the target detection model after using the training carries out mesh When mark detection, the false-alarm in target detection network can be effectively inhibited, the precision of target detection technique is improved.

In addition, since the second sorter network is arranged outside the network structure of target detection model, it can't shadow The network structure of original target detection model is rung, and carries out target area detection using the target detection model subsequent When, since the network structure of target detection model does not change, and then it would not also influence detection speed.

Certainly, target detection model also may include multistage detection network structure.Single-stage detects network structure, that is, one- Stage detects network of network structure, for example, can include but is not limited to YOLO (You Only Look Once, it is only necessary to see one All over) structure, SSD (Single Shot MultiBox Detector, single network objectives detection framework) network structure or RetinaNet network structure etc..

Wherein, RetinaNet network structure is the network for detecting target image, which is By a Backbone network and two single networks being made of the sub-network of particular task, which is responsible for Convolution feature is calculated in whole image, first sub-network executes image classification task (i.e. in the output of Backbone network First sorter network), second sub-network is responsible for convolution frame and returns (i.e. Recurrent networks).It wherein, can in the first sorter network To include cascade convolutional layer and full articulamentum, the corresponding loss function L of the RetinaNet network structure_CIncluding the first classification The loss function L of network_clsLoss function L corresponding with Recurrent networks_bb, i.e. L_C=L_bb+L_cls。

In the alternative embodiment of the application, the second sorter network includes cascade convolutional layer and full articulamentum, wherein volume The input of lamination is connect with the output of the Backbone network of RetinaNet network structure, Backbone role of network be with The form of classifier has carried out pre-training, and feature is extracted from image.

In practical applications, the second sorter network can be according to the design of the first sorter network, the second sorter network Structure type can be identical or not identical with the structure type of the first sorter network, when the structure type of the second sorter network When identical as the structure type of the first sorter network, the second sorter network can not phase with the network parameter in the first sorter network Together.In one example, if single-stage detection network structure is RetinaNet network structure, first in RetinaNet network structure It include cascade convolutional layer and full articulamentum in sorter network, then the second sorter network also may include cascade convolutional layer and complete Articulamentum, and the input of the convolutional layer in the second sorter network is defeated with the Backbone network of RetinaNet network structure It connects out.

In one example, it is illustrated so that target detection model is RetinaNet network structure as an example, as shown in Fig. 2, this Application embodiment provides the schematic diagram of a kind of RetinaNet network structure and the second sorter network.Wherein, Branch-c1 in figure Corresponding network branches indicate the second sorter network, and second sorter network is by cascade convolutional layer Conv3 and full articulamentum FC3 is formed, and the part in figure in dotted line frame is RetinaNet network structure, includes in the RetinaNet network structure Two network branches of Branch-c and Branch-b.Wherein, network branches corresponding to Branch-c indicate the first sorter network, First sorter network is made of cascade convolutional layer Conv1 and full articulamentum, and network branches corresponding to Branch-b are back Return network, Recurrent networks are made of cascade convolutional layer Conv2 and full articulamentum FC2 in the example.In this example, The output of network branches corresponding to Branch-b can be the coordinate of target area, that is, output result is the seat of target area The output of network branches corresponding to mark, Branch-c and Branch-c1 can be the probability that detection zone is target area, Exactly exporting result is probability.

In the alternative embodiment of the application, the loss function of the second sorter network includes based on the defeated of the second sorter network Determining first-loss function out, and based on the output of the first sorter network and export determination second of the second sorter network At least one of in loss function.

That is, in practical applications, the loss function of the second sorter network may exist different combined situations, i.e., It may include the first-loss function determining based on the output of the second sorter network, or may include based on the first sorter network Second loss function of output and the output determination of the second sorter network, also may include that the output based on the second sorter network is true Fixed first-loss function, and second loss determining based on the output of the first sorter network and the output of the second sorter network Function.

L_C1=-(1- α) * p₁ ^γlog(1-p₁)*(1-y)-α*(1-p₁)^γlog(p₁)*y

Wherein, L_C1Indicate first-loss function, α is weight factor, p₁For the second sorter network output as a result, y is Sample label, γ are regulatory factor.

Wherein, sample label refers to marking in sample image with the presence or absence of added by target area, such as sample Image is for that, with the presence or absence of human face region, at this point, including face in sample image, then can set y in detection image 1, face (i.e. the sample image is background image), then can set 0 for y if it does not exist.

L_C2=-(y* (1-M)+(1-y) * M) *

(α*p₁ ^γlog(p₁)*M-(1-α)*(1-p₁)^γlog(1-p₁)*(1-M))

Wherein, L_C2Indicate that the second loss function, α are weight factor, p₁For the second sorter network output as a result, y is Sample label, γ are regulatory factor, and M is target area result label, and the value of M determines in the following manner:

That is, working as p>=th, M=1 then thinks that result is positive, as target area, if p<th, M=0, then it is assumed that As a result it is negative, neither target area.

Below for the specific optinal plan of the loss function of the second sorter network, it is with target detection model For RetinaNet network structure, and total losses function is described in detail in conjunction with specific example.

1, the loss function of the second sorter network includes the first-loss function determining based on the output of the second sorter network L_C1, at this point, total losses function L includes the corresponding loss function L of RetinaNet network structure_C(L_C=L_bb+L_cls) and based on the The determining first-loss function L of the output of two sorter networks_C1, such as L=L_C1+L_C。

As an example, as shown in figure 3, inputting sample image when being trained to RetinaNet network structure After Backbone network, in RetinaNet network structure the output result of the first sorter network be Probability p, Recurrent networks it is defeated Result is indicated by Box (box) out, and the output result of the second sorter network is Probability p₁, in the example, total losses letter Number can indicate are as follows:

L=L_C+L_C1=L_bb+L_cls+L_C1

Wherein,

L_cls=-(1- α) * p^γlog(1-p)*(1-y)-α*(1-p)^γlog(p)*y

L_C1=-(1- α) * p₁ ^γlog(1-p₁)*(1-y)-α*(1-p₁)^γlog(p₁)*y

2, the loss function of the second sorter network includes the defeated of output based on the first sorter network and the second sorter network The the second loss function L determined out_C2, at this point, total losses function L includes the corresponding loss function L of RetinaNet network structure_C (L_C=L_bb+L_cls), and second loss function determining based on the output of the first sorter network and the output of the second sorter network L_C2, such as L=L_C2+L_C。

As an example, as shown in figure 4, inputting sample image when being trained to RetinaNet network structure After Backbone network, in RetinaNet network structure the output result of the first sorter network be Probability p, Recurrent networks it is defeated Result is indicated by Box out, and the output result of the second sorter network is Probability p₁, in the example, total losses function can be indicated Are as follows:

L=L_C2+L_C=L_bb+L_cls+L_C2

L_C2=-(y* (1-M)+(1-y) * M) *

(α*p₁ ^γlog(p₁)*M-(1-α)*(1-p₁)^γlog(1-p₁)*(1-M))

Wherein, the value of M can be determined by following template:

Wherein, L_clsConcrete form and upper embodiment in L_clsConcrete form it is the same, for details, reference can be made to above-described embodiment, Details are not described herein again.

3, the loss function of the second sorter network includes the first-loss function determining based on the output of the second sorter network L_C1, and the second loss function L determining based on the output of the first sorter network and the output of the second sorter network_C2, at this point, Total losses function L includes the corresponding loss function L of RetinaNet network structure_C(L_C=L_bb+L_cls), be based on the second sorter network The determining first-loss function L of output_C1With the output determination of output and the second sorter network based on the first sorter network Second loss function L_C2, such as L=L_C2+_LC+L_C1。

As an example, as shown in figure 5, inputting sample image when being trained to RetinaNet network structure After Backbone network, in RetinaNet network structure the output result of the first sorter network be Probability p, Recurrent networks it is defeated For result by being indicated for Box, the output result of the second sorter network is Probability p out₁, in the example, total losses function can To indicate are as follows:

L=L_C2+L_C+L_C1=L_bb+L_cls+L_C2+L_C1

Wherein, L_cls、L_C2And L_C1Concrete form and upper embodiment in L_cls、L_C2And L_C1Concrete form it is the same, specifically may be used Referring to above-described embodiment, details are not described herein again.

Target detection model based on the embodiment of the present invention, as shown in fig. 6, the embodiment of the present application also provides a kind of figure As detection method, this method comprises:

Step S610 obtains image to be detected；

Step S620 detects image to be detected by target detection model；

Wherein, target detection model be through the foregoing embodiment in target detection model training method be trained after Target detection model, and training method specific embodiment is referred to the training of target detection model in above-described embodiment The description of method, details are not described herein again for the embodiment of the present application, as target detection model can be after training RetinaNet network structure.

Step S630 obtains the testing result of image to be detected based on the output of target detection model.

Wherein, target area can sets itself according to actual needs, be not limited to the face area of people, such as or can be with It is human region etc..

That is, image to be detected can be inputted to the target detection model after training, mesh when carrying out image detection Detection model can export as a result, exporting the testing result as a result, available image to be detected based on target detection model.

Based on principle identical with method shown in Fig. 1, a kind of target detection mould is additionally provided in embodiments herein The training device 70 of the training device 70 of type, target detection model includes the first sorter network, as shown in fig. 7, the target detection The training device 70 of model may include: training supervision network settings module 710 and model training module 720, in which:

Training supervision network settings module 710, at least one second sorter network to be arranged, wherein second when training The input of sorter network is identical as the input of the first sorter network；

Model training module 720, for being trained based on total losses function to target detection model, until total losses letter Number convergence, wherein total losses function includes the loss function of target detection model and the loss function of the second sorter network.

In the embodiment of the present application, target detection model includes single-stage detection network structure.

In the embodiment of the present application, it includes RetinaNet network structure that single-stage, which detects network structure,.

In the embodiment of the present application, the second sorter network includes cascade convolutional layer and full articulamentum, wherein convolutional layer it is defeated Enter and is connect with the output of the Backbone network of RetinaNet network structure.

In the embodiment of the present application, the loss function of the second sorter network includes the output determination based on the second sorter network First-loss function, and second loss function determining based on the output of the first sorter network and the output of the second sorter network At least one of in.

L_C1=-(1- α) * p₁ ^γlog(1-p₁)*(1-y)-α*(1-p₁)^γlog(p₁)*y

L_C2=-(y* (1-M)+(1-y) * M) *

(α*p₂ ^γlog(p₂)*M-(1-α)*(1-p₂)^γlog(1-p₂)*(1-M))

In the embodiment of the present application, the loss function of target detection model includes the loss function and target of the first sorter network The loss function of frame Recurrent networks.

One kind provided by embodiments herein can be performed in the training device of the target detection model of the embodiment of the present application The training method of target detection model, realization principle is similar, the training of the target detection model in each embodiment of the application Movement performed by each module in device is and the step in the training method of the target detection model in each embodiment of the application Rapid corresponding, the detailed functions description of each module of the training device of target detection model specifically may refer to hereinbefore shown Corresponding target detection model training method in description, details are not described herein again.

Based on principle identical with method shown in Fig. 6, a kind of image detection dress is additionally provided in embodiments herein It sets shown in 80, Fig. 8, which may include: image collection module 810 and image detection module 820, in which:

Image collection module 810, for obtaining image to be detected；

Image detection module 820 is based on target detection mould for detecting by target detection model to image to be checked The output of type obtains the testing result of image to be detected；Wherein, target detection model is middle target detection through the foregoing embodiment What the training method of model obtained after being trained.

A kind of image detection side provided by embodiments herein can be performed in the image detection device of the embodiment of the present application Method, realization principle is similar, movement performed by each module in image detection device in each embodiment of the application be with The step in image detecting method in each embodiment of the application is corresponding, the detailed functions of each module of image detection device Description specifically may refer to hereinbefore shown in description in corresponding image detecting method, details are not described herein again.

A kind of electronic equipment is additionally provided in embodiments herein, which can include but is not limited to: processing Device and memory；Memory, for storing computer operation instruction；Processor, for by calling computer operation instruction to hold Method shown in row embodiment.

The another embodiment of the application provides a kind of computer readable storage medium, which is stored at least one Instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, at least a Duan Chengxu, code set or instruction set by Reason device is loaded and is executed to realize the corresponding contents in preceding method embodiment.

A kind of electronic equipment is provided in one alternate embodiment, as shown in figure 9, electronic equipment shown in Fig. 9 4000 It include: processor 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, such as passes through 4002 phase of bus Even.Optionally, electronic equipment 4000 can also include transceiver 4004.It should be noted that transceiver 4004 in practical application It is not limited to one, the structure of the electronic equipment 4000 does not constitute the restriction to the embodiment of the present application.

Processor 4001 can be CPU (Central Processing Unit, central processing unit), general processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit, specific integrated circuit), (Field Programmable Gate Array, scene can compile FPGA Journey gate array) either other programmable logic device, transistor logic, hardware component or any combination thereof.It can be with It realizes or executes and combine various illustrative logic blocks, module and circuit described in present disclosure.Processor 4001 are also possible to realize the combination of computing function, such as combine comprising one or more microprocessors, DSP and microprocessor Combination etc..

Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be PCI (Peripheral Component Interconnect, Peripheral Component Interconnect standard) bus or EISA (Extended Industry Standard Architecture, expanding the industrial standard structure) bus etc..It is total that bus 4002 can be divided into address Line, data/address bus, control bus etc..Only to be indicated with a thick line in Fig. 9 convenient for indicating, it is not intended that only one total Line or a type of bus.

Memory 4003 can be ROM (Read Only Memory, read-only memory) or can store static information and refer to The other kinds of static storage device enabled, RAM (Random Access Memory, random access memory) or can store The other kinds of dynamic memory of information and instruction is also possible to EEPROM (Electrically Erasable Programmable Read Only Memory, Electrically Erasable Programmable Read-Only Memory), CD-ROM (Compact Disc Read Only Memory, CD-ROM) or other optical disc storages, optical disc storage (including compression optical disc, laser disc, optical disc, number The general optical disc of word, Blu-ray Disc etc.), magnetic disk storage medium or other magnetic storage apparatus or can be used in carrying or store Desired program code with instruction or data structure form simultaneously can be but unlimited by any other medium of computer access In this.

Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control Row.Processor 4001 is for executing the application code stored in memory 4003, to realize aforementioned either method embodiment Shown in content.

It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

The above is only some embodiments of the invention, it is noted that those skilled in the art are come It says, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should be regarded as Protection scope of the present invention.

Claims

1. a kind of training method of target detection model, which is characterized in that the target detection model includes the first sorter network, The described method includes:

At least one second sorter network is set, wherein the input of the second sorter network described in when training and first classification The input of network is identical；

The target detection model is trained based on total losses function, until the total losses function convergence, wherein described Total losses function includes the loss function of the target detection model and the loss function of second sorter network.

2. the method according to claim 1, wherein the target detection model includes single-stage detection network knot Structure.

3. according to the method described in claim 2, it is characterized in that, single-stage detection network structure includes RetinaNet net Network structure.

4. according to the method described in claim 3, it is characterized in that, second sorter network includes cascade convolutional layer and complete Articulamentum, wherein the input of the convolutional layer connects with the output of the backbone Backbone network of the RetinaNet network structure It connects.

5. method according to claim 1 to 4, which is characterized in that the loss letter of second sorter network Number includes the first-loss function determining based on the output of second sorter network, and based on first sorter network At least one of in second loss function of output and the output determination of second sorter network.

6. according to the method described in claim 5, it is characterized in that, the first-loss function are as follows:

L_C1=-(1- α) * p₁ ^γlog(1-p₁)*(1-y)-α*(1-p₁)^γlog(p₁)*y

7. according to the method described in claim 5, it is characterized in that, second loss function are as follows:

L_C2=-(y* (1-M)+(1-y) * M) * (α * p₁ ^γlog(p₁)*M-(1-α)*(1-p₁)^γlog(1-p₁)*(1-M))

Wherein, L_c2Indicate that the second loss function, α are weight factor, p₁For the second sorter network output as a result, y indicate sample Label, γ are regulatory factor, and M is target area result label, and the value of the M determines in the following manner:

8. the method according to claim 1, wherein the loss function of the target detection model includes described The loss function of one sorter network and the loss function of target frame Recurrent networks.

9. a kind of image detecting method, which is characterized in that the described method includes:

Obtain image to be detected；

Described image to be detected is detected by the target detection model, wherein the target detection model is to pass through What method training described in any item of the claim 1 to 8 obtained；

Based on the output of the target detection model, the testing result in described image to be detected is obtained.

10. a kind of training device of target detection model, which is characterized in that the target detection model includes the first classification net Network, described device include:

Training supervision network settings module, at least one second sorter network to be arranged, wherein the second classification described in when training The input of network is identical as the input of first sorter network；

Model training module, for being trained based on total losses function to the target detection model, until the total losses Function convergence, wherein the total losses function includes the loss function and second sorter network of the target detection model Loss function.

11. a kind of image detection device, which is characterized in that described device includes:

Image collection module, for obtaining image to be detected；

Image detection module is based on the target detection for detecting by target detection model to the image to be checked The output of model obtains the testing result of described image to be detected, wherein the target detection model is by claim 1 It is obtained to the training of method described in any one of 8.

12. a kind of electronic equipment, which is characterized in that the electronic equipment includes: processor and memory；

The memory, for storing operational order；

The processor, for by calling the operational order, method described in any one of perform claim requirement 1 to 9.

13. a kind of computer readable storage medium, which is characterized in that the storage medium is stored at least one instruction, at least One Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or instruction set It is loaded by processor and is executed to realize the method as described in any in claim 1 to 9.