CN106529565B

CN106529565B - Model of Target Recognition training and target identification method and device calculate equipment

Info

Publication number: CN106529565B
Application number: CN201610849633.9A
Authority: CN
Inventors: 石建萍
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2016-09-23
Filing date: 2016-09-23
Publication date: 2019-09-13
Anticipated expiration: 2036-09-23
Also published as: CN106529565A

Abstract

The invention discloses a kind of Model of Target Recognition training and target identification method and device, calculate equipment, belong to technical field of computer vision, wherein method includes: that will be input to the Model of Target Recognition for multiple local candidate regions that the training image is selected, and obtains the PRELIMINARY RESULTS of multiple local candidate region classification of the Model of Target Recognition output；According to the PRELIMINARY RESULTS of the Weakly supervised information and the multiple local candidate region classification, local candidate region fusion is carried out；The parameter of the Model of Target Recognition is modified according to the result of the PRELIMINARY RESULTS of the multiple local candidate region classification and local candidate region fusion；Iteration executes training step described above until the training result of the Model of Target Recognition meets predetermined convergence condition.The direct supervision of the existing pixel scale of the present invention program, and semantic segmentation model can be optimized end to end, and the result of target identification can be improved according to the judgement to local candidate region.

Description

Model of Target Recognition training and target identification method and device calculate equipment

Technical field

The present invention relates to technical field of computer vision more particularly to a kind of training of Model of Target Recognition and target identification sides Method and device calculate equipment.

Background technique

Target identification is a classical problem of computer vision field, and purpose is to predict that input picture gives the position of object It sets.One good target identification scheme depends on the object category of each pixel, realizes accurate, dense image pixel-class Do not understand.Target identification problem is usually very time-consuming because to mark object space all in figure.According to the reality of common method Experience is trampled, if to obtain the accurate target identification mark of a 400*600 pixel size, it usually needs 5-8 minutes time-consuming. Therefore data mark speed and quality become restrict the problem obtain big data support, important asking of obtaining more further developing Topic.

Summary of the invention

The embodiment of the present invention provides that a kind of Model of Target Recognition is trained and target identification scheme.

According to an aspect of an embodiment of the present invention, a kind of Model of Target Recognition training method is provided, the method is adopted Model of Target Recognition is trained with multiple training images for being labeled with Weakly supervised information in advance, is directed to each training figure Picture, training step include:

It will be input to the Model of Target Recognition for multiple local candidate regions that the training image is selected, obtains institute State the PRELIMINARY RESULTS of multiple local candidate region classification of Model of Target Recognition output；

According to the PRELIMINARY RESULTS of the Weakly supervised information and the multiple local candidate region classification, from the multiple part The local candidate region for belonging to same object category is selected in candidate region, it is candidate for the part for belonging to same object category Region carries out local candidate region fusion；

According to the PRELIMINARY RESULTS of the multiple local candidate region classification and the result of local candidate region fusion to described The parameter of Model of Target Recognition is modified；

Iteration executes training step described above until the training result of the Model of Target Recognition meets predetermined convergence item Part.

According to an aspect of an embodiment of the present invention, a kind of target identification method is provided, comprising:

Using image to be identified as the input of Model of Target Recognition, the Model of Target Recognition uses described above in advance Method be trained；

The multiple local candidate regions selected for described image are determined according to the output result of the Model of Target Recognition Classification results.

Other side according to an embodiment of the present invention provides a kind of Model of Target Recognition training device, described device Model of Target Recognition is trained using multiple training images for being labeled with Weakly supervised information in advance, the training device packet It includes:

Object-recognition unit is input to the mesh for will be directed to the selected multiple local candidate regions of the training image Identification model is marked, the PRELIMINARY RESULTS of multiple local candidate region classification of the Model of Target Recognition output is obtained；

Integrated unit, the PRELIMINARY RESULTS for classifying according to the Weakly supervised information and the multiple local candidate region, The local candidate region for belonging to same object category is selected out of the multiple local candidate region, for belonging to same object The local candidate region of classification carries out local candidate region fusion；

Amending unit, for the PRELIMINARY RESULTS and local candidate region fusion according to the multiple local candidate region classification Result the parameter of the Model of Target Recognition is modified；

The training device iteration operation is until the training result of the Model of Target Recognition meets predetermined convergence condition.

Other side according to an embodiment of the present invention provides a kind of Target Identification Unit, the Target Identification Unit For determining needle according to the output result of the Model of Target Recognition using image to be identified as the input of Model of Target Recognition To the classification results for multiple local candidate regions that described image is selected；Wherein, the Model of Target Recognition uses above in advance The training device is trained.

Another aspect according to an embodiment of the present invention provides a kind of calculating equipment, comprising: processor, communication interface, Memory and communication bus；The processor, the communication interface and the memory complete phase by the communication bus Communication between mutually；

The memory is for storing at least one instruction；Described instruction make the processor using it is multiple be labeled in advance it is weak The training image of supervision message is trained Model of Target Recognition, is directed to each training image, and described instruction makes described Processor executes following operation:

Another aspect according to an embodiment of the present invention provides a kind of computer storage medium, for storing computer The instruction that can be read.Described instruction includes:

The instruction that Model of Target Recognition is trained using multiple training images for being labeled with Weakly supervised information in advance；

It is directed to each training image, institute will be input to for multiple local candidate regions that the training image is selected Model of Target Recognition is stated, the finger of the PRELIMINARY RESULTS of multiple local candidate region classification of the Model of Target Recognition output is obtained It enables；

According to the PRELIMINARY RESULTS of the Weakly supervised information and the multiple local candidate region classification, from the multiple part The local candidate region for belonging to same object category is selected in candidate region, it is candidate for the part for belonging to same object category Region carries out the instruction of local candidate region fusion；

According to the PRELIMINARY RESULTS of the multiple local candidate region classification and the result of local candidate region fusion to described The instruction that the parameter of Model of Target Recognition is modified；

Iteration executes training step described above until the training result of the Model of Target Recognition meets predetermined convergence item The instruction of part.

Technical solution provided in an embodiment of the present invention, it is first that the multiple local candidate regions selected for training image are defeated Enter to Model of Target Recognition and obtain the PRELIMINARY RESULTS of multiple local candidate region classification, is waited according to Weakly supervised information and multiple parts The PRELIMINARY RESULTS of favored area classification carries out local candidate region fusion, according to fusion results to the parameter of Model of Target Recognition into Row amendment, then obtains housebroken Model of Target Recognition.By the guidance for the Weakly supervised information being provided previously, it is accurate to realize The classification of ground part candidate region identifies, completes object recognition task.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 a to Fig. 1 f shows the schematic diagram for using multiple body learning method prediction pixel classifications in the prior art；

Fig. 2 shows the flow charts of the embodiment one of Model of Target Recognition training method provided by the invention；

Fig. 3 shows the flow chart of the embodiment two of Model of Target Recognition training method provided by the invention；

Fig. 4 shows the network model schematic diagram of the embodiment two of Model of Target Recognition training method provided by the invention；

Fig. 5 a to Fig. 5 h shows an exemplary schematic diagram of local candidate region fusion treatment in the embodiment of the present invention；

Fig. 6 shows the flow chart of the embodiment three of Model of Target Recognition training method provided by the invention；

Fig. 7 shows the network model schematic diagram of the embodiment three of Model of Target Recognition training method provided by the invention；

Fig. 8 shows the functional block diagram of the embodiment one of Model of Target Recognition training device provided by the invention；

Fig. 9 shows the functional block diagram of the embodiment two of Model of Target Recognition training device provided by the invention；

Figure 10 shows the functional block diagram of the embodiment three of Model of Target Recognition training device provided by the invention；

Figure 11 shows the calculating equipment for executing Model of Target Recognition training method according to an embodiment of the present invention Block diagram；

Figure 12, which is shown, realizes Model of Target Recognition training method according to an embodiment of the present invention for keeping or carrying Program code storage unit.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

In technical field of computer vision, many times objects in images is accurately identified in order to realize, can will be schemed As dismantling understands at multiple local candidate regions and learns.The principle of dismantling is can be different in covering image as much as possible The object of size；Each part candidate region can cover a part of object, it is not necessary to completely include object, thus each part The information that candidate region is acquired is richer.In the present invention, the classification that target identification problem actually refers to local candidate region is asked Topic, i.e., the object category that local candidate region is belonged to.The present invention utilizes a large amount of training image, obtains mesh by training process Identification model is marked, the accuracy of local candidate region classification can be greatly promoted.

In the implementation of the present invention, presently, there are some using semi-supervised by the discovery of the research prior art by inventor Or Weakly supervised method training objective identifies problem.The Weakly supervised target identification of traditional given image classification is generally divided into two Class.The first kind is direct using the method for multiple bodies study (Multiple Instance Learning, hereinafter referred to as: MIL) Prediction pixel classification.Under this setting, each picture is considered as a series of set of pixels or super-pixel.Wherein if Having an element in set is that then overall output is positive positive sample, whole if instead being all negative sample if instead all elements in set Body output is negative.Such scheme is not because have direct supervision message to instruct bottom-up information, it is easy to can not accurately determine Position object, as shown in Fig. 1 a to Fig. 1 f.Wherein, Fig. 1 a and Fig. 1 d is original image, and Fig. 1 b and Fig. 1 e are respectively Fig. 1 a and Fig. 1 d corresponding True semantic segmentation image, Fig. 1 c and Fig. 1 f are respectively the target identification image predicted using MIL method.From figure It can be seen that, the accuracy rate for the target identification image predicted using MIL method is lower.

The Weakly supervised study in another direction is to utilize the maximum thought (Expectation- of expectation in the prior art Maximization), the current temporary supervision classification of the study of circulation, and study semantic segmentation model.Such method by Conducive to the supervision for having pixel scale, but become dependent upon an extraordinary initialization.It is as a result more difficult if initialization is improper Guarantee.

Based on the above discovery, the embodiment of the present invention proposes a kind of Weakly supervised target identification scheme, existing pixel scale It directly supervises, and semantic segmentation model can be optimized end to end, also introduce an object positioning branch, target can be improved The result of identification.It describes in detail below by several specific embodiments to the program.

Fig. 2 shows the flow charts of the embodiment one of Model of Target Recognition training method provided by the invention.The present embodiment The Model of Target Recognition training method of offer is not necessarily to carry out the mark of pixel scale, based on the Weakly supervised information being provided previously Realize the training of Model of Target Recognition.This method is trained Model of Target Recognition using multiple training images, the present embodiment Described method is mainly saying how to instruct using the Model of Target Recognition that a training image assists semantic segmentation Practice.It will be appreciated by persons skilled in the art that training process is needed using a large amount of training image, the quantity of training image is got over More, covering surface is wider, and the Model of Target Recognition that training obtains is more accurate.The embodiment of the present invention does not limit the quantity of training image System.

As shown in Fig. 2, the training method for being directed to each training image includes the following steps:

Step S101 selects multiple local candidate regions from training image；And obtain the Weakly supervised letter of training image Breath.Step S101 is data preparation step.

One training image is splitted into multiple local candidate regions to understand and learn by the embodiment of the present invention.The principle of dismantling Being being capable of different size of object in covering training image as much as possible；Each part candidate region can cover the one of object Part, it is not necessary to object is completely included, thus the information that each local candidate region is acquired is richer.Further, training is schemed The dismantling of picture is specially to carry out super-pixel segmentation processing to training image, obtains several image blocks；Then, by several images Block carries out clustering combination and obtains multiple local candidate regions.It is candidate that the part provided in the prior art can be used in the embodiment of the present invention The selection method in region, with no restriction to this.

In addition, also needing the Weakly supervised information of acquisition training image, which is the information being provided previously.It is optional Ground, Weakly supervised information are specially object category information.Traditional pixel markup information needs accurately to mark out in training image Object category belonging to each pixel, and Weakly supervised information is the object category information that training image is included in the present invention.It lifts For example, if including people and aircraft in some training image, traditional pixel markup information needs to mark out the training image In each pixel whether belong to people or aircraft, and the present invention only needs to mark out someone and aircraft in training image.That is, pre- It first informs the object category for including in training device training image, but does not inform the position of object.

Step S102 will be input to Model of Target Recognition for multiple local candidate regions that training image is selected, obtain The PRELIMINARY RESULTS of multiple local candidate region classification of Model of Target Recognition output.

After getting out data, start training process.Firstly, the multiple local candidate regions that will be selected for training image It is input to Model of Target Recognition, obtains the PRELIMINARY RESULTS of multiple local candidate region classification.

In the embodiment of the present invention, the cross entropy loss function of the full articulamentum of the full convolutional neural networks of deep learning is utilized As Model of Target Recognition, predict to obtain the PRELIMINARY RESULTS of multiple local candidate region classification using cross entropy loss function.

It is candidate to carry out part according to the PRELIMINARY RESULTS of Weakly supervised information and multiple local candidate region classification by step S103 Region fusion.

Classify first in multiple local candidate regions that Weakly supervised information and step S102 dependent on aforementioned preparation obtain Step is as a result, carry out fusion treatment to multiple local candidate regions.

Step S104, according to the result pair of the PRELIMINARY RESULTS of multiple local candidate region classification and local candidate region fusion The parameter of Model of Target Recognition is modified.

Share full convolutional neural networks as a result, corrects the cross entropy loss function of full articulamentum, realizes to target identification The amendment of model.

Above step S102 to step S104 is training step, and iteration executes above-mentioned training step, housebroken to obtain Model of Target Recognition.Specifically, training step iteration executes until the training result of Model of Target Recognition meets predetermined convergence item Part.For example, predetermined convergence condition is to reach predetermined the number of iterations, and when the number of iterations reaches predetermined the number of iterations, iterative process Terminate.Alternatively, predetermined convergence condition is that PRELIMINARY RESULTS and the difference corrected between result restrain to a certain extent, it is pre- when meeting this When determining the condition of convergence, iterative process terminates.

Model of Target Recognition training method provided in this embodiment will wait first for multiple parts that training image is selected Favored area is input to Model of Target Recognition and obtains the PRELIMINARY RESULTS of multiple local candidate regions classification, according to Weakly supervised information and more The PRELIMINARY RESULTS of a part candidate region classification carries out local candidate region fusion, according to fusion results to Model of Target Recognition Parameter be modified, then obtain housebroken Model of Target Recognition.It is real by the guidance for the Weakly supervised information being provided previously The classification identification for having showed accurately local candidate region, completes object recognition task.

Fig. 3 shows the flow chart of the embodiment two of Model of Target Recognition training method provided by the invention.Fig. 4 is shown The network model schematic diagram of the embodiment two of Model of Target Recognition training method provided by the invention.Below with reference to this two width figure pair The concrete scheme of the present embodiment describes in detail.Method described in the present embodiment is also to say how to utilize a training figure As being trained to Model of Target Recognition.

As shown in figure 3, the training method for being directed to each training image includes the following steps:

Step S201 chooses multiple local candidate regions from training image；And obtain the Weakly supervised letter of training image Breath.Step S201 is data preparation step.

One training image is splitted into multiple local candidate regions to understand and learn by the embodiment of the present invention.The principle of dismantling Being being capable of different size of object in covering training image as much as possible；Each part candidate region can cover the one of object Part, it is not necessary to object is completely included, thus the information that each local candidate region is acquired is richer.Such as the institute, b branch in Fig. 4 Show, b branch into object positioning branch, specifically by original training image disassemble for several part candidate region, this branch tears open What is solved is more careful, and the accuracy rate of object positioning is higher.In the actual operation process, b branch is disassembled obtains 2000 or so Local candidate region.The selection method of the local candidate region provided in the prior art can be used in the embodiment of the present invention, not to this It is restricted.

In addition, also needing the Weakly supervised information of acquisition training image, which is the information being provided previously.It is optional Ground, Weakly supervised information are specially object category information.Traditional pixel markup information needs accurately to mark out in training image Object category belonging to each pixel, and Weakly supervised information is the object category information that training image is included in the present invention.Such as Shown in the lower right corner Fig. 4, the Weakly supervised information being provided previously for the training image is exactly people and aircraft.Traditional pixel mark Information needs to mark out whether each pixel in the training image belongs to people or aircraft, and the present invention only needs to mark out training image Middle someone and aircraft.That is, informing the object category for including in training device training image in advance, but do not inform object Position.

Training image is input to semantic segmentation model by step S202, obtains the preliminary knot of the semantic segmentation of training image Fruit.

After getting out data, start training process.Firstly, training image is input to initial semantic segmentation model, Obtain the PRELIMINARY RESULTS of the semantic segmentation of training image.The present embodiment is using the full convolutional neural networks of deep learning as semanteme The model of segmentation.Specifically, full convolutional neural networks are carried out to training image to predict to obtain the PRELIMINARY RESULTS of semantic segmentation.This step Suddenly learn the parameter of intermediate representation by multiple convolutional layers/nonlinear response layer/pond layer, a specific example is as follows:

1. input layer

// the first stage shares convolutional layer result

2.≤1 convolutional layer 1_1 (3 × 3 × 64)

3.≤2 ReLU layers of nonlinear responses

4.≤3 convolutional layer 1_2 (3 × 3 × 64)

5.≤4 ReLU layers of nonlinear responses

6.≤5 pond layers (3 × 3/2)

7.≤6 convolutional layer 2_1 (3 × 3 × 128)

8.≤7 ReLU layers of nonlinear responses

9.≤8 convolutional layer 2_2 (3 × 3 × 128)

10.≤9 ReLU layers of nonlinear responses

11.≤10 pond layers (3 × 3/2)

12.≤11 convolutional layer 3_1 (3 × 3 × 256)

13.≤12 ReLU layers of nonlinear responses

14.≤13 convolutional layer 3_2 (3 × 3 × 256)

15.≤14 ReLU layers of nonlinear responses

16.≤15 convolutional layer 3_3 (3 × 3 × 256)

17.≤16 ReLU layers of nonlinear responses

18.≤17 pond layers (3 × 3/2)

19.≤18 convolutional layer 4_1 (3 × 3 × 512)

20.≤19 ReLU layers of nonlinear responses

21.≤20 convolutional layer 4_2 (3 × 3 × 512)

22.≤21 ReLU layers of nonlinear responses

23.≤22 convolutional layer 4_3 (3 × 3 × 512)

24.≤23 ReLU layers of nonlinear responses

25.≤24 pond layers (3 × 3/2)

26.≤25 convolutional layer 5_1 (3 × 3 × 512)

27.≤26 ReLU layers of nonlinear responses

28.≤27 convolutional layer 5_2 (3 × 3 × 512)

29.≤28 ReLU layers of nonlinear responses

30.≤29 convolutional layer 5_3 (3 × 3 × 512)

31.≤30 ReLU layers of nonlinear responses

32.≤31 linear difference layers

33.≤32 loss layers carry out the calculating of loss function

Wherein the number before symbol "≤" is current layer number, and subsequent number is the input number of plies, for example, 2.≤1 tables Bright current layer is the second layer, is inputted as first layer.It is convolution layer parameter in bracket behind convolutional layer, for example, 3 × 3 × 64, show Convolution kernel size 3 × 3, port number 64.It is pond layer parameter that bracket is interior behind the layer of pond, for example, 3 × 3/2 show Chi Huahe Size 3 × 3, is divided into 2.

In above-mentioned neural network, there is a nonlinear response unit after each convolutional layer.The nonlinear response list Member is specially to correct linear unit (Rectified Linear Units, hereinafter referred to as: ReLU), by increasing after convolutional layer Above-mentioned correction linear unit, the mapping result of convolutional layer is more sparse as far as possible, closer to the vision response of people, to make image Treatment effect is more preferable.In above-mentioned example, the convolution kernel of convolutional layer is set as 3 × 3, can preferably integrate local message.

In the present embodiment, the step-length stride of pond layer is set, in order to upper layer feature be allowed not increase calculation amount Under the premise of obtain the bigger visual field, while the step-length stride of pond layer allows there are also the feature for enhancing space-invariance Same input appears on different picture positions, and it is identical to export result response.

Linear difference layer obtains the predicted value of each pixel in order to which feature before is amplified to original image size.

In conclusion the convolutional layer of the full convolutional neural networks is mainly used for, information is concluded and fusion, pond layer (are chosen as Maximum pond layer: Max pooling) it is substantially carried out the conclusion of high layer information.The full convolutional neural networks, which can be finely adjusted, to be come Adapt to the tradeoff of different performance and efficiency.

The PRELIMINARY RESULTS of the semantic segmentation for the training image that this step obtains is specially the mark of the semantic segmentation of pixel scale Note, i.e., the mark of the semantic segmentation result of each pixel.But since the semantic segmentation model is the model in training process, It is not final model, therefore PRELIMINARY RESULTS is inaccurate.

Step S203 is classified multiple local candidate regions according to object category using cross entropy loss function；It is right The probability that each part candidate region belongs to object category is predicted, the object category probability of each local candidate region is obtained Predicted value.

It is candidate in a series of part for obtaining objects using the local candidate region generation method of object by step S201 After region, this step classifies to these local candidate regions.One multitask of additional designs of the embodiment of the present invention Training subsystem, is constrained using the mark of image level.The training subsystem of the multitask includes waiting to the part of object The training (i.e. step S203) of favored area classification and (i.e. step 204), this method avoid because most to the training of image category First stage training sample supervisory signals inaccuracy and caused by semanteme deviation.

Specifically, in step S203, using cross entropy loss function, by multiple local candidate regions according to object category Classify.One specific example is as follows:

34.≤31 full articulamentum 6_1 (M × N) (M is upper one layer of output dimension, and N is the classification dimension for needing to predict)

35.≤34 cross entropy loss function layers

By share aforementioned full convolutional neural networks as a result, obtaining the class of local candidate region full articulamentum is predictable Not.

The probability that this step also belongs to object category to each local candidate region is predicted, it is candidate to obtain each part The object category probabilistic forecasting value in region.Specifically, the object category probabilistic forecasting value of each local candidate region is by upper State what full convolutional neural networks learnt.

Step S204 is trained the function for predicting the image category of training image according to Weakly supervised information.

In this step, the scheme of multiple body training is utilized in image category training, uses production model Log-Sum- Exponentail classification, optimization formula are as follows:

Wherein, I_kFor k-th of training image, c is classification；x_kjFor j-th local candidate region of k-th training image Expressive Features, M are the number of the local candidate region of k-th of training image；w_cFor the classifier parameters for the classification c to be learned.It should Formula predictions are I_kClassification is the probability of c, i.e. P_r(I_k∈c|w_c)。

This step, by the study of above-mentioned optimization formula, can acquire classification of all categories using Weakly supervised information as input Device parameter.Since the standard markup information that Weakly supervised information is image category passes through above-mentioned optimization according to the standard markup information The study of formula obtains classifier parameters of all categories, so that be provided with when encountering similar input picture next time also can be pre- for network The ability of survey.

Step S205 selects the local candidate region for belonging to same object category out of multiple local candidate regions.

After the classification processing of step S203, object category belonging to each local candidate region is being would know that.This step Suddenly the local candidate region for belonging to same object category is chosen as one group, executes subsequent operation.If training figure Picture includes N number of object category, then local candidate region is divided into N group, for each group of subsequent operation of execution.

Step S206 carries out fusion treatment for the local candidate region for belonging to same object category, and is calculated using cluster Fused image-region is divided into nearly object area, nearly background area and ambiguity region by method.

Since selected local candidate region is more, if to all local candidate regions for belonging to same object category Fusion treatment is carried out, calculation amount is larger.In order to reduce calculation amount, optionally from candidate for the part for belonging to same object category A collection of local candidate region is picked out in region carries out fusion treatment.The principle selected can be used but be not limited to the following two kinds:

One is the object category probabilistic forecasting value according to local candidate region belongs to from high to low sequence from being directed to The local candidate region of preset quantity is picked out in the local candidate region of same object category, carries out fusion treatment.

Another kind is to pick out object category probabilistic forecasting from for the local candidate region for belonging to same object category Value is higher than the local candidate region of preset threshold, carries out fusion treatment.

The object category probabilistic forecasting value that the illustrative principle of both the above is all based on local candidate region is selected, The height of object category probabilistic forecasting value reflects that local candidate region belongs to the height of the probability of some object category, above-mentioned two The purpose of kind principle is all to pick out the higher local candidate region of likelihood ratio for belonging to object.

Fusion treatment is carried out to the local candidate region picked out.Detailed process can be with are as follows: candidate to the part picked out Region carries out image dividing processing, obtains the binary segmentation mask of local candidate region；The local candidate region that will be singled out Binary segmentation mask carry out fusion treatment；Using clustering algorithm by fused image-region be divided into nearly object area, Nearly background area and ambiguity region.

Fig. 5 a to Fig. 5 h shows an exemplary schematic diagram of local candidate region fusion treatment in the embodiment of the present invention. Wherein, Fig. 5 a is original image, and Fig. 5 b is the true picture of semantic segmentation, and Fig. 5 c, Fig. 5 d and Fig. 5 e are that object category probability is pre- Fig. 5 c, Fig. 5 d and Fig. 5 e are carried out fusion treatment in the binary segmentation mask of the local candidate region of front three by measured value ranking Obtain Fig. 5 f；Using a kind of optionally clustering algorithm, Fig. 5 g is obtained as kmeans method carries out clustering processing to image-region, Image-region is divided into nearly object area, nearly background area and ambiguity region in Fig. 5 g.Wherein nearly object area refer to for The higher region of the probability of object (white area in Fig. 5 g), the higher region of probability that nearly background area refers to as background (black region in Fig. 5 g), background here typically refer to the higher region of probability for being not belonging to object, and ambiguity region refers to Can not estimate whether be object region (grey area Fig. 5 g).

Step S207, nearly object area and nearly background area are as seed, using partitioning algorithm to the ambiguity region It is split, obtains the correction result of the semantic segmentation of the training image.

In order to further predict the segmentation result in ambiguity region, this step nearly object area and nearly background area conduct Seed obtains an object category if grabcut algorithm is split ambiguity region using a kind of optionally partitioning algorithm Semantic segmentation correction result.In the examples described above, Fig. 5 h is the correction result of the semantic segmentation of Fig. 5 a.

If training image includes N number of object category, local candidate region is divided into N group respectively by above-mentioned steps The processing of S206 and step S207 obtain the correction of the semantic segmentation of all objects classification as a result, finally obtaining entire training figure The correction result of the semantic segmentation of picture.

Step S208, according to PRELIMINARY RESULTS and correction as a result, being modified to the model parameter of semantic segmentation model.

Legitimate reading is substituted, the correction result that above-mentioned steps obtain is considered as standard output, determines standard output and preliminary As a result difference is obtained the loss function of semantic segmentation model according to determining difference, is returned using loss function response It passes, the model parameter of update semantics parted pattern.

Step S209, the output of the shared semantic segmentation model being corrected as a result, to the parameter of cross entropy loss function into Row amendment.

Above step S202 to step S209 is training step, and iteration executes above-mentioned training step, housebroken to obtain Model of Target Recognition.Specifically, training step iteration executes until the training result of semantic segmentation model meets predetermined convergence item Part.For example, predetermined convergence condition is to reach predetermined the number of iterations, and when the number of iterations reaches predetermined the number of iterations, iterative process Terminate.Alternatively, predetermined convergence condition is that PRELIMINARY RESULTS and the difference corrected between result restrain to a certain extent, it is pre- when meeting this When determining the condition of convergence, iterative process terminates.

Model of Target Recognition training method provided in this embodiment, according to Weakly supervised information and multiple local candidate regions, Local candidate region fusion is carried out, obtains the correction of the semantic segmentation of training image as a result, to the mould to semantic segmentation model Shape parameter is modified.Then, that shares the semantic segmentation model being corrected exports the ginseng as a result, to Model of Target Recognition Number is modified, and then obtains housebroken Model of Target Recognition.By the guidance for the Weakly supervised information being provided previously, realize Accurately target identification.Further, this method includes the training process of Liang Ge branch, and a branch is semantic segmentation model Training process, another branch (i.e. object positions branch) are the training of the local candidate region classification to object and to image class The process of other training, Liang Ge branch can share trained as a result, it is possible to avoid because of initial period training sample supervision letter Number inaccuracy and caused by semanteme deviate, further improve the accuracy of semantic segmentation result.Due to passing through language in the program The prediction result of pixel scale can be obtained in adopted parted pattern, can belong to pixel as temporary supervision message, such case The direct supervision of rank, so that the direct supervision of the existing pixel scale of the program, and semantic segmentation mould can be optimized end to end Type also introduces object positioning branch, but can be improved according to the judgement to local candidate region target identification as a result, It finally realizes and local candidate region classification is accurately identified.

Fig. 6 shows the flow chart of the embodiment three of Model of Target Recognition training method provided by the invention.Fig. 7 is shown The network model schematic diagram of the embodiment three of Model of Target Recognition training method provided by the invention.The present embodiment and above-mentioned implementation The main distinction of example two is, PRELIMINARY RESULTS and each local candidate region of the present embodiment according to the semantic segmentation of training image Object category probabilistic forecasting value, multiple local candidate regions are screened, the local candidate region through screening is utilized to carry out Subsequent fusion treatment.Wherein, one of the training process of Model of Target Recognition is also belonged to local candidate region screening process Point.It describes in detail below with reference to this two width figure to the concrete scheme of the present embodiment.Method described in the present embodiment is also Saying how to be trained to Model of Target Recognition using a training image.

As shown in fig. 6, the training method for being directed to each training image includes the following steps:

Step S301 chooses multiple local candidate regions from training image；And obtain the Weakly supervised letter of training image Breath.Step S301 is data preparation step.

Training image is input to semantic segmentation model by step S302, obtains the preliminary knot of the semantic segmentation of training image Fruit.

Step S303 is classified multiple local candidate regions according to object category using cross entropy loss function；It is right The probability that each part candidate region belongs to object category is predicted, the object category probability of each local candidate region is obtained Predicted value.

Step S304 is trained the function for predicting the image category of training image according to Weakly supervised information.

The specific implementation process of above-mentioned steps S301 to step S304 can be found in the embodiment of the present invention two step S201 extremely The description of step S204, details are not described herein.

Step S305, the object category of the PRELIMINARY RESULTS of the semantic segmentation according to training image and each local candidate region Probabilistic forecasting value screens multiple local candidate regions.

Since the local candidate region prepared step S301 has thousands of, and these local candidate regions are as sample For this, there are imbalanced training sets, the i.e. area of the quantity Yu high probability background in the region (positive sample) of high probability object The quantity in domain (negative sample) is unbalanced, so that subsequent training process is affected by it to obtain the result of inaccuracy.Therefore, the present embodiment By being screened to local candidate region, so that sample is more balanced.

Specifically, the PRELIMINARY RESULTS of the dicing masks of local candidate region and the semantic segmentation of training image is calculated first Friendship and ratio, hand over and ratio be bigger, show that the probability that local candidate region is object is higher, hand over and than smaller, show local candidate Region is that the probability of background is higher.Then, according to the friendship of local candidate region and than with hand over and than the comparison result of threshold value and The object category predicted value of local candidate region and the comparison result of predicted value threshold value sieve multiple local candidate regions Choosing.

Further, the present embodiment presets two threshold values of friendship and ratio, and respectively first hands over and than threshold value and second It hands over and than threshold value, wherein first hands over and be greater than second than threshold value and hand over and compare threshold value；It is pre- that the present embodiment also presets object category Two threshold values of measured value, respectively the first predicted value threshold value and the second predicted value threshold value, wherein the first predicted value threshold value is greater than the Two predicted value threshold values.

Friendship and ratio in response to local candidate region are greater than or equal to first and hand over and than threshold value, the object of local candidate region Class prediction value is greater than or equal to the first predicted value threshold value, using local candidate region as the part through screening obtained positive sample Candidate region；

Friendship and ratio in response to local candidate region are less than or equal to second and hand over and than threshold value, the object of local candidate region Class prediction value is less than or equal to the second predicted value threshold value, using local candidate region as the part through screening obtained negative sample Candidate region.

By above-mentioned threshold value comparison process, a certain number of positive samples and negative sample are filtered out, and guarantees to filter out just The equal number of sample and negative sample.

Step S306 selects the part candidate for belonging to same object category from through screening in obtained local candidate region Region.

This step chooses the local candidate region for belonging to same object category as one group, executes subsequent behaviour Make.If training image includes N number of object category, local candidate region is divided into N group, subsequent for each group of execution Operation.

Step S307 carries out fusion treatment for the local candidate region for belonging to same object category, and is calculated using cluster Fused image-region is divided into nearly object area, nearly background area and ambiguity region by method.

Step S308, nearly object area and nearly background area are as seed, using partitioning algorithm to the ambiguity region It is split, obtains the correction result of the semantic segmentation of the training image.

Step S309, according to PRELIMINARY RESULTS and correction as a result, being modified to the model parameter of semantic segmentation model.

Step S310, the output of the shared semantic segmentation model being corrected as a result, to the parameter of cross entropy loss function into Row amendment.

The specific implementation process of above-mentioned steps S307 to step S310 can be found in the embodiment of the present invention two step S206 extremely The description of step S209, details are not described herein.

Above step S302 to step S310 is training step, and iteration executes above-mentioned training step, housebroken to obtain Model of Target Recognition.Specifically, training step iteration executes until the training result of semantic segmentation model meets predetermined convergence item Part.For example, predetermined convergence condition is to reach predetermined the number of iterations, and when the number of iterations reaches predetermined the number of iterations, iterative process Terminate.Alternatively, predetermined convergence condition is that PRELIMINARY RESULTS and the difference corrected between result restrain to a certain extent, it is pre- when meeting this When determining the condition of convergence, iterative process terminates.

Model of Target Recognition training method provided in this embodiment, according to Weakly supervised information and multiple local candidate regions, Local candidate region fusion is carried out, obtains the correction of the semantic segmentation of training image as a result, to the mould to semantic segmentation model Shape parameter is modified.Then, that shares the semantic segmentation model being corrected exports the ginseng as a result, to Model of Target Recognition Number is modified, and then obtains housebroken Model of Target Recognition.By the guidance for the Weakly supervised information being provided previously, realize Accurately target identification.Further, this method includes the training process of Liang Ge branch, and a branch is semantic segmentation model Training process, another branch (i.e. object positions branch) are the training of the local candidate region classification to object and to image class The process of other training, Liang Ge branch can share trained as a result, it is possible to avoid because of initial period training sample supervision letter Number inaccuracy and caused by semanteme deviate, further improve the accuracy of semantic segmentation result.The existing pixel scale of the program Direct supervision, and semantic segmentation model can be optimized end to end, also introduce an object positioning branch, and being capable of basis Improve the result of target identification to the judgement of local candidate region.In addition to this, the present embodiment is also by local candidate region It is screened, so that sample is more balanced, has advanced optimized training effect, finally realized to local candidate region classification It accurately identifies.

The present invention also provides a kind of target identification method, which knows image to be identified as target The input of other model determines point for the multiple local candidate regions selected for image according to the output result of Model of Target Recognition Class result.The Model of Target Recognition obtained in the present invention based on training carries out the method and method in the prior art of target identification It is identical, unlike, the Model of Target Recognition utilized is obtained using the training method that the above embodiment of the present invention provides.

Fig. 8 shows the functional block diagram of the embodiment one of Model of Target Recognition training device provided by the invention.Such as Shown in Fig. 8, the Model of Target Recognition training device of the present embodiment uses multiple training images for being labeled with Weakly supervised information in advance Model of Target Recognition is trained, which includes: training module 820.

Training module 820 further comprises: object-recognition unit 824, integrated unit 822 and amending unit 823.

Object-recognition unit 824 is used to that target identification will to be input to for multiple local candidate regions that training image is selected Model obtains the PRELIMINARY RESULTS of multiple local candidate region classification of Model of Target Recognition output.In the embodiment of the present invention, utilize The cross entropy loss function of the full articulamentum of the full convolutional neural networks of deep learning uses cross entropy as Model of Target Recognition Loss function is predicted to obtain the PRELIMINARY RESULTS of multiple local candidate region classification.

Integrated unit 822 is used for the PRELIMINARY RESULTS according to Weakly supervised information and multiple local candidate region classification, carry out office The fusion of portion candidate region.Dependent on the Weakly supervised information of aforementioned preparation, fusion treatment is carried out to multiple local candidate regions.

Amending unit 823, PRELIMINARY RESULTS and local candidate region for being classified according to the multiple local candidate region The result of fusion is modified the parameter of the Model of Target Recognition.

Above-mentioned 820 iteration of training module operation, to obtain housebroken Model of Target Recognition.Specifically, training module 820 Iteration executes until the training result of Model of Target Recognition meets predetermined convergence condition.For example, predetermined convergence condition is to reach pre- Determine the number of iterations, when the number of iterations reaches predetermined the number of iterations, iterative process terminates.Alternatively, predetermined convergence condition is preliminary As a result to a certain extent, when meeting the predetermined convergence condition, iterative process terminates for the difference convergence between correction result.

Further, training device further include: data preparation module 810 from training image for selecting multiple parts Candidate region；And obtain the Weakly supervised information of training image.

In addition, data preparation module 810 is further used for: obtaining the object category information of training image.Traditional pixel Markup information needs accurately to mark out object category belonging to each pixel in training image, and Weakly supervised information in the present invention It is the object category information that training image is included.For example, traditional if comprising people and aircraft in some training image Pixel markup information needs to mark out whether each pixel in the training image belongs to people or aircraft, and the present invention only needs to mark out Someone and aircraft in training image.That is, informing the object category for including in training device training image in advance, but do not accuse Know the position of object.

Model of Target Recognition training device provided in this embodiment, the multiple local candidate regions that will be selected for training image Domain is input to Model of Target Recognition and obtains the PRELIMINARY RESULTS of multiple local candidate region classification, according to Weakly supervised information and multiple offices The PRELIMINARY RESULTS of portion candidate region classification carries out local candidate region fusion, according to fusion results to the ginseng of Model of Target Recognition Number is modified, and then obtains housebroken Model of Target Recognition.By the guidance for the Weakly supervised information being provided previously, realize Accurately target identification.

Fig. 9 shows the functional block diagram of the embodiment two of Model of Target Recognition training device provided by the invention.This For embodiment on the basis of above-mentioned apparatus embodiment one, the additional designs training subsystem of one multitask utilizes image level Other mark is constrained, to avoid because initial period training sample supervisory signals inaccuracy and caused by semanteme deviation.

Training module 820 further comprises: semantic segmentation unit 821, for training image to be input to semantic segmentation mould Type obtains the PRELIMINARY RESULTS of the semantic segmentation of training image.Optionally, the embodiment of the present invention is refreshing using the full convolution of deep learning Model through network as semantic segmentation.It is predicted using full convolutional neural networks, passes through multiple convolutional layer/nonlinear responses Layer/pond layer learns the parameter of intermediate representation, obtains the PRELIMINARY RESULTS of the semantic segmentation of training image.

Object-recognition unit 824, for utilizing cross entropy loss function, by multiple local candidate regions according to object category Classify；The probability for belonging to object category to each local candidate region is predicted, each local candidate region is obtained Object category probabilistic forecasting value.Specifically, the object category probabilistic forecasting value of each local candidate region is by full convolution mind It is obtained through e-learning.

Object-recognition unit 824 by share full convolutional neural networks as a result, obtaining part full articulamentum is predictable The classification of candidate region.

Integrated unit 822 is used for the PRELIMINARY RESULTS according to Weakly supervised information and multiple local candidate region classification, carry out office The fusion of portion candidate region, obtains the correction result of the semantic segmentation of the training image.

Amending unit 823 is further used for: the correction of PRELIMINARY RESULTS and semantic segmentation according to semantic segmentation is as a result, to language The model parameter of adopted parted pattern is modified；The output of the shared semantic segmentation model being corrected is as a result, to target identification mould The parameter of type is modified.The embodiment of the present invention and the prior art the difference is that, the present embodiment does not use semantic segmentation Legitimate reading (the preparatory markup information of such as pixel scale) as correction training process in semantic segmentation model model ginseng Number, but the result that the fusion of multiple local candidate regions of training image is obtained is as semantic segmentation in correction training process The model parameter of model.Legitimate reading is substituted, the correction result that integrated unit 822 obtains is considered as standard output, determines standard The difference of output and PRELIMINARY RESULTS, obtains the loss function of semantic segmentation model according to determining difference, is rung using loss function It should be worth and be returned, the model parameter of update semantics parted pattern.

Training module 820 further include: image category predicting unit 825, for being instructed to for prediction according to Weakly supervised information The function for practicing the image category of image is trained.

Specifically, the scheme of multiple body training is utilized in image category training, uses production model Log-Sum- Exponentail classification, optimization formula are as follows:

Image category predicting unit 825, by the study of above-mentioned optimization formula, can be acquired using Weakly supervised information as input Classifier parameters of all categories.Due to the standard markup information that Weakly supervised information is image category, according to the standard markup information, Classifier parameters of all categories are obtained by the study of above-mentioned optimization formula, so that network, which is provided with, encounters similar input figure next time As when the ability that can also predict.

Further, integrated unit 822 includes: to sort out subelement 822a, fusion treatment subelement 822b, divides subelement 822c。

Sort out subelement 822a to be used to select the part time for belonging to same object category out of multiple local candidate regions Favored area.Sort out subelement 822a to choose the local candidate region for belonging to same object category as one group, transfers to melt It closes processing subelement 822b and segmentation subelement 822c is handled.If training image includes N number of object category, locally Candidate region is divided into N group, transfers to fusion treatment subelement 822b and segmentation subelement 822c to handle for each group.

Fusion treatment subelement 822b is used to carry out at fusion for the local candidate region for belonging to same object category Reason, and fused image-region is divided by nearly object area, nearly background area and ambiguity region using clustering algorithm.

Since selected local candidate region is more, if to all local candidate regions for belonging to same object category Fusion treatment is carried out, calculation amount is larger.In order to reduce calculation amount, optionally fusion treatment subelement 822b from for belong to it is same A collection of local candidate region is picked out in the local candidate region of object category carries out fusion treatment.The principle selected can be used but It is not limited to the following two kinds:

Fusion treatment subelement 822b carries out fusion treatment to the local candidate region picked out.Detailed process can be with are as follows: Image dividing processing is carried out to the local candidate region picked out, obtains the binary segmentation mask of local candidate region；It will choose The binary segmentation mask for the local candidate region selected carries out fusion treatment；Using clustering algorithm by fused image-region It is divided into nearly object area, nearly background area and ambiguity region.

Divide subelement 822c for nearly object area and nearly background area as seed, using partitioning algorithm to ambiguity Region is split, and obtains the correction result of the semantic segmentation of training image.

In order to further predict the segmentation result in ambiguity region, segmentation subelement 822c nearly object area and nearly background Region obtains one if grabcut algorithm is split ambiguity region as seed, using a kind of optionally partitioning algorithm The correction result of the semantic segmentation of object category.If training image includes N number of object category, local candidate region is divided It is N group respectively by the processing of fusion treatment subelement 822b and segmentation subelement 822c, obtains the semanteme of all objects classification The correction of segmentation is as a result, finally obtain the correction result of the semantic segmentation of entire training image.

Model of Target Recognition training device provided in this embodiment, according to Weakly supervised information and multiple local candidate regions, Local candidate region fusion is carried out, obtains the correction of the semantic segmentation of training image as a result, to the mould to semantic segmentation model Shape parameter is modified.Then, that shares the semantic segmentation model being corrected exports the ginseng as a result, to Model of Target Recognition Number is modified, and then obtains housebroken Model of Target Recognition.By the guidance for the Weakly supervised information being provided previously, realize Accurately target identification.Further, the present apparatus includes the training process of Liang Ge branch, and a branch is semantic segmentation model Training process, another branch (i.e. object positions branch) are the training of the local candidate region classification to object and to image class The process of other training, Liang Ge branch can share trained as a result, it is possible to avoid because of initial period training sample supervision letter Number inaccuracy and caused by semanteme deviate, further improve the accuracy of semantic segmentation result.The existing pixel scale of the program Direct supervision, and semantic segmentation model can be optimized end to end, also introduce an object positioning branch, and being capable of basis To local candidate region judgement improve target identification as a result, the final accurate knowledge realized to local candidate region classification Not.

Figure 10 shows the functional block diagram of the embodiment three of Model of Target Recognition training device provided by the invention.This Embodiment is on the basis of above-mentioned apparatus embodiment two, training module 820 further include: selecting unit 826, for according to training figure The object category probabilistic forecasting value of the PRELIMINARY RESULTS of the semantic segmentation of picture and each local candidate region, to multiple local candidate regions It is screened in domain.

Integrated unit 822 is further used for: according to Weakly supervised information, melting to obtained local candidate region is screened It closes, obtains the correction result of the semantic segmentation of training image.

Since the local candidate region that data preparation module prepares has thousands of, and these local candidate regions are made For sample, there are imbalanced training sets, the i.e. quantity Yu high probability background in the region (positive sample) of high probability object Region (negative sample) quantity it is unbalanced so that subsequent training process is affected by it to obtain the result of inaccuracy.Therefore, this reality The selecting unit 826 of example is applied by screening to local candidate region, so that sample is more balanced.

Specifically, selecting unit 826 calculates the dicing masks of local candidate region and the semanteme point of training image first The friendship of the PRELIMINARY RESULTS cut and ratio, are handed over and ratio is bigger, show that the probability that local candidate region is object is higher, are handed over and than smaller, Show that the probability that local candidate region is background is higher.Then, according to the friendship of local candidate region and than with friendship and than threshold value The comparison result of the object category predicted value and predicted value threshold value of comparison result and local candidate region, waits multiple parts Favored area is screened.

Selecting unit 826 is further used for: friendship and ratio in response to local candidate region are greater than or equal to first and hand over and compare Threshold value, the object category predicted value of local candidate region are greater than or equal to the first predicted value threshold value, using local candidate region as Through the local candidate region for screening obtained positive sample；Friendship and ratio in response to local candidate region are less than or equal to second and hand over simultaneously Than threshold value, the object category predicted value of local candidate region is less than or equal to the second predicted value threshold value, and local candidate region is made For the local candidate region through screening obtained negative sample.

By above-mentioned threshold value comparison process, selecting unit 826 filters out a certain number of positive samples and negative sample, and guarantees The equal number of the positive sample and negative sample that filter out.

Sort out subelement 822a to be used to out of local candidate region that obtain through screening select to belong to same object category Local candidate region.

Fusion treatment subelement 822b is further used for: the object category probabilistic forecasting value according to local candidate region is from height To low sequence, the local candidate regions of preset quantity are picked out from for the local candidate region for belonging to same object category Domain carries out fusion treatment；Alternatively, picking out object category probability from for the local candidate region for belonging to same object category Predicted value is higher than the local candidate region of preset threshold, carries out fusion treatment.

Model of Target Recognition training device provided in this embodiment, according to Weakly supervised information and multiple local candidate regions, Local candidate region fusion is carried out, obtains the correction of the semantic segmentation of training image as a result, to the mould to semantic segmentation model Shape parameter is modified.Then, that shares the semantic segmentation model being corrected exports the ginseng as a result, to Model of Target Recognition Number is modified, and then obtains housebroken Model of Target Recognition.By the guidance for the Weakly supervised information being provided previously, realize Accurately target identification.Further, the present apparatus includes the training process of Liang Ge branch, and a branch is semantic segmentation model Training process, another branch (i.e. object positions branch) are the training of the local candidate region classification to object and to image class The process of other training, Liang Ge branch can share trained as a result, it is possible to avoid because of initial period training sample supervision letter Number inaccuracy and caused by semanteme deviate, further improve the accuracy of semantic segmentation result.The existing pixel scale of the program Direct supervision, and semantic segmentation model can be optimized end to end, also introduce an object positioning branch, and being capable of basis Improve the result of target identification to the judgement of local candidate region.In addition to this, the present embodiment is also by local candidate region It is screened, so that sample is more balanced, has advanced optimized training effect, finally realized to local candidate region classification It accurately identifies.

The present invention also provides a kind of Target Identification Unit, which knows image to be identified as target The input of other model determines point for the multiple local candidate regions selected for image according to the output result of Model of Target Recognition Class result.The Model of Target Recognition obtained in the present invention based on training carries out the device and device in the prior art of target identification It is identical, unlike, the Model of Target Recognition utilized is obtained using the training device that the above embodiment of the present invention provides.

Method and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice In the equipment of microprocessor or digital signal processor (DSP) to realize acquisition application message according to an embodiment of the present invention The some or all functions of some or all components.The present invention is also implemented as executing method as described herein Some or all device or device programs (for example, computer program and computer program product).Such reality Existing program of the invention can store on a computer-readable medium, or may be in the form of one or more signals. Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or in any other forms It provides.

For example, Figure 11 shows the calculating equipment that Model of Target Recognition training method according to the present invention may be implemented.It should Calculating equipment can be terminal or server.The calculating equipment conventionally comprises processor 1110 and to store 1120 shape of equipment The computer program product or computer-readable medium of formula, additionally include communication interface and communication bus.Store equipment 1120 can be the electricity of such as flash memory, EEPROM (electrically erasable programmable read-only memory), EPROM, hard disk or ROM etc Sub memory.One or more processors, communication interface and memory complete mutual communication by communication bus.Processor Can be CPU (central processing unit) or GPU (graphics processing unit) storage equipment 1120 has storage for executing above-mentioned side The memory space 1130 of the program code 1131 of any method and step in method, for storing at least one instruction, which makes to locate Manage the various steps in the Model of Target Recognition training method of the device execution embodiment of the present invention.For example, storage program code is deposited Storage space 1130 may include each program code 1131 for being respectively used to realize the various steps in above method.These journeys Sequence code can read or be written to this one or more computer journey from one or more computer program product In sequence product.These computer program products include the program generation of such as hard disk, compact-disc (CD), storage card or floppy disk etc Code carrier.Such computer program product is usually portable or static memory cell shown in such as Figure 12.The storage Unit can have memory paragraph, the memory space etc. of 1120 similar arrangement of storage equipment in the calculating equipment with Figure 11.Program Code can for example be compressed in a suitable form.In general, storage unit includes for executing steps of a method in accordance with the invention 1131 ' of computer-readable code, it can by such as 1110 etc processor read code, when these codes are by calculating When equipment is run, the calculating equipment is caused to execute each step in method described above.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims

1. a kind of Model of Target Recognition training method, which is characterized in that the method is labeled with Weakly supervised letter using multiple in advance The training image of breath is trained Model of Target Recognition, is directed to each training image, and training step includes:

It will be input to the Model of Target Recognition for multiple local candidate regions that the training image is selected, obtains the mesh Mark the PRELIMINARY RESULTS of multiple local candidate region classification of identification model output；

It is candidate from the multiple part according to the PRELIMINARY RESULTS of the Weakly supervised information and the multiple local candidate region classification The local candidate region for belonging to same object category is selected in region, for the local candidate regions for belonging to same object category Domain carries out local candidate region fusion；

According to the PRELIMINARY RESULTS of the multiple local candidate region classification and the result of local candidate region fusion to the target The parameter of identification model is modified；

Iteration executes training step described above until the training result of the Model of Target Recognition meets predetermined convergence condition.

2. Model of Target Recognition training method according to claim 1, which is characterized in that the Weakly supervised information includes: Object category information.

3. Model of Target Recognition training method according to claim 1, which is characterized in that the training will be directed to described The selected multiple local candidate regions of image are input to before the Model of Target Recognition, the method also includes: to the instruction Practice image and carry out super-pixel segmentation processing, several image blocks that super-pixel segmentation is handled will be carried out and clustered, obtained more A part candidate region.

4. Model of Target Recognition training method according to claim 1-3, which is characterized in that the method is also wrapped It includes:

The training image is input to semantic segmentation model, obtains the PRELIMINARY RESULTS of the semantic segmentation of the training image；

It is described according to the Weakly supervised information and the PRELIMINARY RESULTS of the multiple local candidate region classification, from the multiple part The local candidate region for belonging to same object category is selected in candidate region, it is candidate for the part for belonging to same object category Region carries out local candidate region fusion specifically: according to the Weakly supervised information and the multiple local candidate region classification PRELIMINARY RESULTS, the local candidate region for belonging to same object category is selected out of the multiple local candidate region, for Belong to the local candidate region of same object category, carries out local candidate region fusion, obtain the semanteme point of the training image The correction result cut；

The result that the PRELIMINARY RESULTS and local candidate region according to the multiple local candidate region classification merges is to described The parameter of Model of Target Recognition, which is modified, further comprises:

The correction of PRELIMINARY RESULTS and the semantic segmentation according to the semantic segmentation is as a result, mould to the semantic segmentation model Shape parameter is modified；

The output of the shared semantic segmentation model being corrected is as a result, be modified the parameter of the Model of Target Recognition.

5. Model of Target Recognition training method according to claim 1-3, which is characterized in that described to be directed to institute It states the selected multiple local candidate regions of training image and is input to the Model of Target Recognition, it is defeated to obtain the Model of Target Recognition The PRELIMINARY RESULTS of multiple local candidate region classification out further include:

Using cross entropy loss function, the multiple local candidate region is classified according to object category；

The probability for belonging to object category to each local candidate region is predicted, the object type of each local candidate region is obtained Other probabilistic forecasting value；

The parameter to Model of Target Recognition, which is modified, further comprises: carrying out to the parameter of the cross entropy loss function Amendment.

6. Model of Target Recognition training method according to claim 1-3, which is characterized in that the training step Further include:

According to the Weakly supervised information, the function for predicting the image category of the training image is trained.

7. Model of Target Recognition training method according to claim 4, which is characterized in that described according to the Weakly supervised letter The PRELIMINARY RESULTS of breath and the multiple local candidate region classification, selected out of the multiple local candidate region belong to it is same The local candidate region of object category carries out local candidate region and melts for the local candidate region for belonging to same object category It closes, the correction result for obtaining the semantic segmentation of the training image further comprises:

The local candidate region for belonging to same object category is selected out of the multiple local candidate region；

For the local candidate region for belonging to same object category, fusion treatment is carried out, and will be fused using clustering algorithm Image-region is divided into nearly object area, nearly background area and ambiguity region；

Nearly object area and nearly background area is split the ambiguity region using partitioning algorithm, obtains as seed The correction result of the semantic segmentation of the training image.

8. Model of Target Recognition training method according to claim 7, which is characterized in that described be directed to belongs to same object The local candidate region of classification, carrying out fusion treatment further comprises:

Object category probabilistic forecasting value according to local candidate region belongs to same object category from being directed to from high to low sequence Local candidate region in pick out the local candidate region of preset quantity, carry out fusion treatment；

It is higher than in advance alternatively, picking out object category probabilistic forecasting value from for the local candidate region for belonging to same object category If the local candidate region of threshold value, fusion treatment is carried out.

9. Model of Target Recognition training method according to claim 4, which is characterized in that described to scheme for the training The selected multiple local candidate regions of picture are input to the Model of Target Recognition, obtain the multiple of the Model of Target Recognition output The PRELIMINARY RESULTS of local candidate region classification further include: utilize cross entropy loss function, the multiple local candidate region is pressed Classify according to object category；The probability for belonging to object category to each local candidate region is predicted, each part is obtained The object category probabilistic forecasting value of candidate region；

The training step further include: the PRELIMINARY RESULTS of the semantic segmentation according to the training image and each local candidate region Object category probabilistic forecasting value, the multiple local candidate region is screened；

It is described according to the Weakly supervised information and the PRELIMINARY RESULTS of the multiple local candidate region classification, from the multiple part The local candidate region for belonging to same object category is selected in candidate region, it is candidate for the part for belonging to same object category Region carries out local candidate region fusion, and the correction result for obtaining the semantic segmentation of the training image further comprises: foundation The PRELIMINARY RESULTS of the Weakly supervised information and the multiple local candidate region classification, it is candidate from obtained multiple parts are screened The local candidate region for belonging to same object category is selected in region, for the local candidate regions for belonging to same object category Local candidate region is merged in domain, obtains the correction result of the semantic segmentation of the training image.

10. Model of Target Recognition training method according to claim 9, which is characterized in that described according to the training figure The object category probabilistic forecasting value of the PRELIMINARY RESULTS of the semantic segmentation of picture and each local candidate region, waits the multiple part Favored area carries out screening:

Calculate the friendship of the PRELIMINARY RESULTS of the dicing masks of the local candidate region and the semantic segmentation of the training image and ratio；

According to the friendship of the local candidate region and than with hand over and than the comparison result of threshold value and the part candidate region The comparison result of object category predicted value and predicted value threshold value screens the multiple local candidate region.

11. Model of Target Recognition training method according to claim 10, which is characterized in that described according to local candidate regions The friendship in domain and than with hand over and than the comparison result of threshold value and the object category predicted value and predicted value of the local candidate region The comparison result of threshold value, carrying out screening to the multiple local candidate region further comprises:

Friendship in response to the local candidate region is simultaneously handed over and than being greater than or equal to first than threshold value, the part candidate region Object category predicted value is greater than or equal to the first predicted value threshold value, using the local candidate region as through screening obtained positive sample This local candidate region；And/or

Friendship in response to the local candidate region is simultaneously handed over and than being less than or equal to second than threshold value, the part candidate region Object category predicted value is less than or equal to the second predicted value threshold value, using the local candidate region as through screening obtained negative sample This local candidate region.

12. Model of Target Recognition training method according to claim 9, which is characterized in that described to be obtained from through screening The local candidate region for belonging to same object category is selected in multiple part candidate regions, for belonging to same object category Local candidate region is merged in local candidate region, obtains the correction result of the semantic segmentation of the training image into one Step includes:

The local candidate region for belonging to same object category is selected in obtained local candidate region through screening from described；

13. Model of Target Recognition training method according to claim 12, which is characterized in that described be directed to belongs to same object The local candidate region of body classification, carrying out fusion treatment further comprises:

14. a kind of target identification method characterized by comprising

Using image to be identified as the input of Model of Target Recognition, the Model of Target Recognition is used in advance such as claim 1- 13 described in any item methods are trained；

Point for the multiple local candidate regions selected for described image is determined according to the output result of the Model of Target Recognition Class result.

15. a kind of Model of Target Recognition training device, which is characterized in that described device is labeled with Weakly supervised letter using multiple in advance The training image of breath is trained Model of Target Recognition, and the training device includes:

Object-recognition unit is input to the target knowledge for that will be directed to the selected multiple local candidate regions of the training image Other model obtains the PRELIMINARY RESULTS of multiple local candidate region classification of the Model of Target Recognition output；

Integrated unit, for the PRELIMINARY RESULTS according to the Weakly supervised information and the multiple local candidate region classification, from institute It states in multiple local candidate regions and selects the local candidate region for belonging to same object category, for belonging to same object category Local candidate region, carry out local candidate region fusion；

Amending unit, for according to the PRELIMINARY RESULTS of the multiple local candidate region classification and the knot of local candidate region fusion Fruit is modified the parameter of the Model of Target Recognition；

16. Model of Target Recognition training device according to claim 15, which is characterized in that described device further include: number According to preparation module, for obtaining the object category information of the training image.

17. Model of Target Recognition training device according to claim 15, which is characterized in that described device further include: number According to preparation module, for carrying out super-pixel segmentation processing to the training image, if will carry out what super-pixel segmentation was handled Dry image block is clustered, and multiple local candidate regions are obtained.

18. the described in any item Model of Target Recognition training devices of 5-17 according to claim 1, which is characterized in that described device Further include:

Semantic segmentation unit obtains the semanteme of the training image for the training image to be input to semantic segmentation model The PRELIMINARY RESULTS of segmentation；

The integrated unit is further used for: classifying according to the Weakly supervised information and the multiple local candidate region preliminary As a result, the local candidate region for belonging to same object category is selected out of the multiple local candidate region, it is same for belonging to The local candidate region of one object category carries out local candidate region fusion, obtains the school of the semantic segmentation of the training image Positive result；

The amending unit is further used for: according to the PRELIMINARY RESULTS of the semantic segmentation and the correction knot of the semantic segmentation Fruit is modified the model parameter of the semantic segmentation model；The output knot of the shared semantic segmentation model being corrected Fruit is modified the parameter of the Model of Target Recognition.

19. the described in any item Model of Target Recognition training devices of 5-17 according to claim 1, which is characterized in that the target Recognition unit is further used for: utilizing cross entropy loss function, the multiple local candidate region is carried out according to object category Classification；The probability for belonging to object category to each local candidate region is predicted, the object of each local candidate region is obtained Class probability predicted value；

The amending unit is further used for: being modified to the parameter of the cross entropy loss function.

20. the described in any item Model of Target Recognition training devices of 5-17 according to claim 1, which is characterized in that the training Device further include:

Image category predicting unit is used for according to the Weakly supervised information, to for predicting the image category of the training image Function be trained.

21. Model of Target Recognition training device according to claim 18, which is characterized in that the integrated unit is further Include:

Sort out subelement, for selecting the local candidate regions for belonging to same object category out of the multiple local candidate region Domain；

Fusion treatment subelement for carrying out fusion treatment for the local candidate region for belonging to same object category, and uses Fused image-region is divided into nearly object area, nearly background area and ambiguity region by clustering algorithm；

Divide subelement, for nearly object area and nearly background area to be as seed, using partitioning algorithm to the ambiguity area Domain is split, and obtains the correction result of the semantic segmentation of the training image.

22. Model of Target Recognition training device according to claim 21, which is characterized in that the fusion treatment subelement It is further used for:

23. Model of Target Recognition training device according to claim 18, which is characterized in that the object-recognition unit into One step is used for: utilizing cross entropy loss function, the multiple local candidate region is classified according to object category；To each The probability that local candidate region belongs to object category is predicted, the object category probabilistic forecasting of each local candidate region is obtained Value；

The training device further include: selecting unit, for the PRELIMINARY RESULTS of the semantic segmentation according to the training image and every The object category probabilistic forecasting value of a part candidate region screens the multiple local candidate region；

The integrated unit is further used for: classifying according to the Weakly supervised information and the multiple local candidate region preliminary As a result, selecting the local candidate region for belonging to same object category, needle in obtained multiple local candidate regions from through screening To the local candidate region for belonging to same object category, local candidate region is merged, obtains the language of the training image The correction result of justice segmentation.

24. Model of Target Recognition training device according to claim 23, which is characterized in that the selecting unit is further For:

25. Model of Target Recognition training device according to claim 24, which is characterized in that the selecting unit is further For:

26. Model of Target Recognition training device according to claim 23, which is characterized in that the integrated unit is further Include:

Sort out subelement, for selecting the office for belonging to same object category in obtained local candidate region through screening from described Portion candidate region；

27. Model of Target Recognition training device according to claim 26, which is characterized in that the fusion treatment subelement It is further used for:

28. a kind of Target Identification Unit, which is characterized in that the Target Identification Unit is used for using image to be identified as target The input of identification model determines that the multiple parts selected for described image are waited according to the output result of the Model of Target Recognition The classification results of favored area；Wherein, the Model of Target Recognition uses the training cartridge as described in claim 15-27 is any in advance It sets and is trained.

29. a kind of calculating equipment characterized by comprising processor, communication interface, memory and communication bus；The place Reason device, the communication interface and the memory complete mutual communication by the communication bus；

The memory is for storing at least one instruction；Described instruction make the processor using it is multiple be labeled in advance it is Weakly supervised The training image of information is trained Model of Target Recognition, is directed to each training image, and described instruction makes the processing Device executes following operation:

30. a kind of computer storage medium, the computer storage medium is for storing computer-readable instruction；The finger Order includes:

It is directed to each training image, the mesh will be input to for multiple local candidate regions that the training image is selected Identification model is marked, the instruction of the PRELIMINARY RESULTS of multiple local candidate region classification of the Model of Target Recognition output is obtained；

It is candidate from the multiple part according to the PRELIMINARY RESULTS of the Weakly supervised information and the multiple local candidate region classification The local candidate region for belonging to same object category is selected in region, for the local candidate regions for belonging to same object category Domain carries out the instruction of local candidate region fusion；

According to the PRELIMINARY RESULTS of the multiple local candidate region classification and the result of local candidate region fusion to the target The instruction that the parameter of identification model is modified；

Iteration executes training step described above until the training result of the Model of Target Recognition meets predetermined convergence condition Instruction.