CN107194318A

CN107194318A - The scene recognition method of target detection auxiliary

Info

Publication number: CN107194318A
Application number: CN201710270013.4A
Authority: CN
Inventors: 王蕴红; 孙宇航; 赵文婷; 陈训逊; 刘庆杰; 王博
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-04-24
Filing date: 2017-04-24
Publication date: 2017-09-22
Anticipated expiration: 2037-04-24
Also published as: CN107194318B

Abstract

The present invention provides a kind of scene recognition method of target detection auxiliary.The scene recognition method of target detection auxiliary of the present invention includes：Picture to be identified is obtained, picture to be identified is sampled, the sample of predetermined number and default size is obtained, scene Recognition is carried out to each sample according to convolutional neural networks model, corresponding at least two scene of picture to be identified is obtained；Fisrt feature figure corresponding with picture to be identified is advised in the region for obtaining picture to be identified, according to region suggestion and picture to be identified, obtains the classification score of each target in picture to be identified；According to corresponding at least two scene of picture to be identified and the classification score of each target, the corresponding scene of picture to be identified is obtained.The present invention advises that the object detection method of network integration aids in scene recognition method by Fast R CNN networks and region so that the accuracy rate of scene recognition method is improved.

Description

The scene recognition method of target detection auxiliary

Technical field

The present invention relates to the scene recognition method of technical field of computer vision, more particularly to a kind of target detection auxiliary.

Background technology

Scene Recognition is a major issue of computer vision field, is by the computer automatic decision image or photo Which kind of specific scene belonged to, especially important work is played in terms of video monitoring, social network user Behavior mining With.

However, due to scene complexity in itself and illumination, block, the factor, existing scene Recognition such as dimensional variation The scene type that method can not still be distinguished well, such as a pictures have group, and computer is difficult to picture classification to arrive The scene such as market or station or rally parade, so lessens the accuracy rate to scene Recognition.

The content of the invention

The present invention provides a kind of scene recognition method of target detection auxiliary, to realize target detection auxiliary scene Recognition Method, the problem of to reduce in scene recognition method to scene error differentiating.

The present invention provides a kind of scene recognition method of target detection auxiliary, including：

Picture to be identified is obtained, the picture to be identified is sampled, the sample of predetermined number and default size is obtained, Scene Recognition is carried out to each sample according to convolutional neural networks model, the picture to be identified corresponding at least two is obtained Scene；

The region suggestion fisrt feature figure corresponding with the picture to be identified of the picture to be identified is obtained, according to described Region is advised and the picture to be identified, obtains the classification score of each target in the picture to be identified；Wherein, the region At least part of region for indicating to include at least one target is proposed to be used in, the region suggestion advises network to second for region Characteristic pattern progress handles what is obtained, and the fisrt feature figure and the second feature figure are to be treated to described in Fast R-CNN networks Identification picture carries out what process of convolution was obtained；

According to corresponding at least two scene of picture to be identified and the classification score of each target, described treat is obtained Recognize the corresponding scene of picture.

Alternatively, scene Recognition carried out to each sample according to convolutional network model described, obtains described to be identified Before corresponding at least two scene of picture, in addition to：

Training picture and the corresponding label of the training picture are obtained, the label is used to indicate the training picture pair The scene answered；

According to the convolutional neural networks model and the corresponding label of the training picture, each scene is obtained corresponding The corresponding network parameter of the convolutional neural networks model；

It is described that scene Recognition is carried out to each sample according to convolutional network model, obtain the picture to be identified corresponding At least two scenes, including：

According to the corresponding network parameter of the corresponding convolutional neural networks model of each scene, each sample is entered Row scene Recognition, obtains corresponding at least two scene of the picture to be identified.

Alternatively, it is described according to the convolutional neural networks model and the corresponding label of the training picture, obtain each institute The corresponding network parameter of the corresponding convolutional neural networks model of scene is stated, including：

Division Sampling, the training picture after being expanded are carried out to the training picture；

According to the first default training parameter, default processing is carried out to the training picture after the amplification, predetermined number is obtained Third feature figure, it is described it is default processing include convolution, Chi Hua, normalized；

Full connection processing is carried out repeatedly to the third feature figure of the predetermined number, the corresponding field of the training picture is obtained Scape probability；

According to the scene probability and the corresponding label of the training picture, the described first default training parameter is adjusted It is whole, obtain the corresponding network parameter of the corresponding convolutional neural networks model of the scene.

Alternatively, the region suggestion for obtaining the picture to be identified fisrt feature corresponding with the picture to be identified Figure, including：

Process of convolution is carried out to the picture to be identified by the Fast R-CNN networks, shared convolutional layer is obtained；

The second feature figure is extracted from the shared convolutional layer, advises that the corresponding network of network is joined according to the region Number, advises that network carries out region suggestion processing to the second feature figure by the region, obtains the region of each target area Score, according to the area score, obtains the region suggestion；

Using the convolutional layer of the shared default number of plies of convolutional layer superposition as peculiar convolutional layer, the fisrt feature is obtained Figure, the number for the convolutional layer that the peculiar convolutional layer is included is more than the number for the convolutional layer that the shared convolutional layer is included.

Alternatively, it is described according to region suggestion and the picture to be identified, obtain each in the picture to be identified The classification score of target, including：

Advised according to the region, zone marker processing is carried out to the fisrt feature figure, obtained at the zone marker Fisrt feature figure after reason；

Pond processing is carried out to the fisrt feature figure after zone marker processing by the Fast R-CNN networks, obtained Fisrt feature figure to after pondization processing；

Full connection processing is carried out to the fisrt feature figure after pondization processing；

According to the corresponding network parameter of the Fast R-CNN networks, point of each target in the picture to be identified is obtained Class score.

Alternatively, in the region suggestion for obtaining the picture to be identified and corresponding first spy of the picture to be identified Before levying figure, in addition to：

Training picture and the corresponding target area of the training picture are obtained, the target area is complete for indicating Position of the target in the training picture；

According to the Fast R-CNN networks, region suggestion network and the corresponding target area of the training picture Domain, obtains the corresponding network parameter of the Fast R-CNN networks and the corresponding net of region suggestion network of each target Network parameter.

Alternatively, it is described that network and the training picture pair are advised according to the Fast R-CNN networks, the region The target area answered, obtains the corresponding network parameter of the Fast R-CNN networks and region suggestion net of each target The corresponding network parameter of network, including：

Process of convolution is carried out to the training picture by the Fast R-CNN networks, shared convolutional layer is obtained；

The second feature figure is extracted from the shared convolutional layer, according to the second default training parameter, passes through the region It is recommended that network carries out region suggestion processing to the second feature figure, the area score is obtained, according to the area score, is obtained To region suggestion；

Using the convolutional layer of the shared default number of plies of convolutional layer superposition as peculiar convolutional layer, the fisrt feature is obtained Figure, the number for the convolutional layer that the peculiar convolutional layer is included is more than the number for the convolutional layer that the shared convolutional layer is included；

Fisrt feature figure after being gone out by the Fast R-CNN networks to the zone marker carries out pond processing, obtains Fisrt feature figure after pondization processing；

Full connection processing is carried out to the fisrt feature figure after pondization processing, according to the 3rd default training parameter, obtained The classification score of each target in the training picture；

The classification score of each target in region suggestion, the training picture and the target area, are adjusted Whole described second default training parameter and the 3rd default training parameter, obtain the corresponding network ginseng of the region suggestion network Number network parameter corresponding with the Fast R-CNN networks of each target.

The scene recognition method of target detection that the present invention is provided auxiliary, by according to convolutional neural networks model to respectively treating The sample that identification picture sampling is obtained carries out scene Recognition, obtains corresponding at least two scene of picture to be identified.Again by treating Recognize that fisrt feature figure corresponding with picture to be identified is advised in the region of picture, obtain the classification of each target in picture to be identified Score completes target detection process.Finally by corresponding at least two scene of picture to be identified and the classification score of each target, Obtain the corresponding scene of picture to be identified.The present invention advises the object detection method of network by Fast R-CNN networks and region Aid in scene recognition method so that the accuracy rate of scene recognition method is improved.Wherein, region advises network to Fast R-CNN nets Network provides the region for having target area and set up, and greatly reduces the time that Fast R-CNN networks carry out target detection, improves Fast R-CNN networks carry out the speed of target detection.

Brief description of the drawings

The flow chart one of the scene recognition method for the target detection auxiliary that Fig. 1 provides for the present invention；

The flowchart 2 of the scene recognition method for the target detection auxiliary that Fig. 2 provides for the present invention；

The flow chart 3 of the scene recognition method for the target detection auxiliary that Fig. 3 provides for the present invention；

The flow chart four of the scene recognition method for the target detection auxiliary that Fig. 4 provides for the present invention；

The training process of Alexnet models is shown in the scene recognition method for the target detection auxiliary that Fig. 5 provides for the present invention It is intended to；

The flow chart five of the scene recognition method for the target detection auxiliary that Fig. 6 provides for the present invention；

Fast R-CNN networks and region suggestion in the scene recognition method for the target detection auxiliary that Fig. 7 provides for the present invention The training process schematic diagram of network.

Embodiment

The flow chart one of the scene recognition method for the target detection auxiliary that Fig. 1 provides for the present invention, Fig. 2 provides for the present invention Target detection auxiliary scene recognition method flowchart 2, Fig. 3 for the present invention provide target detection auxiliary scene knowledge The flow chart 3 of other method, as shown in figure 1, the scene recognition method of the target detection auxiliary of the present embodiment includes：

Step 101, acquisition picture to be identified, sample to the picture to be identified, obtain predetermined number and default big Small sample, scene Recognition is carried out according to convolutional neural networks model to each sample, obtains the picture correspondence to be identified At least two scenes.

Specifically, the size of picture to be identified can be according to the network of specifically chosen convolutional neural networks model in the present embodiment Selected, the quantity of picture to be identified can be entered according in convolutional neural networks model training learning process to different quality data cases Row adjustment, the present embodiment is not specifically limited to this.And convolutional neural networks model can select Alexnet, VGG16, VGG19, The models such as ResNet, the present embodiment is not also limited this.

Further, method that can be to each picture to be identified by cutting out cutting sampling in the present embodiment, by one Picture to be identified expands many times, obtains the sample of predetermined number and default size, to prevent over-fitting, waits to know so as to improve every The accuracy of the scene Recognition of other picture.Predetermined number and default size are not limited in the present embodiment, figure to be identified is only needed Quantity after piece amplification disclosure satisfy that convolutional neural networks model is trained demand study.

For example, for convenience of explanation, the present embodiment is illustrated from the citing of Alexnet models.It can be treated respectively from one Identification picture upper left corresponding with the picture to be identified after this flip horizontal, upper right, lower-left, bottom right, centre position are cut Cut out sampling, then a picture to be identified has just acquired 10 samples, and this 10 sample standard deviations are input in Alexnet models Scene Recognition is carried out, the result of corresponding 10 scene Recognitions is obtained, the result of scene Recognition can be carried out by matrix here Represent, wherein the line number of matrix is a line, and columns is scene type, each matrix value is the decimal between 0 to 1, and each square Battle array value sum is 1.The scene knowledge averaged as the picture to be identified is added to the corresponding matrix of this 10 samples again Other probability matrix, takes the scene wherein corresponding to maximum probability value, the scene results of the final identification of picture as to be identified.Its Remaining picture to be identified carries out scene Recognition in the manner described above, obtains the result of corresponding scene Recognition.

Further, for the nondescript scene such as station and rally parade, according to convolutional neural networks to be identified The scene Recognition that picture is carried out will be unable to identification, and the result of obtained scene Recognition may swim the picture recognition at station into rally OK, by the picture recognition for parade of gathering into station, picture so to be identified is likely to corresponding at least two scene.

Step 102, the region suggestion for obtaining the picture to be identified fisrt feature figure corresponding with the picture to be identified, According to region suggestion and the picture to be identified, the classification score of each target in the picture to be identified is obtained.

Wherein, the region is proposed to be used at least part of region for indicating to include at least one target, and the region is built Discuss and what is obtained is handled to the progress of second feature figure for region suggestion network, the fisrt feature figure and the second feature figure are fast Fast version base advises target identification method (the Fast Region-based Convolutional Network of network in region Method, Fast R-CNN) what process of convolution was obtained is carried out to the picture to be identified in network.

Specifically, target detection is carried out to picture to be identified in the present embodiment, can be by the target identification in picture to be identified Out.

On the one hand, as shown in Fig. 2 obtaining the region suggestion of the picture to be identified and the figure to be identified in the present embodiment The corresponding fisrt feature figure of piece, including：

Step 201, process of convolution carried out to the picture to be identified by the Fast R-CNN networks, obtain shared volume Lamination.

Step 202, from the shared convolutional layer second feature figure is extracted, advise that network is corresponding according to the region Network parameter, advises that network carries out region suggestion processing to the second feature figure by the region, obtains each target area Area score, according to the area score, obtain region suggestion.

Specifically, picture to be identified is input in Fast R-CNN networks in the present embodiment and carries out process of convolution, obtained Shared convolutional layer, extracts one layer of corresponding second feature figure from shared convolutional layer, is input in region suggestion network and carries out area Domain suggestion processing, just can obtain area score.Wherein, region suggestion is processed as carrying out slide window processing to picture to be identified and carried out Multiple convolution or full connection processing, area score are to express the region with the presence or absence of target and the probability size of target.

Step 203, the convolutional layer that the shared convolutional layer is superimposed to the default number of plies obtain described the as peculiar convolutional layer One characteristic pattern, the number for the convolutional layer that the peculiar convolutional layer is included is more than for the convolutional layer that the shared convolutional layer is included Number.

Specifically, in order to not lose other information, Fast R-CNN networks also may proceed on the basis of shared convolutional layer Carry out process of convolution so that information is more complete, you can the convolutional layer that shared convolutional layer is superimposed to the default number of plies is used as peculiar volume Lamination, obtains fisrt feature figure.The default number of plies is not specifically limited in the present embodiment.For example, extracting the 5th from shared convolutional layer The corresponding second feature figure of layer sends region suggestion network to, using the 9th layer of corresponding fisrt feature figure of shared convolutional layer as Peculiar convolutional layer.

On the other hand, as shown in figure 3, according to region suggestion and the picture to be identified described in the present embodiment, obtaining The classification score of each target in the picture to be identified, including：

Step 301, according to the region advise, to the fisrt feature figure carry out zone marker processing, obtain the area Fisrt feature figure after field mark processing.

Specifically, the higher some regions of area score can be selected in the present embodiment, by these regions correspondence to be identified The particular location of picture is transferred to Fast R-CNN networks as region suggestion, and Fast R-CNN networks just can according to region suggestion Fisrt feature figure is subjected to zone marker processing, such Fast R-CNN networks need to only carry out the classification of each target to marked region Score process, the process for score of then classifying for aimless region without progress, so as to save to picture to be identified The target detection time.

Step 302, by the Fast R-CNN networks to the zone marker processing after fisrt feature figure carry out pond Change is handled, and obtains the fisrt feature figure after pondization processing.

Further, because the size of the fisrt feature figure after zone marker processing is inconsistent, then need by pond Change and handle that its is size normalised, obtain the fisrt feature figure after pondization processing.

Step 303, full connection processing is carried out to the fisrt feature figure after pondization processing.

Specifically, comprehensive figure is much opened because the fisrt feature figure after pondization processing has, is handled by full connection, can Its dimension is reduced, is easy to the operation of classification score.

Step 304, according to the corresponding network parameter of the Fast R-CNN networks, obtain each in the picture to be identified The classification score of target.

Specifically, score of classifying in the present embodiment can not only illustrate whether there is target in picture to be identified, moreover it is possible to illustrate There is the probability size of various targets in picture to be identified.And the corresponding network parameter of Fast R-CNN networks is Fast R-CNN nets The most optimized parameter that network training is obtained, accurate can must come out the target detection in picture to be identified.So by obtaining Whether the classification score for stating each target in picture to be identified just can determine whether out certain target in picture to be identified.

Herein it should be noted that the order of step 101 can be before step 102, the order of step 102 can be in step Before 101, the order of step 101 and step 102 can be carried out simultaneously.It is suitable to the priority of step 101 and step 102 in the present embodiment Sequence is not limited.

Step 103, the classification score according to corresponding at least two scene of picture to be identified and each target, are obtained To the corresponding scene of the picture to be identified.

Specifically, step 101 can come out corresponding at least two scene Recognition of picture to be identified, so become and reduce field The scope of scape selection, such as may be by station and the scene of the corresponding picture to be identified of the more scene of the such number of rally parade. Step 102 can detect the classification score of each target in picture to be identified, such as, and the scene for parade of gathering is most The presence of banner is had, banner can be regard as target.So by corresponding at least two scene of picture to be identified and each target Classification score combines, to identify the corresponding scene of picture to be identified.If for example, having banner in picture to be identified, being somebody's turn to do The corresponding scene of picture to be identified is just rally parade.If not having banner in picture to be identified, the picture to be identified is corresponding Scene is just station.

The present embodiment provide target detection auxiliary scene recognition method, by according to convolutional neural networks model to each The sample that picture sampling to be identified is obtained carries out scene Recognition, obtains corresponding at least two scene of picture to be identified.Pass through again Fisrt feature figure corresponding with picture to be identified is advised in the region of picture to be identified, obtains point of each target in picture to be identified Class score completes target detection process.Finally by classifying for corresponding at least two scene of picture to be identified and each target Point, obtain the corresponding scene of picture to be identified.Advise the target of network in the present embodiment by Fast R-CNN networks and region Detection method aids in scene recognition method so that the accuracy rate of scene recognition method is improved.Wherein, region advises network to Fast R-CNN networks provide have target area region set up, greatly reduce Fast R-CNN networks carry out target detection when Between, improve the speed that Fast R-CNN networks carry out target detection.

The flow chart four of the scene recognition method for the target detection auxiliary that Fig. 4 provides for the present invention, as shown in figure 4, in institute State and scene Recognition is carried out to each sample according to convolutional network model, obtain the picture to be identified corresponding at least two Before scape, the method for the present embodiment also includes：

Step 401, acquisition training picture and the corresponding label of the training picture, the label are used to indicate the instruction Practice the corresponding scene of picture.

Specifically, train picture can be by being obtained in database in the present embodiment, also can be by obtaining manually, the present embodiment This is not limited, need to only ensure the quantity of the corresponding training picture of each scene can reach a thousand sheets.If manually The training picture of acquisition, can carry out sample expansion flip horizontal and by way of cutting sampling to training figure.And if convolution god The size for training picture is required through network model, by way of cutting sampling picture can will be trained to become unification It can be made whether to need to cut out sentencing for sampling process according to the concrete model of convolutional neural networks model in size, the present embodiment It is disconnected.Meanwhile, also need to obtain the corresponding label of training picture in the present embodiment, its label is the corresponding scene class of training picture Not.

Step 402, according to the convolutional neural networks model and the corresponding label of the training picture, obtain each field The corresponding network parameter of the corresponding convolutional neural networks model of scape.

Specifically, training picture label corresponding with its is input in convolutional neural networks model, convolutional neural networks Model just can be trained the process of study.Because training picture label corresponding with training picture is all known, it can pass through The default training parameter of dynamic adjustment first so that convolutional neural networks model can learn to various types of other scene, and then will Different scenes make a distinction.Specific method is as follows：

Step 4021, to it is described training picture carry out Division Sampling, the training picture after being expanded.

Specifically, that a kind of training picture of scene type is become into several times uniform sizes by the process of Division Sampling is big Small, Same Scene training picture, has expanded the quantity of training picture, simple to operate, and is convolutional neural networks model training Study provides more references.

Step 4022, according to the first default training parameter, default processing is carried out to the training picture after the amplification, is obtained The third feature figure of predetermined number, the default processing includes convolution, Chi Hua, normalized.

Specifically, the random parameter for setting convolutional neural networks model is the first default training parameter, to the instruction after amplification Practice picture and carry out convolution, Chi Hua, after normalized, just can obtain predetermined number, can clearly assertiveness training picture for field The third feature figure of scape.

Step 4023, the third feature figure to the predetermined number carry out repeatedly full connection and handled, and obtain the training figure The corresponding scene probability of piece.

Specifically, comprehensive figure is much opened because the third feature figure after default processing has, by multiple full junction Reason, just can arrive the corresponding scene probability of training picture.Wherein, repeatedly full connection processing can reduce the dimension of third feature figure, no Multi information can be lost, classifying quality is effectively ensured.

Step 4024, according to the scene probability and the corresponding label of the training picture, to the described first default training Parameter is adjusted, and obtains the corresponding network parameter of the corresponding convolutional neural networks model of the scene.

Specifically, whether consistent by contrasting scene probability label corresponding with training picture, dynamic adjustment first is preset Training parameter so that scene probability label corresponding with training picture is consistent, just can make the first default training parameter now For the corresponding network parameter of the corresponding convolutional neural networks model of scene.

Step 403, according to the corresponding network parameter of the corresponding convolutional neural networks model of each scene, to each institute State sample and carry out scene Recognition, obtain corresponding at least two scene of the picture to be identified.

Specifically, by convolutional Neural net that the corresponding network parameter of the corresponding convolutional neural networks model of scene is set up Network model can preferably identify scene sample corresponding to picture to be identified carries out scene Recognition, just can be treated Recognize corresponding at least two scene of picture.

The training process of Alexnet models is shown in the scene recognition method for the target detection auxiliary that Fig. 5 provides for the present invention It is intended to, in a specific embodiment, it is necessary to first learn to Alexnet model trainings by taking Alexnet models as an example, then enters Row carries out scene Recognition to picture to be identified, as shown in Figure 5.

First, the training picture of the 256*256 sizes of triple channel is input in Alexnet models, Alexnet model energy The label of enough detection training pictures, and the picture of the 256*256 sizes of triple channel is obtained after Division Sampling, by convolution meter Calculate and activate processing, wherein activation processing has been sampled activation primitive, such as amendment linear unit (Rectified linear unit, RELU, ReLU).Wherein, ReLU mathematic(al) representation is：(0, x), wherein x represents input signal to f (x)=max, and f (x) is represented Output signal.When input signal is less than 0, output signal is 0；If input signal is equal to or more than 0, output signal is equal to defeated Enter signal.Understand, the receipts of stochastic gradient descent (Stochastic gradient descent, SGD) method are obtained using ReLU Holding back speed can be more many soon than other functions (the activation primitive sigmoid/tanh in such as conventional method).Because ReLU is linear , ReLU only needs to a threshold value and can be obtained by activation value, and a lot of complicated computings are calculated without spending.Therefore, it is possible to excellent Change convolution process, so as to obtain the characteristic pattern of 96 55*55 sizes, then carry out pond and normalized, obtain 96 27*27 The characteristic pattern of size, again passes by convolutional calculation and activates processing, obtain the characteristic pattern of 256 27*27 sizes, and pond is carried out again And normalized, the characteristic pattern of 256 13*13 sizes is obtained, pond and normalized are carried out again, 384 13*13 are obtained The characteristic pattern of size, passes through convolutional calculation and activates processing, obtain the characteristic pattern of 256 13*13 sizes, pond Hua Chu is carried out again again Reason, obtains the characteristic pattern of 256 6*6 sizes, then carries out three full connections and handles line activating processing of going forward side by side, just can be trained The result of picture correspondence scene, that is, train the corresponding scene probability of picture.Further according to the label of training picture, adjust above-mentioned whole Process, that is, change the first default training parameter of Alexnet models, until the corresponding scene probability of training picture is schemed with training The label of piece is consistent, and institute's Alexnet model trainings converge to certain interval, and penalty values are controlled within the acceptable range, Now deconditioning.It regard the first default training parameter now as the corresponding network parameter of the corresponding Alexnet models of scene. So, the overall process of Alexnet model trainings study is just completed.

Then, when carrying out scene Recognition to one picture to be identified of Alexnet mode inputs, because above-mentioned training learns Process the corresponding network parameter of the corresponding Alexnet models of the scene of Alexnet models is determined, just can be to be identified by this The corresponding scene Recognition of picture comes out.

The flow chart five of the scene recognition method for the target detection auxiliary that Fig. 6 provides for the present invention, as shown in fig. 6, in institute Before stating the region suggestion fisrt feature figure corresponding with the picture to be identified for obtaining the picture to be identified, the present embodiment Method also includes：

Step 601, acquisition training picture and the corresponding target area of the training picture, the target area is used to refer to Show position of the complete target in the training picture.

Specifically, train picture can be by being obtained in database in the present embodiment, also can be by obtaining manually, the present embodiment This is not limited.Because the size of the input picture of Fast R-CNN networks is not limited, then without being carried out to training picture Split sampling processing.Meanwhile, also need to obtain the corresponding target area of training picture in the present embodiment, its target area is used to refer to Show physical location of the complete target in training picture.

Step 602, according to the Fast R-CNN networks, the region advise network and the training picture it is corresponding Target area, obtains the corresponding network parameter of the Fast R-CNN networks and region suggestion network pair of each target The network parameter answered.

Specifically, training picture target area corresponding with its is input in Fast R-CNN networks, Fast R-CNN Network and region suggestion network just can be trained the process of study.Due to training picture target area corresponding with training picture All it is known, the default training parameter of dynamic adjustment second can be passed through so that region suggestion network can learn to various targets Whether exist in training picture, obtain the region suggestion of target presence, send region suggestion to Fast R-CNN networks, can Reduce Fast R-CNN networks carry out target detection it is real-time between, improve efficiency.Meanwhile, it can also be preset by dynamic adjustment the 3rd Training parameter so that Fast R-CNN networks can learn all kinds of targets into various training pictures, and then will train in picture All kinds of targets make a distinction.Specific method is as follows：

Step 6021, by the Fast R-CNN networks to it is described training picture carry out process of convolution, obtain shared volume Lamination.

Specifically, training picture is input in Fast R-CNN networks in the present embodiment and carries out process of convolution, be total to Enjoy convolutional layer.

Step 6022, from the shared convolutional layer second feature figure is extracted, according to the second default training parameter, passed through The region suggestion network carries out region suggestion processing to the second feature figure, the area score is obtained, according to the area Domain score, obtains the region suggestion.

Specifically, one layer of corresponding second feature figure is extracted from shared convolutional layer, is input in region suggestion network The suggestion processing of row region, just can obtain area score.Wherein, region suggestion is processed as going forward side by side to training picture progress slide window processing Row multiple convolution or full connection processing.Area score is to express the region with the presence or absence of target and the probability size of target.

Step 6023, the convolutional layer that the shared convolutional layer is superimposed into the default number of plies obtain described as peculiar convolutional layer Fisrt feature figure, the number for the convolutional layer that the peculiar convolutional layer is included is more than for the convolutional layer that the shared convolutional layer is included Number.

Step 6024, according to the region advise, to the fisrt feature figure carry out zone marker processing, obtain the area Fisrt feature figure after field mark processing.

Specifically, the higher some regions of area score can be selected in the present embodiment, these regions correspondence is schemed in training The particular location of piece is transferred to Fast R-CNN networks as region suggestion, and Fast R-CNN networks just can be by according to region suggestion Fisrt feature figure carries out zone marker processing so that Fast R-CNN networks only need to carry out classifying for each target to marked region Divide process, for aimless region then without carrying out the process of classification score, so as to save the target to training picture Detection time.

Step 6025, the zone marker is gone out by the Fast R-CNN networks after fisrt feature figure carry out pond Processing, obtains the fisrt feature figure after pondization processing.

Specifically, because the size of the fisrt feature figure after zone marker processing is inconsistent, then need by pond Processing is size normalised by its, obtains the fisrt feature figure after pondization processing.

Step 6026, full connection processing is carried out to the fisrt feature figure after pondization processing, according to the 3rd default training Parameter, obtains the classification score of each target in the training picture.

Step 6027, according to the region advise, it is described training picture in each target classification score and the mesh Region is marked, the described second default training parameter and the 3rd default training parameter is adjusted, obtains the region suggestion network pair The network parameter and the corresponding network parameter of the Fast R-CNN networks of each target answered.

Specifically, because target area is known, the 3rd default training parameter just can be dynamically adjusted according to target area, So that target detection is according to accurate, so as to preset training parameter as the corresponding net of Fast R-CNN networks of each target using the 3rd Network parameter.Meanwhile, can also be further according to region suggestion and target area, the default training parameter of dynamic adjustment second so that region is obtained Divide corresponding region suggestion more accurate, become the time that Fast R-CNN networks carry out target detection to training picture that reduces, So as to which the second default training parameter is advised into the corresponding network parameter of network as region.

Fast R-CNN networks and RPN networks in the scene recognition method for the target detection auxiliary that Fig. 7 provides for the present invention Training process schematic diagram, it is necessary to first advise network to Fast R-CNN networks and region in a specific embodiment (Region Proposal Networks, RPN) is trained study to training picture together, then carries out the field of picture to be identified Scape identification process, as shown in Figure 7.

First, the training picture of arbitrary size is input in Fast R-CNN networks, obtains the corresponding mesh of training picture Mark region.Process of convolution is carried out to training picture by Fast R-CNN networks, shared convolutional layer is obtained.RPN networks are from shared Convolutional layer extracts second feature figure, slide window processing is carried out to second feature figure, then carry out process of convolution twice or full connection processing Afterwards, i.e., the pixel in each sliding window forms 9 different size of regions, is compared respectively with target area, according to Two default training parameters, obtain corresponding area score, 300 regions for choosing highest scoring are advised as region, are sent to In Fast R-CNN networks.Shared convolutional layer is superimposed to the convolutional layer of some numbers of plies again as peculiar convolutional layer, the is just can obtain One characteristic pattern.It can be advised in the present embodiment according to region, zone marker processing is carried out to fisrt feature figure, obtained at zone marker Fisrt feature after reason, then fisrt feature figure after being gone out by Fast R-CNN networks to zone marker carry out pond processing, obtain Fisrt feature figure to after pondization processing.Full connection processing is carried out to the fisrt feature figure after pondization processing again, it is pre- according to the 3rd If training parameter, the classification score of each target in the training picture is obtained.Now, the 3rd default training parameter and mesh is judged Whether consistent mark region.If inconsistent, the default training parameter of dynamic adjustment the 3rd, while also according to region suggestion and target area Domain, the default training parameter of adjustment second, until the 3rd default training parameter is consistent with target area, and Fast R-CNN networks with RPN network trainings converge to certain interval, and penalty values are controlled within the acceptable range, now deconditioning.By now Second default training parameter presets training parameter as Fast as the corresponding network parameter of RPN networks, and by the 3rd now The corresponding network parameter of R-CNN networks.So, the overall process of Fast R-CNN networks and the study of RPN network trainings is just completed.

Then, when carrying out target detection to one picture to be identified of Fast R-CNN network inputs, due to above-mentioned training The process of study determines the corresponding network parameter of the corresponding Fast R-CNN networks of the target of Fast R-CNN networks, RPN nets The corresponding network parameter of the corresponding RPN networks of target of network also determines, just can be by the corresponding target identification of this picture to be identified Out, and the region suggestions of RPN networks can also be saved also save the detection times of Fast R-CNN networks.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above-mentioned each method embodiment can lead to The related hardware of programmed instruction is crossed to complete.Foregoing program can be stored in a computer read/write memory medium.The journey Sequence upon execution, performs the step of including above-mentioned each method embodiment；And foregoing storage medium includes：ROM, RAM, magnetic disc or Person's CD etc. is various can be with the medium of store program codes.

Finally it should be noted that：Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations；To the greatest extent The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that：Its according to The technical scheme described in foregoing embodiments can so be modified, or which part or all technical characteristic are entered Row equivalent substitution；And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme.

Claims

1. a kind of scene recognition method of target detection auxiliary, it is characterised in that including：

Picture to be identified is obtained, the picture to be identified is sampled, the sample of predetermined number and default size is obtained, according to Convolutional neural networks model carries out scene Recognition to each sample, obtains the picture to be identified corresponding at least two Scape；

The region suggestion fisrt feature figure corresponding with the picture to be identified of the picture to be identified is obtained, according to the region It is recommended that and the picture to be identified, obtain the classification score of each target in the picture to be identified；Wherein, the region suggestion For at least part of region for indicating to include at least one target, the region suggestion advises network to second feature for region Figure progress handles what is obtained, and the fisrt feature figure and the second feature figure are to described to be identified in Fast R-CNN networks Picture carries out what process of convolution was obtained；

According to corresponding at least two scene of picture to be identified and the classification score of each target, obtain described to be identified The corresponding scene of picture.

2. according to the method described in claim 1, it is characterised in that each sample is entered according to convolutional network model described Row scene Recognition, before obtaining corresponding at least two scene of the picture to be identified, in addition to：

Training picture and the corresponding label of the training picture are obtained, the label is used to indicate that the training picture is corresponding Scene；

According to the convolutional neural networks model and the corresponding label of the training picture, each scene is obtained corresponding described The corresponding network parameter of convolutional neural networks model；

According to the corresponding network parameter of the corresponding convolutional neural networks model of each scene, field is carried out to each sample Scape is recognized, obtains corresponding at least two scene of the picture to be identified.

3. method according to claim 2, it is characterised in that described according to the convolutional neural networks model and the instruction Practice the corresponding label of picture, obtain the corresponding network parameter of the corresponding convolutional neural networks model of each scene, including：

According to the first default training parameter, default processing is carried out to the training picture after the amplification, the of predetermined number is obtained Three characteristic patterns, the default processing includes convolution, Chi Hua, normalized；

Full connection processing is carried out repeatedly to the third feature figure of the predetermined number, the corresponding scene of the training picture is obtained general Rate；

According to the scene probability and the corresponding label of the training picture, the described first default training parameter is adjusted, Obtain the corresponding network parameter of the corresponding convolutional neural networks model of the scene.

4. according to the method described in claim 1, it is characterised in that the region suggestion for obtaining the picture to be identified and institute The corresponding fisrt feature figure of picture to be identified is stated, including：

The second feature figure is extracted from the shared convolutional layer, the corresponding network parameter of network is advised according to the region, led to Cross the region suggestion network and region suggestion processing carried out to the second feature figure, obtain the area score of each target area, According to the area score, the region suggestion is obtained；

Using the convolutional layer of the shared default number of plies of convolutional layer superposition as peculiar convolutional layer, the fisrt feature figure, institute are obtained The number for stating the convolutional layer that peculiar convolutional layer is included is more than the number for the convolutional layer that the shared convolutional layer is included.

5. method according to claim 4, it is characterised in that described according to region suggestion and the figure to be identified Piece, obtains the classification score of each target in the picture to be identified, including：

Advised according to the region, zone marker processing is carried out to the fisrt feature figure, obtained after the zone marker processing Fisrt feature figure；

Pond processing is carried out to the fisrt feature figure after zone marker processing by the Fast R-CNN networks, pond is obtained Fisrt feature figure after change processing；

According to the corresponding network parameter of the Fast R-CNN networks, classifying for each target in the picture to be identified is obtained Point.

6. method according to claim 5, it is characterised in that in the region suggestion for obtaining the picture to be identified and Before the corresponding fisrt feature figure of the picture to be identified, in addition to：

Training picture and the corresponding target area of the training picture are obtained, the target area is used to indicate complete target Position in the training picture；

According to the Fast R-CNN networks, region suggestion network and the corresponding target area of the training picture, obtain Take the corresponding network parameter of the Fast R-CNN networks and the corresponding network ginseng of region suggestion network of each target Number.

7. method according to claim 6, it is characterised in that described according to the Fast R-CNN networks, the region It is recommended that network and the corresponding target area of the training picture, obtain the Fast R-CNN networks correspondence of each target Network parameter and the corresponding network parameter of region suggestion network, including：

The second feature figure is extracted from the shared convolutional layer, according to the second default training parameter, is advised by the region Network carries out region suggestion processing to the second feature figure, obtains the area score, according to the area score, obtains institute State region suggestion；

Using the convolutional layer of the shared default number of plies of convolutional layer superposition as peculiar convolutional layer, the fisrt feature figure, institute are obtained The number for stating the convolutional layer that peculiar convolutional layer is included is more than the number for the convolutional layer that the shared convolutional layer is included；

Fisrt feature figure after being gone out by the Fast R-CNN networks to the zone marker carries out pond processing, obtains pond Fisrt feature figure after processing；

Full connection processing is carried out to the fisrt feature figure after pondization processing, according to the 3rd default training parameter, obtains described Train the classification score of each target in picture；

The classification score of each target in region suggestion, the training picture and the target area, adjust institute State the second default training parameter and the 3rd default training parameter, obtain the corresponding network parameter of the region suggestion network and The corresponding network parameter of the Fast R-CNN networks of each target.