CN110188720A

CN110188720A - A kind of object detection method and system based on convolutional neural networks

Info

Publication number: CN110188720A
Application number: CN201910483258.4A
Authority: CN
Inventors: 王珏; 邵嘉葳; 孟令波
Original assignee: Shanghai Yunshen Intelligent Technology Co Ltd
Current assignee: Shanghai Yunshen Intelligent Technology Co Ltd
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2019-08-30

Abstract

The invention discloses object detection method and system based on convolutional neural networks, method includes: to construct the data set for being detected target；According to preset ratio cut partition it is that training set, test set and verifying collect by the image data in the data set of detected target, and image therein is labeled；The network structure of convolutional neural networks model is constructed, the convolutional neural networks model predicts object using different characteristic dimensions；Training set is loaded into convolutional neural networks model to be trained；In the training process, load verifying collection optimizes the parameter of convolutional neural networks model by the method for multiple-authentication；Convolutional neural networks model is tested for the property by test set, detects its generalization ability；Target identification is carried out using the convolutional neural networks model that generalization ability is met the requirements.Train the convolutional neural networks model obtained that can quick and precisely identify the Small object in market, compact intensive or high superposed target through the invention.

Description

A kind of object detection method and system based on convolutional neural networks

Technical field

The present invention relates to field of target recognition more particularly to a kind of object detection method based on convolutional neural networks and it is System.

Background technique

In recent years with the development of artificial intelligence technology deep learning, after especially proposing convolutional neural networks technology, The method of target detection has obtained duration update, and detection effect has also obtained qualitative leap.

In view of the rapid development of existing target detection technique, therefore in megastore, there has also been be widely applied.Example Such as, it needs to carry out unmanned retail identification merchandise classification in market, carries out recognition of face matching, etc. in inlet.Therefore big In type market, target detection technique has extraordinary development prospect.It in existing technology, can be fast after giving input picture The jobbie in an image or multiple large-sized objects are detected fastly and mark out position and the type of object.It can be fast The object detection and recognition end to end of realization one of speed.

In existing target detection technique, detection speed can substantially meet demand, but accuracy rate and identification There is also biggish defects for effect.The existing target detection technique of detection effect in to(for) single body is preferable, but if needing When detecting the multiple targets of compact intensive perhaps high superposed target or object to be detected shared overall ratio in figure is smaller When can not quickly and accurately reach desired target detection effect.This defect is especially tight in the scene application of megastore Weight, commodity are not of uniform size, and put intensively, and mobile population is huge, and existing image detecting technique is unable to satisfy in this scene In application.And existing target detection technique generalization ability is weaker, cannot reach desired effect to object classification.If When the new uncommon length-width ratio and other situations that same type objects occur, it is also unable to complete ideal detection effect.

Summary of the invention

In order to solve the above technical problems, the present invention provides a kind of object detection method based on convolutional neural networks and is System, specifically, technical scheme is as follows:

On the one hand, the invention discloses a kind of object detection methods based on convolutional neural networks, comprising: building is detected The data set of target, comprising being detected the image data of target in the data set of the detected target；By the detected mesh Image data in target data set to the training set of the detected target, test set and is tested according to preset ratio cut partition Card collection；Target object in all image datas of the training set, test set and verifying collection is labeled；Construct convolution mind Network structure through network model；The convolutional neural networks model predicts object using different characteristic dimensions；Described The training set is loaded into convolutional neural networks model to be trained；Process is trained in the convolutional neural networks model In, the verifying collection is loaded, by the method for multiple-authentication, optimizes the parameter of the convolutional neural networks model；By described Test set is tested for the property the convolutional neural networks model, detects the generalization ability of the convolutional neural networks model； Target identification is carried out using the convolutional neural networks model that generalization ability is met the requirements.

It is preferably, described that be loaded into the training set in the convolutional neural networks model and be trained include: setting instruction Practice parameter；The training parameter includes initial learning rate, weight attenuation rate, maximum number of iterations；By the figure in the training set Loop iteration training is carried out as data are input in the convolutional neural networks model；After training, training result is extracted.

Preferably, in the training convolutional neural networks model, the loss function calculation formula of use is as follows: Lossfunction=bool* (2-areaPred) * bce+bool* (2-areaPred)

*(whtrue-whpred)²+bool*bce+(1-bool)*bce*ignore

+bool*bce

Wherein: bool is confidence level；Bce is two-value cross entropy；AreaPred is prediction block range；Whtrue is detected The length and width true value of target in the picture；Whpred is to be detected the length and width predicted value of target in the picture；Ignore is to hand over simultaneously Ignore point lower than the object of certain threshold value than (IOU).

Preferably, described during the convolutional neural networks model is trained, the verifying collection is loaded, by more The method of re-examination card, the parameter for optimizing the convolutional neural networks model includes: in the convolutional neural networks model training mistake Cheng Zhong loads the verifying collection；Using the method for multiple-authentication, to the loss during the convolutional neural networks model training Functional value is recorded；According to the loss function value, the connection weight in the neural network model between each neuron is adjusted Weight；Whether the current training of judgement meets preset trained termination condition；When current training is unsatisfactory for preset trained termination condition When, continue to be trained the neural network model；When current training meets preset trained termination condition, stop to institute The training of convolutional neural networks model is stated, each parameter of the convolutional neural networks model after saving training.

Preferably, the preset trained termination condition are as follows: the loss function value reaches preset loss function threshold value Or frequency of training reaches preset maximum number of iterations.

Preferably, the 0-95 layer of the convolutional neural networks is depth convolutional layer, is made of convolution block and single layer convolutional layer； The convolution block includes two convolutional layers and a residual error layer；A single layer convolutional layer is inserted into after per several convolution blocks；The volume The 96-126 layer of product neural network is characterized alternation of bed, includes three scales, is realized by way of convolution kernel in each scale The feature interaction of part.

On the other hand, the invention also discloses a kind of object detection systems based on convolutional neural networks, comprising: data set Module is constructed, for constructing the data set of detected target, includes detected target in the data set of the detected target Image data；Data set division module, for by the image data in the data set of the detected target according to preset ratio Example is divided to the training set, test set and verifying collection of the detected target；Labeling module, for the training set, test Target object in all image datas of collection and verifying collection is labeled；Model construction module, for constructing convolutional Neural net The network structure of network model；The convolutional neural networks model predicts object using different characteristic dimensions；Training module is used It is trained in being loaded into the training set in the convolutional neural networks model；Optimization module is verified, in the convolution During neural network model is trained, the verifying collection is loaded, by the method for multiple-authentication, optimizes the convolutional Neural The parameter of network model；Test module, for being tested for the property by the test set to the convolutional neural networks model, Detect the generalization ability of the convolutional neural networks model；Target identification module, the volume for being met the requirements by generalization ability Neural network model is accumulated to carry out target identification.

Preferably, the training module includes: parameter setting submodule, for training parameter to be arranged；The training parameter Including initial learning rate, weight attenuation rate, maximum number of iterations；Training submodule, for by the picture number in the training set Loop iteration training is carried out according to being input in the convolutional neural networks model；As a result extraction module mentions after training Take training result.

Preferably, the verifying optimization module includes: load submodule, in the convolutional neural networks model training In the process, the verifying collection is loaded；Verify submodule, the method for using multiple-authentication, to the convolutional neural networks mould Loss function value in type training process is recorded；Weight adjusting submodule, for adjusting institute according to the loss function value State the connection weight in neural network model between each neuron；Judging submodule, for judging currently whether training meets pre- If training termination condition；When current training is unsatisfactory for preset trained termination condition, the training module continues to described Neural network model is trained；Parameter sub-module stored, for when current training meets preset trained termination condition, institute Training of the training module stopping to the convolutional neural networks model is stated, the convolutional neural networks model after saving training Each parameter.

Present invention improves over whole convolutional neural networks structures, carry out operation with multilayer convolutional layer, and increase net The feature interaction number of plies of network is realized using multiple dimensioned convolution algorithm, improves image characteristics extraction precision, and operation essence is improved Degree is improved largely to Small object, compact intensive or the multiple targets of high superposed target detection effect.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without any creative labor, it can also be obtained according to these attached drawings His attached drawing.

Fig. 1 is that the present invention is based on the flow charts of the object detection method embodiment of convolutional neural networks；

Fig. 2 is that the present invention is based on the flow charts of another embodiment of the object detection method of convolutional neural networks；

Fig. 3 is the network architecture schematic diagram of convolutional neural networks model in the present invention；

Fig. 4 is convolutional layer schematic diagram of internal structure；

Fig. 5 is residual error layer schematic diagram of internal structure；

Fig. 6 is that the present invention is based on the structural block diagrams of the object detection system embodiment of convolutional neural networks；

Fig. 7 is that the present invention is based on the structural block diagrams of another embodiment of the object detection system of convolutional neural networks.

Appended drawing reference:

A-- depth convolutional layer；B-- feature interaction layer；100-- data set constructs module；200-- data set division module； 300-- labeling module；400-- model construction module；500-- training module；600-- verifies optimization module；700-- tests mould Block；800-- target identification module；510-- parameter setting submodule；520-- trains submodule；530-- result extracting sub-module； 610-- loads submodule；620-- verifies submodule；630-- weight adjusting submodule；640-- judging submodule；650-- parameter Sub-module stored.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, so as to provide a thorough understanding of the present application embodiment.However, it will be clear to one skilled in the art that there is no these specific The application also may be implemented in the other embodiments of details.In other cases, it omits to well-known system, device, electricity The detailed description of road and method, so as not to obscure the description of the present application with unnecessary details.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " indicates the description Feature, entirety, step, operation, the presence of element and/or component, but one or more other features, entirety, step are not precluded Suddenly, the presence or addition of operation, element, component and/or set.

To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understand Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".

It will be further appreciated that the term "and/or" used in present specification and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing, and obtain other embodiments.

The invention discloses a kind of object detection method based on convolutional neural networks, embodiment 1 is as shown in Figure 1, comprising:

S101 constructs the data set for being detected target, includes detected target in the data set of the detected target Image data；

Training convolutional neural networks model needs many training samples, therefore, it is necessary first to which acquisition contains detected target Image data, for example if it is intended to the convolutional neural networks model can recognize pen, then just needing to construct a pen The picture of a large amount of various pens is contained in data set, the inside.Convolutional neural networks model subsequently through a large amount of pen figure The study of piece can extract the feature of pen, from these pictures so as to identify the pen in various scenes in the later period is applied.

Create the data set for being detected target.For example the data set that the present embodiment uses is the root from ImagNet database According to the image that the target category of required detection is chosen, such as: teacup, chair, people etc..(ImagNet is one for vision pair It is the maximum database of image recognition in the world as the large-scale visible database of identification software research).

Image data in the data set of the detected target is detected according to preset ratio cut partition to described by S102 Survey the training set, test set and verifying collection of target；

Specifically, the ratio of division can root building data set and then dividing the picture in the data set According to being preset, general training set accounts about 80%, the image data of test set and each station 10% or so of verifying collection.Compared with Good, in the present embodiment, 78% image data in the data set of detected target can be divided in training set；By 10% Image data be divided in test set, by 12% image data be divided to verifying concentrate.

S103 is labeled the target object in all image datas of the training set, test set and verifying collection；

Specifically, to allow convolutional neural networks that target object can be recognized accurately, then we must first tell him Which is target object in picture, and especially using the recognition detection with the target object in each market, what is acquired is various Various articles are may included in the picture of market, then the target item for needing to detect in the picture is what name Word requires to mark out and in which of picture region etc..That is, to the target in image data in this step Object is labeled including marking target object title and location region in the image data.

In general, being needed soft using marking to needing detected target to be labeled work in all image datas Part.Main purpose is to collect the location coordinate information (x, y, w, h) for being detected target in image.(x, y) is to be detected mesh Position coordinates of the target central point in figure, (w, h) are the length being detected target in the picture and width.

S104 constructs the network structure of convolutional neural networks model；The convolutional neural networks model uses different spies Scale is levied to predict object；

Specifically, in order to identify various Small object objects, dense object, the even object of lap Body then just has higher requirement to the network architecture of convolutional neural networks model.

The present embodiment is using the algorithm of target detection end to end based on deep learning.Specifically, by target area Prediction and target category prediction are integrated in single Neural model, realize the real-time quick mesh in the higher situation of accuracy rate Mark detection and identification.It can be by whole figure as the defeated of network using the convolutional neural networks model (after training) in the present embodiment Enter, the position of boundingbox and the classification of boundingbox are directly returned in output layer.

The convolutional neural networks used in the present embodiment predict object using different characteristic dimensions, increase network The feature interaction number of plies, by improving whole convolutional neural networks structure on the basis of traditional convolutional neural networks, with more Layer convolutional layer carries out operation, is realized using multiple dimensioned convolution algorithm, improves image characteristics extraction precision, improves operation essence Degree is improved largely to Small object, compact intensive or the multiple targets of high superposed target detection effect.

S105 is loaded into the training set in the convolutional neural networks model and is trained；

After the framework for building convolutional neural networks model, then training set is loaded into the model and is trained.

S106 loads the verifying collection, passes through multiple-authentication during the convolutional neural networks model is trained Method, optimize the parameter of the convolutional neural networks model；

Specifically, in the training process, it is also necessary to carry out the parameter to trained convolutional neural networks model in conjunction with verifying collection It is adjusted.Using the method for multiple-authentication, verifying collection is loaded into test Card, the loss function value for calculating at this stage be it is how many, adjust convolutional Neural net in turn further according to calculating acquisition loss function value The parameter of network.It then proceedes to train, then verifies, then adjust, retraining, loop back and forth like this, by way of multiple-authentication, one Optimize the parameter of convolutional neural networks model step by step.

S107 is tested for the property the convolutional neural networks model by the test set, detects the convolution mind Generalization ability through network model；

After training, then detect by the image data in test set the property of the convolutional neural networks model after training Can, detect its generalization ability.Generalization ability (generalization ability) refers to machine learning algorithm to fresh sample Adaptability.The destination of study is to acquire the rule for lying in data behind, other than the study collection with same rule Data, trained network can also provide suitable output, which is known as generalization ability.

S108 carries out target identification using the convolutional neural networks model that generalization ability is met the requirements.

Specifically, can be protected if detecting that the generalization ability of the convolutional neural networks model is met the requirements Each parameter for depositing the convolutional neural networks model carries out the knowledge such as Small object, intensive target by the convolutional neural networks model Not.

In the present embodiment, the data set for being ready to complete to classify and mark is (comprising training set, verifying collection and test Collection), training set is loaded into and is trained in the convolutional neural networks model built, load verifying collection carries out in training process Verifying.Training result data are loaded after completing training.Test set picture is imported in test program to be tested, and test knot is obtained Fruit.The present invention has preferable resolution for Small object and compact target.

In above-described embodiment, step S105 is loaded into the training set in the convolutional neural networks model and is trained packet It includes:

Training parameter is arranged in S1051；The training parameter includes initial learning rate, weight attenuation rate, greatest iteration time Number；

Specifically, needing first to carry out initial setting up to model before training starts, in general, needing first to be arranged initial Learning efficiency, for example, initial learning rate 0.1 is arranged, 4 times of decline every time terminates when to 0.0005；In addition, there is provision of weight A series of training parameters such as attenuation rate and maximum number of iterations.

Image data in the training set is input in the convolutional neural networks model and is recycled repeatedly by S1052 Generation training；

S1053 after training, extracts training result.

Specifically, after training, the parameters such as weight of convolutional neural networks model after extracting training, after being trained Convolutional neural networks model.

Preferably, loss function equally plays very important effect in the training convolutional neural networks model. It is main foundation in learning process is also to judge that the important of algorithm superiority and inferiority is sentenced after study by calculating the size of loss function According to.Common loss function has mean square error (MSE), and it is also simplest loss function which, which is most classical, almost It is omnipotent, but it is less accurate.The loss function that the present invention uses is by being detected the loss of the centre coordinate of target, being detected target The loss of length and width regressand value, the loss of preceding background confidence level and classification in the picture is lost this four losses and is constituted.

Specifically, being detected the location coordinate information (x, y, w, h) of target in such as certain image.Wherein, (x, y) is tested Position coordinates of the central point of target in figure are surveyed, (w, h) is the length being detected target in the picture and width.Each costing bio disturbance is such as Under:

(1) loss of xy (object centre coordinate): xyloss=bool* (2-areaPred) * bce, wherein bool is to set Reliability, wherein bce is that the two-value of xy value intersects entropy loss, and it is smaller that this is worth smaller entire penalty values.

(2) loss of wh (anchor length and width regressand value) is calculated:

Whloss=bool* (2-areaPred) * (whtrue-whpred)², the case where ensuring confidence level (bool) Under, areaPred needs are bigger, and wh is needed as close to true value wh.

(3) confidence level loss (preceding background) loss: bool*bce+ (1-bool) * bce*ignore is calculated, wherein bool is Confidence level, bce are the two-value cross entropy of predicted value and actual degree of belief, and ignore indicates iou lower than certain threshold value but certain Existing object, ignores.In the case where ensuring confidence level (bool), predicted value is needed as close to true value, together When there is no the part of object to need as close to background true value, while multiplied by needing to ignore point accordingly.

(4) classification loss: bool*bce is calculated, confidence level is multiplied by a polytypic cross entropy.

To sum up, it sums to above four losses, the loss function of the invention used can be obtained, specific calculation formula is such as Under:

Lossfunction=bool* (2-areaPred) * bce+bool* (2-areaPred)

*(whtrue-whpred)²+bool*bce+(1-bool)*bce*ignore

+bool*bce

Wherein: bool is confidence level；Bce is two-value cross entropy；AreaPred is prediction block range；Whtrue is detected The length and width true value of target in the picture；Whpred is to be detected the length and width predicted value of target in the picture；Ignore is to hand over simultaneously Ignore point than the object lower than certain threshold value.

Another embodiment of the method for the present invention, as shown in Figure 2, comprising:

S201 constructs the data set for being detected target, includes detected target in the data set of the detected target Image data；

Image data in the data set of the detected target is detected according to preset ratio cut partition to described by S202 Survey the training set, test set and verifying collection of target；

S203 is labeled the target object in all image datas of the training set, test set and verifying collection；

S204 constructs the network structure of convolutional neural networks model；The convolutional neural networks model uses different spies Scale is levied to predict object；

S205 is loaded into the training set in the convolutional neural networks model and is trained；

Specifically, initial learning rate, weight attenuation rate, a system such as maximum number of iterations is first arranged before training starts Column training parameter.Then start to train again.

S206 loads the verifying collection during convolutional neural networks model training；

Therefore the verifying for being also trained the stage in training process by verifying collection waits training to terminate, extract training As a result (each parameter, such as each neuron weight, learning rate etc. of convolutional neural networks model at the end of training).

S207, using the method for multiple-authentication, to the loss function value during the convolutional neural networks model training It is recorded；

S208 adjusts the connection weight in the neural network model between each neuron according to the loss function value；

S209, judges whether current training meets preset trained termination condition；If so, entering step S210；Otherwise, it returns Step S205 is returned to continue to be trained the neural network model；

Specifically, preset trained termination condition can sets itself, for example can be loss function value and reach preset damage Lose function threshold；Or frequency of training reaches preset maximum number of iterations etc..

S210 stops the training to the convolutional neural networks model, the convolutional neural networks mould after saving training Each parameter of type；

S211 is tested for the property the convolutional neural networks model by the test set, detects the convolution mind Generalization ability through network model；

It is tested using test set: by the training result after training (for example, convolutional neural networks model after training The weight of current each neuron) it is loaded into training pattern, load test collection data are tested for the property simultaneously in test code The generalization ability of detection model, obtains test result.

S212 carries out target identification using the convolutional neural networks model that generalization ability is met the requirements.

In the present embodiment, when training convolutional neural networks model, current training can also be verified, Specifically, calculating and saving the loss function value of current training pattern, then adjusted in turn according to current loss function value The weight of each neuron of training pattern, to play the role of optimizing the parameter of the training pattern, so that the training pattern is defeated Result out is more and more correct, and error is smaller and smaller.When verifying optimization module detects the loss function value of current training pattern When having reached preset loss function threshold value, then can first deconditioning, save current training result (each parameter), so The training pattern is tested by test module again afterwards, detects its generalization ability, is trained in this way Convolutional neural networks model has preferable resolution for Small object and compact target.

In any of the above-described embodiment, the network architecture of the convolutional neural networks model is specific as shown in figure 3, the convolution is refreshing 0-95 layer through network model is depth convolutional layer (part A in figure)；Specifically, the depth convolutional layer of convolutional neural networks model It is made of convolution residual block (2*conv+1*res) and single layer convolutional layer (conv)；The convolution residual block includes two convolutional layers With a residual error layer；A single layer convolutional layer is inserted into after per several convolution residual blocks.Specific structure in convolutional layer and residual error layer As shown in Figure 4, Figure 5.Preferably, in the present embodiment, depth convolutional layer 68 convolutional layers in total, remaining is residual error layer (residual)。

Can there are one BN layers (BatchNormalization) (carrying out batch standardization) and one after each convolutional layer ReLU (activation primitive).Specifically, each convolutional layer has 32 convolution kernels (filter) in the present embodiment, each convolution kernel is big Small is 3*3, step-length 1.The picture for inputting pixel 416*416*3, output obtains 416*416*32's after convolution algorithm feature map.Residual layers do not influence input and output as a result, i.e. input and output are generally consistent, and main function is to subtract Small loss controls the propagation of gradient.

78-109 layers be network feature interaction layer (part B in Fig. 3).Three branch's outputs can be obtained by upper layer operation to make Prediction, the size for exporting characteristic pattern is respectively 13*13, tri- scales of 26*26,52*52, in each scale, passes through convolution kernel Mode realizes local feature interaction, realizes that the part between feature map is special by the convolution algorithm mode of 3*3 and 1*1 Sign interaction.Wherein: the characteristic pattern of 13*13 uses (116 × 90)；(156×198)；(373 × 326) this 3 anchor；26*26 Characteristic pattern use (30 × 61)；(62×45)；(59 × 119) this 3 anchor；The characteristic pattern of 52*52 uses (10 × 13)； (16×30)；(33 × 23) this 3 anchor.

On the other hand, the invention also discloses a kind of object detection system based on convolutional neural networks, the target detections System can be used it is of the invention target object is detected based on the object detection method of convolutional neural networks, specifically, this implementation The object detection system based on convolutional neural networks, as shown in Figure 6, comprising:

Data set constructs module 100, for constructing the data set of detected target, in the data set of the detected target Image data comprising being detected target；Image data set needed for creation.Specifically, the data set that such as the present embodiment uses It is the image chosen from ImagNet database according to the target category of required detection, such as: teacup, chair, people etc. (ImagNet is the large-scale visible database for being used for the research of visual object identification software, is that image recognition is maximum in the world Database).

Data set division module 200, for by the image data in the data set of the detected target according to preset Ratio cut partition to the training set of the detected target, test set and verifying collects；Specifically, data set division module 200 to from The image data downloaded in ImageNet database is classified, and is divided into three classes and is distributed as training set, test set, verifying collection.It draws Point ratio can be with preset in advance, and general training collection proportion is larger, about can achieve 80%, test set and verifying collection ratio Close, probably each station 10% or so certainly, on this basis, can carry out subtle adjustment again, for example, in the present embodiment, training Collection, test set, verifying collect the ratio of shared conceptual data collection respectively (78%, 10%, 12%).

Labeling module 300, for the target object in all image datas to the training set, test set and verifying collection It is labeled；Labeling module 300 is needed to needing detected target to be labeled work in all image datas using mark Software.Main purpose is to collect the location coordinate information (x, y, w, h) for being detected target in image.(x, y) is detected Position coordinates of the central point of target in figure, (w, h) are the length being detected target in the picture and width.

Model construction module 400, for constructing the network structure of convolutional neural networks model；The convolutional neural networks mould Type predicts object using different characteristic dimensions；Specifically, in order to identify various Small object objects, dense object, The even target object of lap then just has higher requirement to the network architecture of convolutional neural networks model.This implementation Example is improved on the basis of traditional convolutional neural networks, allows the convolutional neural networks model using difference Characteristic dimension predict object.

The present embodiment improves whole convolutional neural networks structure, carries out operation with multilayer convolutional layer, and increase The feature interaction number of plies of network is realized using multiple dimensioned convolution algorithm, improves image characteristics extraction precision, and operation essence is improved Degree is improved largely to Small object, compact intensive or the multiple targets of high superposed target detection effect

Training module 500 is trained for being loaded into the training set in the convolutional neural networks model；

Optimization module 600 is verified, for loading the verifying during convolutional neural networks model is trained Collection, by the method for multiple-authentication, optimizes the parameter of the convolutional neural networks model；

Test module 700 is detected for being tested for the property by the test set to the convolutional neural networks model The generalization ability of the convolutional neural networks model；After training, then training detected by the image data in test set The performance of convolutional neural networks model afterwards, detects its generalization ability.

Target identification module 800, convolutional neural networks model for being met the requirements by generalization ability carry out target Identification.If detecting that the generalization ability of the convolutional neural networks model is met the requirements, convolution mind can be saved Each parameter through network model, later period can load each parameter of the convolutional neural networks model of preservation to carry out Small object, intensive The identification such as target.

Preferably, on the basis of the above system embodiment, as shown in fig. 7, the training module 500 includes:

Parameter setting submodule 510, for training parameter to be arranged；The training parameter includes initial learning rate, and weight declines Lapse rate, maximum number of iterations；Specifically, needing first to carry out initial setting up to model, in general, needing before training starts Initial learning efficiency is first set, for example, initial learning rate 0.1 is arranged；In addition, there is provision of weight attenuation rate and greatest iteration time A series of training parameters such as number.

Training submodule 520, for the image data in the training set to be input to the convolutional neural networks model Middle progress loop iteration training；

As a result extraction module 530, after training, extraction training result.Specifically, extracting and (protecting after training Deposit) parameters such as the weight of convolutional neural networks model after training, the convolutional neural networks model after being trained.

(2) loss of wh (anchor length and width regressand value) is calculated:

Lossfunction=bool* (2-areaPred) * bce+bool* (2-areaPred)

*(whtrue-whpred)²+bool*bce+(1-bool)*bce*ignore

+bool*bce

Using loss function of the invention, it is more conducive to the training to Small object object identification model.

In above-described embodiment, the verifying optimization module 600 includes:

Submodule 610 is loaded, for during the convolutional neural networks model training, loading the verifying collection；

Verify submodule 620, the method for using multiple-authentication, during the convolutional neural networks model training Loss function value recorded；

Weight adjusting submodule 630, for adjusting each nerve in the neural network model according to the loss function value Connection weight between member；

Judging submodule 640, for judging currently whether training meets preset trained termination condition；It is trained not when current When meeting preset trained termination condition, the training module 500 continues to be trained the neural network model；For example, Preset trained termination condition are as follows: the loss function value reaches preset loss function threshold value or frequency of training reaches preset Maximum number of iterations.As long as the loss function value that so verifying submodule 620 authenticates to current convolutional neural networks model reaches Preset maximum number of iterations will terminate to train before preset loss function value or the number of training have reached.

Parameter sub-module stored 650, for when current training meets preset trained termination condition, the training module 500 stop the training to the convolutional neural networks model, each parameter of the convolutional neural networks model after saving training. After training, parameter sub-module stored 650 will save each parameter of "current" model, and later period test module 700 passes through test Previously stored training result (each parameter of the training pattern of preservation) is loaded into training pattern when collection test, in test generation Load test collection data are tested for the property the generalization ability of simultaneously detection model in code, if detecting the generalization ability of the model It meets the requirements, then can be carried out in target identification, especially market using the trained convolutional neural networks model The identification of all kinds of Small objects, intensive target etc..If generalization ability is unsatisfactory for requiring, will return again to convolutional Neural net Network model is trained, and until its general Huaneng Group power is met the requirements, finally obtains the volume met the requirements by adequately training Product neural network model, saves its final training result (each parameter of the trained convolutional neural networks model).

The network architecture of the convolutional neural networks model used in the present invention is as shown in figure 3, the convolutional neural networks 0-95 layers are depth convolutional layer (in figure shown in A), and the depth convolutional layer of convolutional neural networks model is by convolution block and single layer convolution Layer is constituted；The convolution block includes two convolutional layers and a residual error layer；A single layer convolutional layer is inserted into after per several convolution blocks. Depth convolutional layer in the present embodiment includes 68 convolutional layers, remaining is residual error layer.

Can there are one BN layers (BatchNormalization) (carrying out batch standardization) and one after each convolutional layer ReLU (activation primitive).Each convolutional layer has 32 convolution kernels (filter), and each convolution kernel size is 3*3, step-length 1.It is defeated The picture for entering pixel 416*416*3, output obtains the feature map of 416*416*32 after convolution algorithm.Residual Layer does not influence input and output as a result, i.e. input and output are generally consistent, and main function is to reduce loss, controls the biography of gradient It broadcasts.

The 96-126 layer of the convolutional neural networks is characterized alternation of bed (in Fig. 3 shown in B), includes three scales, each Local feature interaction is realized by way of convolution kernel in scale.Three branch's outputs can be obtained by upper layer operation to make prediction, The size of output characteristic pattern is respectively 13*13, tri- scales of 26*26,52*52, real by way of convolution kernel in each scale The feature interaction in current situation portion realizes the interaction of the local feature between feature map by the convolution algorithm mode of 3*3 and 1*1. Wherein: the characteristic pattern of 13*13 uses (116 × 90)；(156×198)；(373 × 326) this 3 anchor；The feature of 26*26 Figure uses (30 × 61)；(62×45)；(59 × 119) this 3 anchor；The characteristic pattern of 52*52 uses (10 × 13)；(16× 30)；(33 × 23) this 3 anchor.

The object detection system based on convolutional neural networks of the present embodiment improves whole convolutional neural networks knot Structure carries out operation with multilayer convolutional layer, and increases the feature interaction number of plies of network, is realized using multiple dimensioned convolution algorithm, Image characteristics extraction precision is improved, operational precision is improved, to Small object, the compact intensive or multiple mesh of high superposed target Target detection effect is improved largely.

The device of the invention embodiment is corresponding with embodiment of the method for the invention, of the invention based on convolutional neural networks The technical detail of object detection method is equally applicable to the object detecting device of the invention based on convolutional neural networks, to reduce It repeats, repeats no more.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of object detection method based on convolutional neural networks characterized by comprising

Construct the data set for being detected target, the image data comprising detected target in the data set of the detected target；

By the image data in the data set of the detected target according to preset ratio cut partition to the detected target Training set, test set and verifying collection；

Target object in all image datas of the training set, test set and verifying collection is labeled；

Construct the network structure of convolutional neural networks model；The convolutional neural networks model is using different characteristic dimensions come pre- Survey object；

The training set is loaded into the convolutional neural networks model to be trained；

During the convolutional neural networks model is trained, the verifying collection is loaded, it is excellent by the method for multiple-authentication Change the parameter of the convolutional neural networks model；

The convolutional neural networks model is tested for the property by the test set, detects the convolutional neural networks model Generalization ability；

Target identification is carried out using the convolutional neural networks model that generalization ability is met the requirements.

2. a kind of object detection method based on convolutional neural networks according to claim 1, which is characterized in that it is described It is loaded into the training set in the convolutional neural networks model and is trained and include:

Training parameter is set；The training parameter includes initial learning rate, weight attenuation rate, maximum number of iterations；

Image data in the training set is input in the convolutional neural networks model and carries out loop iteration training；

After training, training result is extracted.

3. a kind of object detection method based on convolutional neural networks according to claim 1, which is characterized in that in training In the convolutional neural networks model, the loss function calculation formula of use is as follows:

Lossfunction=bool* (2-areaPred) * bce+bool* (2-areaPred) * (whtrue-whpred)²+ bool*bce+(1-bool)*bce*ignore+bool*bce

Wherein: bool is confidence level；Bce is two-value cross entropy；AreaPred is prediction block range；Whtrue is to be detected target Length and width true value in the picture；Whpred is to be detected the length and width predicted value of target in the picture；Ignore is to hand over and than low Ignore point in the object of certain threshold value.

4. a kind of object detection method based on convolutional neural networks according to claim 1, which is characterized in that it is described It during the convolutional neural networks model is trained, loads the verifying and collects, by the method for multiple-authentication, described in optimization The parameter of convolutional neural networks model includes:

During the convolutional neural networks model training, the verifying collection is loaded；

Using the method for multiple-authentication, the loss function value during the convolutional neural networks model training is recorded；

According to the loss function value, the connection weight in the neural network model between each neuron is adjusted；

Whether the current training of judgement meets preset trained termination condition；

When current training is unsatisfactory for preset trained termination condition, continue to be trained the neural network model；

When current training meets preset trained termination condition, stops the training to the convolutional neural networks model, save Each parameter of the convolutional neural networks model after training.

5. a kind of object detection method based on convolutional neural networks according to claim 4, which is characterized in that described pre- If training termination condition are as follows:

The loss function value reaches preset loss function threshold value or frequency of training reaches preset maximum number of iterations.

6. a kind of object detection method based on convolutional neural networks according to claim 1-5, feature exist In,

The 0-95 layer of the convolutional neural networks is depth convolutional layer, is made of convolution block and single layer convolutional layer；The convolution block Include two convolutional layers and a residual error layer；A single layer convolutional layer is inserted into after per several convolution blocks；

The 96-126 layer of the convolutional neural networks is characterized alternation of bed, includes three scales, passes through convolution kernel in each scale Mode realize local feature interaction.

7. a kind of object detection system based on convolutional neural networks characterized by comprising

Data set constructs module, includes quilt in the data set of the detected target for constructing the data set of detected target Detect the image data of target；

Data set division module, for by the image data in the data set of the detected target according to preset ratio cut partition To the training set of the detected target, test set and verifying collection；

Labeling module is marked for the target object in all image datas to the training set, test set and verifying collection Note；

Model construction module, for constructing the network structure of convolutional neural networks model；The convolutional neural networks model uses Different characteristic dimension predicts object；

Training module is trained for being loaded into the training set in the convolutional neural networks model；

Optimization module is verified, for during the convolutional neural networks model is trained, loading the verifying collection, is passed through The method of multiple-authentication optimizes the parameter of the convolutional neural networks model；

Test module detects the volume for being tested for the property by the test set to the convolutional neural networks model The generalization ability of product neural network model；

Target identification module, convolutional neural networks model for being met the requirements by generalization ability carry out target identification.

8. a kind of object detection system based on convolutional neural networks according to claim 7, which is characterized in that the instruction Practicing module includes:

Parameter setting submodule, for training parameter to be arranged；The training parameter includes initial learning rate, weight attenuation rate, most Big the number of iterations；

Training submodule, is followed for the image data in the training set to be input in the convolutional neural networks model Ring iterative training；

As a result extraction module, after training, extraction training result.

9. a kind of object detection system based on convolutional neural networks according to claim 7, which is characterized in that described to test Demonstrate,proving optimization module includes:

Submodule is loaded, for during the convolutional neural networks model training, loading the verifying collection；

Verify submodule, the method for using multiple-authentication, to the loss during the convolutional neural networks model training Functional value is recorded；

Weight adjusting submodule, for adjusting in the neural network model between each neuron according to the loss function value Connection weight；

Judging submodule, for judging currently whether training meets preset trained termination condition；When current training be unsatisfactory for it is pre- If training termination condition when, the training module continues to be trained the neural network model；

Parameter sub-module stored, for when current training meets preset trained termination condition, the training module stops pair The training of the convolutional neural networks model, each parameter of the convolutional neural networks model after saving training.

10. according to a kind of described in any item object detection systems based on convolutional neural networks of claim 7-9, feature exists In,