CN109033940A

CN109033940A - A kind of image-recognizing method, calculates equipment and storage medium at device

Info

Publication number: CN109033940A
Application number: CN201810565420.2A
Authority: CN
Inventors: 陈华官
Original assignee: Shanghai Map Intelligent Network Technology Co Ltd; Shenzhen Yi Chart Information Technology Co Ltd; Shanghai Is According To Figure Network Technology Co Ltd
Current assignee: Shanghai ituzhian artificial intelligence Co.,Ltd.; Shanghai Yitu Technology Co ltd; Shenzhen Yitu Information Technology Co ltd
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2018-12-18
Anticipated expiration: 2038-06-04
Also published as: CN109033940B

Abstract

This application discloses a kind of image-recognizing method, device, equipment and storage medium are calculated, to solve the problems, such as that the precision of image recognition is not high in image procossing in the prior art.The described method includes: image to be detected is input in the convolutional neural networks model that training is completed in advance, the characteristic image of described image is determined by the convolutional neural networks model, and the recognition result of described image is determined according to the characteristic image；The convolutional neural networks model, the convolution kernel that the convolution kernel and weight matrix shared including weight matrix are not shared.

Description

A kind of image-recognizing method, calculates equipment and storage medium at device

Technical field

This application involves deep learning field more particularly to a kind of image-recognizing method, device, calculate equipment and storage Jie Matter.

Background technique

Recent years, with the development of deep learning, most computers vision (CV) problem is based on deep learning side What method solved, the CNN network mainly inside deep learning method.Convolutional neural networks (Convolutional Neural Networks, CNN) it is an important application of the deep learning algorithm in field of image processing.Convolutional neural networks are by image Two-dimensional discrete convolution algorithm and artificial neural network in processing combine, and can automatically extract feature, be mainly used in two dimension The identification and detection of image.

CNN network generally comprises convolutional layer, and pond layer connects layer entirely.Convolutional layer is the most important layer of convolutional neural networks, product Layer is mainly used for extracting image shallow-layer feature, such as: edge, gradient information.The convolution algorithm of convolutional layer occupies entire convolution mind Operand through network 95% is the key that convolutional neural networks functionization.

For example, human face recognition model in the prior art, for example, the solution of DeepID is to be cut into a face very Multi-section point, a model is trained in each part, then model aggregation.The target of DeepID is that face verification (judges two figures Whether piece is a people), while deriving recognition of face (multiple face verification).It is defeated using convolutional neural networks learning characteristic Enter a picture, then the characteristic image that output 160 is tieed up is classified using ready-made shallow-layer machine learning combination Bayes. DeepID is using the method for increasing data set, first is that increase new data, celebFaces (87628 pictures, 5436 people), CelebFaces+ (202599 pictures, 10177 people)；Second is that cutting picture, multizone, multiple dimensioned, multichannel are cut, so The vector of calculating is combined afterwards, using PCA dimensionality reduction, and then obtains preferable recognition accuracy.FaceNet is then not account for office Portion's feature directly obtains acceptable accuracy of identification with special objective function greatly by data volume.

As can be seen that the operand of above-mentioned human face recognition model is all larger, it is difficult to further increase facial image identification Precision and the light-weighted demand of model.

Summary of the invention

The embodiment of the present application provides a kind of image-recognizing method, device, calculates equipment and storage medium, existing to solve Have in technology that accuracy of identification is not high in recognition of face, the lower problem of model training efficiency.

The embodiment of the present application provides a kind of image-recognizing method, this method comprises:

Obtain image to be detected；

The image to be detected is input in the convolutional neural networks model that training is completed in advance, passes through the convolution Neural network model determines the characteristic image of described image, and the recognition result of described image is determined according to the characteristic image；

The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution unit；It is described First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is the convolution Rear M convolutional layer in neural network model；The third convolution unit is in the convolutional neural networks model except described the R convolutional layer outside one convolution unit and second convolution unit；N, M, R are positive integer；

The convolution kernel in N number of convolutional layer in first convolution unit is the convolution kernel that weight matrix is shared；Described The convolution kernel in M convolutional layer in two convolution units is the convolution kernel that weight matrix is shared；The third convolution unit it is every Convolution kernel in a convolutional layer includes that the convolution kernel that at least one weight matrix is not shared and at least one weight matrix are shared Convolution kernel；For a convolutional layer, the number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, with The depth of the convolutional layer is negatively correlated.

In the embodiment of the present application, image information relevant to location information is corresponding by the convolution kernel that weight matrix is not shared Convolutional layer obtain, obtained with the corresponding convolutional layer of convolution kernel that the unrelated image information of location information is shared by weight matrix , the convolution kernel that the versatility and weight matrix that can combine the convolutional layer of the shared convolution kernel of weight matrix are not shared The position characteristic of convolutional layer preferably utilizes the information of image local in the case where saving network parameter.In addition, comparing The convolutional neural networks model of existing recognition of face, significantly increases the depth of its model, improves the precision of recognition of face.

Each convolutional layer of a kind of possible implementation, the third convolution unit is divided into L group by identity principle；Pass through After the L group carries out convolution respectively, together by the characteristic image linear superposition of L group convolutional layer output, described the is obtained The output result of three convolution units；L is positive integer.

While the purpose of grouping is to reduce parameter amount, precision is improved.By way of grouping, advised in same parameter Higher precision is obtained under conditions of mould, is conducive to after the parameter amount for increasing the convolution kernel that weight matrix is not shared, Bu Huiming The aobvious parameter scale for increasing model, preferably improves the precision of recognition of face.It is preferably sharp in the case where saving network parameter With the information of image local.

A kind of possible implementation, the convolution kernel of the R group convolutional layer in the L group convolutional layer are that weight matrix is not shared Convolution kernel；The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the convolution kernel that weight matrix is shared.

In the embodiment of the present application, it can be made by convolution kernel by group distribution by weight matrix using the convolutional layer after grouping The convolutional layer for the convolution kernel that the convolutional layer and weight matrix of shared convolution kernel are not shared becomes a parallel-connection structure, rather than goes here and there It is coupled structure, image information relevant to location information is obtained by the corresponding convolutional layer of convolution kernel that weight matrix is not shared, with The unrelated image information of location information is obtained by the corresponding convolutional layer of convolution kernel that weight matrix is shared, can be easily real The convolutional layer for the convolution kernel that the versatility and weight matrix for now taking into account the convolutional layer of the shared convolution kernel of weight matrix are not shared Position characteristic, and then can easily realize the optimization of the feature of modeling.

A kind of possible implementation, the convolution kernel that at least one described weight is not shared include the image to be detected Position feature；It is described to be input to image to be detected before the convolutional neural networks model that training is completed in advance, further includes:

The position of at least one feature in the image to be detected is positioned；At least one described feature is root It is determined according to the convolutional neural networks model；

The position of at least one feature in described image is adjusted to feature pair described in the convolutional neural networks model The position answered.

The embodiment of the present application provides a kind of pattern recognition device, which includes:

Module is obtained, for obtaining image to be detected；

Processing module, for the image to be detected to be input to the convolutional neural networks model that training is completed in advance In, the characteristic image of described image is determined by the convolutional neural networks model, and according to characteristic image determination The recognition result of image；The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution list Member；First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is institute State the rear M convolutional layer in convolutional neural networks model；The third convolution unit is to remove in the convolutional neural networks model R convolutional layer outside first convolution unit and second convolution unit；N, M, R are positive integer；The first convolution list The convolution kernel in N number of convolutional layer in member is the convolution kernel that weight matrix is shared；M convolutional layer in second convolution unit In convolution kernel be weight matrix share convolution kernel；Convolution kernel in each convolutional layer of the third convolution unit includes extremely The convolution kernel that the convolution kernel and at least one weight matrix that a few weight matrix is not shared are shared；For a convolutional layer, institute State the not shared convolution kernel of weight matrix number account for convolution kernel total number ratio, it is negatively correlated with the depth of the convolutional layer.

A kind of possible implementation, the processing module are also used to:

The position of at least one feature in the image to be detected is positioned；At least one described feature is root It is determined according to the convolutional neural networks model；The position of at least one feature in described image is adjusted to the convolutional Neural The corresponding position of feature described in network model.

The embodiment of the present application provides a kind of computer readable storage medium, and the computer-readable recording medium storage has Computer executable instructions, the computer executable instructions are for making the computer execute any one of above-described embodiment institute The method stated.

The embodiment of the present application provides a kind of calculating equipment of image recognition, comprising:

Memory, for storing program instruction；

Processor executes above-mentioned implementation according to the program of acquisition for calling the program instruction stored in the memory Method described in any one of example.

The embodiment of the present application provides a kind of image-recognizing method, device, calculates equipment and computer readable storage medium, Image to be detected is identified by convolutional neural networks model, wherein the convolutional neural networks model, including first Convolution unit, the second convolution unit and third convolution unit；First convolution unit is in the convolutional neural networks model Top n convolutional layer, second convolution unit be the convolutional neural networks model in rear M convolutional layer；The third Convolution unit is R volume in the convolutional neural networks model in addition to first convolution unit and second convolution unit Lamination；The convolution kernel in N number of convolutional layer in first convolution unit is the convolution kernel that weight matrix is shared；The volume Two The convolution kernel in M convolutional layer in product unit is the convolution kernel that weight matrix is shared；Each volume of the third convolution unit Convolution kernel in lamination includes the convolution that the convolution kernel that at least one weight matrix is not shared and at least one weight matrix are shared Core；For a convolutional layer, the number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, and described The depth of convolutional layer is negatively correlated.By above-mentioned convolutional neural networks model, it can effectively improve and not shared using weight matrix The convolution depth of neural network model of convolution kernel have under the premise of not increasing the calculation amount of neural convolutional network model The accuracy of the raising image recognition of effect.

Detailed description of the invention

Fig. 1 is a kind of schematic diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 2 is a kind of schematic diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 3 is a kind of flow diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 4 is a kind of schematic diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 5 is a kind of schematic diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 6 is a kind of schematic diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 7 is a kind of schematic diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 8 is a kind of schematic diagram of image-recognizing method provided by the embodiments of the present application；

Fig. 9 is a kind of structural schematic diagram of pattern recognition device provided by the embodiments of the present application.

Specific embodiment

In order to keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application make into It is described in detail to one step, it is clear that the described embodiments are only a part but not all of the embodiments of the present application. Based on the embodiment in the application, obtained by those of ordinary skill in the art without making creative efforts all Other embodiments shall fall in the protection scope of this application.

Before specific to recognition of face, first to the Face datection in recognition of face, face normalization, face verification, people Face identification carries out necessary explaining illustration.

Face datection detects the face in image, and by result with rectangle frame outline come.The method that can be used Have and learns to obtain with support vector regression algorithm by local binary patterns (Local Binary Patterns, LBP) feature Datum mark comes out the regional frame of face.

Face normalization carries out the correction of posture to the face detected, and the precision of recognition of face can be improved by correcting. The method of the method that the method for correction has 2D correction, 3D correction, 3D correction can be such that side face is preferably identified.Correction at present The requirement of real-time is fully achieved during processing.When carrying out face normalization, the position of detection characteristic point is had This step is set, these characteristic point positions mainly such as left-hand side nose, nostril downside, pupil position, the positions such as upper lip downside, After being aware of the position of these characteristic points, the deformation of position driving is done, face can be corrected by.

Face verification is based on matched mode, so the answer that it is obtained is "Yes" or "no".In concrete operations When, a test picture is given, is then matched in turn, has matched and has then illustrated what test image and this matched Face is the face of the same person.For example, used in face brush face punch card system (should) it is this method: it records one by one offline Enter the human face photo (face of employee's typing is more than one general) of employee, employee's camera when brush face is checked card captures To after image, then the advanced row Face datection said by front carries out face normalization, then carry out face verification, once It is "Yes" with result, illustrates that the personnel of this brush face belong to this office, face verification is just completed to this step.From When line keyboarder worker's face, we can be corresponding with name by face, in this way Yi Dan after face verifies successfully, so that it may Know that whom this people is.Such a system advantage described above is that development cost are cheap, is suitble to minimized office place, is lacked Point is cannot to block in capture, but also human face posture is required to compare calibration.

What recognition of face to be answered is " Who Am I? ", compared to the matching process that face verification uses, recognition of face is being known The other stage is more the means using classification.It is to the image (people done after two step of front i.e. Face datection, face normalization Face) classification.

Recognition of face is segmented into two kinds by tagsort: one is the recognitions of face based on shallow-layer feature, and one is bases In the face identification method of deep learning.Shallow-layer face identification method extracts the local feature of facial image first, such as SIFT, Then they are assembled Global Face description by certain pond mechanism by the features such as LBP, HOG.People based on deep learning Face recognition method is DeepFace than more typical code usually using CNN structure, and this method uses the CNN net of a deep layer Network structure, the used data set number of training is 4,000,000, altogether includes the face of 4000 people.

It is based primarily upon for the human face recognition model of Deepface and is described in the embodiment of the present application；For other faces Identification model can be implemented with reference to the method in the embodiment of the present application, and details are not described herein.

As shown in Figure 1, DeepFace has used 3D model that facial image is calibrated to typical posture in the pretreated stage Under.The basic procedure of recognition of face are as follows: Face datection → face alignment → face representation → face classification.

It is aligned process, a kind of possible implementation for face, the convolutional neural networks model needs are aligned The model of recognition of face, then it is described to be input to image to be detected before the convolutional neural networks model that training is completed in advance, Further include:

Step 1: being positioned to the position of at least one feature in the image to be detected；

Step 2: adjusting the position of at least one feature in described image to described in the convolutional neural networks model The corresponding position of feature.

In step 1, at least one described feature is to be determined according to the convolutional neural networks model；For example, Fig. 1 (a) 6 basic points (2 eye centers, 1 nose point, the point on 3 mouths) can be used in, or pass through local binary patterns (Local Binary Patterns, LBP) feature is learnt to obtain datum mark with support vector regression algorithm, by the area of face Domain, which outlines, to be come；It is of course also possible to which the feature according to needed for the convolutional neural networks model that the preparatory training is completed determines benchmark The number and location of point.

In step 2, it may include steps of in the specific implementation process: Fig. 1 (b) two dimension shearing, by face part It cuts out and；Fig. 1 (c) selects 67 basic points, then by triangulation trigonometric ratio, adds triangle at profile to keep away Exempt from discontinuous；Face after trigonometric ratio is converted into 3D shape by Fig. 1 (d)；Face after trigonometric ratio is become having depth by Fig. 1 (e) The 3D triangulation network；Fig. 1 (f) deflects the triangulation network, make face just forward-facing；Fig. 1 (g) finally puts positive face；Fig. 1 (h) Determine the face of a new angle.Face is aligned using 3D model, for face after alignment, the feature of human face region can To be fixed in certain pixels, at this point it is possible to convolutional neural networks model can be made to send out come learning characteristic with convolutional neural networks Wave better effect.

After 3D is aligned, the image of formation is all the image of uniform format, is input in convolutional neural networks model, To obtain face representation, the structure of the model includes: the first convolutional layer, the second convolutional layer, maximum pond layer；First partial convolution Layer, the second local convolutional layer, third part convolutional layer, full articulamentum.

Specific parameter can be such that the first convolutional layer: 32 11 × 11 × 3 convolution kernels；Wherein, it is described at least one Feature can correspond to a convolution kernel；According to preparatory trained convolutional neural networks model, the mapping to be checked can be determined As feature to be determined；Maximum pond layer: 3 × 3, step-length=2；Second convolutional layer: 16 9 × 9 convolution kernels；Three first layers Purpose is to extract the feature of low level, such as simple side and texture.The wherein Max-pooling layers of output pair for making convolution Small drift condition is more robust.But without with too many Max-pooling layer, because of too many Max-pooling layer meeting So that network losses image information.

First partial convolutional layer: 16 9 × 9 convolution kernels, in local convolutional layer, the weight matrix of convolution kernel is not shared； Second local convolutional layer: 16 7 × 7 convolution kernels, weight matrix are not shared；Third part convolutional layer: 16 5 × 5 convolution Core, weight matrix are not shared；Full articulamentum: 4096 dimensions；Softmax layers: 4030 dimensions.Weight matrix in above structure is not shared Convolution kernel include the image to be detected position feature.Three layers next are all convolution that using weights matrix is not shared Core, the reason is as follows that: in the face picture of alignment, different regions has different statistical natures, and the local stability of convolution is false And if be not present, so will lead to the loss of information using identical convolution kernel；It is special not increase extraction for shared convolution kernel Calculation amount when sign, and calculation amount when will increase trained；Using not shared convolution kernel, trained parameter amount is needed to increase Add.

Upper one layer of each unit is connected by full articulamentum with all units of this layer, for capturing facial image difference position The correlation between feature set.Wherein, the 7th layer (4096 dimension) is used to represent face.The output of full articulamentum can be used for The input of Softmax, Softmax layers for classifying.

For 4096 dimensional vectors of convolutional neural networks model output, the normalization of face representation can be carried out: first each Dimension is normalized, i.e., will be divided by maximum value of the dimension on entire training set for every one-dimensional in result vector.Often A vector carries out L2 normalization.

After obtaining face representation, a variety of methods can be used and classify.For example, directly calculating inner product；The card side of weighting away from From；Use the methods of Siamese network structure.

For the accuracy rate of the image recognition of raising convolutional neural networks model, convolutional layer may include two kinds, and one is power The shared convolutional layer of value matrix, the characteristic information in this convolutional layer are not dependent on each position of input picture, i.e. image The image distribution of feature of each position be the same, the weight matrix of convolution kernel can be shared, it is possible to reduce a large amount of Parameter setting.

And another kind is the convolutional layer that weight matrix is not shared.The convolutional layer that do not shared by weight matrix is known in image In other field, such as in recognition of face, several key points of face (such as: eyes, nose, the corners of the mouth) are transformed in image Fixation position, be then input in training pattern progress Feature Extraction (most important face recognition module), Ke Yiyou Accuracy rate of the raising of effect for this precise image identification of face.Since in this method, the image of each position divides Cloth and other positions are different.For example, the image distribution of eye portion and the corners of the mouth are different in facial image.Therefore, The convolutional layer that weight matrix is not shared conflicts with the hypothesis for the convolutional layer that weight matrix is shared, the volume that weight matrix is not shared Lamination is needed for each position using unused weight come each self-modeling, and therefore, the CNN based on part connection convolutional layer increases A large amount of parameter is added.

However, the feature for relying on the image information of position can be more and more weaker after convolution from level to level.At this point, the right to use The effect for the convolutional layer that value matrix is not shared also can be more and more weaker.When the depth down of convolutional neural networks, such as: network becomes When at 50 layers, part connection convolutional layer is added in last several layers of, feature is substantially that position is unrelated at this time, can not utilize image Locality information, the parameter amount for resulting in overall model is excessive, and training effectiveness is lower, and trained effect is poor.

Therefore, the convolution depth of the method for current recognition of face is all shallower, and accuracy of identification is difficult to further increase.

And to the convolutional neural networks that weight is shared, it mainly include LeNet, AlexNet, VGG, GoogLeNet, ResNet Equal models, LeNet are mainly used for 10 handwritten numerals of identification, identify that the precision of picture is not high, the number of plies is shallower.Because of CNN Can extract shallow-layer, middle layer, deep layer feature, the number of plies of network is more, it is meant that the feature that can extract different layers is richer It is rich.Also, the feature that deeper network extracts is more abstract, more has semantic information.The convolutional Neural net of VGG and GoogLeNet The depth of network model is deeper.GoogLeNet increases starting (Inception) structure, by 1x1, the convolutional layer of 3x3,5x5 and The pond of 3x3, is stacked, and on the one hand increases the width of network, on the other hand increases network to the adaptability of scale； 1 × 1 convolution is mainly used to dimensionality reduction in initial structure, can all be expanded with the width and depth of whole network structure after starting mechanism Greatly, 2-3 times of performance boost can be brought.

In the training process, it also needs to consider that the network number of plies increases bring degenerate problem.By residual error network, network is set It is calculated as H (x)=F (x)+x, as shown in Figure 2.As long as can be converted to study one residual error function F (x)=H (x)-x. F (x)= 0, identical mapping H (x)=x is just constituted, F is network mapping before summing, and H is the network mapping after being input to summation. The thought of residual error is all to remove identical main part, so that small variation is protruded, so that regression criterion is more easier.? In ResNet, its depth is made to can achieve 152 layers by residual error network.Solve level it is deep when can not train Problem.

In conclusion can realize that deeper convolution is deep by various methods in the convolutional neural networks that weight is shared Degree to obtain more information content, and then improves the accuracy of identification of image, handles more information content.

Although the depth in above-mentioned convolutional neural networks model is deeper, in human face recognition model, being can not be straight It connects and carries out recognition of face using above-mentioned model, be mainly due to, the volume that the weight matrix based on local feature is not shared The parameter amount of product core is excessive, is the convolution kernel that weight matrix is shared directly directly to be replaced with the volume that weight matrix is not shared Product core carries out recognition of face；In addition, the feature for relying on the image information of position can be more and more weaker after convolution from level to level, make The effect that the convolution kernel that do not shared with weight matrix carries out convolution also can be more and more weaker.It is reached in the depth of convolutional neural networks model When to tens layers, the convolutional layer for the convolution kernel that weight matrix is not shared is added in it is last several layers of, at this time feature be substantially position without It closes, the locality information of image can not be utilized, the parameter amount for resulting in overall model is excessive, and training effectiveness is lower, training Effect is poor.

Therefore, as shown in figure 3, the embodiment of the present application provides a kind of image-recognizing method, this method comprises:

Step 301: obtaining image to be detected；

Step 302: the image to be detected being input in the convolutional neural networks model that training is completed in advance, passed through The convolutional neural networks model determines the characteristic image of described image, and the knowledge of described image is determined according to the characteristic image Other result；

Wherein, the convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution list Member；First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is institute State the rear M convolutional layer in convolutional neural networks model；The third convolution unit is to remove in the convolutional neural networks model R convolutional layer outside first convolution unit and second convolution unit；N, M, R are positive integer；

Wherein, the convolutional layer of the convolutional layer or the second convolution unit of first convolution unit schematic diagram as shown in figure 4, Including, the size of the characteristic image of input is W1 × H1 × N1, and W1 is characterized the width of image, and H1 is characterized the height of image, N1 is characterized the port number of image.Since the number of upper one layer of convolution determines the port number of output, and upper one layer of output is logical Road number determines the depth of the next layer of each convolution kernel of convolutional layer.Therefore, the size of the convolution kernel of the convolutional layer is N1 × K × K, Including N2 convolution kernel, the output of the convolutional layer is W2 × H2 × N2, and wherein W1 and W2 can be identical, and H1 and H2 can be different, Different situations is that and experienced the down-sampled layer in pond before convolutional layer output.

The convolutional layer of second convolution unit can be as shown in Figure 5, comprising: the size of the characteristic image of input be W1 × H1 × N1, W1 are characterized the width of image, and H1 is characterized the height of image, and N1 is characterized the port number of image.The convolutional layer The size of the shared convolution kernel of weight matrix is N1 × K × K, including N2¹The convolution kernel that a and weight matrix is not shared it is big Small N1 × K × K, number N2²It is a.The output of the convolutional layer is W2 × H2 × (N2¹+N2²), after a characteristic image input, point Not Jing Guo common convolutional layer and part connection convolutional layer, the characteristic image that the two exports is linked together later.Wherein W1 and W2 can be identical, and H1 and H2 can be different, and different situations is before convolutional layer output, and the drop that experienced pond is adopted Sample layer.

The determination principle of the size of first convolution unit and the third convolution unit are as follows: CNN network most starts It is several layers of not replace, it is preceding several layers of substantially at completion extraction edge, and assembling basic configuration.It is considered that these are all The weight matrix of image can be general.CNN network it is last it is several layers of be also not replace, in last several layers of characteristic image Substantially location information is lost, the effect for the convolution kernel that weight matrix is not shared can not embody.Second convolution unit can be set For the convolutional layer of middle section.Such as: the convolutional layer of intermediate 1/3 extension position.

The ratio between convolution kernel that the shared convolution kernel of weight matrix and weight matrix are not shared can be with are as follows: with the volume The depth of lamination is negatively correlated.Specifically, front layer (the closer layer of the input of the second convolution unit of distance): weight matrix is not shared The convolutional layer accounting of convolution kernel want more relatively low because the size of characteristic image is bigger at this time, if weight matrix is not If the convolutional layer ratio of shared convolution kernel is too high, it will lead to that parameter is excessive, and data volume is excessive.Optional ratio is such as: 1/4, 1/8,1/16 etc..Back layer (the farther layer of input of the second convolution unit of distance): the volume for the convolution kernel that weight matrix is not shared Lamination accounting can be identical as the convolutional layer for the convolution kernel that the weight matrix of front is not shared or bigger.

The accounting for freely adjusting the convolution kernel that the shared convolution kernel of weight matrix and weight matrix are not shared as needed, keeps away Exempt to improve the utilization rate of local location characteristic information merely, the convolutional layer for the convolution kernel for sharing weight matrix not leans on very much Before, then parameter amount can explode, and cause to train over-fitting；And it avoids reducing parameter amount merely, the volume for sharing weight matrix not The convolutional layer of product core rearward, and can not effectively utilize picture position characteristic information.To effective control parameter amount core image Accuracy of identification.The convolutional layer of second convolution unit can be moved along when parameter amount is not excessive, efficiently use image local Property information.

As shown in fig. 6, being a kind of example of convolutional neural networks model, structure is as follows:

Upper drawing shows network principal frames, it can be seen that a residual error module (as shown in Figure 2) be by two layers of convolution again Add an identical mapping composition.The size of characteristic image between same color block is the same, therefore residual error module is defeated The dimension size for entering output is also that equally, can be directly added (solid-line curve in such as Fig 2) Network stretch to different face The convolution that will be 2 by 2 times of down-samplings either step-length when color lump, then at this moment the size of characteristic image can all halve, but It is that the quantity of convolution kernel will increase one times, for the complexity of retention time, then residual error module outputs and inputs size not When the same, it can be mapped to 1 × 1 convolution kernel with the dimension (imaginary curve in such as Fig 2) output.ResNet General structure be also with reference to VGG network.

The structure for the resnet that following table is one 20 layers.Structure is as follows:

Second class convolutional layer includes the first convolution unit Conv2_x, the second convolution unit Conv3_x, third convolution unit Conv4_x contains 3 residual error modules in the second class convolutional layer respectively, and the convolution kernel of each module is 3 × 3 sizes, step A length of 1.The output of Con4_x is mapped to the characteristic image of 64 1 × 1 sizes by global average pond, finally again by containing The full articulamentum output category result of 10 neurons.It is, of course, also possible to answering according to the depth of model and image to be detected Miscellaneous degree determines the first convolution unit, the number of the second convolution unit and third convolution unit in the second class convolutional layer.

A convolution is done to image in Conv1, each of Conv2_x to Conv4_x there are 3 residual error modules, figure As the double reduction { 32,16,8 } of block size, the double increase { 16,32,64 } of convolution kernel number.Most of convolution kernel size in network It is all 3 × 3.Port number between Conv1 to Conv2_x due to data is all 16, and the dimension of data is the same, thus input and it is defeated It can be applied directly to out together.But the port number of data is not between Con2_x to Conv3_x and Conv3_x to Conv4_x Equally, and the size of the characteristic image of output is all different, cannot be added together, be reflected to input data with 1 × 1 convolution kernel It is mapped to dimension same as output data.The output of the last layer is the characteristic pattern of 64 × 8 × 8 (the 8 × 8 of 64 channels) sizes Picture, last we will be averaged pond by an overall situation, each 8 × 8 characteristic image is exactly mapped to 1 × 1 size, finally Output is 64 × 1 × 1, and the probability of 10 class labels is exported using the full articulamentum that output number is 10.

A kind of possible implementation, the ratio that the convolution kernel in the second convolution unit can be set to 1/8 is weight The convolution kernel that matrix is not shared, the convolution kernel that 7/8 weight matrix that can be set to is shared.In the specific implementation process, weight The method of salary distribution in the second convolution unit of the shared convolution kernel of the convolution kernel and weight matrix that matrix is not shared can be random Distribution, or be arranged in each convolutional layer of each residual error module, it is not limited here.

The application also provides a kind of possible implementation, and each convolutional layer of the third convolution unit presses identity principle It is divided into L group；After carrying out convolution respectively by the L group, by the characteristic image linear superposition of L group convolutional layer output one It rises, obtains the output result of the third convolution unit；L is positive integer.

In a kind of specific implementation process, while the purpose of grouping is to reduce parameter amount, precision is improved.Below Convolutional neural networks model, is illustrated by taking ResNeXt as an example, and the convolutional neural networks model of other groupings is also referred to Resnext implements the method in the application, and details are not described herein.

As shown in fig. 7, ResNeXt can obtain higher precision, substantially 100 layers under conditions of same parameter scale ResNeXt can achieve the effect that 200 layers of ResNet, and calculation amount is also less.ResNeXt is proposed other than width and depth Third neural network dimension, that is, the group number being grouped.

The convolution kernel of Conv1 is 7 × 7,64 convolution kernels, step-length 2, the size of the characteristic image of output is 112 × 112, the first convolution unit can be conv2, first include one 3 × 3 maximum pond, step-length 2；Second convolution unit The size of the characteristic image of output is 56 × 56；Second convolution unit can be conv3 and conv4, and third convolution unit can be with For conv5, the number of residual error module contained by each convolution unit is 3,4,6,3 respectively.By taking conv2 as an example, each residual error mould It include 512 convolution of convolutional layer → 1 × 1 of 128 convolution kernels of convolutional layer → 3 × 3 of 1 × 1 128 convolution kernels in block The convolutional layer of core.The grouping convolution of 3 × 3 convolutional layer in each residual error module is divided into 32 groups.The last one is complete Office pond connects full connection.

As shown in figure 8, left figure is residual error module, right figure is ResNeXt: for 256 channels characteristic image it is defeated Enter, a minor structure of ResNeXt is still a residual error module, the difference is that convolution access has been divided into many groups.

The convolution kernel number that one convolutional layer includes is exactly the size of the channel dimension of convolutional layer output characteristic pattern, also It is the depth of convolution kernel.Left figure is a residual error module of resnet, and the convolution of 3x3 is splitted into 32 groups of convolution in right figure Residual error module.Because the number of every group of 1x1 convolution determines the depth of the convolution kernel of second step 3x3 convolution.In second step When 3x3 convolution algorithm, the depth of each convolution kernel is 4.Do not do the residual error module being grouped, the depth of convolution kernel relative to the left side Degree is 64.That is, original every group of 4 convolution, 32 groups altogether, every group is responsible for the input data that port number is 4.Although now Input data is 128 channels, but our convolution kernel but presses the data of component other places reason different depth, also, last The 1x1 convolution of step is can to carry out directly being added merging, and greatly reduces calculation amount.For example, first group of convolution is 0~3 Channel, second group of processing is 4~7 channels, and finally the result again each group is end to end.

By way of grouping, higher precision is obtained under conditions of same parameter scale, is conducive to increasing weight After the parameter amount for the convolution kernel that matrix is not shared, the parameter scale of model will not be obviously increased, preferably improves recognition of face Precision.In the case where saving network parameter, the information of image local is preferably utilized.

In the specific implementation process, the convolution kernel that the convolution kernel and weight matrix that weight matrix is not shared are shared is in volume Two Distribution in product unit can be randomly assigned, and can also be distributed according to default rule, it is not limited here.

A kind of possible mode, the convolution kernel of the R group convolutional layer in the L group convolutional layer are the volume that weight matrix is not shared Product core；The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the convolution kernel that weight matrix is shared；R is positive integer.

In a kind of specific embodiment, in the Conv3 layer in the second convolution unit, convolution kernel that weight matrix is not shared Corresponding group of number accounts for the 1/8 of total group number, in conjunction with above-described embodiment, can randomly select 4 groups, convolution kernel is that weight matrix is not total The convolution kernel enjoyed；In Conv4, it is 1/4 that corresponding group of number of the convolution kernel that weight matrix is not shared, which accounts for total group number, in conjunction with above-mentioned implementation Example can randomly select 8 groups, and convolution kernel is the convolution kernel that weight matrix is not shared.Due to grouping convolutional layer inherently The parallel-connection structure of multiple common convolutional layers, it is only necessary to which a portion convolutional layer is replaced with to the volume that do not share including weight matrix The convolutional layer of product core.

As shown in figure 9, the embodiment of the present application also provides a kind of pattern recognition device, which includes:

Module 901 is obtained, for obtaining image to be detected；

Processing module 902, for the image to be detected to be input to the convolutional neural networks mould that training is completed in advance In type, the characteristic image of described image is determined by the convolutional neural networks model, and institute is determined according to the characteristic image State the recognition result of image；The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution Unit；First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is Rear M convolutional layer in the convolutional neural networks model；The third convolution unit is in the convolutional neural networks model The R convolutional layer in addition to first convolution unit and second convolution unit；N, M, R are positive integer；First convolution The convolution kernel in N number of convolutional layer in unit is the convolution kernel that weight matrix is shared；M convolution in second convolution unit Convolution kernel in layer is the convolution kernel that weight matrix is shared；Convolution kernel in each convolutional layer of the third convolution unit includes The convolution kernel that the convolution kernel and at least one weight matrix that at least one weight matrix is not shared are shared；For a convolutional layer, The number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, the depth negative with the convolutional layer It closes.

A kind of possible implementation, the convolution kernel of the R group convolutional layer in the L group convolutional layer are that weight matrix is not shared Convolution kernel；The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the convolution kernel that weight matrix is shared；R is positive whole Number.

A kind of possible implementation, processing module 902 are also used to:

The embodiment of the present application also provides a kind of computer readable storage medium, and the computer-readable recording medium storage has Computer executable instructions, the computer executable instructions are for making the computer execute any one of above-described embodiment institute The method stated.

The embodiment of the present application also provides a kind of calculating equipment of image recognition, comprising:

Memory, for storing program instruction；

Processor executes such as above-mentioned reality according to the program of acquisition for calling the program instruction stored in the memory Apply method described in any one of example.

In conclusion image information relevant to location information is not shared by weight matrix in the embodiment of the present application The corresponding convolutional layer of convolution kernel obtains, corresponding with the convolution kernel that the unrelated image information of location information is shared by weight matrix Convolutional layer obtains, what the versatility and weight matrix that can combine the convolutional layer of the shared convolution kernel of weight matrix were not shared The position characteristic of the convolutional layer of convolution kernel preferably utilizes the information of image local in the case where saving network parameter.Separately Outside, compared to the convolutional neural networks model of existing recognition of face, the depth of its model is significantly increased, recognition of face is improved Precision.The accounting for freely adjusting the convolution kernel that the shared convolution kernel of weight matrix and weight matrix are not shared as needed, from And effective control parameter amount.The convolutional layer of second convolution unit can be moved along when parameter amount is not excessive, be efficiently used Image local information.By way of grouping, higher precision is obtained under conditions of same parameter scale, is conducive to increasing After the parameter amount for the convolution kernel that weighting value matrix is not shared, the parameter scale of model will not be obviously increased, face is preferably improved The precision of identification.It can make the convolution kernel for sharing weight matrix by convolution kernel by group distribution using the convolutional layer after grouping Convolutional layer and the convolutional layer of convolution kernel do not shared of weight matrix become a parallel-connection structure, rather than cascaded structure can Easily realize the convolution kernel that the versatility for taking into account the convolutional layer of the shared convolution kernel of weight matrix and weight matrix are not shared Convolutional layer position characteristic, and then can easily realize the optimization of the feature of modeling, and then do not needing to improve parameter In the case where amount, the effective accuracy of identification for improving recognition of face.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.

Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.

Claims

1. a kind of image-recognizing method, which is characterized in that this method comprises:

Obtain image to be detected；

The image to be detected is input in the convolutional neural networks model that training is completed in advance, passes through the convolutional Neural Network model determines the characteristic image of described image, and the recognition result of described image is determined according to the characteristic image；

The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution unit；Described first Convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is the convolutional Neural Rear M convolutional layer in network model；The third convolution unit is that the first volume is removed in the convolutional neural networks model R convolutional layer outside product unit and second convolution unit；N, M, R are positive integer；

The convolution kernel in N number of convolutional layer in first convolution unit is the convolution kernel that weight matrix is shared；The volume Two The convolution kernel in M convolutional layer in product unit is the convolution kernel that weight matrix is shared；Each volume of the third convolution unit Convolution kernel in lamination includes the convolution that the convolution kernel that at least one weight matrix is not shared and at least one weight matrix are shared Core；For a convolutional layer, the number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, and described The depth of convolutional layer is negatively correlated.

2. the method according to claim 1, wherein each convolutional layer of the third convolution unit presses same original Then it is divided into L group；After carrying out convolution respectively by the L group, by the characteristic image linear superposition of L group convolutional layer output one It rises, obtains the output result of the third convolution unit；L is positive integer.

3. according to the method described in claim 2, it is characterized in that, the convolution kernel of the R group convolutional layer in the L group convolutional layer is The convolution kernel that weight matrix is not shared；The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the volume that weight matrix is shared Product core.

4. method according to claim 1-3, which is characterized in that the convolution that at least one described weight is not shared Core includes the position feature of the image to be detected；It is described that image to be detected is input to the convolution mind that training is completed in advance Before network model, further includes:

The position of at least one feature in the image to be detected is positioned；At least one described feature is according to institute State what convolutional neural networks model determined；

The position of at least one feature in described image is adjusted corresponding to feature described in the convolutional neural networks model Position.

5. a kind of pattern recognition device, which is characterized in that the device includes:

Module is obtained, for obtaining image to be detected；

Processing module is led to for the image to be detected to be input in the convolutional neural networks model that training is completed in advance The characteristic image that the convolutional neural networks model determines described image is crossed, and described image is determined according to the characteristic image Recognition result；The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution unit；Institute Stating the first convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is the volume Rear M convolutional layer in product neural network model；The third convolution unit is in the convolutional neural networks model except described R convolutional layer outside first convolution unit and second convolution unit；N, M, R are positive integer；In first convolution unit N number of convolutional layer in convolution kernel be weight matrix share convolution kernel；In M convolutional layer in second convolution unit Convolution kernel is the convolution kernel that weight matrix is shared；Convolution kernel in each convolutional layer of the third convolution unit includes at least one The convolution kernel that the convolution kernel and at least one weight matrix that a weight matrix is not shared are shared；For a convolutional layer, the power The number of the not shared convolution kernel of value matrix accounts for the ratio of the total number of convolution kernel, negatively correlated with the depth of the convolutional layer.

6. device according to claim 5, which is characterized in that each convolutional layer of the third convolution unit presses same original Then it is divided into L group；After carrying out convolution respectively by the L group, by the characteristic image linear superposition of L group convolutional layer output one It rises, obtains the output result of the third convolution unit；L is positive integer.

7. device according to claim 6, which is characterized in that the convolution kernel of the R group convolutional layer in the L group convolutional layer is The convolution kernel that weight matrix is not shared；The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the volume that weight matrix is shared Product core.

8. according to the described in any item devices of claim 5-7, which is characterized in that the processing module is also used to:

The position of at least one feature in the image to be detected is positioned；At least one described feature is according to institute State what convolutional neural networks model determined；The position of at least one feature in described image is adjusted to the convolutional neural networks The corresponding position of feature described in model.

9. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer can It executes instruction, the computer executable instructions are according to any one of claims 1 to 4 for executing the computer Method.

10. a kind of calculating equipment of image recognition characterized by comprising

Memory, for storing program instruction；

Processor, for calling the program instruction stored in the memory, according to acquisition program execute as claim 1 to Method described in any one of 4.