Summary of the invention
The embodiment of the present application provides a kind of image-recognizing method, device, calculates equipment and storage medium, existing to solve
Have in technology that accuracy of identification is not high in recognition of face, the lower problem of model training efficiency.
The embodiment of the present application provides a kind of image-recognizing method, this method comprises:
Obtain image to be detected;
The image to be detected is input in the convolutional neural networks model that training is completed in advance, passes through the convolution
Neural network model determines the characteristic image of described image, and the recognition result of described image is determined according to the characteristic image;
The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution unit;It is described
First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is the convolution
Rear M convolutional layer in neural network model;The third convolution unit is in the convolutional neural networks model except described the
R convolutional layer outside one convolution unit and second convolution unit;N, M, R are positive integer;
The convolution kernel in N number of convolutional layer in first convolution unit is the convolution kernel that weight matrix is shared;Described
The convolution kernel in M convolutional layer in two convolution units is the convolution kernel that weight matrix is shared;The third convolution unit it is every
Convolution kernel in a convolutional layer includes that the convolution kernel that at least one weight matrix is not shared and at least one weight matrix are shared
Convolution kernel;For a convolutional layer, the number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, with
The depth of the convolutional layer is negatively correlated.
In the embodiment of the present application, image information relevant to location information is corresponding by the convolution kernel that weight matrix is not shared
Convolutional layer obtain, obtained with the corresponding convolutional layer of convolution kernel that the unrelated image information of location information is shared by weight matrix
, the convolution kernel that the versatility and weight matrix that can combine the convolutional layer of the shared convolution kernel of weight matrix are not shared
The position characteristic of convolutional layer preferably utilizes the information of image local in the case where saving network parameter.In addition, comparing
The convolutional neural networks model of existing recognition of face, significantly increases the depth of its model, improves the precision of recognition of face.
Each convolutional layer of a kind of possible implementation, the third convolution unit is divided into L group by identity principle;Pass through
After the L group carries out convolution respectively, together by the characteristic image linear superposition of L group convolutional layer output, described the is obtained
The output result of three convolution units;L is positive integer.
While the purpose of grouping is to reduce parameter amount, precision is improved.By way of grouping, advised in same parameter
Higher precision is obtained under conditions of mould, is conducive to after the parameter amount for increasing the convolution kernel that weight matrix is not shared, Bu Huiming
The aobvious parameter scale for increasing model, preferably improves the precision of recognition of face.It is preferably sharp in the case where saving network parameter
With the information of image local.
A kind of possible implementation, the convolution kernel of the R group convolutional layer in the L group convolutional layer are that weight matrix is not shared
Convolution kernel;The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the convolution kernel that weight matrix is shared.
In the embodiment of the present application, it can be made by convolution kernel by group distribution by weight matrix using the convolutional layer after grouping
The convolutional layer for the convolution kernel that the convolutional layer and weight matrix of shared convolution kernel are not shared becomes a parallel-connection structure, rather than goes here and there
It is coupled structure, image information relevant to location information is obtained by the corresponding convolutional layer of convolution kernel that weight matrix is not shared, with
The unrelated image information of location information is obtained by the corresponding convolutional layer of convolution kernel that weight matrix is shared, can be easily real
The convolutional layer for the convolution kernel that the versatility and weight matrix for now taking into account the convolutional layer of the shared convolution kernel of weight matrix are not shared
Position characteristic, and then can easily realize the optimization of the feature of modeling.
A kind of possible implementation, the convolution kernel that at least one described weight is not shared include the image to be detected
Position feature;It is described to be input to image to be detected before the convolutional neural networks model that training is completed in advance, further includes:
The position of at least one feature in the image to be detected is positioned;At least one described feature is root
It is determined according to the convolutional neural networks model;
The position of at least one feature in described image is adjusted to feature pair described in the convolutional neural networks model
The position answered.
The embodiment of the present application provides a kind of pattern recognition device, which includes:
Module is obtained, for obtaining image to be detected;
Processing module, for the image to be detected to be input to the convolutional neural networks model that training is completed in advance
In, the characteristic image of described image is determined by the convolutional neural networks model, and according to characteristic image determination
The recognition result of image;The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution list
Member;First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is institute
State the rear M convolutional layer in convolutional neural networks model;The third convolution unit is to remove in the convolutional neural networks model
R convolutional layer outside first convolution unit and second convolution unit;N, M, R are positive integer;The first convolution list
The convolution kernel in N number of convolutional layer in member is the convolution kernel that weight matrix is shared;M convolutional layer in second convolution unit
In convolution kernel be weight matrix share convolution kernel;Convolution kernel in each convolutional layer of the third convolution unit includes extremely
The convolution kernel that the convolution kernel and at least one weight matrix that a few weight matrix is not shared are shared;For a convolutional layer, institute
State the not shared convolution kernel of weight matrix number account for convolution kernel total number ratio, it is negatively correlated with the depth of the convolutional layer.
Each convolutional layer of a kind of possible implementation, the third convolution unit is divided into L group by identity principle;Pass through
After the L group carries out convolution respectively, together by the characteristic image linear superposition of L group convolutional layer output, described the is obtained
The output result of three convolution units;L is positive integer.
A kind of possible implementation, the convolution kernel of the R group convolutional layer in the L group convolutional layer are that weight matrix is not shared
Convolution kernel;The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the convolution kernel that weight matrix is shared.
A kind of possible implementation, the processing module are also used to:
The position of at least one feature in the image to be detected is positioned;At least one described feature is root
It is determined according to the convolutional neural networks model;The position of at least one feature in described image is adjusted to the convolutional Neural
The corresponding position of feature described in network model.
The embodiment of the present application provides a kind of computer readable storage medium, and the computer-readable recording medium storage has
Computer executable instructions, the computer executable instructions are for making the computer execute any one of above-described embodiment institute
The method stated.
The embodiment of the present application provides a kind of calculating equipment of image recognition, comprising:
Memory, for storing program instruction;
Processor executes above-mentioned implementation according to the program of acquisition for calling the program instruction stored in the memory
Method described in any one of example.
The embodiment of the present application provides a kind of image-recognizing method, device, calculates equipment and computer readable storage medium,
Image to be detected is identified by convolutional neural networks model, wherein the convolutional neural networks model, including first
Convolution unit, the second convolution unit and third convolution unit;First convolution unit is in the convolutional neural networks model
Top n convolutional layer, second convolution unit be the convolutional neural networks model in rear M convolutional layer;The third
Convolution unit is R volume in the convolutional neural networks model in addition to first convolution unit and second convolution unit
Lamination;The convolution kernel in N number of convolutional layer in first convolution unit is the convolution kernel that weight matrix is shared;The volume Two
The convolution kernel in M convolutional layer in product unit is the convolution kernel that weight matrix is shared;Each volume of the third convolution unit
Convolution kernel in lamination includes the convolution that the convolution kernel that at least one weight matrix is not shared and at least one weight matrix are shared
Core;For a convolutional layer, the number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, and described
The depth of convolutional layer is negatively correlated.By above-mentioned convolutional neural networks model, it can effectively improve and not shared using weight matrix
The convolution depth of neural network model of convolution kernel have under the premise of not increasing the calculation amount of neural convolutional network model
The accuracy of the raising image recognition of effect.
Specific embodiment
In order to keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application make into
It is described in detail to one step, it is clear that the described embodiments are only a part but not all of the embodiments of the present application.
Based on the embodiment in the application, obtained by those of ordinary skill in the art without making creative efforts all
Other embodiments shall fall in the protection scope of this application.
Before specific to recognition of face, first to the Face datection in recognition of face, face normalization, face verification, people
Face identification carries out necessary explaining illustration.
Face datection detects the face in image, and by result with rectangle frame outline come.The method that can be used
Have and learns to obtain with support vector regression algorithm by local binary patterns (Local Binary Patterns, LBP) feature
Datum mark comes out the regional frame of face.
Face normalization carries out the correction of posture to the face detected, and the precision of recognition of face can be improved by correcting.
The method of the method that the method for correction has 2D correction, 3D correction, 3D correction can be such that side face is preferably identified.Correction at present
The requirement of real-time is fully achieved during processing.When carrying out face normalization, the position of detection characteristic point is had
This step is set, these characteristic point positions mainly such as left-hand side nose, nostril downside, pupil position, the positions such as upper lip downside,
After being aware of the position of these characteristic points, the deformation of position driving is done, face can be corrected by.
Face verification is based on matched mode, so the answer that it is obtained is "Yes" or "no".In concrete operations
When, a test picture is given, is then matched in turn, has matched and has then illustrated what test image and this matched
Face is the face of the same person.For example, used in face brush face punch card system (should) it is this method: it records one by one offline
Enter the human face photo (face of employee's typing is more than one general) of employee, employee's camera when brush face is checked card captures
To after image, then the advanced row Face datection said by front carries out face normalization, then carry out face verification, once
It is "Yes" with result, illustrates that the personnel of this brush face belong to this office, face verification is just completed to this step.From
When line keyboarder worker's face, we can be corresponding with name by face, in this way Yi Dan after face verifies successfully, so that it may
Know that whom this people is.Such a system advantage described above is that development cost are cheap, is suitble to minimized office place, is lacked
Point is cannot to block in capture, but also human face posture is required to compare calibration.
What recognition of face to be answered is " Who Am I? ", compared to the matching process that face verification uses, recognition of face is being known
The other stage is more the means using classification.It is to the image (people done after two step of front i.e. Face datection, face normalization
Face) classification.
Recognition of face is segmented into two kinds by tagsort: one is the recognitions of face based on shallow-layer feature, and one is bases
In the face identification method of deep learning.Shallow-layer face identification method extracts the local feature of facial image first, such as SIFT,
Then they are assembled Global Face description by certain pond mechanism by the features such as LBP, HOG.People based on deep learning
Face recognition method is DeepFace than more typical code usually using CNN structure, and this method uses the CNN net of a deep layer
Network structure, the used data set number of training is 4,000,000, altogether includes the face of 4000 people.
It is based primarily upon for the human face recognition model of Deepface and is described in the embodiment of the present application;For other faces
Identification model can be implemented with reference to the method in the embodiment of the present application, and details are not described herein.
As shown in Figure 1, DeepFace has used 3D model that facial image is calibrated to typical posture in the pretreated stage
Under.The basic procedure of recognition of face are as follows: Face datection → face alignment → face representation → face classification.
It is aligned process, a kind of possible implementation for face, the convolutional neural networks model needs are aligned
The model of recognition of face, then it is described to be input to image to be detected before the convolutional neural networks model that training is completed in advance,
Further include:
Step 1: being positioned to the position of at least one feature in the image to be detected;
Step 2: adjusting the position of at least one feature in described image to described in the convolutional neural networks model
The corresponding position of feature.
In step 1, at least one described feature is to be determined according to the convolutional neural networks model;For example, Fig. 1
(a) 6 basic points (2 eye centers, 1 nose point, the point on 3 mouths) can be used in, or pass through local binary patterns
(Local Binary Patterns, LBP) feature is learnt to obtain datum mark with support vector regression algorithm, by the area of face
Domain, which outlines, to be come;It is of course also possible to which the feature according to needed for the convolutional neural networks model that the preparatory training is completed determines benchmark
The number and location of point.
In step 2, it may include steps of in the specific implementation process: Fig. 1 (b) two dimension shearing, by face part
It cuts out and;Fig. 1 (c) selects 67 basic points, then by triangulation trigonometric ratio, adds triangle at profile to keep away
Exempt from discontinuous;Face after trigonometric ratio is converted into 3D shape by Fig. 1 (d);Face after trigonometric ratio is become having depth by Fig. 1 (e)
The 3D triangulation network;Fig. 1 (f) deflects the triangulation network, make face just forward-facing;Fig. 1 (g) finally puts positive face;Fig. 1 (h)
Determine the face of a new angle.Face is aligned using 3D model, for face after alignment, the feature of human face region can
To be fixed in certain pixels, at this point it is possible to convolutional neural networks model can be made to send out come learning characteristic with convolutional neural networks
Wave better effect.
After 3D is aligned, the image of formation is all the image of uniform format, is input in convolutional neural networks model,
To obtain face representation, the structure of the model includes: the first convolutional layer, the second convolutional layer, maximum pond layer;First partial convolution
Layer, the second local convolutional layer, third part convolutional layer, full articulamentum.
Specific parameter can be such that the first convolutional layer: 32 11 × 11 × 3 convolution kernels;Wherein, it is described at least one
Feature can correspond to a convolution kernel;According to preparatory trained convolutional neural networks model, the mapping to be checked can be determined
As feature to be determined;Maximum pond layer: 3 × 3, step-length=2;Second convolutional layer: 16 9 × 9 convolution kernels;Three first layers
Purpose is to extract the feature of low level, such as simple side and texture.The wherein Max-pooling layers of output pair for making convolution
Small drift condition is more robust.But without with too many Max-pooling layer, because of too many Max-pooling layer meeting
So that network losses image information.
First partial convolutional layer: 16 9 × 9 convolution kernels, in local convolutional layer, the weight matrix of convolution kernel is not shared;
Second local convolutional layer: 16 7 × 7 convolution kernels, weight matrix are not shared;Third part convolutional layer: 16 5 × 5 convolution
Core, weight matrix are not shared;Full articulamentum: 4096 dimensions;Softmax layers: 4030 dimensions.Weight matrix in above structure is not shared
Convolution kernel include the image to be detected position feature.Three layers next are all convolution that using weights matrix is not shared
Core, the reason is as follows that: in the face picture of alignment, different regions has different statistical natures, and the local stability of convolution is false
And if be not present, so will lead to the loss of information using identical convolution kernel;It is special not increase extraction for shared convolution kernel
Calculation amount when sign, and calculation amount when will increase trained;Using not shared convolution kernel, trained parameter amount is needed to increase
Add.
Upper one layer of each unit is connected by full articulamentum with all units of this layer, for capturing facial image difference position
The correlation between feature set.Wherein, the 7th layer (4096 dimension) is used to represent face.The output of full articulamentum can be used for
The input of Softmax, Softmax layers for classifying.
For 4096 dimensional vectors of convolutional neural networks model output, the normalization of face representation can be carried out: first each
Dimension is normalized, i.e., will be divided by maximum value of the dimension on entire training set for every one-dimensional in result vector.Often
A vector carries out L2 normalization.
After obtaining face representation, a variety of methods can be used and classify.For example, directly calculating inner product;The card side of weighting away from
From;Use the methods of Siamese network structure.
For the accuracy rate of the image recognition of raising convolutional neural networks model, convolutional layer may include two kinds, and one is power
The shared convolutional layer of value matrix, the characteristic information in this convolutional layer are not dependent on each position of input picture, i.e. image
The image distribution of feature of each position be the same, the weight matrix of convolution kernel can be shared, it is possible to reduce a large amount of
Parameter setting.
And another kind is the convolutional layer that weight matrix is not shared.The convolutional layer that do not shared by weight matrix is known in image
In other field, such as in recognition of face, several key points of face (such as: eyes, nose, the corners of the mouth) are transformed in image
Fixation position, be then input in training pattern progress Feature Extraction (most important face recognition module), Ke Yiyou
Accuracy rate of the raising of effect for this precise image identification of face.Since in this method, the image of each position divides
Cloth and other positions are different.For example, the image distribution of eye portion and the corners of the mouth are different in facial image.Therefore,
The convolutional layer that weight matrix is not shared conflicts with the hypothesis for the convolutional layer that weight matrix is shared, the volume that weight matrix is not shared
Lamination is needed for each position using unused weight come each self-modeling, and therefore, the CNN based on part connection convolutional layer increases
A large amount of parameter is added.
However, the feature for relying on the image information of position can be more and more weaker after convolution from level to level.At this point, the right to use
The effect for the convolutional layer that value matrix is not shared also can be more and more weaker.When the depth down of convolutional neural networks, such as: network becomes
When at 50 layers, part connection convolutional layer is added in last several layers of, feature is substantially that position is unrelated at this time, can not utilize image
Locality information, the parameter amount for resulting in overall model is excessive, and training effectiveness is lower, and trained effect is poor.
Therefore, the convolution depth of the method for current recognition of face is all shallower, and accuracy of identification is difficult to further increase.
And to the convolutional neural networks that weight is shared, it mainly include LeNet, AlexNet, VGG, GoogLeNet, ResNet
Equal models, LeNet are mainly used for 10 handwritten numerals of identification, identify that the precision of picture is not high, the number of plies is shallower.Because of CNN
Can extract shallow-layer, middle layer, deep layer feature, the number of plies of network is more, it is meant that the feature that can extract different layers is richer
It is rich.Also, the feature that deeper network extracts is more abstract, more has semantic information.The convolutional Neural net of VGG and GoogLeNet
The depth of network model is deeper.GoogLeNet increases starting (Inception) structure, by 1x1, the convolutional layer of 3x3,5x5 and
The pond of 3x3, is stacked, and on the one hand increases the width of network, on the other hand increases network to the adaptability of scale;
1 × 1 convolution is mainly used to dimensionality reduction in initial structure, can all be expanded with the width and depth of whole network structure after starting mechanism
Greatly, 2-3 times of performance boost can be brought.
In the training process, it also needs to consider that the network number of plies increases bring degenerate problem.By residual error network, network is set
It is calculated as H (x)=F (x)+x, as shown in Figure 2.As long as can be converted to study one residual error function F (x)=H (x)-x. F (x)=
0, identical mapping H (x)=x is just constituted, F is network mapping before summing, and H is the network mapping after being input to summation.
The thought of residual error is all to remove identical main part, so that small variation is protruded, so that regression criterion is more easier.?
In ResNet, its depth is made to can achieve 152 layers by residual error network.Solve level it is deep when can not train
Problem.
In conclusion can realize that deeper convolution is deep by various methods in the convolutional neural networks that weight is shared
Degree to obtain more information content, and then improves the accuracy of identification of image, handles more information content.
Although the depth in above-mentioned convolutional neural networks model is deeper, in human face recognition model, being can not be straight
It connects and carries out recognition of face using above-mentioned model, be mainly due to, the volume that the weight matrix based on local feature is not shared
The parameter amount of product core is excessive, is the convolution kernel that weight matrix is shared directly directly to be replaced with the volume that weight matrix is not shared
Product core carries out recognition of face;In addition, the feature for relying on the image information of position can be more and more weaker after convolution from level to level, make
The effect that the convolution kernel that do not shared with weight matrix carries out convolution also can be more and more weaker.It is reached in the depth of convolutional neural networks model
When to tens layers, the convolutional layer for the convolution kernel that weight matrix is not shared is added in it is last several layers of, at this time feature be substantially position without
It closes, the locality information of image can not be utilized, the parameter amount for resulting in overall model is excessive, and training effectiveness is lower, training
Effect is poor.
Therefore, as shown in figure 3, the embodiment of the present application provides a kind of image-recognizing method, this method comprises:
Step 301: obtaining image to be detected;
Step 302: the image to be detected being input in the convolutional neural networks model that training is completed in advance, passed through
The convolutional neural networks model determines the characteristic image of described image, and the knowledge of described image is determined according to the characteristic image
Other result;
Wherein, the convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution list
Member;First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is institute
State the rear M convolutional layer in convolutional neural networks model;The third convolution unit is to remove in the convolutional neural networks model
R convolutional layer outside first convolution unit and second convolution unit;N, M, R are positive integer;
The convolution kernel in N number of convolutional layer in first convolution unit is the convolution kernel that weight matrix is shared;Described
The convolution kernel in M convolutional layer in two convolution units is the convolution kernel that weight matrix is shared;The third convolution unit it is every
Convolution kernel in a convolutional layer includes that the convolution kernel that at least one weight matrix is not shared and at least one weight matrix are shared
Convolution kernel;For a convolutional layer, the number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, with
The depth of the convolutional layer is negatively correlated.
Wherein, the convolutional layer of the convolutional layer or the second convolution unit of first convolution unit schematic diagram as shown in figure 4,
Including, the size of the characteristic image of input is W1 × H1 × N1, and W1 is characterized the width of image, and H1 is characterized the height of image,
N1 is characterized the port number of image.Since the number of upper one layer of convolution determines the port number of output, and upper one layer of output is logical
Road number determines the depth of the next layer of each convolution kernel of convolutional layer.Therefore, the size of the convolution kernel of the convolutional layer is N1 × K × K,
Including N2 convolution kernel, the output of the convolutional layer is W2 × H2 × N2, and wherein W1 and W2 can be identical, and H1 and H2 can be different,
Different situations is that and experienced the down-sampled layer in pond before convolutional layer output.
The convolutional layer of second convolution unit can be as shown in Figure 5, comprising: the size of the characteristic image of input be W1 ×
H1 × N1, W1 are characterized the width of image, and H1 is characterized the height of image, and N1 is characterized the port number of image.The convolutional layer
The size of the shared convolution kernel of weight matrix is N1 × K × K, including N21The convolution kernel that a and weight matrix is not shared it is big
Small N1 × K × K, number N22It is a.The output of the convolutional layer is W2 × H2 × (N21+N22), after a characteristic image input, point
Not Jing Guo common convolutional layer and part connection convolutional layer, the characteristic image that the two exports is linked together later.Wherein
W1 and W2 can be identical, and H1 and H2 can be different, and different situations is before convolutional layer output, and the drop that experienced pond is adopted
Sample layer.
The determination principle of the size of first convolution unit and the third convolution unit are as follows: CNN network most starts
It is several layers of not replace, it is preceding several layers of substantially at completion extraction edge, and assembling basic configuration.It is considered that these are all
The weight matrix of image can be general.CNN network it is last it is several layers of be also not replace, in last several layers of characteristic image
Substantially location information is lost, the effect for the convolution kernel that weight matrix is not shared can not embody.Second convolution unit can be set
For the convolutional layer of middle section.Such as: the convolutional layer of intermediate 1/3 extension position.
The ratio between convolution kernel that the shared convolution kernel of weight matrix and weight matrix are not shared can be with are as follows: with the volume
The depth of lamination is negatively correlated.Specifically, front layer (the closer layer of the input of the second convolution unit of distance): weight matrix is not shared
The convolutional layer accounting of convolution kernel want more relatively low because the size of characteristic image is bigger at this time, if weight matrix is not
If the convolutional layer ratio of shared convolution kernel is too high, it will lead to that parameter is excessive, and data volume is excessive.Optional ratio is such as: 1/4,
1/8,1/16 etc..Back layer (the farther layer of input of the second convolution unit of distance): the volume for the convolution kernel that weight matrix is not shared
Lamination accounting can be identical as the convolutional layer for the convolution kernel that the weight matrix of front is not shared or bigger.
In the embodiment of the present application, image information relevant to location information is corresponding by the convolution kernel that weight matrix is not shared
Convolutional layer obtain, obtained with the corresponding convolutional layer of convolution kernel that the unrelated image information of location information is shared by weight matrix
, the convolution kernel that the versatility and weight matrix that can combine the convolutional layer of the shared convolution kernel of weight matrix are not shared
The position characteristic of convolutional layer preferably utilizes the information of image local in the case where saving network parameter.In addition, comparing
The convolutional neural networks model of existing recognition of face, significantly increases the depth of its model, improves the precision of recognition of face.
The accounting for freely adjusting the convolution kernel that the shared convolution kernel of weight matrix and weight matrix are not shared as needed, keeps away
Exempt to improve the utilization rate of local location characteristic information merely, the convolutional layer for the convolution kernel for sharing weight matrix not leans on very much
Before, then parameter amount can explode, and cause to train over-fitting;And it avoids reducing parameter amount merely, the volume for sharing weight matrix not
The convolutional layer of product core rearward, and can not effectively utilize picture position characteristic information.To effective control parameter amount core image
Accuracy of identification.The convolutional layer of second convolution unit can be moved along when parameter amount is not excessive, efficiently use image local
Property information.
As shown in fig. 6, being a kind of example of convolutional neural networks model, structure is as follows:
Upper drawing shows network principal frames, it can be seen that a residual error module (as shown in Figure 2) be by two layers of convolution again
Add an identical mapping composition.The size of characteristic image between same color block is the same, therefore residual error module is defeated
The dimension size for entering output is also that equally, can be directly added (solid-line curve in such as Fig 2) Network stretch to different face
The convolution that will be 2 by 2 times of down-samplings either step-length when color lump, then at this moment the size of characteristic image can all halve, but
It is that the quantity of convolution kernel will increase one times, for the complexity of retention time, then residual error module outputs and inputs size not
When the same, it can be mapped to 1 × 1 convolution kernel with the dimension (imaginary curve in such as Fig 2) output.ResNet
General structure be also with reference to VGG network.
The structure for the resnet that following table is one 20 layers.Structure is as follows:
Second class convolutional layer includes the first convolution unit Conv2_x, the second convolution unit Conv3_x, third convolution unit
Conv4_x contains 3 residual error modules in the second class convolutional layer respectively, and the convolution kernel of each module is 3 × 3 sizes, step
A length of 1.The output of Con4_x is mapped to the characteristic image of 64 1 × 1 sizes by global average pond, finally again by containing
The full articulamentum output category result of 10 neurons.It is, of course, also possible to answering according to the depth of model and image to be detected
Miscellaneous degree determines the first convolution unit, the number of the second convolution unit and third convolution unit in the second class convolutional layer.
A convolution is done to image in Conv1, each of Conv2_x to Conv4_x there are 3 residual error modules, figure
As the double reduction { 32,16,8 } of block size, the double increase { 16,32,64 } of convolution kernel number.Most of convolution kernel size in network
It is all 3 × 3.Port number between Conv1 to Conv2_x due to data is all 16, and the dimension of data is the same, thus input and it is defeated
It can be applied directly to out together.But the port number of data is not between Con2_x to Conv3_x and Conv3_x to Conv4_x
Equally, and the size of the characteristic image of output is all different, cannot be added together, be reflected to input data with 1 × 1 convolution kernel
It is mapped to dimension same as output data.The output of the last layer is the characteristic pattern of 64 × 8 × 8 (the 8 × 8 of 64 channels) sizes
Picture, last we will be averaged pond by an overall situation, each 8 × 8 characteristic image is exactly mapped to 1 × 1 size, finally
Output is 64 × 1 × 1, and the probability of 10 class labels is exported using the full articulamentum that output number is 10.
A kind of possible implementation, the ratio that the convolution kernel in the second convolution unit can be set to 1/8 is weight
The convolution kernel that matrix is not shared, the convolution kernel that 7/8 weight matrix that can be set to is shared.In the specific implementation process, weight
The method of salary distribution in the second convolution unit of the shared convolution kernel of the convolution kernel and weight matrix that matrix is not shared can be random
Distribution, or be arranged in each convolutional layer of each residual error module, it is not limited here.
The application also provides a kind of possible implementation, and each convolutional layer of the third convolution unit presses identity principle
It is divided into L group;After carrying out convolution respectively by the L group, by the characteristic image linear superposition of L group convolutional layer output one
It rises, obtains the output result of the third convolution unit;L is positive integer.
In a kind of specific implementation process, while the purpose of grouping is to reduce parameter amount, precision is improved.Below
Convolutional neural networks model, is illustrated by taking ResNeXt as an example, and the convolutional neural networks model of other groupings is also referred to
Resnext implements the method in the application, and details are not described herein.
As shown in fig. 7, ResNeXt can obtain higher precision, substantially 100 layers under conditions of same parameter scale
ResNeXt can achieve the effect that 200 layers of ResNet, and calculation amount is also less.ResNeXt is proposed other than width and depth
Third neural network dimension, that is, the group number being grouped.
The convolution kernel of Conv1 is 7 × 7,64 convolution kernels, step-length 2, the size of the characteristic image of output is 112 ×
112, the first convolution unit can be conv2, first include one 3 × 3 maximum pond, step-length 2;Second convolution unit
The size of the characteristic image of output is 56 × 56;Second convolution unit can be conv3 and conv4, and third convolution unit can be with
For conv5, the number of residual error module contained by each convolution unit is 3,4,6,3 respectively.By taking conv2 as an example, each residual error mould
It include 512 convolution of convolutional layer → 1 × 1 of 128 convolution kernels of convolutional layer → 3 × 3 of 1 × 1 128 convolution kernels in block
The convolutional layer of core.The grouping convolution of 3 × 3 convolutional layer in each residual error module is divided into 32 groups.The last one is complete
Office pond connects full connection.
As shown in figure 8, left figure is residual error module, right figure is ResNeXt: for 256 channels characteristic image it is defeated
Enter, a minor structure of ResNeXt is still a residual error module, the difference is that convolution access has been divided into many groups.
The convolution kernel number that one convolutional layer includes is exactly the size of the channel dimension of convolutional layer output characteristic pattern, also
It is the depth of convolution kernel.Left figure is a residual error module of resnet, and the convolution of 3x3 is splitted into 32 groups of convolution in right figure
Residual error module.Because the number of every group of 1x1 convolution determines the depth of the convolution kernel of second step 3x3 convolution.In second step
When 3x3 convolution algorithm, the depth of each convolution kernel is 4.Do not do the residual error module being grouped, the depth of convolution kernel relative to the left side
Degree is 64.That is, original every group of 4 convolution, 32 groups altogether, every group is responsible for the input data that port number is 4.Although now
Input data is 128 channels, but our convolution kernel but presses the data of component other places reason different depth, also, last
The 1x1 convolution of step is can to carry out directly being added merging, and greatly reduces calculation amount.For example, first group of convolution is 0~3
Channel, second group of processing is 4~7 channels, and finally the result again each group is end to end.
By way of grouping, higher precision is obtained under conditions of same parameter scale, is conducive to increasing weight
After the parameter amount for the convolution kernel that matrix is not shared, the parameter scale of model will not be obviously increased, preferably improves recognition of face
Precision.In the case where saving network parameter, the information of image local is preferably utilized.
In the specific implementation process, the convolution kernel that the convolution kernel and weight matrix that weight matrix is not shared are shared is in volume Two
Distribution in product unit can be randomly assigned, and can also be distributed according to default rule, it is not limited here.
A kind of possible mode, the convolution kernel of the R group convolutional layer in the L group convolutional layer are the volume that weight matrix is not shared
Product core;The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the convolution kernel that weight matrix is shared;R is positive integer.
In a kind of specific embodiment, in the Conv3 layer in the second convolution unit, convolution kernel that weight matrix is not shared
Corresponding group of number accounts for the 1/8 of total group number, in conjunction with above-described embodiment, can randomly select 4 groups, convolution kernel is that weight matrix is not total
The convolution kernel enjoyed;In Conv4, it is 1/4 that corresponding group of number of the convolution kernel that weight matrix is not shared, which accounts for total group number, in conjunction with above-mentioned implementation
Example can randomly select 8 groups, and convolution kernel is the convolution kernel that weight matrix is not shared.Due to grouping convolutional layer inherently
The parallel-connection structure of multiple common convolutional layers, it is only necessary to which a portion convolutional layer is replaced with to the volume that do not share including weight matrix
The convolutional layer of product core.
In the embodiment of the present application, it can be made by convolution kernel by group distribution by weight matrix using the convolutional layer after grouping
The convolutional layer for the convolution kernel that the convolutional layer and weight matrix of shared convolution kernel are not shared becomes a parallel-connection structure, rather than goes here and there
It is coupled structure, image information relevant to location information is obtained by the corresponding convolutional layer of convolution kernel that weight matrix is not shared, with
The unrelated image information of location information is obtained by the corresponding convolutional layer of convolution kernel that weight matrix is shared, can be easily real
The convolutional layer for the convolution kernel that the versatility and weight matrix for now taking into account the convolutional layer of the shared convolution kernel of weight matrix are not shared
Position characteristic, and then can easily realize the optimization of the feature of modeling.
As shown in figure 9, the embodiment of the present application also provides a kind of pattern recognition device, which includes:
Module 901 is obtained, for obtaining image to be detected;
Processing module 902, for the image to be detected to be input to the convolutional neural networks mould that training is completed in advance
In type, the characteristic image of described image is determined by the convolutional neural networks model, and institute is determined according to the characteristic image
State the recognition result of image;The convolutional neural networks model, including the first convolution unit, the second convolution unit and third convolution
Unit;First convolution unit is the top n convolutional layer in the convolutional neural networks model, and second convolution unit is
Rear M convolutional layer in the convolutional neural networks model;The third convolution unit is in the convolutional neural networks model
The R convolutional layer in addition to first convolution unit and second convolution unit;N, M, R are positive integer;First convolution
The convolution kernel in N number of convolutional layer in unit is the convolution kernel that weight matrix is shared;M convolution in second convolution unit
Convolution kernel in layer is the convolution kernel that weight matrix is shared;Convolution kernel in each convolutional layer of the third convolution unit includes
The convolution kernel that the convolution kernel and at least one weight matrix that at least one weight matrix is not shared are shared;For a convolutional layer,
The number of the not shared convolution kernel of the weight matrix accounts for the ratio of the total number of convolution kernel, the depth negative with the convolutional layer
It closes.
Each convolutional layer of a kind of possible implementation, the third convolution unit is divided into L group by identity principle;Pass through
After the L group carries out convolution respectively, together by the characteristic image linear superposition of L group convolutional layer output, described the is obtained
The output result of three convolution units;L is positive integer.
A kind of possible implementation, the convolution kernel of the R group convolutional layer in the L group convolutional layer are that weight matrix is not shared
Convolution kernel;The convolution kernel of L-R group convolutional layer in the L group convolutional layer is the convolution kernel that weight matrix is shared;R is positive whole
Number.
A kind of possible implementation, processing module 902 are also used to:
The position of at least one feature in the image to be detected is positioned;At least one described feature is root
It is determined according to the convolutional neural networks model;The position of at least one feature in described image is adjusted to the convolutional Neural
The corresponding position of feature described in network model.
The embodiment of the present application also provides a kind of computer readable storage medium, and the computer-readable recording medium storage has
Computer executable instructions, the computer executable instructions are for making the computer execute any one of above-described embodiment institute
The method stated.
The embodiment of the present application also provides a kind of calculating equipment of image recognition, comprising:
Memory, for storing program instruction;
Processor executes such as above-mentioned reality according to the program of acquisition for calling the program instruction stored in the memory
Apply method described in any one of example.
In conclusion image information relevant to location information is not shared by weight matrix in the embodiment of the present application
The corresponding convolutional layer of convolution kernel obtains, corresponding with the convolution kernel that the unrelated image information of location information is shared by weight matrix
Convolutional layer obtains, what the versatility and weight matrix that can combine the convolutional layer of the shared convolution kernel of weight matrix were not shared
The position characteristic of the convolutional layer of convolution kernel preferably utilizes the information of image local in the case where saving network parameter.Separately
Outside, compared to the convolutional neural networks model of existing recognition of face, the depth of its model is significantly increased, recognition of face is improved
Precision.The accounting for freely adjusting the convolution kernel that the shared convolution kernel of weight matrix and weight matrix are not shared as needed, from
And effective control parameter amount.The convolutional layer of second convolution unit can be moved along when parameter amount is not excessive, be efficiently used
Image local information.By way of grouping, higher precision is obtained under conditions of same parameter scale, is conducive to increasing
After the parameter amount for the convolution kernel that weighting value matrix is not shared, the parameter scale of model will not be obviously increased, face is preferably improved
The precision of identification.It can make the convolution kernel for sharing weight matrix by convolution kernel by group distribution using the convolutional layer after grouping
Convolutional layer and the convolutional layer of convolution kernel do not shared of weight matrix become a parallel-connection structure, rather than cascaded structure can
Easily realize the convolution kernel that the versatility for taking into account the convolutional layer of the shared convolution kernel of weight matrix and weight matrix are not shared
Convolutional layer position characteristic, and then can easily realize the optimization of the feature of modeling, and then do not needing to improve parameter
In the case where amount, the effective accuracy of identification for improving recognition of face.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application
Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies
Within, then the application is also intended to include these modifications and variations.