CN110032925A

CN110032925A - A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm

Info

Publication number: CN110032925A
Application number: CN201910130815.4A
Authority: CN
Inventors: 莫伟珑; 罗晓曙; 赵书林
Original assignee: Guangxi Normal University
Current assignee: Wu Bin
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2019-07-19
Anticipated expiration: 2039-02-22
Also published as: CN110032925B

Abstract

The invention discloses a kind of based on the images of gestures segmentation and recognition methods that improve capsule network and algorithm, belong to computer vision and field of artificial intelligence, remove background with the U-shaped residual error capsule network proposed under complex background, gesture image segmentation is come out, then with the method for image procossing remove noise and by the hand gesture location of its binary image orient come, third, the gesture area come will be oriented to remove the background of original image as exposure mask, only retain images of gestures, images of gestures is finally input to improved matrix capsule network, it is identified using innovatory algorithm.Innovatory algorithm ratio U-Net algorithm greatly reduces parameter amount, improves the segmentation performance of images of gestures, to improve the discrimination of images of gestures.

Description

A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm

Technical field

The present invention relates to computer vision and field of artificial intelligence, more particularly to it is a kind of based on improve capsule network with The images of gestures of algorithm divides and recognition methods.

Background technique

Currently, man-machine interaction becomes important field of research in artificial intelligence field, in order to meet practical application Needs, research the man-machine manual communication method based on machine vision have important application value.Such as hand-held holder, The man-machine manual communication application in the fields such as unmanned machine head, AR (Augmented Reality), VR (Virtual Reality) And the translation of the gesture sign language to deaf-mute, will all the intelligent level of Related product be greatly improved, while facilitating people's Daily life.General Gesture Recognition has exchange method or combination image procossing mode based on data glove to utilize the colour of skin Model is partitioned into gesture and identifies etc. in conjunction with convolutional neural networks (CNN).Because most of technologies need to be preset in ideal Background environment can be just achieved, and also not consider in gesture change procedure CNN to the spatial relationship recognition capability between object not Cause discrimination not high enough by force.

Capsule network is split and identifies that the hand ratio CNN of different perspectives is more advantageous.In terms of Hand Gesture Segmentation, use The dynamic routing algorithm of existing capsule network is difficult to carry out the feature extraction of images of gestures relatively deep, to cause not training Or training effect is undesirable.In terms of gesture identification, existing matrix capsule network convergence is slower, is made using single scale channel It is not high at discrimination.CNN algorithm is directly used to be split very big with identification parameter amount, this greatly increases hardware spending again.

Summary of the invention

The purpose of the present invention is to provide a kind of based on the images of gestures segmentation for improving capsule network and algorithm and identification side Method, it is slower to solve existing matrix capsule network convergence, causes discrimination not high using single scale channel, directly uses CNN Algorithm is split, and greatly increase hardware spending the technical issues of very big with identification parameter amount.The invention proposes one kind Based on the images of gestures segmentation and recognition methods for improving capsule network and algorithm.Capsule network has very the gesture of different perspectives Good recognition capability, and less than the parameter that CNN is used, can effectively realize the gesture under complex scene is split, The identification of positioning and images of gestures classification.

A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm, the method includes walking as follows It is rapid:

Step 1: the images of gestures under shooting and collection complex background manually marks the gesture profile of all images And label figure is generated, then original image and label figure are subjected to image enhancement processing；

Step 2: U-shaped residual error capsule network being trained with the image after image enhancement, under complex background U-shaped residual error capsule network after images of gestures input trains is partitioned into binaryzation images of gestures；

Step 3: the binaryzation images of gestures split in step 2 is obtained into rectangle by framing and surrounds frame, Original image corresponding to frame is surrounded to be multiplied with segmentation figure the images of gestures for finally obtaining and splitting；

Step 4: improving matrix capsule network using the images of gestures training of different gesture shapes, export trained improvement Matrix capsule network model, the images of gestures input that step 3 is partitioned into improve matrix capsule network model, use improvement matrix Capsule network category of model goes out every kind of different gesture, realizes the identification of images of gestures.

There are parameter amount is excessive when being used for image segmentation for classic algorithm U-Net, the bad problem of segmentation effect, this hair It is bright propose a kind of U-shaped residual error capsule network division model, the models coupling depth residual error technology, capsule network be made into one it is residual Poor capsule structure module can extract the richer profound feature of images of gestures, accelerate the convergence rate of model, and replace U- Common convolutional layer inside Net algorithm, finally obtains U-shaped residual error capsule network division model, and innovatory algorithm ratio U-Net algorithm is big Parameter amount is reduced greatly, improves images of gestures segmentation effect.Existing difficulty when for raw capsules network for image segmentation Normally the deep layer network training the problem of, to propose a kind of improvement compression function Squash algorithm, improved compression function energy The order of magnitude of enough adaptive adjustment activation values, enables U-shaped residual error capsule network division model that can normally instruct in deep layer network Practice, so as to effectively correct output gesture segmentation image.

Remove background with the U-shaped residual error capsule network proposed under complex background, gesture image segmentation is come out, is then used The hand gesture location of its binary image simultaneously is oriented, third by the method removal noise of image procossing, will orient the hand come The background of original image is removed as exposure mask in gesture region, only retains images of gestures, and images of gestures is finally input to improved square Battle array capsule network, is identified using innovatory algorithm.Innovatory algorithm ratio U-Net algorithm greatly reduces parameter amount, improves hand The segmentation performance of gesture image, to improve the discrimination of images of gestures.

Further, U-shaped residual error capsule network is made of capsule convolutional layer and capsule residual block in the step 2, U-shaped residual The left part of poor capsule network extracts further feature, U-shaped residual error to image using capsule convolutional layer and capsule residual block Use two capsule residual blocks as middle layer below capsule network, the right part of U-shaped residual error capsule network uses capsule Warp lamination carries out up-sampling enlarged drawing, and the merging features that the left side of U-shaped residual error capsule network is extracted to right side again into Row extracts feature, and final output end reverts back the Hand Gesture Segmentation figure of original image size；

The principle formula of the capsule convolutional layer are as follows:

u_i|j=w_iju_i (1)

Capsule u is inputted in formula_iInto multiplied by attitude regulation vector w_ijObtain u_i|j；

Dynamic routing formula are as follows:

b_ij=b_ij+u_i|j·v_j (5)

Wherein, c_ijFor the dynamic routing coefficient of coup (i.e. probability vector), b_ijIt is initialized as 0, s_jBe all predicted vectors and The sum of weighting of probability vector；

Bring formula (1) into dynamic routing formula (2-5) circuit training 3 times；

The capsule residual block is made of two pieces of capsule convolutional layers, input is first carried out batch standardization, then be input to two Capsule convolutional layer carries out batch standardization again after the output of second layer capsule convolutional layer, two-way output is added exports result again.

Further, the detailed process of framing is first to carry out the fuzzy denoising of image in the step 3, with 9*9 kernel Low-pass filter smoothed image, each pixel replaces with the mean value of the pixel surrounding pixel, removes the noise of segmentation figure, then The a little bigger white patch of figure corrosion despeckle processing removal, calculates the contour area for corroding remaining target, is acquired according to contour area Maximum frame cuts out original image and binary picture according to maximum frame, finally merges this two figure to obtain colored images of gestures.

Further, matrix capsule network commonly rolls up base, main capsule layer, capsule convolutional layer and capsule in the step 4 Layer of classifying forms, and matrix capsule network is then the attitude matrix for each neuron vector being made into a n*n size, matrix capsule Last two layers of network of convolution capsule layer uses EM for realizing convolution, posture changing and with three steps of EM dynamic routing Algorithm realizes cluster process, and E walks specific formula are as follows:

Wherein, x_iFor the ballot vector of input, a_jIt is represented as the Gaussian Mixture coefficient of jth class,Represent number According to x_iIn the Gaussian Profile of jth class, denominator represents the sum of k Gaussian mixtures, finally acquire posterior probability p (j | x_i)；

The formula of M step are as follows:

It is realized by sample weighting by formula (7-8) averagely come the mean value for estimating jth class, variance is acquired by formula (7-9) Value, and entropy cost is acquired by formula once_j,

Belong to jth class if entropy is smaller, use sigmoid function that value is compressed between 0 to 1 as activation primitive, That is Gaussian Mixture coefficient:

a_j=sigmoid (λ (β_a-cost_j)) (11)

It is Annealing Strategy that λ, which is added, in selection in formula, inverse of the value as temperature value, with the increase of frequency of training, temperature Decline allows λ slowly to increase, so that activation primitive also increases；

Every layer of capsule layer is all assigned the parameter beta in formula (10-11)_aAnd β_μ, which is instructed by backpropagation Practice, and formula (, 6-11) and 3 the number of iterations of selection, realize dynamic routing processing.

Further, in the step 2 when training, the predicted value of training network output is input to Loss function with true In, Loss function are as follows:

Loss=log (Dice_loss)+α * Focal_loss (12)

(12) Loss in formula is composed of Dice loss and Focal loss, realizes the two loss of efficient combination, The two, which is zoomed to the consistent order of magnitude, to train, and amplify Dice loss using-log, while scaling factor α diminution is added The size of Focal loss；

Dice_loss=1-dice_coef=1-2 | A ∩ B |/(| A |+| B |) (13)

(13) A in formula in Dice loss and B is respectively the prognostic chart of label figure and network output, which calculates A With the similarity of B, when infinitely approaching similar, the value of dice_coef is 1；

(14) the Focal loss in formula is absorbed in the sample for being difficult to classify, since the background accounting of training image is very big, And gesture accounting very little, it will lead to negative sample loss and occupy leading, γ value, value 2, β takes 0.25, positive and negative so as to adjust The balance of sample；

By constantly training the weight of iteration more new capsule segmentation network, until Loss function convergence, capsule segmentation is exported Network model carries out the segmentation of images of gestures using capsule parted pattern.

Further, the detailed process for improving matrix capsule network is trained in the step 4 are as follows:

Training is improved matrix capsule network activation vector to be input in margin loss function, loss is shown below:

(15) k in formula refers to that kth is classified, and margin loss adds up the loss of each classification and is averaged again It is worth, the λ in formula is proportionality coefficient, adjusts the weight of the two, the m in formula⁺、m^-Distinguish value 0.9 and 0.1, then L_KTo be 0, that When kth is classified as positive sample, i.e. T_KIt is 1, | | v_k| | length must be exceeded 0.9 and just do not have loss error, work as kth When being classified as negative sample, i.e. T_KIt is 0, | | v_k| | length be necessarily less than 0.1 and want and just do not have loss error, the knot of prediction Fruit and true value input loss function, then carry out right value update.

Present invention employs above-mentioned technical proposal, the present invention is had following technical effect that

Overall performance of the present invention is more preferable than general mainstream algorithm, and more adapts to operate in the holder and nothing of hardware resource anxiety On man-machine product, the parameter of inventive algorithm is few, more saving hardware spending；Multiple dimensioned and identical mapping matrix has been used simultaneously Capsule structure, for improving gesture identification rate；The results showed that by the output pair of the common convolutional layer in multiple dimensioned channel PrimaryCapsules carries out identical mapping, can promote images of gestures discrimination, reduces loss value and accelerate training.Improve square The beneficial effect of battle array capsule network is: can more effectively improve gesture identification rate than original matrix capsule network algorithmic method, know When the images of gestures of other different angle, the gesture identification effect than classical CNN method is more preferable, and can acceleration model training loss value Convergence.

Detailed description of the invention

Fig. 1 is the method for the present invention flow chart.

Fig. 2 is that the U-shaped residual error capsule of the present invention divides network.

Fig. 3 is residual error capsule block structural diagram of the present invention.

Fig. 4 is gesture positioning flow figure of the present invention.

Fig. 5 is matrix capsule network architecture diagram of the present invention.

Fig. 6 is that two layers of convolutional layer of the present invention improves figure.

Fig. 7 is gesture locating effect figure of the present invention.

Fig. 8 is final effect figure of the present invention.

Fig. 9 is discrimination comparison diagram before and after matrix capsule network improvement of the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention more comprehensible, referring to the drawings and preferred reality is enumerated Example is applied, the present invention is described in more detail.However, it is necessary to illustrate, many details listed in specification are only to be Reader is set to have a thorough explanation to one or more aspects of the present invention, it can also be with even without these specific details Realize the aspects of the invention.

The present invention provides a kind of based on the images of gestures segmentation for improving capsule network and algorithm and recognition methods flow chart, master It to be made of algorithm and corresponding software section, software section mainly completes the segmentation, positioning and classification of image, and main includes view The calculating that section frame of frequency, image enhancement, the segmentation of images of gestures, gesture position, the calculating of gesture classification；It is entire to improve capsule net The Hand Gesture Segmentation of network is with recognition methods flow experiment environment: including double E5-2637v4CPU servers, also using simultaneously GTX1080Ti video card, 32GB memory accelerate to train.Operating system is Ubuntu 16.04, and used platform is that Google opens The machine learning frame of hair tensorflow1.5-gpu editions.U-shaped residual error capsule divides network, gesture location algorithm, matrix capsule Identify network, matrix capsule Network Recognition improvement part and whole segmentation with recognizer flowage structure figure respectively such as Fig. 2-6 institute Show.

In Fig. 2, U-shaped depth residual error capsule segmentation network model is mainly by capsule convolutional layer, capsule residual block composition.Figure Middle left part extracts further feature to the image of a 128*128*3 using capsule convolutional layer and capsule residual block, special The size variation of sign layer is followed successively by 128*128,64*64,32*32.Network bottom uses two residual error capsule blocks as centre Layer.Right part is up-sampled (enlarged drawing) using capsule warp lamination, and the merging features of U-shaped left side extraction to the right side Side extracts feature again, and the size variation of characteristic layer is followed successively by 32*32,64*64,128*128, and final output end reverts back original The Hand Gesture Segmentation figure of image size.Total combines residual error technology, enables the network number of plies deeper.Capsule convolutional layer is former It manages as follows:

u_i|j=w_iju_i (1)

Capsule u is inputted in formula (1)_iInto multiplied by attitude regulation vector w_ijObtain u_i|j, bring dynamic routing formula (2-5) circulation 3 into It is secondary, it is trained to not use backpropagation.

b_ij=b_ij+u_i|j·v_j (5)

c_ijFor the dynamic routing coefficient of coup (i.e. probability vector), b_ijIt is initialized as 0, s_jAll predicted vectors and probability to The sum of weighting of amount.

Formula (4) when due to training | | s_j||²Usual minimum (amplitude is between 1e-20 to 1e-42), so as to cause activation Value v_jAlso minimum, when iteration repeatedly after v_jIt can not often be trained for 0, improved boil down to formula (6), former formula (4) denominator left side The 1 of item is changed toAfterwards, enough automatic adjusument v can be obtained_jThe order of magnitude, so as to normally train.

Following table gives improved u-res-cap-net and the performance indicator of other segmentation networks compares.

The comparison of table 1 inventive algorithm u-res-cap-net and other algorithms

As can be seen from the table, index Auc_roc is product below ROC curve, invention achieves 0.958, index Auc_P-R is product below P-R (Precisition-Recall) curve, invention achieves 0.936, illustrates this model opponent The performance of gesture image segmentation is very good；It is the ability of background, Specific in the image that index S pecific reaction is split Value is lower, and the more spots of the image split are treated as images of gestures；Index S ensitivity and recall rate Recall mono- Sample has reacted in the image pixel split, how many pixel is to belong to the ability of images of gestures pixel, Sensitivity value is higher, and the images of gestures split is more complete；The one kind of index Jacard as measurement segmentation precision, The similarity of the images of gestures and label image that split is reacted, the more higher more approximate label image of Jacard value；Index F1 Value has measured the performance of Precisition and Recall simultaneously, and the F1 value of inventive algorithm is more preferable with other algorithm comparisons；This hair Bright parameter amount is fewer than other two kinds of algorithms, is more applicable for the embedded device of hardware resource anxiety.

In Fig. 3, capsule residual block is made of two pieces of capsule convolutional layers, which first carries out batch standardization input, then Two capsule convolutional layers are input to, carry out batch standardization after the output of second layer capsule convolutional layer again, two-way output is added to be exported again As a result.

In Fig. 4, gesture location algorithm includes fuzzy denoising, figure corrodes despeckle, calculating largest contours are surrounded frame, cut Original image, binary picture and the cut-out figure of merging.Denoising is wherein obscured smoothly to scheme using the low-pass filter of 9*9 kernel Picture can remove the noise of segmentation figure so that each pixel replaces with the mean value of the pixel surrounding pixel in this way；It is rotten to carry out image Erosion can remove white patch more a little bigger, so that positioning is more accurate.The contour area for corroding remaining target is calculated, according to Acquire maximum frame (bbox) in these regions；Original image and binary picture are cut out according to bbox, finally merges to obtain by this two figure Colored images of gestures.

In Fig. 5, gesture classification takes matrix capsule network model, and matrix capsule network model is by commonly rolling up base, master Capsule layer (PrimaryCaps), capsule convolutional layer, capsule classification layer composition.Each neuron of convolutional neural networks is scalar Output, capsule network are that each neuron vector is allowed to export, and can retain more characteristics of image in this way, such as direction, posture, thick Carefully, the features such as position, size, and matrix capsule network is then the posture square for each neuron vector being made into a n*n size Battle array, when doing posture changing, matrix operation can save many computing costs than vector mode capsule operation.

Last two layers of matrix capsule network of convolution capsule layer (ConvCaps) successively realize convolution, posture changing and with Three steps of EM (Expectation-Maximization) dynamic routing.Convolution is to extract advanced features and allow tensor Obtain correct dimensional space.Posture changing is some small variations in order to allow CNN to tolerate visual angle, and capsule is enabled to convert multiplied by one Matrix W obtains a ballot matrix, and identification is able to carry out being rotated by some angles so as to cope with image.To institute There is ballot matrix to carry out EM dynamic routing processing, gathers if how many class for how many class.In GMM (Gaussian Mixed Model cluster process) is realized using EM algorithm, wherein E step is shown in formula (7).It is k that the process, which is by vector clusters, Gauss distribution.X in formula_iFor the ballot vector of input, a_jIt is represented as the Gaussian Mixture coefficient of jth class,Represent number According to x_iIn the Gaussian Profile of jth class, denominator represents the sum of k Gaussian mixtures, finally acquire posterior probability p (j | x_i)。

M step is formula (8-12), is realized by sample weighting by formula (8-9) averagely come the mean value for estimating jth class, by public affairs Formula (8-10) acquires variance yields.

Entropy cost is acquired by formula (11)_j, jth class is most possibly belonged to if entropy is smaller, and pass through formula (12) Value is compressed between 0 to 1 as activation primitive, i.e. Gaussian Mixture coefficient by sigmoid function.(12) selection addition λ is in formula Annealing Strategy, inverse of the value as temperature value, with the increase of frequency of training, temperature decline allows λ slowly to increase, so that activation Function also slowly increases.

a_j=sigmoid (λ (β_a-cost_j)) (12)

Every layer of capsule layer is all assigned the parameter beta in formula (11-12)_aAnd β_μ, which is instructed by backpropagation Practice.Formula (7-12) selects 3 the number of iterations, to realize that dynamic routing is handled.

Fig. 6 is to two layers of improved structure chart before matrix capsule network.In order to allow last two layers of convolution capsule layer can Images of gestures advanced features abundant are obtained, preceding two-tier network (common convolutional layer and PrimaryCapsules) have been used more Scale convolution sum identical mapping method improves.It is imperfect only to will lead to many feature extractions with a kind of scale channel, causes The ballot effect of last two layers of capsule convolutional layer is also unobvious, and improving 1 is that original method 5*5 convolution kernel is made into multiple dimensioned volume The pond layer of 2*2 and the convolution kernel of 2*2 is added in product, the first branch, and the second branch joined the convolution kernel of two 3*3, third point Branch keeps original big convolution kernel of 5*5, and the 4th branch uses the convolution kernel of 1*1, finally different channel splicings to obtain Different low-level features.It was restrained slowly since dynamic routing will lead to trained penalty values, it is defeated PrimaryCaps layers to improve 2 Enter and export Fusion Features, enhance information flow, accelerates convergence rate.

Fig. 1 is the Hand Gesture Segmentation of capsule network and the flow chart of recognition methods, and detailed process and processing method are described as follows:

Step 1: the image for taking a large amount of indoor and outdoor different scenes is clapped with camera, 22 kinds of gestures of taking pictures, every kind of gesture is with difference Angle shoot 500 images altogether.

Step 2: being labeled with gesture profile of the software to every figure, ultimately produce pairs of original image and binaryzation mask Figure.

Step 3: being the image of 128 × 128 sizes to the Image Adjusting demarcated, then image is converted into TFRecord number A large amount of image data can be efficiently read when according to structured file to train.

Step 4: trained image is increased using the images such as random brightness adjustment, random overturning, random scaling, random cropping Rival's section, and be input to U-shaped segmentation capsule network and be trained.

Step 5: by the predicted value of network output and be really input in Loss function, Loss function it is as follows:

Loss=log (Dice_loss)+α * Focal_loss (13)

(13) Loss in formula is composed of Dice loss and Focal loss, and more loss combinations can effectively promote instruction Practice effect.For the two loss of efficient combination, needing to zoom to the two the consistent order of magnitude could be trained, therefore use- Log amplifies Dice loss, while the size that scaling factor α reduces Focal loss is added.

Dice_loss=1-dice_coef=1-2 | A | ∩ | B |/(| A |+| B |) (14)

(14) A in formula in Dice loss and B is respectively the prognostic chart of label figure and network output, which calculates A With the similarity of B, when infinitely approaching similar, the value of dice_coef is 1.

(15) the Focal loss in formula is absorbed in the sample for being difficult to classify, since the background accounting of training image is very big, And gesture accounting very little, will lead to negative sample loss occupy it is leading.The general value of γ value is that 2, β takes 0.25, so as to adjust The balance of positive negative sample.

Step 6: by the performance indicator of test set test capsule segmentation network, average F1score is 0.933.Although property Can Indexes Comparison it is good, but because cannot very complete parttion it is correct, there are certain noises, spot for capsule parted pattern Point.In order to ensure the images of gestures split can be used as the input of matrix capsule Hand Gesture Segmentation model, determine so having used Position algorithm.Location algorithm carries out image to the binaryzation images of gestures split and obscures, Image erosion, to guarantee to remove not Spot or noise, in order to avoid it is mistaken for images of gestures.The wheel of remaining object gesture area is found out after fuzzy and corrode Exterior feature calculates contour area and surrounds frame position.Original image and binaryzation gesture figure are cut out according to position, will finally cut out and Two width figures merge and be finally partitioned into the images of gestures of no background.

Step 7: matrix capsule network model, matrix capsule network model is by common convolutional layer, main capsule layer, capsule convolution Layer and capsule classification layer composition.Image Adjusting after image enhancement is input to common convolution at the image of 28*28 size Layer obtains various low-level features, then is transported to main capsule layer and is made into matrix capsule.Capsule convolutional layer successively realizes that convolution, posture become It changes and with three steps of EM (Expectation-Maximization) dynamic routing.Convolution be in order to extract advanced features with And tensor is allowed to be adjusted to correct dimensional space.Some small variations that posture changing refers to allow CNN to tolerate visual angle, enable capsule multiply A ballot matrix is obtained with a transformation matrix W, it can be into being rotated by some angles so as to cope with image Row identification.EM dynamic routing is carried out to all ballot matrixes, is gathered if how many class for how many class.The final of capsule convolutional layer As a result it is input to capsule classification layer, final output is attitude matrix and activation vector.

Step 8: activation vector is input in margin loss function, loss is shown below:

(16) k in formula refers to that kth is classified, and margin loss adds up the loss of each classification and makes even again Mean value.λ in formula is proportionality coefficient, adjusts the weight of the two.M in formula⁺、m^-Distinguish value 0.9 and 0.1, then L_KTo be 0, So (i.e. T when kth is classified as positive sample_KFor 1), | | v_k| | length must be exceeded 0.9 and just do not have loss error, when Kth is classified as (i.e. T when negative sample_KFor 0), | | v_k| | length be necessarily less than 0.1 and want and just do not have loss error.Prediction Result and true value input loss function, then carry out right value update.

Step 9: the images of gestures that step 6 exports being input in trained matrix capsule network model, gesture is carried out Classification, to complete the algorithm flow of the images of gestures classification method of entire capsule network.

Fig. 7 gesture locating effect figure is followed successively by segmentation original image, fuzzy, burn into obtains maximum positioning frame bbox and final Segmentation effect.

It is whole realization effect picture in Fig. 8, palm hand gesture is finally predicted as number 5.

Fig. 9 is that the gesture data of shooting of the matrix capsule network to different perspectives is tested, and three curve comparisons change It is compared into front and back discrimination and compares the gesture identification rate of tradition CNN under the identical number of plies, effect of the present invention is more as we know from the figure Good, inventive algorithm improves 3~4%, and the hand that tradition CNN shoots different perspectives than the gesture identification rate of original algorithm The discrimination of gesture image is not so good as capsule network.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the principle of the present invention, it can also make several improvements and retouch, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of based on the images of gestures segmentation and recognition methods that improve capsule network and algorithm, which is characterized in that the method Include the following steps:

Step 1: the images of gestures under shooting and collection complex background is manually marked and is given birth to the gesture profile of all images Image enhancement processing is carried out at label figure, then by original image and label figure；

Step 2: U-shaped residual error capsule network being trained with the image after image enhancement, the gesture under complex background U-shaped residual error capsule network after image input trains is partitioned into binaryzation images of gestures；

Step 3: the binaryzation images of gestures split in step 2 being obtained into rectangle by framing and surrounds frame, encirclement Original image corresponding to frame is multiplied the images of gestures for finally obtaining and splitting with segmentation figure；

Step 4: improving matrix capsule network using the images of gestures training of different gesture shapes, export trained improvement matrix Capsule network model, the images of gestures input that step 3 is partitioned into improve matrix capsule network model, use improvement matrix capsule Network model sorts out every kind of different gesture, realizes the identification of images of gestures.

2. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: U-shaped residual error capsule network is made of capsule convolutional layer and capsule residual block in the step 2, U-shaped residual error capsule The left part of network extracts further feature, U-shaped residual error capsule net to image using capsule convolutional layer and capsule residual block Use two capsule residual blocks as middle layer below network, the right part of U-shaped residual error capsule network uses capsule deconvolution Layer carries out up-sampling enlarged drawing, and the merging features that the left side of U-shaped residual error capsule network is extracted are extracted again to right side Feature, final output end revert back the Hand Gesture Segmentation figure of original image size；

The principle formula of the capsule convolutional layer are as follows:

u_i|j=w_iju_i (1)

Dynamic routing formula are as follows:

b_ij=b_ij+u_i|j·v_j (5)

Wherein, c_ijFor the dynamic routing coefficient of coup (i.e. probability vector), b_ijIt is initialized as 0, s_jIt is all predicted vectors and probability The sum of weighting of vector；

The capsule residual block is made of two pieces of capsule convolutional layers, input is first carried out batch standardization, then be input to two capsules Convolutional layer carries out batch standardization again after the output of second layer capsule convolutional layer, two-way output is added exports result again.

3. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: the detailed process of framing is first to carry out the fuzzy denoising of image in the step 3, with the low pass of 9*9 kernel Filter smoothing image, each pixel replace with the mean value of the pixel surrounding pixel, remove the noise of segmentation figure, then figure corrosion The a little bigger white patch of despeckle processing removal, calculates the contour area for corroding remaining target, acquires maximum side according to contour area Frame cuts out original image and binary picture according to maximum frame, finally merges this two figure to obtain colored images of gestures.

4. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: matrix capsule network commonly rolls up base, main capsule layer, capsule convolutional layer and capsule classification layer in the step 4 Composition, matrix capsule network is then the attitude matrix for each neuron vector being made into a n*n size, and matrix capsule network is most Two layers of convolution capsule layer is real using EM algorithm for realizing convolution, posture changing and with three steps of EM dynamic routing afterwards Show cluster process, E walks specific formula are as follows:

Wherein, x_iFor the ballot vector of input, a_jIt is represented as the Gaussian Mixture coefficient of jth class,Represent data x_i? The Gaussian Profile of jth class, denominator represent the sum of k Gaussian mixtures, finally acquire posterior probability p (j | x_i)；

The formula of M step are as follows:

It is realized by sample weighting by formula (7-8) averagely come the mean value for estimating jth class, variance yields is acquired by formula (7-9), and Entropy cost is acquired by formula once_j,

Belong to jth class if entropy is smaller, uses sigmoid function that value is compressed between 0 to 1 as activation primitive, i.e., it is high This mixed coefficint:

a_j=sigmoid (λ (β_a-cost_j)) (11)

It is Annealing Strategy that λ, which is added, in selection in formula, inverse of the value as temperature value, with the increase of frequency of training, temperature decline λ is allowed slowly to increase, so that activation primitive also increases；

Every layer of capsule layer is all assigned the parameter beta in formula (10-11)_aAnd β_μ, which is trained by backpropagation, public Formula (, 6-11) and 3 the number of iterations of selection, realize dynamic routing processing.

5. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: will train the predicted value of network output in the step 2 when training and be really input in Loss function, Loss function are as follows:

Loss=log (Dice_loss)+α * Focal_loss (12)

(12) Loss in formula is composed of Dice loss and Focal loss, the two loss of efficient combination is realized, by two Person zooms to the consistent order of magnitude and could train, and amplifies Dice loss using-log, while scaling factor α is added and reduces Focal The size of loss；

Dice_loss=1-dice_coef=1-2 | A ∩ B |/(| A |+| B |) (13)

(13) A in formula in Dice loss and B is respectively the prognostic chart of label figure and network output, which calculates A and B Similarity, when infinitely approaching similar, the value of dice_coef is 1；

(14) the Focal loss in formula is absorbed in the sample for being difficult to classify, since the background accounting of training image is very big, and hand Gesture accounting very little will lead to negative sample loss and occupy leading, and γ value, value 2, β takes 0.25, so as to adjust positive negative sample Balance；

By constantly training the weight of iteration more new capsule segmentation network, until Loss function convergence, exports capsule and divide network Model carries out the segmentation of images of gestures using capsule parted pattern.

6. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: training improves the detailed process of matrix capsule network in the step 4 are as follows:

(15) k in formula refers to that kth is classified, and margin loss adds up the loss of each classification and is averaged again, formula In λ be proportionality coefficient, the weight both adjusted, the m in formula⁺、m^-Distinguish value 0.9 and 0.1, then L_KTo be 0, then when When kth is classified as positive sample, i.e. T_KIt is 1, | | v_k| | length must be exceeded 0.9 and just do not have loss error, when kth is classified When for negative sample, i.e. T_KIt is 0, | | v_k| | length be necessarily less than 0.1 and want and just do not have loss error, the result of prediction and True value inputs loss function, then carries out right value update.