CN110032925A - A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm - Google Patents

A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm Download PDF

Info

Publication number
CN110032925A
CN110032925A CN201910130815.4A CN201910130815A CN110032925A CN 110032925 A CN110032925 A CN 110032925A CN 201910130815 A CN201910130815 A CN 201910130815A CN 110032925 A CN110032925 A CN 110032925A
Authority
CN
China
Prior art keywords
capsule
images
gestures
loss
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910130815.4A
Other languages
Chinese (zh)
Other versions
CN110032925B (en
Inventor
莫伟珑
罗晓曙
赵书林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wu Bin
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN201910130815.4A priority Critical patent/CN110032925B/en
Publication of CN110032925A publication Critical patent/CN110032925A/en
Application granted granted Critical
Publication of CN110032925B publication Critical patent/CN110032925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of based on the images of gestures segmentation and recognition methods that improve capsule network and algorithm, belong to computer vision and field of artificial intelligence, remove background with the U-shaped residual error capsule network proposed under complex background, gesture image segmentation is come out, then with the method for image procossing remove noise and by the hand gesture location of its binary image orient come, third, the gesture area come will be oriented to remove the background of original image as exposure mask, only retain images of gestures, images of gestures is finally input to improved matrix capsule network, it is identified using innovatory algorithm.Innovatory algorithm ratio U-Net algorithm greatly reduces parameter amount, improves the segmentation performance of images of gestures, to improve the discrimination of images of gestures.

Description

A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm
Technical field
The present invention relates to computer vision and field of artificial intelligence, more particularly to it is a kind of based on improve capsule network with The images of gestures of algorithm divides and recognition methods.
Background technique
Currently, man-machine interaction becomes important field of research in artificial intelligence field, in order to meet practical application Needs, research the man-machine manual communication method based on machine vision have important application value.Such as hand-held holder, The man-machine manual communication application in the fields such as unmanned machine head, AR (Augmented Reality), VR (Virtual Reality) And the translation of the gesture sign language to deaf-mute, will all the intelligent level of Related product be greatly improved, while facilitating people's Daily life.General Gesture Recognition has exchange method or combination image procossing mode based on data glove to utilize the colour of skin Model is partitioned into gesture and identifies etc. in conjunction with convolutional neural networks (CNN).Because most of technologies need to be preset in ideal Background environment can be just achieved, and also not consider in gesture change procedure CNN to the spatial relationship recognition capability between object not Cause discrimination not high enough by force.
Capsule network is split and identifies that the hand ratio CNN of different perspectives is more advantageous.In terms of Hand Gesture Segmentation, use The dynamic routing algorithm of existing capsule network is difficult to carry out the feature extraction of images of gestures relatively deep, to cause not training Or training effect is undesirable.In terms of gesture identification, existing matrix capsule network convergence is slower, is made using single scale channel It is not high at discrimination.CNN algorithm is directly used to be split very big with identification parameter amount, this greatly increases hardware spending again.
Summary of the invention
The purpose of the present invention is to provide a kind of based on the images of gestures segmentation for improving capsule network and algorithm and identification side Method, it is slower to solve existing matrix capsule network convergence, causes discrimination not high using single scale channel, directly uses CNN Algorithm is split, and greatly increase hardware spending the technical issues of very big with identification parameter amount.The invention proposes one kind Based on the images of gestures segmentation and recognition methods for improving capsule network and algorithm.Capsule network has very the gesture of different perspectives Good recognition capability, and less than the parameter that CNN is used, can effectively realize the gesture under complex scene is split, The identification of positioning and images of gestures classification.
A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm, the method includes walking as follows It is rapid:
Step 1: the images of gestures under shooting and collection complex background manually marks the gesture profile of all images And label figure is generated, then original image and label figure are subjected to image enhancement processing;
Step 2: U-shaped residual error capsule network being trained with the image after image enhancement, under complex background U-shaped residual error capsule network after images of gestures input trains is partitioned into binaryzation images of gestures;
Step 3: the binaryzation images of gestures split in step 2 is obtained into rectangle by framing and surrounds frame, Original image corresponding to frame is surrounded to be multiplied with segmentation figure the images of gestures for finally obtaining and splitting;
Step 4: improving matrix capsule network using the images of gestures training of different gesture shapes, export trained improvement Matrix capsule network model, the images of gestures input that step 3 is partitioned into improve matrix capsule network model, use improvement matrix Capsule network category of model goes out every kind of different gesture, realizes the identification of images of gestures.
There are parameter amount is excessive when being used for image segmentation for classic algorithm U-Net, the bad problem of segmentation effect, this hair It is bright propose a kind of U-shaped residual error capsule network division model, the models coupling depth residual error technology, capsule network be made into one it is residual Poor capsule structure module can extract the richer profound feature of images of gestures, accelerate the convergence rate of model, and replace U- Common convolutional layer inside Net algorithm, finally obtains U-shaped residual error capsule network division model, and innovatory algorithm ratio U-Net algorithm is big Parameter amount is reduced greatly, improves images of gestures segmentation effect.Existing difficulty when for raw capsules network for image segmentation Normally the deep layer network training the problem of, to propose a kind of improvement compression function Squash algorithm, improved compression function energy The order of magnitude of enough adaptive adjustment activation values, enables U-shaped residual error capsule network division model that can normally instruct in deep layer network Practice, so as to effectively correct output gesture segmentation image.
Remove background with the U-shaped residual error capsule network proposed under complex background, gesture image segmentation is come out, is then used The hand gesture location of its binary image simultaneously is oriented, third by the method removal noise of image procossing, will orient the hand come The background of original image is removed as exposure mask in gesture region, only retains images of gestures, and images of gestures is finally input to improved square Battle array capsule network, is identified using innovatory algorithm.Innovatory algorithm ratio U-Net algorithm greatly reduces parameter amount, improves hand The segmentation performance of gesture image, to improve the discrimination of images of gestures.
Further, U-shaped residual error capsule network is made of capsule convolutional layer and capsule residual block in the step 2, U-shaped residual The left part of poor capsule network extracts further feature, U-shaped residual error to image using capsule convolutional layer and capsule residual block Use two capsule residual blocks as middle layer below capsule network, the right part of U-shaped residual error capsule network uses capsule Warp lamination carries out up-sampling enlarged drawing, and the merging features that the left side of U-shaped residual error capsule network is extracted to right side again into Row extracts feature, and final output end reverts back the Hand Gesture Segmentation figure of original image size;
The principle formula of the capsule convolutional layer are as follows:
ui|j=wijui (1)
Capsule u is inputted in formulaiInto multiplied by attitude regulation vector wijObtain ui|j
Dynamic routing formula are as follows:
bij=bij+ui|j·vj (5)
Wherein, cijFor the dynamic routing coefficient of coup (i.e. probability vector), bijIt is initialized as 0, sjBe all predicted vectors and The sum of weighting of probability vector;
Bring formula (1) into dynamic routing formula (2-5) circuit training 3 times;
The capsule residual block is made of two pieces of capsule convolutional layers, input is first carried out batch standardization, then be input to two Capsule convolutional layer carries out batch standardization again after the output of second layer capsule convolutional layer, two-way output is added exports result again.
Further, the detailed process of framing is first to carry out the fuzzy denoising of image in the step 3, with 9*9 kernel Low-pass filter smoothed image, each pixel replaces with the mean value of the pixel surrounding pixel, removes the noise of segmentation figure, then The a little bigger white patch of figure corrosion despeckle processing removal, calculates the contour area for corroding remaining target, is acquired according to contour area Maximum frame cuts out original image and binary picture according to maximum frame, finally merges this two figure to obtain colored images of gestures.
Further, matrix capsule network commonly rolls up base, main capsule layer, capsule convolutional layer and capsule in the step 4 Layer of classifying forms, and matrix capsule network is then the attitude matrix for each neuron vector being made into a n*n size, matrix capsule Last two layers of network of convolution capsule layer uses EM for realizing convolution, posture changing and with three steps of EM dynamic routing Algorithm realizes cluster process, and E walks specific formula are as follows:
Wherein, xiFor the ballot vector of input, ajIt is represented as the Gaussian Mixture coefficient of jth class,Represent number According to xiIn the Gaussian Profile of jth class, denominator represents the sum of k Gaussian mixtures, finally acquire posterior probability p (j | xi);
The formula of M step are as follows:
It is realized by sample weighting by formula (7-8) averagely come the mean value for estimating jth class, variance is acquired by formula (7-9) Value, and entropy cost is acquired by formula oncej,
Belong to jth class if entropy is smaller, use sigmoid function that value is compressed between 0 to 1 as activation primitive, That is Gaussian Mixture coefficient:
aj=sigmoid (λ (βa-costj)) (11)
It is Annealing Strategy that λ, which is added, in selection in formula, inverse of the value as temperature value, with the increase of frequency of training, temperature Decline allows λ slowly to increase, so that activation primitive also increases;
Every layer of capsule layer is all assigned the parameter beta in formula (10-11)aAnd βμ, which is instructed by backpropagation Practice, and formula (, 6-11) and 3 the number of iterations of selection, realize dynamic routing processing.
Further, in the step 2 when training, the predicted value of training network output is input to Loss function with true In, Loss function are as follows:
Loss=log (Dice_loss)+α * Focal_loss (12)
(12) Loss in formula is composed of Dice loss and Focal loss, realizes the two loss of efficient combination, The two, which is zoomed to the consistent order of magnitude, to train, and amplify Dice loss using-log, while scaling factor α diminution is added The size of Focal loss;
Dice_loss=1-dice_coef=1-2 | A ∩ B |/(| A |+| B |) (13)
(13) A in formula in Dice loss and B is respectively the prognostic chart of label figure and network output, which calculates A With the similarity of B, when infinitely approaching similar, the value of dice_coef is 1;
(14) the Focal loss in formula is absorbed in the sample for being difficult to classify, since the background accounting of training image is very big, And gesture accounting very little, it will lead to negative sample loss and occupy leading, γ value, value 2, β takes 0.25, positive and negative so as to adjust The balance of sample;
By constantly training the weight of iteration more new capsule segmentation network, until Loss function convergence, capsule segmentation is exported Network model carries out the segmentation of images of gestures using capsule parted pattern.
Further, the detailed process for improving matrix capsule network is trained in the step 4 are as follows:
Training is improved matrix capsule network activation vector to be input in margin loss function, loss is shown below:
(15) k in formula refers to that kth is classified, and margin loss adds up the loss of each classification and is averaged again It is worth, the λ in formula is proportionality coefficient, adjusts the weight of the two, the m in formula+、m-Distinguish value 0.9 and 0.1, then LKTo be 0, that When kth is classified as positive sample, i.e. TKIt is 1, | | vk| | length must be exceeded 0.9 and just do not have loss error, work as kth When being classified as negative sample, i.e. TKIt is 0, | | vk| | length be necessarily less than 0.1 and want and just do not have loss error, the knot of prediction Fruit and true value input loss function, then carry out right value update.
Present invention employs above-mentioned technical proposal, the present invention is had following technical effect that
Overall performance of the present invention is more preferable than general mainstream algorithm, and more adapts to operate in the holder and nothing of hardware resource anxiety On man-machine product, the parameter of inventive algorithm is few, more saving hardware spending;Multiple dimensioned and identical mapping matrix has been used simultaneously Capsule structure, for improving gesture identification rate;The results showed that by the output pair of the common convolutional layer in multiple dimensioned channel PrimaryCapsules carries out identical mapping, can promote images of gestures discrimination, reduces loss value and accelerate training.Improve square The beneficial effect of battle array capsule network is: can more effectively improve gesture identification rate than original matrix capsule network algorithmic method, know When the images of gestures of other different angle, the gesture identification effect than classical CNN method is more preferable, and can acceleration model training loss value Convergence.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart.
Fig. 2 is that the U-shaped residual error capsule of the present invention divides network.
Fig. 3 is residual error capsule block structural diagram of the present invention.
Fig. 4 is gesture positioning flow figure of the present invention.
Fig. 5 is matrix capsule network architecture diagram of the present invention.
Fig. 6 is that two layers of convolutional layer of the present invention improves figure.
Fig. 7 is gesture locating effect figure of the present invention.
Fig. 8 is final effect figure of the present invention.
Fig. 9 is discrimination comparison diagram before and after matrix capsule network improvement of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, referring to the drawings and preferred reality is enumerated Example is applied, the present invention is described in more detail.However, it is necessary to illustrate, many details listed in specification are only to be Reader is set to have a thorough explanation to one or more aspects of the present invention, it can also be with even without these specific details Realize the aspects of the invention.
The present invention provides a kind of based on the images of gestures segmentation for improving capsule network and algorithm and recognition methods flow chart, master It to be made of algorithm and corresponding software section, software section mainly completes the segmentation, positioning and classification of image, and main includes view The calculating that section frame of frequency, image enhancement, the segmentation of images of gestures, gesture position, the calculating of gesture classification;It is entire to improve capsule net The Hand Gesture Segmentation of network is with recognition methods flow experiment environment: including double E5-2637v4CPU servers, also using simultaneously GTX1080Ti video card, 32GB memory accelerate to train.Operating system is Ubuntu 16.04, and used platform is that Google opens The machine learning frame of hair tensorflow1.5-gpu editions.U-shaped residual error capsule divides network, gesture location algorithm, matrix capsule Identify network, matrix capsule Network Recognition improvement part and whole segmentation with recognizer flowage structure figure respectively such as Fig. 2-6 institute Show.
In Fig. 2, U-shaped depth residual error capsule segmentation network model is mainly by capsule convolutional layer, capsule residual block composition.Figure Middle left part extracts further feature to the image of a 128*128*3 using capsule convolutional layer and capsule residual block, special The size variation of sign layer is followed successively by 128*128,64*64,32*32.Network bottom uses two residual error capsule blocks as centre Layer.Right part is up-sampled (enlarged drawing) using capsule warp lamination, and the merging features of U-shaped left side extraction to the right side Side extracts feature again, and the size variation of characteristic layer is followed successively by 32*32,64*64,128*128, and final output end reverts back original The Hand Gesture Segmentation figure of image size.Total combines residual error technology, enables the network number of plies deeper.Capsule convolutional layer is former It manages as follows:
ui|j=wijui (1)
Capsule u is inputted in formula (1)iInto multiplied by attitude regulation vector wijObtain ui|j, bring dynamic routing formula (2-5) circulation 3 into It is secondary, it is trained to not use backpropagation.
bij=bij+ui|j·vj (5)
cijFor the dynamic routing coefficient of coup (i.e. probability vector), bijIt is initialized as 0, sjAll predicted vectors and probability to The sum of weighting of amount.
Formula (4) when due to training | | sj||2Usual minimum (amplitude is between 1e-20 to 1e-42), so as to cause activation Value vjAlso minimum, when iteration repeatedly after vjIt can not often be trained for 0, improved boil down to formula (6), former formula (4) denominator left side The 1 of item is changed toAfterwards, enough automatic adjusument v can be obtainedjThe order of magnitude, so as to normally train.
Following table gives improved u-res-cap-net and the performance indicator of other segmentation networks compares.
The comparison of table 1 inventive algorithm u-res-cap-net and other algorithms
As can be seen from the table, index Auc_roc is product below ROC curve, invention achieves 0.958, index Auc_P-R is product below P-R (Precisition-Recall) curve, invention achieves 0.936, illustrates this model opponent The performance of gesture image segmentation is very good;It is the ability of background, Specific in the image that index S pecific reaction is split Value is lower, and the more spots of the image split are treated as images of gestures;Index S ensitivity and recall rate Recall mono- Sample has reacted in the image pixel split, how many pixel is to belong to the ability of images of gestures pixel, Sensitivity value is higher, and the images of gestures split is more complete;The one kind of index Jacard as measurement segmentation precision, The similarity of the images of gestures and label image that split is reacted, the more higher more approximate label image of Jacard value;Index F1 Value has measured the performance of Precisition and Recall simultaneously, and the F1 value of inventive algorithm is more preferable with other algorithm comparisons;This hair Bright parameter amount is fewer than other two kinds of algorithms, is more applicable for the embedded device of hardware resource anxiety.
In Fig. 3, capsule residual block is made of two pieces of capsule convolutional layers, which first carries out batch standardization input, then Two capsule convolutional layers are input to, carry out batch standardization after the output of second layer capsule convolutional layer again, two-way output is added to be exported again As a result.
In Fig. 4, gesture location algorithm includes fuzzy denoising, figure corrodes despeckle, calculating largest contours are surrounded frame, cut Original image, binary picture and the cut-out figure of merging.Denoising is wherein obscured smoothly to scheme using the low-pass filter of 9*9 kernel Picture can remove the noise of segmentation figure so that each pixel replaces with the mean value of the pixel surrounding pixel in this way;It is rotten to carry out image Erosion can remove white patch more a little bigger, so that positioning is more accurate.The contour area for corroding remaining target is calculated, according to Acquire maximum frame (bbox) in these regions;Original image and binary picture are cut out according to bbox, finally merges to obtain by this two figure Colored images of gestures.
In Fig. 5, gesture classification takes matrix capsule network model, and matrix capsule network model is by commonly rolling up base, master Capsule layer (PrimaryCaps), capsule convolutional layer, capsule classification layer composition.Each neuron of convolutional neural networks is scalar Output, capsule network are that each neuron vector is allowed to export, and can retain more characteristics of image in this way, such as direction, posture, thick Carefully, the features such as position, size, and matrix capsule network is then the posture square for each neuron vector being made into a n*n size Battle array, when doing posture changing, matrix operation can save many computing costs than vector mode capsule operation.
Last two layers of matrix capsule network of convolution capsule layer (ConvCaps) successively realize convolution, posture changing and with Three steps of EM (Expectation-Maximization) dynamic routing.Convolution is to extract advanced features and allow tensor Obtain correct dimensional space.Posture changing is some small variations in order to allow CNN to tolerate visual angle, and capsule is enabled to convert multiplied by one Matrix W obtains a ballot matrix, and identification is able to carry out being rotated by some angles so as to cope with image.To institute There is ballot matrix to carry out EM dynamic routing processing, gathers if how many class for how many class.In GMM (Gaussian Mixed Model cluster process) is realized using EM algorithm, wherein E step is shown in formula (7).It is k that the process, which is by vector clusters, Gauss distribution.X in formulaiFor the ballot vector of input, ajIt is represented as the Gaussian Mixture coefficient of jth class,Represent number According to xiIn the Gaussian Profile of jth class, denominator represents the sum of k Gaussian mixtures, finally acquire posterior probability p (j | xi)。
M step is formula (8-12), is realized by sample weighting by formula (8-9) averagely come the mean value for estimating jth class, by public affairs Formula (8-10) acquires variance yields.
Entropy cost is acquired by formula (11)j, jth class is most possibly belonged to if entropy is smaller, and pass through formula (12) Value is compressed between 0 to 1 as activation primitive, i.e. Gaussian Mixture coefficient by sigmoid function.(12) selection addition λ is in formula Annealing Strategy, inverse of the value as temperature value, with the increase of frequency of training, temperature decline allows λ slowly to increase, so that activation Function also slowly increases.
aj=sigmoid (λ (βa-costj)) (12)
Every layer of capsule layer is all assigned the parameter beta in formula (11-12)aAnd βμ, which is instructed by backpropagation Practice.Formula (7-12) selects 3 the number of iterations, to realize that dynamic routing is handled.
Fig. 6 is to two layers of improved structure chart before matrix capsule network.In order to allow last two layers of convolution capsule layer can Images of gestures advanced features abundant are obtained, preceding two-tier network (common convolutional layer and PrimaryCapsules) have been used more Scale convolution sum identical mapping method improves.It is imperfect only to will lead to many feature extractions with a kind of scale channel, causes The ballot effect of last two layers of capsule convolutional layer is also unobvious, and improving 1 is that original method 5*5 convolution kernel is made into multiple dimensioned volume The pond layer of 2*2 and the convolution kernel of 2*2 is added in product, the first branch, and the second branch joined the convolution kernel of two 3*3, third point Branch keeps original big convolution kernel of 5*5, and the 4th branch uses the convolution kernel of 1*1, finally different channel splicings to obtain Different low-level features.It was restrained slowly since dynamic routing will lead to trained penalty values, it is defeated PrimaryCaps layers to improve 2 Enter and export Fusion Features, enhance information flow, accelerates convergence rate.
Fig. 1 is the Hand Gesture Segmentation of capsule network and the flow chart of recognition methods, and detailed process and processing method are described as follows:
Step 1: the image for taking a large amount of indoor and outdoor different scenes is clapped with camera, 22 kinds of gestures of taking pictures, every kind of gesture is with difference Angle shoot 500 images altogether.
Step 2: being labeled with gesture profile of the software to every figure, ultimately produce pairs of original image and binaryzation mask Figure.
Step 3: being the image of 128 × 128 sizes to the Image Adjusting demarcated, then image is converted into TFRecord number A large amount of image data can be efficiently read when according to structured file to train.
Step 4: trained image is increased using the images such as random brightness adjustment, random overturning, random scaling, random cropping Rival's section, and be input to U-shaped segmentation capsule network and be trained.
Step 5: by the predicted value of network output and be really input in Loss function, Loss function it is as follows:
Loss=log (Dice_loss)+α * Focal_loss (13)
(13) Loss in formula is composed of Dice loss and Focal loss, and more loss combinations can effectively promote instruction Practice effect.For the two loss of efficient combination, needing to zoom to the two the consistent order of magnitude could be trained, therefore use- Log amplifies Dice loss, while the size that scaling factor α reduces Focal loss is added.
Dice_loss=1-dice_coef=1-2 | A | ∩ | B |/(| A |+| B |) (14)
(14) A in formula in Dice loss and B is respectively the prognostic chart of label figure and network output, which calculates A With the similarity of B, when infinitely approaching similar, the value of dice_coef is 1.
(15) the Focal loss in formula is absorbed in the sample for being difficult to classify, since the background accounting of training image is very big, And gesture accounting very little, will lead to negative sample loss occupy it is leading.The general value of γ value is that 2, β takes 0.25, so as to adjust The balance of positive negative sample.
By constantly training the weight of iteration more new capsule segmentation network, until Loss function convergence, capsule segmentation is exported Network model carries out the segmentation of images of gestures using capsule parted pattern.
Step 6: by the performance indicator of test set test capsule segmentation network, average F1score is 0.933.Although property Can Indexes Comparison it is good, but because cannot very complete parttion it is correct, there are certain noises, spot for capsule parted pattern Point.In order to ensure the images of gestures split can be used as the input of matrix capsule Hand Gesture Segmentation model, determine so having used Position algorithm.Location algorithm carries out image to the binaryzation images of gestures split and obscures, Image erosion, to guarantee to remove not Spot or noise, in order to avoid it is mistaken for images of gestures.The wheel of remaining object gesture area is found out after fuzzy and corrode Exterior feature calculates contour area and surrounds frame position.Original image and binaryzation gesture figure are cut out according to position, will finally cut out and Two width figures merge and be finally partitioned into the images of gestures of no background.
Step 7: matrix capsule network model, matrix capsule network model is by common convolutional layer, main capsule layer, capsule convolution Layer and capsule classification layer composition.Image Adjusting after image enhancement is input to common convolution at the image of 28*28 size Layer obtains various low-level features, then is transported to main capsule layer and is made into matrix capsule.Capsule convolutional layer successively realizes that convolution, posture become It changes and with three steps of EM (Expectation-Maximization) dynamic routing.Convolution be in order to extract advanced features with And tensor is allowed to be adjusted to correct dimensional space.Some small variations that posture changing refers to allow CNN to tolerate visual angle, enable capsule multiply A ballot matrix is obtained with a transformation matrix W, it can be into being rotated by some angles so as to cope with image Row identification.EM dynamic routing is carried out to all ballot matrixes, is gathered if how many class for how many class.The final of capsule convolutional layer As a result it is input to capsule classification layer, final output is attitude matrix and activation vector.
Step 8: activation vector is input in margin loss function, loss is shown below:
(16) k in formula refers to that kth is classified, and margin loss adds up the loss of each classification and makes even again Mean value.λ in formula is proportionality coefficient, adjusts the weight of the two.M in formula+、m-Distinguish value 0.9 and 0.1, then LKTo be 0, So (i.e. T when kth is classified as positive sampleKFor 1), | | vk| | length must be exceeded 0.9 and just do not have loss error, when Kth is classified as (i.e. T when negative sampleKFor 0), | | vk| | length be necessarily less than 0.1 and want and just do not have loss error.Prediction Result and true value input loss function, then carry out right value update.
Step 9: the images of gestures that step 6 exports being input in trained matrix capsule network model, gesture is carried out Classification, to complete the algorithm flow of the images of gestures classification method of entire capsule network.
Fig. 7 gesture locating effect figure is followed successively by segmentation original image, fuzzy, burn into obtains maximum positioning frame bbox and final Segmentation effect.
It is whole realization effect picture in Fig. 8, palm hand gesture is finally predicted as number 5.
Fig. 9 is that the gesture data of shooting of the matrix capsule network to different perspectives is tested, and three curve comparisons change It is compared into front and back discrimination and compares the gesture identification rate of tradition CNN under the identical number of plies, effect of the present invention is more as we know from the figure Good, inventive algorithm improves 3~4%, and the hand that tradition CNN shoots different perspectives than the gesture identification rate of original algorithm The discrimination of gesture image is not so good as capsule network.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the principle of the present invention, it can also make several improvements and retouch, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (6)

1. a kind of based on the images of gestures segmentation and recognition methods that improve capsule network and algorithm, which is characterized in that the method Include the following steps:
Step 1: the images of gestures under shooting and collection complex background is manually marked and is given birth to the gesture profile of all images Image enhancement processing is carried out at label figure, then by original image and label figure;
Step 2: U-shaped residual error capsule network being trained with the image after image enhancement, the gesture under complex background U-shaped residual error capsule network after image input trains is partitioned into binaryzation images of gestures;
Step 3: the binaryzation images of gestures split in step 2 being obtained into rectangle by framing and surrounds frame, encirclement Original image corresponding to frame is multiplied the images of gestures for finally obtaining and splitting with segmentation figure;
Step 4: improving matrix capsule network using the images of gestures training of different gesture shapes, export trained improvement matrix Capsule network model, the images of gestures input that step 3 is partitioned into improve matrix capsule network model, use improvement matrix capsule Network model sorts out every kind of different gesture, realizes the identification of images of gestures.
2. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: U-shaped residual error capsule network is made of capsule convolutional layer and capsule residual block in the step 2, U-shaped residual error capsule The left part of network extracts further feature, U-shaped residual error capsule net to image using capsule convolutional layer and capsule residual block Use two capsule residual blocks as middle layer below network, the right part of U-shaped residual error capsule network uses capsule deconvolution Layer carries out up-sampling enlarged drawing, and the merging features that the left side of U-shaped residual error capsule network is extracted are extracted again to right side Feature, final output end revert back the Hand Gesture Segmentation figure of original image size;
The principle formula of the capsule convolutional layer are as follows:
ui|j=wijui (1)
Capsule u is inputted in formulaiInto multiplied by attitude regulation vector wijObtain ui|j
Dynamic routing formula are as follows:
bij=bij+ui|j·vj (5)
Wherein, cijFor the dynamic routing coefficient of coup (i.e. probability vector), bijIt is initialized as 0, sjIt is all predicted vectors and probability The sum of weighting of vector;
Bring formula (1) into dynamic routing formula (2-5) circuit training 3 times;
The capsule residual block is made of two pieces of capsule convolutional layers, input is first carried out batch standardization, then be input to two capsules Convolutional layer carries out batch standardization again after the output of second layer capsule convolutional layer, two-way output is added exports result again.
3. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: the detailed process of framing is first to carry out the fuzzy denoising of image in the step 3, with the low pass of 9*9 kernel Filter smoothing image, each pixel replace with the mean value of the pixel surrounding pixel, remove the noise of segmentation figure, then figure corrosion The a little bigger white patch of despeckle processing removal, calculates the contour area for corroding remaining target, acquires maximum side according to contour area Frame cuts out original image and binary picture according to maximum frame, finally merges this two figure to obtain colored images of gestures.
4. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: matrix capsule network commonly rolls up base, main capsule layer, capsule convolutional layer and capsule classification layer in the step 4 Composition, matrix capsule network is then the attitude matrix for each neuron vector being made into a n*n size, and matrix capsule network is most Two layers of convolution capsule layer is real using EM algorithm for realizing convolution, posture changing and with three steps of EM dynamic routing afterwards Show cluster process, E walks specific formula are as follows:
Wherein, xiFor the ballot vector of input, ajIt is represented as the Gaussian Mixture coefficient of jth class,Represent data xi? The Gaussian Profile of jth class, denominator represent the sum of k Gaussian mixtures, finally acquire posterior probability p (j | xi);
The formula of M step are as follows:
It is realized by sample weighting by formula (7-8) averagely come the mean value for estimating jth class, variance yields is acquired by formula (7-9), and Entropy cost is acquired by formula oncej,
Belong to jth class if entropy is smaller, uses sigmoid function that value is compressed between 0 to 1 as activation primitive, i.e., it is high This mixed coefficint:
aj=sigmoid (λ (βa-costj)) (11)
It is Annealing Strategy that λ, which is added, in selection in formula, inverse of the value as temperature value, with the increase of frequency of training, temperature decline λ is allowed slowly to increase, so that activation primitive also increases;
Every layer of capsule layer is all assigned the parameter beta in formula (10-11)aAnd βμ, which is trained by backpropagation, public Formula (, 6-11) and 3 the number of iterations of selection, realize dynamic routing processing.
5. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: will train the predicted value of network output in the step 2 when training and be really input in Loss function, Loss function are as follows:
Loss=log (Dice_loss)+α * Focal_loss (12)
(12) Loss in formula is composed of Dice loss and Focal loss, the two loss of efficient combination is realized, by two Person zooms to the consistent order of magnitude and could train, and amplifies Dice loss using-log, while scaling factor α is added and reduces Focal The size of loss;
Dice_loss=1-dice_coef=1-2 | A ∩ B |/(| A |+| B |) (13)
(13) A in formula in Dice loss and B is respectively the prognostic chart of label figure and network output, which calculates A and B Similarity, when infinitely approaching similar, the value of dice_coef is 1;
(14) the Focal loss in formula is absorbed in the sample for being difficult to classify, since the background accounting of training image is very big, and hand Gesture accounting very little will lead to negative sample loss and occupy leading, and γ value, value 2, β takes 0.25, so as to adjust positive negative sample Balance;
By constantly training the weight of iteration more new capsule segmentation network, until Loss function convergence, exports capsule and divide network Model carries out the segmentation of images of gestures using capsule parted pattern.
6. a kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm according to claim 1, It is characterized by: training improves the detailed process of matrix capsule network in the step 4 are as follows:
Training is improved matrix capsule network activation vector to be input in margin loss function, loss is shown below:
(15) k in formula refers to that kth is classified, and margin loss adds up the loss of each classification and is averaged again, formula In λ be proportionality coefficient, the weight both adjusted, the m in formula+、m-Distinguish value 0.9 and 0.1, then LKTo be 0, then when When kth is classified as positive sample, i.e. TKIt is 1, | | vk| | length must be exceeded 0.9 and just do not have loss error, when kth is classified When for negative sample, i.e. TKIt is 0, | | vk| | length be necessarily less than 0.1 and want and just do not have loss error, the result of prediction and True value inputs loss function, then carries out right value update.
CN201910130815.4A 2019-02-22 2019-02-22 Gesture image segmentation and recognition method based on improved capsule network and algorithm Active CN110032925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910130815.4A CN110032925B (en) 2019-02-22 2019-02-22 Gesture image segmentation and recognition method based on improved capsule network and algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910130815.4A CN110032925B (en) 2019-02-22 2019-02-22 Gesture image segmentation and recognition method based on improved capsule network and algorithm

Publications (2)

Publication Number Publication Date
CN110032925A true CN110032925A (en) 2019-07-19
CN110032925B CN110032925B (en) 2022-05-17

Family

ID=67234970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910130815.4A Active CN110032925B (en) 2019-02-22 2019-02-22 Gesture image segmentation and recognition method based on improved capsule network and algorithm

Country Status (1)

Country Link
CN (1) CN110032925B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414402A (en) * 2019-07-22 2019-11-05 北京达佳互联信息技术有限公司 A kind of gesture data mask method, device, electronic equipment and storage medium
CN110569781A (en) * 2019-09-05 2019-12-13 河海大学常州校区 time sequence classification method based on improved capsule network
CN110991563A (en) * 2019-12-23 2020-04-10 青岛大学 Capsule network random routing algorithm based on feature fusion
CN111709446A (en) * 2020-05-14 2020-09-25 天津大学 X-ray chest radiography classification device based on improved dense connection network
CN112232261A (en) * 2020-10-27 2021-01-15 上海眼控科技股份有限公司 Method and device for fusing image sequences
CN112487981A (en) * 2020-11-30 2021-03-12 哈尔滨工程大学 MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation
CN113011243A (en) * 2021-01-13 2021-06-22 苏州元启创人工智能科技有限公司 Facial expression analysis method based on capsule network
CN113112484A (en) * 2021-04-19 2021-07-13 山东省人工智能研究院 Ventricular image segmentation method based on feature compression and noise suppression
CN114241245A (en) * 2021-12-23 2022-03-25 西南大学 Image classification system based on residual error capsule neural network
CN116304842A (en) * 2023-05-18 2023-06-23 南京信息工程大学 Capsule network text classification method based on CFC structure improvement

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2923991A1 (en) * 2013-10-11 2015-04-16 Mauna Kea Technologies Method for characterizing images acquired through a video medical device
US20170235987A1 (en) * 2016-01-14 2017-08-17 Aaron Hirschmann Systems and Methods for Labeling, Identifying, and Tracking Data Related to Consumable Product
CN108182438A (en) * 2018-01-17 2018-06-19 清华大学 Figure binary feature learning method and device based on deeply study
CN108629288A (en) * 2018-04-09 2018-10-09 华中科技大学 A kind of gesture identification model training method, gesture identification method and system
CN108830826A (en) * 2018-04-28 2018-11-16 四川大学 A kind of system and method detecting Lung neoplasm
CN108898577A (en) * 2018-05-24 2018-11-27 西南大学 Based on the good malign lung nodules identification device and method for improving capsule network
CN108985316A (en) * 2018-05-24 2018-12-11 西南大学 A kind of capsule network image classification recognition methods improving reconstructed network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2923991A1 (en) * 2013-10-11 2015-04-16 Mauna Kea Technologies Method for characterizing images acquired through a video medical device
US20170235987A1 (en) * 2016-01-14 2017-08-17 Aaron Hirschmann Systems and Methods for Labeling, Identifying, and Tracking Data Related to Consumable Product
CN108182438A (en) * 2018-01-17 2018-06-19 清华大学 Figure binary feature learning method and device based on deeply study
CN108629288A (en) * 2018-04-09 2018-10-09 华中科技大学 A kind of gesture identification model training method, gesture identification method and system
CN108830826A (en) * 2018-04-28 2018-11-16 四川大学 A kind of system and method detecting Lung neoplasm
CN108898577A (en) * 2018-05-24 2018-11-27 西南大学 Based on the good malign lung nodules identification device and method for improving capsule network
CN108985316A (en) * 2018-05-24 2018-12-11 西南大学 A kind of capsule network image classification recognition methods improving reconstructed network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LOUIS THIBON: ""Resolution enhancement in laser scanning microscopy with deconvolution switching laser modes(D-SLAM)"", 《OPTICS EXPRESS》 *
张金: ""基于深度学习的肺结节识别与检测研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414402B (en) * 2019-07-22 2022-03-25 北京达佳互联信息技术有限公司 Gesture data labeling method and device, electronic equipment and storage medium
CN110414402A (en) * 2019-07-22 2019-11-05 北京达佳互联信息技术有限公司 A kind of gesture data mask method, device, electronic equipment and storage medium
CN110569781A (en) * 2019-09-05 2019-12-13 河海大学常州校区 time sequence classification method based on improved capsule network
CN110569781B (en) * 2019-09-05 2022-09-09 河海大学常州校区 Time sequence classification method based on improved capsule network
CN110991563A (en) * 2019-12-23 2020-04-10 青岛大学 Capsule network random routing algorithm based on feature fusion
CN110991563B (en) * 2019-12-23 2023-04-18 青岛大学 Capsule network random routing method based on feature fusion
CN111709446A (en) * 2020-05-14 2020-09-25 天津大学 X-ray chest radiography classification device based on improved dense connection network
CN112232261A (en) * 2020-10-27 2021-01-15 上海眼控科技股份有限公司 Method and device for fusing image sequences
CN112487981A (en) * 2020-11-30 2021-03-12 哈尔滨工程大学 MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation
CN113011243A (en) * 2021-01-13 2021-06-22 苏州元启创人工智能科技有限公司 Facial expression analysis method based on capsule network
CN113112484B (en) * 2021-04-19 2021-12-31 山东省人工智能研究院 Ventricular image segmentation method based on feature compression and noise suppression
CN113112484A (en) * 2021-04-19 2021-07-13 山东省人工智能研究院 Ventricular image segmentation method based on feature compression and noise suppression
CN114241245A (en) * 2021-12-23 2022-03-25 西南大学 Image classification system based on residual error capsule neural network
CN114241245B (en) * 2021-12-23 2024-05-31 西南大学 Image classification system based on residual capsule neural network
CN116304842A (en) * 2023-05-18 2023-06-23 南京信息工程大学 Capsule network text classification method based on CFC structure improvement

Also Published As

Publication number Publication date
CN110032925B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN110032925A (en) A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN106599854B (en) Automatic facial expression recognition method based on multi-feature fusion
CN110458038B (en) Small data cross-domain action identification method based on double-chain deep double-current network
CN108615010A (en) Facial expression recognizing method based on the fusion of parallel convolutional neural networks characteristic pattern
CN108446729A (en) Egg embryo classification method based on convolutional neural networks
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN109543548A (en) A kind of face identification method, device and storage medium
CN109117897A (en) Image processing method, device and readable storage medium storing program for executing based on convolutional neural networks
CN109446922B (en) Real-time robust face detection method
CN110532946B (en) Method for identifying axle type of green-traffic vehicle based on convolutional neural network
CN110991349B (en) Lightweight vehicle attribute identification method based on metric learning
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
CN115393225A (en) Low-illumination image enhancement method based on multilevel feature extraction and fusion
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN111199245A (en) Rape pest identification method
CN113505810A (en) Pooling vision-based method for detecting weed growth cycle by using Transformer
CN109583289A (en) The gender identification method and device of crab
CN109508640A (en) A kind of crowd's sentiment analysis method, apparatus and storage medium
CN115719457A (en) Method for detecting small target in unmanned aerial vehicle scene based on deep learning
CN111209873A (en) High-precision face key point positioning method and system based on deep learning
CN114782979A (en) Training method and device for pedestrian re-recognition model, storage medium and terminal
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN111881803B (en) Face recognition method based on improved YOLOv3
CN114092799A (en) Forestry pest identification and detection method based on pooling vision Transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220427

Address after: 271600 No. 574, Hekou village, Laocheng Town, Feicheng City, Tai'an City, Shandong Province

Applicant after: Wu Bin

Address before: 541004 Guangxi Normal University, 15, Yucai Road, Qixing District, Guilin, the Guangxi Zhuang Autonomous Region

Applicant before: Guangxi Normal University

GR01 Patent grant
GR01 Patent grant