CN104834922A

CN104834922A - Hybrid neural network-based gesture recognition method

Info

Publication number: CN104834922A
Application number: CN201510280013.3A
Authority: CN
Inventors: 纪禄平; 尹力; 周龙; 王强; 卢鑫; 黄青君; 杨洁
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2015-05-27
Filing date: 2015-05-27
Publication date: 2015-08-12
Anticipated expiration: 2035-05-27
Also published as: CN104834922B

Abstract

The invention discloses a hybrid neural network-based gesture recognition method. For a gesture image to be recognized and a gesture image training sample, first a pulse coupling neural network is used to detect to obtain noise points, then a composite denoising algorithm is used to process the noise points, then a cell neural network is used to extract edge points in the gesture image, connected regions are obtained according to the extracted edge points, curvature is used to perform fingertip detection on each connected region to obtain undetermined fingertip points, interference of a face part is eliminated to obtain a gesture region, then the gesture region is partitioned according to gesture shape features, Fourier descriptors which keep phase information are obtained according to contour points of the partitioned gesture region, and first multiple Fourier descriptors are selected as gesture features; and a BP neural network is trained according to gesture features of the gesture image training sample, and the gesture features of the gesture image to be recognized are input to the BP neural network for recognition. The hybrid neural network-based gesture recognition method provided by the invention improves the accuracy rate of gesture recognition through utilization of various neural networks.

Description

Based on the gesture identification method of hybrid neural networks

Technical field

The invention belongs to technical field of hand gesture recognition, more specifically say, relate to a kind of gesture identification method based on hybrid neural networks.

Background technology

Along with advancing by leaps and bounds of computer technology, human-computer interaction technology is more and more universal in the life of people.Man-machine interaction (Human-Computer Interaction, HCI) technology refers to and uses certain mode of operation between user and computing machine and reciprocal process between a kind of people performed and computing machine.Its development roughly experienced by the pure handwork stage, verbal order controls stage, user interface stage etc., but along with the development of the technology such as artificial intelligence in recent years, causes the attention to human-computer interaction technology development gradually.

Now along with the continuous expansion of computing machine in application, existing man-machine interaction mode can not meet the higher level requirement of people to daily demand, is badly in need of a kind of more succinct, friendly novel man-machine mutually mutual mode.Final purpose due to man-machine interaction naturally exchanges between human and computer to realize, and interpersonal major part is conveyed a message by body language or facial expression in daily life, only have sub-fraction to be come by natural language, this just shows that human emotion expressed in body language or intention aspect has larger advantage.Due in the middle of body language, hand plays very important role, and therefore, based on interactive mode and the gesture Activity recognition system of gesture behavior, also namely gesture recognition system gets more and more people's extensive concerning.

Generally, gesture recognition system forms primarily of following components: gesture pre-service, Hand Gesture Segmentation, gesture modeling, gesture feature extraction, gesture identification.For gesture pretreatment operation, the mainly denoising operation of images of gestures, denoise algorithm common at present comprises: mean filter, medium filtering, low pass spatial filtering, frequency domain low-pass ripple and Pulse Coupled Neural Network etc., but deposit in case for multiple noise, the noise removal capability of current algorithm all can not reach denoising effect that can be good, and the denoise algorithm that therefore design one is good is most important for the identifying in later stage.For Hand Gesture Segmentation operation, Hand Gesture Segmentation method conventional at present has the dividing method based on Skin Color Information, the dividing method based on movable information and the dividing method based on marginal information.Because the dividing method based on Skin Color Information is easily subject to the interference of background information, the dividing method based on marginal information can not reach good segmentation effect again, and it is also vital for therefore how designing a good partitioning algorithm effectively.Operation is extracted for gesture feature, most widely used is at present feature extracting method based on Fourier descriptor, but the rotational invariance due to the method makes the method little for the changing features of the gesture after gesture rotation, and it is also vital for therefore how designing a Fourier descriptor without rotational invariance.For gesture identification operation, method common at present has template matching technique, support vector machine, neural net method, hidden Markov model etc., therefore how to select a good gesture identification method most important equally for gesture recognition system.

Neural net method refers to and utilizes some simple processing units to simulate people's brain neuron, and these simple processing units are connected into network to realize a science to brain simulation in some way.Neural net method often has following advantage: parallel computation, distributed storage, robustness, nonlinear process and good adaptivity and fault-tolerance ability.Therefore, neural net method can be applied under multiple scene.Such as: gesture identification, Iamge Segmentation, noise processed etc.

At present, neural net method obtains increasing application in gesture Activity recognition field.But neural net method is also only limitted to this stage of gesture identification in the application in gesture Activity recognition field, little in the application in other stages for gesture Activity recognition.

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, a kind of gesture identification method based on hybrid neural networks is provided, Pulse Coupled Neural Network is utilized to improve the denoising effect of images of gestures, cell neural network is utilized to carry out Hand Gesture Segmentation, adopt the Fourier descriptor with rotational variability as gesture feature, utilize BP neural network to carry out gesture identification, thus improve the accuracy rate of gesture identification

For achieving the above object, the present invention is based on the gesture identification method of hybrid neural networks, comprise the following steps:

S1: the feature extracting images of gestures to be identified and images of gestures training sample, concrete steps comprise:

S1.1: the Pulse-coupled Neural Network Model of suggestion gesture gray level image, using the corresponding neuronic input in Pulse Coupled Neural Network of the gray-scale value of each pixel of current gesture gray level image, the pixel of granting characteristic to images of gestures of Pulse Coupled Neural Network is utilized to detect, if the output state of pixel is fired state, then element corresponding for this pixel in testing result matrix is set to 1, otherwise is set to 0; Each element of traversal testing result matrix, if element value is 1, then with the center of this element for noise reduction process window, the size of noise reduction process window is arranged according to actual conditions, the value of other elements in statistics noise reduction process window except central point element, if the number of elements that value is 0 is greater than predetermined threshold value, illustrate that this central point is noise spot, other situations are not then noise spots;

Distinguish two kinds of noise estimation value H (i, j) and the V (i, j) of calculating noise point as follows:

H(i,j)＝|a(i,j)-b(i,j)|

Wherein, a (i, j) is the gray-scale value at pixel (i, j) place in image, and b (i, j) carries out the intermediate value output gray level value after medium filtering for this pixel;

V (i, j) = \frac{| m_{1} (i, j) - a (i, j) | + | m_{2} (i, j) - a (i, j) |}{2}

Wherein, m ₁(i, j) and m ₂(i, j) represents the gray-scale value of immediate two points with x (i, j) gray-scale value in the neighborhood of pixel (i, j) place respectively;

If H (i, j)>=T ₁, and V (i, j)>=T ₂, then adopt medium filtering to process this noise spot, otherwise adopt mean filter to process this noise spot;

S1.2: histogram equalization is carried out to the gesture gray level image after step S1.1 denoising;

S1.3: the Cellular Neural Networks setting up gesture gray level image, using the input u of the gray-scale value of each pixel of the gesture gray level image (i, j) after equalization corresponding cell in Cellular Neural Networks _ij, carry out iteration according to the formula of state migration procedure, until the convergence of whole cell neural network, obtain the output y of each cell _ij(t); The output valve of the cell unit that each pixel is corresponding in traversal cell neural network, when the output valve of certain pixel is in [0,1] scope, if its corresponding neighborhood in other pixels pixel value and be greater than predetermined threshold value, then this pixel is not edge pixel, otherwise is edge pixel point; When output valve [-1,0) in scope, be not edge pixel point;

S1.4: obtain connected region according to the edge pixel point that step S1.3 obtains, extract the profile obtaining connected region, carry out finger tip detection respectively to each connected region, Fingertip Detection is:

Each wire-frame image vegetarian refreshments in traversal connected region, using this pixel as reference point, coordinate is designated as p (p _x, p _y, 0), preset a distance constant L, get p point L some p above along contour direction ₁(p _1x, p _1y, 0), get p L some p below ₂(p _2x, p _2y, 0), compute vector with vector between the cosine value cos α of angle, if cos α is greater than preset curvature threshold value T, then judge that this point is as finger tip point undetermined, otherwise not as finger tip point undetermined;

According to traversal direction determination fingertip location vector product symbol, if during clockwise traversal according to gesture region overall profile, vector product symbol should be negative, otherwise is just, calculate finger tip undetermined some vector with vector between vector product if the symbol that the symbol of this vector product is corresponding with fingertip location is identical, is then left finger tip undetermined point, otherwise does not retain;

Judge in the finger tip point all undetermined detected in this connected region, whether the y coordinate difference of the finger tip point undetermined that y coordinate is maximum and the minimum finger tip point undetermined of y coordinate exceedes the half of face height, if so, this connected region is not gesture region, otherwise as gesture region undetermined; In the gesture region each undetermined judged further again, whether finger tip point quantity undetermined exceedes default amount threshold, and if so, then this connected region is gesture region, otherwise is not;

Asking for the principal direction in gesture region, is that 2 pairs of gesture regions are split according to principal direction according to gesture length and width ratio, obtains the gesture region after splitting;

S1.5: by the gesture region obtained after step S1.4 segmentation, the point coordinate in gesture region is represented with plural form, all point coordinates are formed discrete series, and note point quantity is n, carries out Fourier transform to this discrete series, obtain n Fourier coefficient z (k), k=0,1 ... n-1, calculates Fourier descriptor

Wherein k '=1,2 ..., n-1, represent the angle of gesture region principal direction and x-axis.

Q constitutive characteristic vector before selecting in Fourier descriptor;

S2: using the proper vector of training sample images of gestures as training sample input BP neural network, the images of gestures classification of its correspondence, as the output of BP neural network, is trained BP neural network;

S3: in the BP neural network train the proper vector input step S2 of images of gestures to be identified, exports the images of gestures classification identifying and obtain.

The present invention is based on the gesture identification method of hybrid neural networks, for images of gestures to be identified and images of gestures training sample, first the differentiation adopting Pulse Coupled Neural Network to carry out noise spot and marginal point detects, recycling compound denoise algorithm processes noise spot, then the marginal point in cell neural network extraction images of gestures is adopted, marginal point according to extracting obtains connected region, utilize curvature to carry out finger tip detection to each connected region and obtain finger tip undetermined point, get rid of the interference of people face part again, obtain gesture region, then split according to hand shape features of shape, obtain the gesture region after splitting, obtain according to the point in gesture region the Fourier descriptor retaining phase information, before selecting, several Fourier descriptors are as gesture feature, according to the gesture feature training BP neural network of images of gestures training sample, the gesture feature of images of gestures to be identified input BP neural network is identified.

The present invention has following beneficial effect:

(1) utilize Pulse Coupled Neural Network to carry out the differentiation of noise spot and marginal point, in conjunction with compound denoise algorithm, denoising is carried out to images of gestures, can denoising effect be improved;

(2) Hand Gesture Segmentation combines the coarse segmentation of cell neural network and cuts based on the segmentation of gesture shape facility, can improve the accuracy of Hand Gesture Segmentation;

(3) gesture feature adopts Fourier descriptor, remains phase information, can improve discrimination.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the gesture identification method that the present invention is based on hybrid neural networks

Fig. 2 is the process flow diagram of images of gestures feature extraction in the present invention

Fig. 3 carries out gesture in conjunction with gesture style characteristic to segment the process flow diagram cut

Fig. 4 is the schematic diagram that finger tip of the present invention detects;

Fig. 5 is the exemplary plot of gesture coarse segmentation;

Fig. 6 is that gesture segments the exemplary plot of cutting.

Embodiment

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in and will be left in the basket here.

Embodiment

Fig. 1 is the process flow diagram of the gesture identification method that the present invention is based on hybrid neural networks.As shown in Figure 1, the gesture identification method that the present invention is based on hybrid neural networks comprises the following steps:

S101: the feature extracting sample to be identified and training sample:

First need to carry out feature extraction to images of gestures to be identified and images of gestures training sample.Fig. 2 is the process flow diagram of images of gestures feature extraction in the present invention.As shown in Figure 2, in the present invention, images of gestures feature extraction comprises the following steps:

S201: images of gestures noise suppression preprocessing:

The present invention adopts the denoise algorithm combined based on Pulse Coupled Neural Network (PCNN-Pulse Coupled Neural Network) and compound denoise algorithm to carry out the denoising of gesture gray level image, first by adopting Pulse Coupled Neural Network to detect the differentiation that images of gestures carries out noise spot and marginal point, adopt compound denoise algorithm to carry out denoising operation according to the type of noise spot afterwards, thus reach the object removing multiple noise under the prerequisite of preserving edge information.

Each neuron of Pulse Coupled Neural Network is made up of three parts: receiving unit, modulating part and pulse generator.Pulse Coupled Neural Network is the pretreated a kind of common method of image noise reduction, and its Main Function is to remove salt-pepper noise.When Pulse Coupled Neural Network is used for image noise reduction field, can be understood as the local interconnection network of a two-dimension single layer, pixel in this network in neuron and pending gray level image is one to one, and is also interconnective relation between adjacent neurons.In noise reduction process process, the gray-scale value of each pixel of pending image can be understood as neuronic feed back input, each neuronic output is simultaneously only as the input of adjacent neurons, and each neuronic output state only has two kinds: fired state and the state that misfires, can be designated as 1 and 0 respectively.The pixel corresponding due to noise and the pixel of surrounding are distinguished comparatively large, and the granting characteristic of Pulse Coupled Neural Network therefore can be utilized to carry out the judgement of noise spot in conjunction with the self-characteristic of noise, and concrete determination methods is as follows:

Set up the Pulse-coupled Neural Network Model of gesture gray level image; Using the corresponding neuronic input in Pulse Coupled Neural Network of the gray-scale value of each pixel of current gesture gray level image, then the pixel of granting characteristic to whole image of Pulse Coupled Neural Network is utilized to detect, if the output state of pixel is fired state, then element corresponding for this pixel in testing result matrix is set to 1, otherwise be set to 0, visible detection matrix of consequence is identical with the size of pending image; Arranging noise reduction process window size, is 3 × 3 in the present embodiment; Each element of traversal testing result matrix, if element value is 1, it is namely fired state, then with the center of this element for noise reduction process window, the value of other elements in statistics noise reduction process window except the central point element testing result of other pixels (namely beyond central point corresponding pixel points point), if value is the number of elements of 0 (namely misfire state) be greater than predetermined threshold value, illustrate that this central point is noise spot, other situations then this central point are not noise spots.Thus reach the object carrying out judging differentiation to noise spot and marginal point.Amount threshold is generally the half of number of elements in noise reduction process window.

After judgement obtains noise spot, then adopt compound denoise algorithm to carry out corresponding denoising operation, its main method is:

Suppose a (i, j) be pixel (i in image, j) gray-scale value at place, 1≤i≤M, 1≤j≤N, M represents that the pixel quantity (i.e. columns) that gesture gray level image is often gone, N represent the pixel quantity (i.e. line number) that gesture gray level image often arranges, b (i, j) carries out the intermediate value output gray level value after medium filtering for this pixel.In order to reach the object to Gaussian noise, adopt the difference of the pixel value of noise spot and intermediate value output gray level value as noise estimation value, shown in (1)

H(i,j)＝|a(i,j)-b(i,j)| (1)

Because the type of noise is different, if merely use above-mentioned method of estimation, the object distinguishing multiple noise can not be reached, therefore on the basis of above-mentioned formula, in turn introduce another one noise estimation value V (i, j), this parameter and pixel (i, j) pixel value a (i, j) and close two some m at place ₁(i, j) and m ₂the gradient of (i, j) and mean value, shown in (3)

V (i, j) = \frac{| m_{1} (i, j) - a (i, j) | + | m_{2} (i, j) - a (i, j) |}{2} - - - (2)

Wherein, m ₁(i, j) and m ₂(i, j) represents the gray-scale value of immediate two points with x (i, j) gray-scale value in the neighborhood of pixel (i, j) place respectively.

Arranging threshold value is T ₁and T ₂, then by above-mentioned two kinds of relations between noise estimation value and threshold value, realize the alignment processing to different noise, concrete grammar is:

If H (i, j)>=T ₁, and V (i, j)>=T ₂, then judge that the type of this noise spot is as salt-pepper noise or impulsive noise, adopt medium filtering to process this noise spot, the gray-scale value by this noise spot is revised as medium filtering output valve, if H (i, j) < is T ₁, or H (i, j)>=T ₁and V (i, j) < T ₂, then judge that this noise type is as Gaussian noise, adopt mean filter to process this noise spot, the gray-scale value by this noise spot is revised as mean filter output valve.

In above-mentioned algorithm, threshold value T ₁and T ₂choose the quality of compound denoise algorithm result most important.Wherein conventional at present Research on threshold selection is mean absolute deviation algorithm and MAD algorithm.According to this algorithm, T ₁=3.5 δ _ij, δ _ijthe mean absolute deviation of all pixels in the denoising window of expression pixel (i, j).Threshold value T ₂selection mainly for the texture that may occur in images of gestures, according to MAD algorithm and experiment experience, T ₂value be usually chosen as 6 ~ 10 integer.

S202: histogram equalization:

Histogram equalization process refers to the method utilizing the contrast of image histogram to image to adjust, thus the grey level histogram of original image is become be uniformly distributed in global scope from certain gray areas relatively concentrated.The present invention carries out histogram equalization process to the gesture gray level image after step S201 denoising, is the difference in order to expand images of gestures prospect and background part gray-scale value.Histogram equalization is the method for current a kind of conventional picture superposition, and its concrete steps do not repeat them here.

S203: the gesture coarse segmentation based on cell neural network:

The same with Pulse Coupled Neural Network, pixel one_to_one corresponding in neuron in cell neural network and gesture gray level image, remember that the cell that the i-th row jth arranges is C (i, j) (pixel (i in corresponding gesture gray level image, j)), cell C (i, j) forms by four parts: input variable u _ij, state transfering variable x _ij, output variable y _ijand threshold values I.Local interconnect between the cell of cell neural network, cell C (i, j) only with its neighborhood N _rcell in (i, j) is interconnected, and with other cell without direct annexation.Cell C (i, j) neighborhood N _r(i, j) can be defined as:

N _r(i,j)＝C(k,l)|max(k-i,l-j)≤r (3)

Wherein, r is positive integer, 1≤i, k≤M, and 1≤j, l≤N, M represents that the pixel quantity that gesture gray level image is often gone, N represent the pixel quantity that gesture gray level image often arranges.Namely the neighborhood of cell C (i, j) is centered by C (i, j), the scope of the length of side included by the square of 2r+1.

The main formulas of cell neural network is:

State migration procedure:

C \frac{{δx}_{i j} (t)}{δ t} = - \frac{1}{R_{x}} x_{i j} (t) + \underset{C (k, l) &Element; N_{r} (i, j)}{Σ} A (k, l) y_{k l} (t) + \underset{C (k, l) &Element; N_{r} (i, j)}{Σ} B (k, l) u (k, l) + I - - - (4)

Output equation:

y_{i j} (t) = \frac{1}{2} (| x_{i j} (t) + 1 | - | x_{i j} (t) - 1 |) = \{\begin{matrix} 1, & x_{i j} (t) &GreaterEqual; 1 \\ x_{i j} (t), & | x_{i j} (t) | < 1 \\ - 1 & x_{i j} (t) \leq - 1 \end{matrix} - - - (5)

Wherein, 1≤i, k≤M, 1≤j, l≤N; T represents iterations; A (k, l) represents the neighborhood N residing for cell C (i, j) _rthe feedback weight of the cell C (k, l) in (i, j); B (k, l) then represents the neighborhood N residing for cell C (i, j) _rthe control weight of the cell C (k, l) in (i, j) is also other elements in template B except the element of center.Here the value of (k, l) is according to neighborhood N _rthe definition of (i, j) determines.

Feedback template A and control module B is the matrix of (2r+1) × (2r+1), and I represents the threshold values template of cell neural network, and the value of A, B and I comprehensively determines the input quantity u of cell neural network _ij, output quantity y _ijand state transfer amount x _ijcorresponding relation.Therefore for Cellular Neural Networks, how correctly the value of design of feedback template A, Control architecture B and threshold values I is most important.

The template design method that the present invention adopts ties based on algebraically the template design method combined with forefathers' stencil design experience, and the form of template A, B, I generally designs as follows:

A (k, l) = \{\begin{matrix} a, & k = i, l = j \\ 0, & k &NotEqual; i, l &NotEqual; j \end{matrix} - - - (6)

B (k, l) = \{\begin{matrix} b, & k = i, l = j \\ - c, & k &NotEqual; i, l &NotEqual; j \end{matrix} - - - (7)

I＝-d (8)

Wherein, a, b, c, d are normal number.

Set up the Cellular Neural Networks of gesture gray level image, using the input u of the gray-scale value of each pixel of gesture gray level image (i, j) after equalization corresponding cell in Cellular Neural Networks _ij, carry out iteration according to the formula of state migration procedure, until the convergence of whole cell neural network, there is output y in each cell _ij(t).According to output equation, the output valve y of cell neural network _ijt (), between 1 and-1, works as y _ijwhen () is 1 t, represent entirely black; Work as y _ijwhen () is-1 t, representative is complete white.

Judge that whether certain pixel is the ultimate principle of marginal point and is: when certain pixel value is for entirely black, when being+1, if each pixel value in its corresponding neighborhood and the threshold parameter that is greater than setting, then this pixel is not edge pixel, and now pixel value is tending towards complete white; Otherwise, if, each pixel value in its corresponding neighborhood and be less than the threshold parameter of setting, then this pixel represents edge pixel, and now pixel value is tending towards entirely black.When this pixel value is complete white, namely-1 time, then regardless of the value size of each pixel in its corresponding neighborhood, this pixel value all will be tending towards entirely in vain.

According to above principle, judge in the present invention that whether certain pixel is the method for marginal point and is: the output valve of the cell unit that each pixel is corresponding in traversal cell neural network, when the output valve of certain pixel is [0,1] in scope, if in its corresponding neighborhood other pixels pixel value and be greater than predetermined threshold value, then this pixel is not edge pixel, otherwise is edge pixel point; When output valve [-1,0) in scope, be not edge pixel point.Neighborhood territory pixel value and threshold value arrange according to actual conditions.

S204: carry out gesture segmentation in conjunction with gesture style characteristic and cut:

Fig. 3 carries out gesture in conjunction with gesture style characteristic to segment the process flow diagram cut.As shown in Figure 3, gesture segmentation of the present invention is cut and is comprised the following steps:

S301: extract connected region and profile:

According to the edge pixel point adopting cell neural network to obtain, ask for connected region, thus remove the interference of other background informations, only retain hand and the face area of people.The algorithm asking for connected region employing in the present embodiment is two_pass algorithm.Then the profile of connected region is extracted, the present embodiment adopts search labeling method to extract profile, idiographic flow is: systematically scan extracting the image after connected region above, if run into the some points in connected region, then with this point for starting point, then follow the tracks of its edge, and the pixel above edge marks.When the profile of scanning reaches complete closed, then get back to a position and continue scanning, until find new Pixel Information.Extraction connected region and profile also can select additive method as required.

S302: finger tip detection is carried out to each connected region:

Respectively finger tip detection is carried out for each connected region obtained, thus determines whether gesture region.Generally when carrying out gesture identification, finger all separates, and therefore can carry out finger tip detection by curvature estimation.Fig. 4 is the schematic diagram that finger tip of the present invention detects.As indicated at 4, the method that finger tip detects is:

Each wire-frame image vegetarian refreshments in traversal connected region, using this pixel as reference point, coordinate is designated as p (p _x, p _y, 0), (p _x, p _y) namely represent the two-dimensional coordinate of this reference point in images of gestures, preset a distance constant L, get p point L some p above along contour direction ₁(p _1x, p _1y, 0), then put p and some p ₁composition straight line, then gets L some p after a p along contour direction ₂(p _2x, p _2y, 0), then put p and some p ₂also can form straight line, understand shape in an angle between these two, this angle is designated as α; By vector with vector between the cosine value of angle as the curvature result that will calculate, namely curvature estimation formula is:

c o s α = \frac{\overset{&RightArrow;}{{pp}_{1}} \cdot \overset{&RightArrow;}{{pp}_{2}}}{| \overset{&RightArrow;}{{pp}_{1}} | | \overset{&RightArrow;}{{pp}_{2}} |} - - - (9)

If cos α is greater than preset curvature threshold value T, then judge that this point is as finger tip point undetermined.The size of threshold value T is arranged according to distance constant L, and when distance constant L is larger, threshold value T is also larger.Distance constant L usually can not be too small or excessive, generally arranges according to 1/1 to two/4th of finger average length.

For the interference of the groove part of finger, can by vector with vector between the symbol of vector product determine.As seen in Figure 4, when a p is positioned at fingertip location when the symbol of vector product and some p are positioned at groove location the symbol of vector product is different, therefore can pass through symbol come the position of judging point p.Just for this purpose, just p, p will be put ₁and p ₂coordinate represent in three-dimensional rectangular coordinate mode.Fingertip location the symbol of vector product is relevant with traversal direction, when clockwise traversal according to gesture region overall profile, according to the right-hand rule of vector product, fingertip location vector product is inside perpendicular to image, is negative, when counterclockwise traversal according to gesture region overall profile (as shown in Figure 4 traversal direction), fingertip location vector product is outside perpendicular to image, is just.According to fingertip location the symbol of vector product, thus the interference of removing groove part.Namely finger tip undetermined point is judged the symbol of vector product, if the symbol corresponding with fingertip location is identical, is then left finger tip undetermined point, otherwise does not retain.

S303: judge gesture region:

After finger tip point being detected, also need to judge finger tip point, thus the curvature that some part removing people face part causes because of angle problem is greater than the interference of threshold value, judges to obtain gesture region.Present invention employs double decision method:

Whether the y coordinate difference between the finger tip point undetermined that namely y coordinate is maximum first judging to detect in connected region and the minimum finger tip point undetermined of the y coordinate detected exceedes the half of face height, if, this connected region is not gesture region, otherwise as gesture region undetermined.Here why by being set to the half of face height apart from size, test by experiment and draw, so just under the prerequisite of the correct finger tip point of complete reservation, the interference of people face part can be removed.

In the gesture region each undetermined judged further again, whether finger tip point quantity undetermined exceedes default amount threshold, and if so, then this connected region is gesture region, otherwise is not.The number of the finger tip point quantity that actual gesture region obtains is relevant with curvature threshold T, and therefore in actual applications, the threshold value of finger tip point quantity can obtain by carrying out statistics to the experimental result of several gesture training samples.

S304: gesture region segmentation:

By operating the interference eliminating other connected regions such as face above, obtain gesture region.But inside gesture region, likely not merely comprise the palm portion of people, likely also have the parts such as wrist.Generally, the effective information of the gesture of people all concentrates on the palm portion of people, and the information of the part such as wrist can be ignored substantially.Therefore in order to make late feature extract and follow the tracks of efficient and effectively, needing to split gesture region, reaching only to retain and pointing and the object of palm portion.

According to the shape facility of staff, the present invention approximates 2 to realize the segmentation to gesture according to the ratio of the length of gesture and the width of gesture.Before splitting, need the principal direction first knowing gesture region, the method asking for gesture principal direction in the present embodiment is: the barycenter asking for gesture region, then the vector of barycenter to each finger tip point is tried to achieve, be averaged by these vectors, the direction of this average vector is gesture region principal direction.And then the segmentation of gesture is carried out according to the principal direction in gesture region.The dividing method that the present embodiment adopts is: the boundary rectangle obtaining gesture region by gesture region principal direction, limit, place parallel with principal direction is for long, the limit vertical with principal direction is wide, select the broadside at finger tip point place, from this broadside, along long limit, intercept the boundary rectangle that distance is 2 times of width edge lengths, the gesture region comprised in this boundary rectangle is the gesture region only retaining finger and palm portion that segmentation will obtain.

S205: adopt the Fourier descriptor retaining phase information to extract gesture feature:

Step S204 is split to the gesture region obtained, the present invention devises a kind of Fourier descriptor retaining phase information to extract gesture feature information, thus removes the rotational invariance of conventional Fourier descriptor, reaches the object distinguishing rotate gesture.

Discrete fourier coefficient z (k) can be expressed as:

z (k) = \frac{1}{n} Σ_{i = 1}^{n} p (i) e^{- j \frac{2 π i k}{n}}, k = 0, 1, ..., n - 1 - - - (10)

Wherein, p (i) represents i-th data in discrete series, and n represents the data bulk in discrete series, and e represents natural constant, and j is imaginary unit.In the present invention, be gesture profile due to what need to carry out converting, therefore discrete series p (i) is the plural form that step S104 splits coordinate in the gesture region contour pixel obtained.

Inverse Fourier transform can be expressed as:

p (i) = Σ_{k = 0}^{n - 1} z (k) e^{j \frac{2 π i k}{n}}, i = 0, 1, ..., n - 1 - - - (11)

According to fundamental property z (the k)=z of Fourier transform ^*(n-k) remove the HFS from K+1 to n-K-1 in Fourier transform form z, wherein, z* here represents the conjugate complex number form of z; The span of K is: [0, n/2].And then inverse Fourier transform is carried out to the z removing HFS, will obtain the curve approximate with former Fourier transform, but this curve becomes more level and smooth, this curve becomes the K curve of approximation of former Fourier's change curve.Wherein, above-mentioned described Fourier coefficient subset z (k) | n-K < k≤K} then will be used for extracting the Fourier descriptor of gesture feature exactly.

The reference position of the yardstick of Fourier descriptor and shape, direction and curve has certain relation.Therefore, in order to ensure that recognizer has rotation, translation and scale invariance, then need to be normalized operation to Fourier descriptor.Fundamental property according to Fourier's change can prove, when representing profile with Fourier coefficient, coefficient amplitude || z (k) || there is rotational invariance, translation invariance and start position independence, wherein, 0≤k≤n-1, again because Z [0] does not have translation invariance, therefore the span of k is set to [1, n-1].In order to realize the scale invariability of Fourier descriptor, can by the amplitude of each coefficient except Z [0] || Z (k) || divided by || Z (1) ||, thus reach the characteristic of Scale invariant.Fourier descriptor S [k '] after normalization operation can be expressed as:

S [k^{'}] = \frac{| | z (k^{'}) | |}{| | z (1) | |} - - - (12)

Wherein, 1≤k '≤n-1; || || represent modulo operator.

The detailed description of unitary Fourier descriptor can see document " Song Ruihua. describe the Gesture Recognition Algorithm [D] of son based on Fourier. Xian Electronics Science and Technology University, 2008 ".

The present invention in order to remove the rotational invariance of conventional Fourier descriptor, the phase information after remaining rotation, the normalized form of the Fourier descriptor after improvement can be expressed as:

Wherein, represent the angle of gesture region principal direction and x-axis, j is imaginary unit.Fourier descriptor S [k '] above remains the phase information that gesture rotates, therefore this descriptor does not have rotational invariance.Therefore the present invention adopts and removes in addition coefficient as the feature in gesture region.This feature has translation and scale invariance, and has nothing to do with the reference position of gesture contour curve, and have rotational variability again, this proper vector can reach the object distinguished rotate gesture simultaneously.Because the point quantity in different gesture region is not necessarily identical, therefore in actual applications, only in Fourier descriptor, before unified selection, Q constitutive characteristic is vectorial, and the size of Q can be determined according to actual conditions.

S102: according to training sample training BP neural network:

Using the proper vector of training sample images of gestures as training sample input BP neural network, the images of gestures classification of its correspondence, as the output of BP neural network, is trained BP neural network.BP neural network is a kind of conventional neural network, and the concrete formation of its network and parameter and training method, do not repeat them here.

S103: carry out gesture identification to the sample identified:

In the BP neural network train the proper vector input step S102 of images of gestures to be identified, export the images of gestures classification identifying and obtain.

In order to technique effect of the present invention is described, to invention has been experimental verification.The gesture training sample selected be divided into gesture upward, gesture down, gesture towards left, gesture towards right four parts, the training sample quantity of every part is 80, selects test sample book more equally, every partial test sample size 40 from this four classes image.In order to show that conveniently sample is to carry out implementation process explanation only to choose gesture herein, and in sample, every pictures is of a size of 256 × 256, and gray level is 256 upward.

First need to carry out image denoising to sample upward.Scale size due to samples pictures is 256 × 256, due to Pulse Coupled Neural Network be used for image noise reduction field time, its neuron number and pending image slices vegetarian refreshments are one to one, therefore the neuron number of Pulse Coupled Neural Network is set to 65536, the optimum configurations of the Pulse-coupled Neural Network Model that the present embodiment adopts is: neuron iterations τ=10, neuron strength of joint β=3, dynamic threshold parameter θ _ij=1, the amplification coefficient V that threshold value exports _θ=20, the attenuation coefficient a of threshold function table _θ=0.2, then utilize granting characteristic paired pulses coupled neural network to detect, then judge to obtain noise spot by testing result, then according to the type of noise spot, adopt compound denoise algorithm to carry out denoising operation, wherein the optimum configurations of compound denoise algorithm is T ₁=3.5 δ _ij, wherein S _krepresent noise window, the detection window of noise window size and Pulse Coupled Neural Network is in the same size, and size is 3 × 3, T ₂=8.

After histogram equalization is carried out to the images of gestures after denoising, adopt cell neural network to detect the edge obtaining gesture in images of gestures, realize the coarse segmentation to images of gestures, in the present embodiment, in cell neural network, the size of the neighborhood of each cell is 3*3, and the template adopted is:

A = (\begin{matrix} 0 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 0 \end{matrix}), B = (\begin{matrix} - 1 & - 1 & - 1 \\ - 1 & 6 & - 1 \\ - 1 & - 1 & - 1 \end{matrix}), I = - 0.1

Fig. 5 is the exemplary plot of gesture coarse segmentation.

Then carry out segmentation in conjunction with gesture shape facility to images of gestures to cut.Wherein the size of constant L is 80, and the size of the threshold value T of curvature estimation is 0.5.Fig. 6 is that gesture segments the exemplary plot of cutting.Can see, carry out gesture segmentation cut after, the impact in the regions such as face can be eliminated, obtain gesture region comparatively accurately.

And then point coordinate segmentation being cut the gesture region obtained builds discrete series, Fourier coefficient is obtained after carrying out Fourier transform, then being normalized according to formula (13), forming gesture feature vector by selecting first 200 in the Fourier descriptor after normalization.

The gesture feature of training sample vector is adopted to train BP neural network, wherein the number of the input layer of BP neural network is determined by gesture feature vector, the number of output layer is determined by gesture specimen types, the number of the input layer that the present invention adopts is 200, the number of hidden layer is 10, and the number of output layer is 4.Output rusults can by binary mode 0001, and 0010,0100,1000 represent, wherein 0001 represents gesture upward, and 0010 represents gesture down, 0100 represents that gesture is towards a left side, and 1000 represent that gesture is towards the right side, judge which kind of type is gesture belong to according to the result that gesture exports.

In order to the quality of the noise reduction of the novel denoise algorithm combined based on Pulse Coupled Neural Network and compound denoise algorithm of verifying that the present invention designs, the denoise algorithm design the present invention and simple compound denoise algorithm and medium filtering have done comparative analysis, and the leading indicator of contrast is Y-PSNR PSNR.Table 1 is denoise algorithm of the present invention and the PSNR table of comparisons contrasting algorithm.

Table 1

As can be seen from Table 1, when identical noise density, the value of its PSNR of denoising method that the present invention proposes is apparently higher than the value of medium filtering and simple compound denoise algorithm.As can be seen here, the denoise algorithm in conjunction with Pulse Coupled Neural Network and compound denoise algorithm of the present invention's design has good denoising effect.

In addition, also adopt the recognition effect of traditional Fourier descriptor to contrast, contrast index is the discrimination of gesture sample.Table 2 is gesture specimen discerning result statistical forms of conventional Fourier descriptor.Table 3 is gesture specimen discerning result statistical forms of Fourier descriptor of the present invention.

Table 2

Table 3

Known by the result of contrast table 2 and table 3, traditional Fourier descriptor well can not identify the gesture that rotation is larger, and discrimination only has about 71%, and discrimination is lower, therefore, when the method has the scene of different implication for rotate gesture, effect is not fine.The Fourier descriptor that the present invention improves can rotate a certain angle in tolerance gesture, although two kinds of different images can be thought when angle rotates excessive, but verify that the present invention still reaches the discrimination of about 91%, achieves good gesture identification effect by experiment.

Although be described the illustrative embodiment of the present invention above; so that those skilled in the art understand the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined, these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection in appended claim.

Claims

1. based on a gesture identification method for hybrid neural networks, it is characterized in that, comprise the following steps:

H(i,j)＝|a(i,j)-b(i,j)|

V (i, j) = \frac{| m_{1} (i, j) - a (i, j) | + | m_{2} (i, j) - a (i, j) |}{2}

Q constitutive characteristic vector before selecting in Fourier descriptor;

2. gesture identification method according to claim 1, is characterized in that, threshold value T in step S1.1 ₁=3.5 δ _ij, δ _ijthe mean absolute deviation of all pixels in the denoising window of expression pixel (i, j); Threshold value T ₂it is the integer of 6 ~ 10.

3. gesture identification method according to claim 1, is characterized in that, in step S1.3, the feedback template A in cell neural network is:

A (k, l) = {\begin{matrix} a, & k = i, l = j \\ 0, & k &NotEqual; i, l &NotEqual; j \end{matrix},

Control architecture B is:

B (k, l) = \{\begin{matrix} b, & k = i, l = j \\ - c, & k &NotEqual; i, l &NotEqual; j \end{matrix},

Threshold value I=-d,

Wherein (k, l) be centered by cell C (i, j) in cell neural network, the length of side is the neighborhood N of 2r+1 _rpoint in (i, j), a, b, c, d are normal number.

4. gesture identification method according to claim 1, it is characterized in that, in step S1.4, the method asking for gesture region principal direction is: the barycenter asking for gesture region, then the vector of barycenter to each finger tip point is tried to achieve, be averaged by these vectors, the direction of this average vector is gesture region principal direction.