CN102508547A

CN102508547A - Computer-vision-based gesture input method construction method and system

Info

Publication number: CN102508547A
Application number: CN2011103459148A
Authority: CN
Inventors: 王轩; 王金磊; 于成龙; 赵海楠; 张加佳; 许欣欣
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2011-11-04
Filing date: 2011-11-04
Publication date: 2012-06-20

Abstract

The invention relates to a computer-vision-based gesture input method construction method, which comprises the following steps of: gesture acquisition, video processing, gesture analysis, gesture identification and construction of a gesture input method. Gesture frames are clustered by utilizing an asynchronous buffering model to obtain key gesture frames, and only the key gesture frames are subjected to the subsequent gesture analysis and gesture identification; therefore, compared with the conventional system for analyzing and identifying each frame of gestures, a system provided by the invention can give a more real-time response. In addition, the combined application of the gesture identification and the input method is creative in the history of human-computer interaction.

Description

Gesture input method construction method and system based on computer vision

Technical field

The present invention relates to a kind of gesture input method construction method and system, more particularly to a kind of gesture input method construction method and system based on Video clustering processing.

Background technology

With the development of technology, in some special dimensions, gesture input technology has also obtained significant progress, and the premise of gesture input technology is the collection of hand signal, and then the hand signal of collection is analyzed and processed.Because the detection of gesture target refers to detect gesture target from image stream in the case where people is with complicated background condition, gesture identification is then carried out again.Gesture identification is that its high-level implication is explained according to human hand posture and change procedure, because gesture has following four feature, so it is characterized in one of chief technology that gesture input method is realized to extract with geometric invariance：Hand is elastomeric objects, so difference is very big between same gesture；Hand has bulk redundancy information, because gesture identification key is identification finger characteristic, so palm is characterized in redundancy；The position of hand is that, in three dimensions, therefore, it is difficult to position, and the image that computer is obtained is the three-dimensional projection to two dimension, so projecting direction is crucial；The surface Non-smooth surface of hand, so easily producing shade.

Existing Gesture Recognition is that all data frames for capturing camera all carry out gesture analysis and gesture identification, and all data frames include gesture preparatory stage, critical stage and Restoration stage, therefore, gesture recognition system response poor real.Existing input method is that the keyboard of the keyboard, mouse or mobile device by user's control PC, screen are inputted.

The content of the invention

Present invention solves the technical problem that being：A kind of gesture input method construction method and system based on computer video are built, overcomes prior art gesture recognition system response speed slow, relies only on keyboard and mouse touch-screen to use the technical problem of input method.

The technical scheme is that：A kind of gesture input method construction method based on computer vision is provided, step is as follows

Gather gesture：Gesture vision signal is caught, gesture data picture frame is obtained；

Video processing：Using the asynchronous clustering processing model based on interval gesture data frame buffering, the asynchronous clustering processing model includes buffer model, the gesture data picture frame of collection is sequentially stored in buffer model, clustering processing is carried out to the gesture data picture frame in buffer model, gesture keyframe sequence is obtained；

Gesture analysis：After gesture key frame images binaryzation and smothing filtering denoising, carry out gesture contours extract, crucial gesture outline data is obtained, according to gesture key frame images binary map and the gesture contours extract gesture feature parameter of proposition, the gesture feature parameter composition gesture feature vector；

Gesture identification：Gesture identification is carried out according to gesture feature vector, corresponding with identification gesture key letter virtual key or Keyboard Control virtual key is obtained；

Build gesture input method：According to obtained key letter virtual key and the existing input method of Keyboard Control virtual key call operation system.

The present invention further technical scheme be：In video-processing steps, the total data of preparatory stage of each class including gesture, key frame stage and Restoration stage that clustering processing is obtained is carried out to the gesture data picture frame in buffer model, the cluster centre of each class is the gesture key frame of each gesture.

The present invention further technical scheme be：In video-processing steps, threshold parameter is set, the frame by comparison interval data frame is poor, if frame difference removes the gesture data frame not less than threshold parameter in buffer model；If frame difference exceedes threshold parameter, it is defined as drawing class border, and choose the gesture data frame of such Median Position as such gesture key frame.

The present invention further technical scheme be：In video-processing steps, the gesture data picture frame of collection is sequentially stored in buffer model, when buffer model completely starts to carry out clustering processing to gesture data frame.

The present invention further technical scheme be：In video-processing steps, while gesture data frame clustering processing, constantly enter buffer model from the new gesture data frame of seizure, when the gesture data frame clustering processing speed in buffer model is less than gesture data frame capture velocity, when buffer model data are full, wait gesture data frame to be clustered processing, and point out user the size for increasing configuration file；When buffer model gesture data frame clustering processing speed is more than gesture data frame capture velocity, is there is the buffer model data empty time, waiting gesture data frame to be captured that gesture data frame is put into buffer model, and point out user to reduce the size in configuration file.

The present invention further technical scheme be：The gesture feature parameter includes gesture area feature, Hu invariant moment features and Fourier description.

The present invention further technical scheme be：In gesture analysis step, gesture contours extract is carried out using Laplce's Boundary extracting algorithm.

The technical scheme is that：Build a kind of gesture input method constructing system based on computer vision, including gather the gesture collecting unit of gesture vision signal, the video processing unit handled the gesture vision signal of collection, the gesture analysis unit for carrying out gesture analysis and obtaining gesture feature vector, the gesture identification unit for carrying out according to gesture feature vector gesture identification, structure gesture input method construction unit, the gesture collecting unit catches gesture vision signal, obtains gesture data picture frame；The video processing unit uses the asynchronous clustering processing model based on interval gesture data frame buffering, the asynchronous clustering processing model includes buffer model, the gesture data picture frame of collection is sequentially stored in buffer model, clustering processing is carried out to the gesture data picture frame in buffer model, gesture keyframe sequence is obtained；After the gesture analysis unit is to gesture key frame images binaryzation and smothing filtering denoising, carry out gesture contours extract, obtain crucial gesture outline data, according to gesture key frame images binary map and the gesture contours extract gesture feature parameter of proposition, the gesture feature parameter composition gesture feature vector；The gesture identification unit carries out gesture identification according to gesture feature vector, obtains corresponding with identification gesture key letter virtual key or Keyboard Control virtual key；The construction unit is according to obtained key letter virtual key and the existing input method of Keyboard Control virtual key call operation system.

The present invention further technical scheme be：The video processing unit includes carrying out the gesture data picture frame in buffer model the clustering processing module of clustering processing, the clustering processing module reads gesture frame data using the clustering algorithm based on time series from buffer model, and gesture key frame is obtained by being spaced gesture frame frame difference.

The present invention further technical scheme be：The gesture identification unit carries out gesture identification according to gesture feature vector, obtains and the identification corresponding 26 English letter of keyboard virtual keys of gesture and 4 Keyboard Control virtual keys.

The solution have the advantages that：Build a kind of gesture input method construction method based on Video clustering processing and system.Gesture frame is clustered using asynchronous buffer model, crucial gesture frame is obtained, only by the follow-up gesture analysis of crucial gesture frame progress and gesture identification, relative to conventional analysis and the system of each frame gesture of identification, system response real-time is effectively increased.In addition, the present invention is by the connected applications of gesture identification and input method, it is the innovation in man-machine interaction history.

Brief description of the drawings

Fig. 1 is flow chart of the invention.

Fig. 2 is structural representation of the invention.

Fig. 3 is video processing architecture schematic diagram of the present invention.

Fig. 4 is gesture analysis structural representation of the present invention.

Fig. 5 is gesture identification of the present invention and builds input method structural representation.

Embodiment

With reference to specific embodiment, technical solution of the present invention is further illustrated.

As shown in figure 1, the embodiment of the present invention is：The technical scheme is that：A kind of gesture input method construction method based on computer vision is provided, step is as follows

Step 100：Gesture is gathered, i.e.,：Gesture vision signal is caught, gesture data picture frame is obtained.

Specific implementation process is as follows：One gesture word is a limited basic gesture sequence, and each basic gesture includes the following three stage：Preparatory stage, i.e., hand is moved on to the motion stage of crucial hand gesture location from some initial rest position；Crucial gesture stage, i.e. gesture motion stage；Restoration stage, i.e. hand return to the motion stage of initial position.In the specific embodiment of the invention, gesture vision signal is caught using camera, gesture data picture frame is obtained by processing.

Step 200：Video processing, i.e.,：Using the asynchronous clustering processing model based on interval gesture data frame buffering, the asynchronous clustering processing model includes buffer model, the gesture data picture frame of collection is sequentially stored in buffer model, clustering processing is carried out to the gesture data picture frame in buffer model, gesture keyframe sequence is obtained.

As shown in Fig. 2 specific implementation process is as follows：In video processing procedure is carried out, preparatory stage and the useless data frame of Restoration stage are removed as much as possible, identification is trained with the data frame in crucial gesture stage, can so greatly improve training and recognition speed.Video flow processing of the present invention is i.e. according to this principle, using the asynchronous clustering processing model based on interval gesture data frame buffering, the caching of images of gestures frame is captured to camera, compare and cluster to realize, data in poly- each class out are exactly the total data of the preparatory stage of a basic gesture, key frame stage and Restoration stage, therefore each basic hand-type action of gesture word can be represented with each class cluster centre.

Specifically, Video processing of the present invention includes following process：

First, the use of asynchronous clustering processing model.

As shown in figure 3, specific implementation process is as follows：

Buffer model：Essence is a data acquisition buffer model.System according to configuration file, the memory headroom of system distribution gesture data frame sign as buffer model, camera by real-time capture to image data frame be sequentially stored in buffer model, when buffer model completely starts to carry out clustering processing to gesture data frame.

Asynchronous process：While gesture data frame clustering processing, constantly catch new data frame from camera and enter buffer model.When buffer model gesture data frame clustering processing speed is less than data frame capture velocity, in fact it could happen that the full situation of buffer model data, now obstruction catches data frame process, waits gesture data frame to be clustered processing, and point out user the size for increasing configuration file；When buffer model gesture data frame clustering processing speed is more than data frame capture velocity, it is likely to occur the empty situation of buffer model gesture data frame, now obstruction is spaced frame clustering processing process, gesture data frame is put into buffering area etc. gesture data frame process to be captured, and points out user the size for reducing configuration file.

Gesture interval frame cluster based on time series：Clustering algorithm based on time series reads data from buffer model, comparison interval data frame, according to threshold parameter in comparative result and configuration file, if frame difference is not less than threshold parameter, then remove identical data frame in buffer model, if frame difference exceedes threshold parameter, be defined as drawing class border, and then the data frame for choosing such Median Position removes whole class as such critical data frame from buffer model.

2nd, the poor computational methods of images of gestures interval data frame

The system calculates frame poor according to images of gestures data frame feature using the computational methods based on color histogram.Circular is as follows：

(1) the related frame difference method of histogram：CORREL_D parameters in correspondence configuration file

Fd (H_{1}, H_{2}) = \frac{\underset{i}{Σ} ({H_{1}}^{'} (i) \cdot {H_{2}}^{'} (i))}{\sqrt{\underset{i}{Σ} ({H_{1}}^{'} {(i)}^{2}) \cdot \underset{i}{Σ} ({H_{2}}^{'} {(i)}^{2})}},

Wherein,

N is the sum of a frame pixel, Fd (H₁, H₂) represent be two images of gestures compared color histogram difference, H₁For the color histogram calculated value for first images of gestures to be compared, H₂For the color histogram calculated value for second images of gestures to be compared, i, j, k is intermediate variable.

(2)χ²Examine difference of histograms method：

Fd (H_{1}, H_{2}) = \underset{i}{Σ} \frac{H_{1} (i) - H_{2} (i)}{H_{1} (i) + H_{2} (i)}

Wherein, Fd (H₁, H₂) represent be two images of gestures compared color histogram difference, H₁For the color histogram calculated value for first images of gestures to be compared, H₂For the color histogram calculated value for second images of gestures to be compared, i is intermediate variable.

(3) the minimum frame difference method of histogram：

Fd (H_{1}, H_{2}) = \underset{i}{Σ} \min (H_{1} (i), H_{2} (i))

Wherein, min (H₁(i), H₂(i)) represent to take H₁(i), H₂(i) minimum value between, Fd (H₁, H₂) represent be two images of gestures compared color histogram difference, H₁For the color histogram calculated value for first images of gestures to be compared, H₂For the color histogram calculated value for second images of gestures to be compared, i is intermediate variable.

(4) histogram interframe Bhattacharyya distance method：BHATTACHARYYA_D in correspondence configuration file, the method is only applicable to normalization histogram.

System using the logical AND of 4 kinds of comparative approach results in above color histogram computational methods by, as the comparison parameter of images of gestures interval frame, realizing Clustering Effect.

3rd, the interval frame cluster based on time series

Cluster configuration parameter as follows：

BUFFER_SIZE fields are system buffer model size, are accurate to integer-bit；

CORREL_D fields are that color histogram correlation frame difference method compares threshold value, are accurate to 2 significant digits；

CHISQR_D fields are χ²Examine color histogram frame difference method to compare threshold value, be accurate to 2 significant digits；

INTERSECT_D fields are that the minimum frame difference method of color histogram compares threshold value, are accurate to 2 significant digits；

BHATTACHARYYA_D fields are that color histogram interframe Bhattacharyya distance method compares threshold value, are accurate to 2 significant digits；

INTERVALFRAMES_VALUE fields are that buffer model data frame compares hop count, are accurate to integer-bit.Arthmetic statement is as follows：

(1) initialize.System allocated size for (BUFFER_SIZE*SIZEOF (gesture data frame)) memory headroom as buffer model, camera real-time capture to image data frame to be stored in element in buffer model U, U in chronological order be X₀, X₁, X₂..., X_{BUFFER_SIZE-1}；

(2) frame is compared poor.Order calculates X in U_i(i-th of element, 0≤i ＜ BUFFER_SIZE) and X_{i+INTERVALFRAMES_VALUE}((i+INTERVALFRAMES_VALUE) individual element) data frame difference d_{I, i+INTERVALFRAMES_VALUE}, cluster boundary is defined as if following condition is very, same category is otherwise defined as；

(3)(d_{I, i+INTERVALFRAMES_VALUE}≥CORREL_D) &&

(d_{I, i+INTERVALFRAMES_VALUE}≥CHISQR_D) &&

(d_{I, i+INTERVALFRAMES_VALUE}≥INTERSECT_D) &&

(d_{I, i+INTERVALFRAMES_VALUE}≥BHATTACHARYYA_D)

；

(4) cluster centre.Class set T (M), SIZEOF (T (M))=k, (TM) are obtained after a certain moment T, U cluster_j(0≤j ＜ k), chooses (TM)_jElementFor class center (For class (TM)_jMedian Position element), and as class (TM)_jCritical data frame, it is right

Carry out after gesture analysis and the processing of gesture identification process, class (TM) is deleted from U_jAll data frames；

(5) asynchronous process.While data clusters are read from U obtains gesture key frame, while writing gesture video stream data into U；

(6) deadlock is handled.If data are sky in U, process blocking is clustered, untill there are data in U；If data are full in model, data process obstruction is caught, and reminds user suitably to adjust BUFFERSIZE parameters in configuration file, to increase U size；

(7) algorithm terminates.System receives seizure gesture video flowing and ceased and desisted order, and stops writing data into U, and element in U is disposed, and terminates.

Step 300：Gesture analysis, i.e.,：After gesture key frame images binaryzation and smothing filtering denoising, carry out gesture contours extract, crucial gesture outline data is obtained, according to gesture key frame images binary map and the gesture contours extract gesture feature parameter of proposition, the gesture feature parameter composition gesture feature vector.

As shown in figure 4, specific implementation process is as follows：The input data of image pre-processing module is gesture key frame, and output data is gesture frame outline data, and data processing step includes images of gestures binaryzation, smothing filtering denoising gesture area and the part of gesture contours extract three.Then according to gesture key frame images binary map and the gesture contours extract gesture feature parameter of proposition, the gesture feature parameter composition gesture feature vector

First, images of gestures binaryzation

The system carries out binary conversion treatment according to the Skin Color Information of human body to images of gestures, obtains the gesture area of people.Specific algorithm is that each pixel of gained image after cluster segmentation is proceeded as follows：

(1) Hue the and Satisfaction values of the pixel are calculated by rgb value；

(2) judge Hue the and Satisfaction values of the pixel whether in the colour of skin of the people is interval；

(3) if the pixel is changed into black if；

(4) pixel is otherwise changed into white；

Wherein：Hue and Satisfaction data types are single precision, without unit.

By binary conversion treatment, gesture area binary map is obtained.

2nd, smothing filtering denoising gesture area

Gesture binary map after binaryzation is primarily present salt-pepper noise, is mainly shown as there is chequered with black and white bright dim spot in image.The system carries out denoising using medium filtering and linear smoothing filtering to gesture area binary map, obtains clearly gesture area.

3rd, gesture contours extract

The system carries out gesture contours extract using Laplce's Boundary extracting algorithm.Laplace operator is a kind of second dervative scalar operator that computing is carried out to two-dimensional function, more sensitive to the noise in image.When being handled, gesture edge can produce a precipitous zero crossing, for noiseless and with brink image, gesture contours extract can be come out by Laplace operator.

4th, gesture feature parameter extraction technology

The system carries out gesture model characteristic parameter extraction to the gesture binary map and gesture profile that are obtained after pretreatment.Totally three groups of the gesture feature parameter to be extracted, respectively gesture area feature, Hu invariant moment features and Fourier description, three groups of characteristic parameters collectively constitute system features vector.

Gesture area feature includes two sections ratio, the hand region area represented with number of pixels, the gesture profile girth that gesture area area splits gesture boundary rectangle with gesture rectangular area ratio, gesture area length and width ratio, gesture binary map center of gravity.

Hu invariant moment features

Moment of the orign or central moment are directly used as the feature of image, it is impossible to ensure feature while having translation, Invariant to rotation and scale.If in fact, only representing the feature of image with central moment, feature only has translation invariance；If using central moment is normalized, feature not only has translation invariance, but also with constant rate, but without rotational invariance.M.K.Hu proposes not bending moment concept, provide the definition of continuous function square and the fundamental property on square, demonstrate the properties such as the translation invariance about square, rotational invariance and constant rate, and show in particular the expression formula of seven with translation invariance, rotational invariance and constant rate not bending moments.Seven not bending moment be made up of the linear combination of second order and third central moment, expression formula such as formula：

φ₁=μ₂₀+μ₀₂

φ₂=(μ₂₀-μ₀₂)²+(2μ₁₁)²

φ₃=(μ₃₀-3μ₁₂)²+(3μ₂₁-μ₀₃)²

φ₄=(μ₃₀+μ₁₂)²+(μ₂₁+μ₀₃)²

φ₅=(μ₃₀-3μ₁₂)(μ₃₀+μ₁₂)[(μ₃₀+μ₁₂)²-3(μ₂₁+μ₀₃)²]+(3μ₂₁-μ₀₃)(μ₂₁+μ₀₃)[3(μ₃₀+μ₁₂)²-(μ₂₁+μ₀₃)²]

φ₆=(μ₂₀-μ₀₂)[(μ₃₀+μ₁₂)²-(μ₂₁+μ₀₃)²]+4μ₁₁(μ₃₀+μ₁₂)(μ₂₁+μ₀₃)

φ₇=(3 μ₂₁-μ₀₃)(μ₃₀+μ₁₂)[(μ₃₀+μ₁₂)²-3(μ₂₁+μ₀₃)²]-(μ₃₀-3μ₁₂)(μ₂₁+μ₀₃)[3(μ₃₀+μ₁₂)²-(μ₂₁+μ₀₃)²]

Normalized central moment

WhereinP+q=2,3 ... ..

Wherein：φ₁, φ₂, φ₃, φ₄, φ₅, φ₆, φ₇For constant matrix.

Fourier describes subcharacter extraction

It is as follows that Fourier describes subcharacter extraction flow：

1) Fourier transformation obtains the Fourier coefficient of gesture contour curve point sequence；

2) Fourier inversion obtains the expression coefficient of gesture contour curve point sequence in a frequency domain；

3) noise of gesture contour curve point sequence in a frequency domain is removed, in order to smoothed curve；

4) a gesture contour curve is obtained to going the gesture contour curve point sequence frequency domain representation after noise to carry out inverse Fourier transform again, takes 12 coefficients for representing this curve, be used as gesture feature.

Fourier descriptors (Fourier Deseriptor, abbreviation FD) it is commonly used to represent the shape facility of single closed curve, its basic thought is that objective contour curve is modeled as into one-dimensional sequence, one-dimensional Fourier transform is carried out to the sequence, so as to obtain a series of fourier coefficient, the objective contour is described with these coefficients.

The advantage of Fourier descriptors method is mainly reflected in Computing Principle simply, and description is clear, with by coarse and fine characteristic.Computing Principle can simply cause feature extraction more to stablize, and a large amount of control parameters need not be set to be obtained with result during calculating, the uniformity of calculating is good.

The basic thought of Fourier descriptors assumes that body form is a closed curve, and all point sequences are along along boundary curve：{ x (l), y (l)：L=0,1 ..., n-1 }, it is expressed as with plural form：

P (l)=x (l)+j*y (l) (l=0,1 ..., n-1,

)

So, border can just be represented in the one-dimensional space.

The discrete Fourier coefficient of one-dimensional sequence is represented with set Z, is defined as follows：

Z = z (k) = \frac{1}{n} Σ_{1 = 0}^{n - 1} p (l) \exp (- j \frac{2 πlk}{n})

(k=0,1 ... .n-1)

Z is p Fourier transform, is the expression of point sequence in a frequency domain.Its inverse fourier transform coefficient is represented with set P, is defined as follows：

P = p (l) = Σ_{k = 0}^{n - 1} z (k) \exp (j \frac{2 πlk}{n})

(l=0,1 ... .n-1)

Utilize the property of fourier coefficient

(

Z (k) conjugate complex number), in coefficient sets Z, eliminate from

Radio-frequency component in the range of to n-K-1.Inverse fourier transform is carried out again, will obtain the curve approximate with virgin curve, but Mutational part will be smoothened in virgin curve, and this curve of approximation is referred to as the K curve of approximation of virgin curve.Fourier coefficient subset { z (k) under this meaning:K≤K } it is referred to as Fourier descriptors (Fourier Descriptor).Because fourier coefficient has the characteristic that energy is concentrated to low frequency, therefore the purpose for distinguishing different shape border can be just reached with less coefficient.

Fourier descriptors are relevant with the yardstick of shape, direction and curve initial point position.In order to recognize have rotation, translation and scale invariance shape, it is necessary to which Fourier descriptors are normalized.According to Fourier transform property, body form border initial point position is translated into a length, r times of object amplification, anglec of rotation duty and translational displacement (x₀, y₀) after, the Fourier transform coefficient collection of new shape shares Z ' expressions, is defined as follows：

Wherein k=0,1 ... .n-1, x ' (l)+iy ' (l)=x (l+a)+iy (l+a)

As can be seen from the above equation, when describing shape with fourier coefficient, coefficient amplitude ‖ z (k) ‖, k=0,1 ... .n-1, with rotational invariance and translation invariance (z (0) does not have translation invariance), and it is unrelated with the selection of the origin of curve.When object translation, only change its z (0) component F (x₀+iy₀) value.Amplitude ‖ z (k) ‖ divided by ‖ z (1) ‖ of each coefficient (except z (0)), then

Do not change with change of scale, so

K=1,2 ... .., n-1, ‖ ‖ represent modulus.

There is rotation simultaneously, translation and scale invariance, and unrelated with the selection of origin of curve position, we are using it as Fourier descriptors, therefore normalized Fourier descriptors d (k) is defined as：

d (k) = \frac{| | z (k) | |}{| | z (l) | |},

K=1,2 ... n-1, ‖ ‖ represent modulus

The system removes preceding 12 coefficients of z (0) outside, obtains the characteristic vector of one 12 dimension, and this feature vector has rotation, translation and scale invariance, put down and unrelated with the selection of origin of curve position.

Step 400：Gesture identification, i.e.,：Gesture identification is carried out according to gesture feature vector, corresponding with identification gesture key letter virtual key or Keyboard Control virtual key is obtained.

As shown in figure 5, specific implementation process is as follows：Identification learning process uses BP neural network as disaggregated model, and combines Bayesian Classification Model progress contrast experiment.The gesture that system can be recognized includes 26 English alphabets and 4 control gestures, and control gesture is respectively Spaee (correspondence and keyboard strip space bar), StartIME (Input Method Editor：Input method switches, correspondence keyboard shift key), End (correspondence keyboard End keys) and Backspace (correspondence and keyboard Backspace keys).

Step 500：Gesture input method is built, i.e.,：According to obtained key letter virtual key and the existing input method of Keyboard Control virtual key call operation system.

As shown in figure 5, specific implementation process is as follows：During gesture input, a mark gesture is set, cutting is carried out to video flowing according to mark gesture.Send respective virtual key to reach the input of control system input method to operating system according to the result obtained from system cutting cluster and Classification and Identification.

As shown in Fig. 2 the embodiment of the present invention is：Build a kind of gesture input method constructing system based on computer vision, including gather the gesture collecting unit 1 of gesture vision signal, the video processing unit 2 handled the gesture vision signal of collection, the gesture analysis unit 3 for carrying out gesture analysis and obtaining gesture feature vector, the gesture identification unit 4 for carrying out according to gesture feature vector gesture identification, structure gesture input method construction unit 5, the gesture collecting unit 1 catches gesture vision signal, obtains gesture data picture frame；The video processing unit 2 is using the asynchronous clustering processing model based on interval gesture data frame buffering, the asynchronous clustering processing model includes buffer model, the gesture data picture frame of collection is sequentially stored in buffer model, clustering processing is carried out to the gesture data picture frame in buffer model, gesture keyframe sequence is obtained；After the gesture analysis unit 3 is to gesture key frame images binaryzation and smothing filtering denoising, carry out gesture contours extract, obtain crucial gesture outline data, according to gesture key frame images binary map and the gesture contours extract gesture feature parameter of proposition, the gesture feature parameter composition gesture feature vector；The gesture identification unit 4 carries out gesture identification according to gesture feature vector, obtains corresponding with identification gesture key letter virtual key or Keyboard Control virtual key；The construction unit 5 is according to obtained key letter virtual key and the existing input method of Keyboard Control virtual key call operation system.

Specific implementation process of the present invention is as follows：The gesture collecting unit 1 catches gesture vision signal using camera, and gesture data picture frame is obtained by processing.

The course of work of video processing unit 2 is as follows：

In video processing procedure is carried out, preparatory stage and the useless data frame of Restoration stage are removed as much as possible, identification is trained with the data frame in crucial gesture stage, can so greatly improve training and recognition speed.Video flow processing of the present invention is i.e. according to this principle, using the asynchronous clustering processing model based on interval gesture data frame buffering, the caching of images of gestures frame is captured to camera, compare and cluster to realize, data in poly- each class out are exactly the total data of the preparatory stage of a basic gesture, key frame stage and Restoration stage, therefore each basic hand-type action of gesture word can be represented with each class cluster centre.

First, the use of asynchronous clustering processing model.

As shown in figure 3, specific implementation process is as follows：

2nd, the poor computational methods of images of gestures interval data frame

Fd (H_{1}, H_{2}) = \frac{\underset{i}{Σ} ({H_{1}}^{'} (i) \cdot {H_{2}}^{'} (i))}{\sqrt{\underset{i}{Σ} ({H_{1}}^{'} {(i)}^{2}) \cdot \underset{i}{Σ} ({H_{2}}^{'} {(i)}^{2})}},

Wherein,

(2)χ²Examine difference of histograms method：

Fd (H_{1}, H_{2}) = \underset{i}{Σ} \frac{H_{1} (i) - H_{2} (i)}{H_{1} (i) + H_{2} (i)}

(3) the minimum frame difference method of histogram：

Fd (H_{1}, H_{2}) = \underset{i}{Σ} \min (H_{1} (i), H_{2} (i))

3rd, the interval frame cluster based on time series

Cluster configuration parameter as follows：

BUFFER_SIZE fields are system buffer model size, are accurate to integer-bit；

INTERVALFRAMES_VALUE fields are that buffer model data frame compares hop count, are accurate to integer-bit.

Arthmetic statement is as follows：

(8) initialize.System allocated size for (BUFFER_SIZE*SIZEOF (gesture data frame)) memory headroom as buffer model, camera real-time capture to image data frame to be stored in element in buffer model U, U in chronological order be X₀, X₁, X₂..., X_{BUFFER_SIZE-1}；

(9) frame is compared poor.Order calculates X in U_i(i-th of element, 0≤i ＜ BUFFER_SIZE) and X_{i+INTERVALFRAMES_VALUE}((i+INTERVALFRAMES_VALUE) individual element) data frame difference d_{I, i+INTERVALFRAMES_VALUE}, cluster boundary is defined as if following condition is very, it is no

Then it is defined as same category；

(10)(d_{I, i+INTERVALFRAMES_VALUE}≥CORREL_D) &&

(d_{I, i+INTERVALFRAMES_VALUE}≥CHISQR_D) &&

(d_{I, i+INTERVALFRAMES_VALUE}≥INTERSECT_D) &&

(d_{I, i+INTERVALFRAMES_VALUE}≥BHATTACHARYYA_D)

；

(11) cluster centre.Class set T (M), SIZEOF (T (M))=k, (TM) are obtained after a certain moment T, U cluster_j(0≤j ＜ k), chooses (TM)_jElement

For class center (

For class (TM)_jMedian Position element), and as class (TM)_jCritical data frame, it is right

(12) asynchronous process.While data clusters are read from U obtains gesture key frame, while writing gesture video stream data into U；

(13) deadlock is handled.If data are sky in U, process blocking is clustered, untill there are data in U；If data are full in model, data process obstruction is caught, and reminds user suitably to adjust BUFFERSIZE parameters in configuration file, to increase U size；

(14) algorithm terminates.System receives seizure gesture video flowing and ceased and desisted order, and stops writing data into U, and element in U is disposed, and terminates.

The course of work of gesture analysis unit 3 is as follows：

First, images of gestures binaryzation

(5) Hue the and Satisfaction values of the pixel are calculated by rgb value；

(6) judge Hue the and Satisfaction values of the pixel whether in the colour of skin of the people is interval；

(7) if the pixel is changed into black if；

(8) pixel is otherwise changed into white；

Wherein：Hue and Satisfaction data types are single precision, without unit.

By binary conversion treatment, gesture area binary map is obtained.

2nd, smothing filtering denoising gesture area

3rd, gesture contours extract

4th, gesture feature parameter extraction technology

The system carries out gesture model characteristic parameter extraction to the gesture binary map and gesture profile that are obtained after pretreatment.Totally three groups of the gesture feature parameter to be extracted, respectively gesture area feature, Hu invariant moment features and Pourier description, three groups of characteristic parameters collectively constitute system features vector.

Hu invariant moment features

φ₁=μ₂₀+μ₀₂

φ₂=(μ₂₀-μ₀₂)²+(2μ₁₁)²

φ₃=(μ₃₀-3μ₁₂)²+(3μ₂₁-μ₀₃)²

φ₄=(μ₃₀+μ₁₂)²+(μ₂₁+μ₀₃)²

Normalized central moment

Wherein

P+q=2,3 ... ..

φ₁, φ₂, φ₃, φ₄, φ₅, φ₆, φ₇For constant matrix, u_ijElement variable in representing matrix.

Fourier describes subcharacter extraction

It is as follows that Fourier describes subcharacter extraction flow：

Fourier transformation obtains the Fourier coefficient of gesture contour curve point sequence；

Fourier inversion obtains the expression coefficient of gesture contour curve point sequence in a frequency domain；

Remove the noise of gesture contour curve point sequence in a frequency domain, in order to smoothed curve；

Again to going the gesture contour curve point sequence frequency domain representation after noise to carry out inverse Fourier transform, a gesture contour curve is obtained, 12 coefficients for representing this curve is taken, is used as gesture feature.

Fourier descriptors (Fourier Descriptor, abbreviation FD) it is commonly used to represent the shape facility of single closed curve, its basic thought is that objective contour curve is modeled as into one-dimensional sequence, one-dimensional Fourier transform is carried out to the sequence, so as to obtain a series of fourier coefficient, the objective contour is described with these coefficients.

P (l)=x (l)+j*y (l) (l=0,1 ..., n-1,

)

So, border can just be represented in the one-dimensional space.

Z = z (k) = \frac{1}{n} Σ_{1 = 0}^{n - 1} p (l) \exp (- j \frac{2 πlk}{n})

(k=0,1 ... .n-1)

P = p (l) = Σ_{k = 0}^{n - 1} z (k) \exp (j \frac{2 πlk}{n})

(l=0,1 ... .n-1)

Utilize the property of fourier coefficient

(

Z (k) conjugate complex number), in coefficient sets Z, eliminate from

Wherein k=0,1 ... .n-1, x ' (l)+iy ' (l)=x (l+a)+iy (l+a)

Do not change with change of scale, so

K=1,2 ... .., n-1, ‖ ‖ represent modulus.There is rotation simultaneously, translation and scale invariance, and unrelated with the selection of origin of curve position, we are using it as Fourier descriptors, therefore normalized Fourier descriptors d (k) is defined as：

d (k) = \frac{| | z (k) | |}{| | z (l) | |},

K=1,2 ... n-1, ‖ ‖ represent modulus

The course of work of recognition unit 4 is as follows：

As shown in figure 5, specific implementation process is as follows：Identification learning process uses BP neural network as disaggregated model, and combines Bayesian Classification Model progress contrast experiment.The gesture that system can be recognized includes 26 English alphabets and 4 control gestures, and control gesture is respectively Space (correspondence and keyboard strip space bar), StartIME (Input Method Editor：Input method switches, correspondence keyboard shift key), End (correspondence keyboard End keys) and Backspace (correspondence and keyboard Backspace keys).

The course of work of construction unit 5 is as follows：

Above content is to combine specific preferred embodiment further description made for the present invention, it is impossible to assert that the specific implementation of the present invention is confined to these explanations.For general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, some simple deduction or replace can also be made, protection scope of the present invention should be all considered as belonging to.

Claims

1. a kind of gesture input method construction method based on computer vision, step is as follows

Gesture analysis：After gesture key frame images binaryzation and smothing filtering denoising, carry out gesture contours extract, crucial gesture outline data is obtained, according to gesture key frame binary map and the gesture contours extract gesture feature parameter of proposition, the gesture feature parameter composition gesture feature vector；

2. the gesture input method construction method based on computer vision according to claim 1, it is characterized in that, in video-processing steps, clustering processing is carried out to the gesture data picture frame in buffer model, obtained each class includes the total data of the preparatory stage, key frame stage and Restoration stage of gesture, and the cluster centre of each class is the gesture key frame of each gesture.

3. the gesture input method construction method based on computer vision according to claim 2, it is characterised in that in video-processing steps, threshold parameter is set, frame by comparison interval data frame is poor, if frame difference removes the gesture data frame not less than threshold parameter in buffer model；If frame difference exceedes threshold parameter, it is defined as drawing class border, and choose the gesture data frame of such Median Position as such gesture key frame.

4. the gesture input method construction method based on computer vision according to claim 1, it is characterized in that, in video-processing steps, the gesture data picture frame of collection is sequentially stored in buffer model, when buffer model completely starts to carry out clustering processing to gesture data frame.

5. the gesture input method construction method based on computer vision according to claim 4, it is characterized in that, in video-processing steps, while gesture data frame clustering processing, constantly enter buffer model from the new gesture data frame of seizure, when the gesture data frame clustering processing speed in buffer model is less than gesture data frame capture velocity, when buffer model data are full, wait gesture data frame to be clustered processing, and point out user the size for increasing configuration file；When buffer model gesture data frame clustering processing speed is more than gesture data frame capture velocity, is there is the buffer model data empty time, waiting gesture data frame to be captured that gesture data frame is put into buffer model, and point out user to reduce the size in configuration file.

6. the gesture input method construction method based on computer vision according to claim 1, it is characterised in that the gesture feature parameter includes gesture area feature, Hu invariant moment features and Fourier description.

7. the gesture input method construction method based on computer vision according to claim 1, it is characterised in that in gesture analysis step, gesture contours extract is carried out using Laplce's Boundary extracting algorithm.

8. a kind of gesture input method constructing system based on computer vision, it is characterized in that, including gather the gesture collecting unit of gesture vision signal, the video processing unit handled the gesture vision signal of collection, the gesture analysis unit for carrying out gesture analysis and obtaining gesture feature vector, the gesture identification unit for carrying out according to gesture feature vector gesture identification, structure gesture input method construction unit, the gesture collecting unit catches gesture vision signal, obtains gesture data picture frame；The video processing unit uses the asynchronous clustering processing model based on interval gesture data frame buffering, the asynchronous clustering processing model includes buffer model, the gesture data picture frame of collection is sequentially stored in buffer model, clustering processing is carried out to the gesture data picture frame in buffer model, gesture keyframe sequence is obtained；After the gesture analysis unit is to gesture key frame images binaryzation and smothing filtering denoising, carry out gesture contours extract, obtain crucial gesture outline data, according to gesture key frame images binary map and the gesture contours extract gesture feature parameter of proposition, the gesture feature parameter composition gesture feature vector；The gesture identification unit carries out gesture identification according to gesture feature vector, obtains corresponding with identification gesture key letter virtual key or Keyboard Control virtual key；The construction unit is according to obtained key letter virtual key and the existing input method of Keyboard Control virtual key call operation system.

9. the gesture input method constructing system based on computer vision according to claim 8, it is characterized in that, the video processing unit includes carrying out the gesture data picture frame in buffer model the clustering processing module of clustering processing, the clustering processing module reads gesture frame data using the clustering algorithm based on time series from buffer model, and gesture key frame is obtained by being spaced gesture frame frame difference.

10. the gesture input method constructing system based on computer vision according to claim 8, it is characterized in that, the gesture identification unit carries out gesture identification according to gesture feature vector, obtains and the identification corresponding 26 English letter of keyboard virtual keys of gesture and 4 Keyboard Control virtual keys.