CN111914808A

CN111914808A - Gesture recognition system realized based on FPGA and recognition method thereof

Info

Publication number: CN111914808A
Application number: CN202010834453.XA
Authority: CN
Inventors: 王俊; 易金; 陈康; 林瑞全; 欧明敏; 邢新华; 武义; 赵显煜; 郑炜; 李振嘉
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2020-11-10
Anticipated expiration: 2040-08-19
Also published as: CN111914808B

Abstract

The invention relates to a gesture recognition system realized based on FPGA and a recognition method thereof, comprising a CMOS camera data acquisition module, an FPGA data processing module, a DDR3 storage module and a VGA display module. The CMOS camera is connected with the FPGA, and the driving of the camera is completed inside the FPGA chip. After video data collected by the camera enters the FPGA chip, the video data is cached in the DDR3 storage module under the action of the data read-write control module, and meanwhile, the video data is read out under the action of the data read-write control module. And the character driving and video overlapping module overlaps the recognized content with the gesture image in a character form in real time according to the recognition results of the static gesture and the dynamic gesture and then sends the overlapped recognized content to the VGA display module for display. Compared with the prior art, the method effectively solves the problems of unstable gesture recognition and instantaneity of gesture recognition in complex environments with insufficient illumination, skin-color-like interference and the like.

Description

Gesture recognition system realized based on FPGA and recognition method thereof

Technical Field

The invention relates to the technical field of image processing and the field of man-machine interaction, in particular to a gesture recognition system and a gesture recognition method based on FPGA.

Background

Gesture recognition technology is one of the hot spots in the field of human-computer interaction. With the support of gesture recognition technology, interactive devices such as computers and robots can be controlled more naturally and effectively.

Gesture recognition can be divided into two types: static gesture recognition and dynamic gesture recognition. Most existing solution designs can only be used for static gestures or dynamic gestures. In gesture recognition, there are very few methods that can simultaneously handle static gestures and dynamic gestures. The algorithm provided by the invention can freely switch between static gesture recognition and dynamic gesture recognition while solving the problem of gesture recognition in a complex environment, thereby greatly improving the efficiency of a gesture recognition system.

Complex environments such as insufficient illumination and similar skin color interference have great influence on gesture recognition accuracy, and an application system based on computer software has the defects of large volume, high power consumption, high cost, poor instantaneity and the like. The FPGA is used as a programmable logic device, has the characteristics of parallel processing, pipeline calculation and the like, contains rich configurable logic resources and an IP core which can be called, and is an ideal device for digital image processing.

Disclosure of Invention

In view of the above, the present invention provides a gesture recognition system implemented based on an FPGA and a recognition method thereof, which simultaneously implement static gesture recognition and dynamic gesture recognition by using a HOG feature and an SVM classifier on an FPGA platform, provide an effective man-machine interaction means, and effectively solve the problems of instability of gesture recognition and instantaneity of gesture recognition in complex environments with insufficient illumination and skin-color-like interference.

The invention is realized by adopting the following scheme: a gesture recognition system realized based on FPGA comprises a CMOS camera data acquisition module, an FPGA data processing module, a DDR3 storage module and a VGA display module; the FPGA data processing module comprises a camera driving module, a data read-write control module, a color space transformation module, a median filtering module, a histogram equalization module, a gesture segmentation module, a static and dynamic gesture judgment module, a feature extraction module, a classification identification module, a dynamic gesture track identification module, a character driving and video superposition module and a VGA driving module; the camera driving module drives the CMOS camera data acquisition module to acquire video data, and transmits the video data back to the data read-write control module, so that the data read-write control module controls the read and write of the acquired video data in the DDR3 storage module; the DDR3 storage module is used for storing video data acquired by the CMOS camera data acquisition module, transmitting the stored video data to the color space conversion module for data format conversion, and filtering through the median filtering module to reduce noise of the video data; the histogram equalization module is used for performing equalization processing on the video data filtered by the median filtering module so as to improve the contrast of the video data; the gesture segmentation module is used for segmenting a gesture area in the video data after equalization processing; the static and dynamic gesture judgment module comprises a dynamic gesture processing module and a static gesture processing module and is used for judging whether the gesture of the current frame is a static gesture or a dynamic gesture, when the specific gesture of the palm is detected, the gesture is considered to be a dynamic gesture, and the video data stream enters the dynamic gesture processing module; when the specific gesture of the palm is not detected, the gesture is considered to be a static gesture, and the video data stream enters a static gesture processing module; the feature extraction module is used for extracting HOG features of the current static gesture area; the classification recognition module uses an SVM classifier to classify and recognize the HOG characteristics of the current gesture area; the dynamic gesture track recognition module tracks the current dynamic gesture and recognizes the current dynamic gesture; the character driving and video overlapping module is used for driving the result of the static gesture recognition or the result of the dynamic gesture recognition into corresponding characters and overlapping the corresponding characters with the original gesture video; the VGA driving module is used for driving the VGA chip and displaying the superposed video in real time.

The invention provides a recognition method of a gesture recognition system based on FPGA, which comprises the following steps:

step S1: the CMOS camera data acquisition module acquires video data to the FPGA data processing module in real time;

step S2: the data read-write control module controls the read and write of the collected video data in the DDR3 storage module, and the DDR3 storage module stores the collected video data;

step S3: the color space transformation module converts the acquired RGB format data into YCbCr format data;

step S4: the median filtering module adopts a 3 multiplied by 3 template and uses a rapid median filtering algorithm to filter the data converted by the color space conversion module;

step S5: the filtered data is instantiated into two random memory blocks by a histogram equalization module, a Y component in YCbCr data after format data conversion is firstly subjected to histogram statistics to obtain a histogram, the histogram is cut and redistributed according to a cutting threshold value, then equalization operation is carried out on the histogram, the threshold value range is 0-100, and the equalization formula is as follows:

MN is the total number of image pixels, n_jThe number of pixels with the gray level j is, and the value range of j is 0 to k; l is the number of image gray levels, and by the above formula, the gray value of the pixel in the image after histogram equalization can be obtained by mapping the gray value r (k) of the pixel in the image after median filtering to s (k);

step S6: the gesture segmentation module is used for carrying out segmentation on the YCbCr color gamut space based on a skin color threshold value and eliminating the influence of similar skin colors by adopting a four-connected domain marking algorithm; determining the position of a gesture centroid through a centroid calculation formula, wherein the position of the gesture centroid represents the position of a center, and determining a minimum rectangular frame of a gesture area through the position of the gesture center;

step S7: judging whether the gesture is a static gesture or a dynamic gesture, starting timing when the static and dynamic gesture judging module detects that the gesture of the current frame is a palm, judging that the gesture is a dynamic gesture if the timing time reaches the set time K seconds and 0< K < 3, and then executing step S8; otherwise, if the gesture is determined to be a static gesture, HOG feature extraction and SVM classification recognition of the gesture area are performed, and then step S8 is executed;

step S8: the names of various gestures in the character driving and video overlapping module can generate corresponding character image data, SVM classification recognition results and dynamic gesture track recognition results are input into the character driving and video overlapping module, the image data are driven by the SVM classification recognition results and the dynamic gesture track recognition results, the driven character image data and the recognized gesture area image data are overlapped, and finally the character image data and the recognized gesture area image data are displayed on the VGA display module at the same time.

Further, the conversion formula for converting the acquired RGB format data into the YCbCr format data in step S3 is:

Y＝0.299R+0.587G+0.114B

Cb＝-0.172R-0.339G+0.511B+128

Cr＝0.511R-0.428G-0.083B+128

this formula is further translated into:

further, the specific content of the filtering process in step S4 is: firstly, respectively sequencing 3 images in each row; then extracting the minimum value of the three maximum values, the median value of the three median values and the maximum value of the three minimum values; and finally, taking the median of the three values obtained above again, namely obtaining the median of the finally obtained 9 pixels.

Further, the step S5 specifically includes the following steps:

step S51: and (3) performing threshold segmentation based on skin color in the YCbCr color gamut space, wherein the threshold segmentation formula of the skin color is as follows:

step S52: eliminating the influence of similar skin color by adopting a four-connected domain marking algorithm;

step S53: marking the gesture area with a rectangular frame; determining the position of a gesture centroid through a centroid calculation formula, wherein the position of the gesture centroid represents the position of a center, and determining a minimum rectangular frame of a gesture area through the position of the gesture center; the centroid calculation formula is as follows:

M₀₀the centroid weight of the gesture area in the whole frame image, M₀₁Is the weight of the centroid of the gesture area in the vertical direction, M₁₀Is the weight of the centroid of the gesture area in the horizontal direction, X_cIs the abscissa, Y, of the centroid of the gesture area_cThe vertical coordinate of the centroid of the gesture area, the pixel value of each pixel point of the f (x, y) gesture area, y is the vertical coordinate of the pixel point of the gesture area, and x is the horizontal coordinate of the pixel point of the gesture area.

Further, the step S52 specifically includes the following steps:

step S521: judging whether the leftmost point and the uppermost point in the four neighborhoods of the current input pixel point are marked or not, if the leftmost point in the four neighborhoods of the current input pixel point is marked and the uppermost point is not marked, marking the point as the same area as the leftmost point; if the leftmost point is not marked and the uppermost point is marked, marking the point as the uppermost point and the same area; if the leftmost point and the uppermost point in the four neighborhoods of the current input pixel point are marked simultaneously, marking the point as a mark of a point with a small pixel value compared with the pixel values of the leftmost point and the uppermost point in the four neighborhoods of the current input pixel point in the two regions; if the leftmost point and the uppermost point are not marked at the same time, marking the point as a new area;

step S522: and calculating the area of each marked connected region, namely summing the number of pixel points of each connected region, wherein the connected region with the largest area is the gesture region to be segmented.

Further, the specific content of performing the HOG feature extraction of the gesture area in step S7 is: using a Shift _ RAM Shift register to perform two-line Shift storage on the static gesture image, forming a 3 x 3 pixel array with the input data of the current line, and calculating the gradient amplitude and the gradient direction of an input pixel point (x, y) by using the pixel array;

gradient G in horizontal direction_x(x, y) gradient with vertical G_y(x, y) the formula is as follows:

G_x(x,y)＝H(x+1,y)-H(x-1,y)

G_y(x,y)＝H(x,y+1)-H(x,y-1)

the gradient amplitude G (x, y) and the gradient direction theta (x, y) of the pixel point (x, y) are calculated according to the following formula:

dividing cells and blocks: the direction value range of the pixel points is 0-180 degrees, the pixel points are divided into 9 bin intervals in total, and each 20 degrees is one interval; forming a cell by every 8 × 8 pixels, forming a block by every 2 × 2 cells, and forming an image by every 15 × 7 blocks; one cell contains 9-dimensional feature vectors, one block contains 36-dimensional feature vectors, and one image contains 3780-dimensional feature vectors, namely the HOG feature vectors extracted from the gesture regions.

Further, the specific content of the SVM classification and identification in step S7 is as follows: the classification recognition module performs classification recognition by using a linear SVM classifier, and realizes static gesture multi-classification by using one-to-many methods (OVO SVMs); defining five types of static gestures, wherein the mark of the gesture 1 is A, the mark of the gesture 2 is B, the mark of the gesture 3 is C, the mark of the gesture fist is D and the mark of the gesture palm is E; for the A, B, C, D, E five types of gestures, respectively selecting feature vectors corresponding to (A, B), (A, C), (A, D), (A, E), (B, C), (B, D), (B, E), (C, D), (C, E), (D, E) as a training set during training to obtain ten groups of training results; during testing, ten groups of training results are respectively used for testing, and the ten groups of classification results are voted to obtain the final classification result.

Further, the voting process is as follows:

step 1: a ═ B ═ C ═ D ═ E ═ 0; // ticket number initialization

Step 2: (a, B) -classifier ifAwin, then a ═ a + 1; else B ═ B + 1;

and step 3: (a, C) -classifier ifAwin, then a ═ a + 1; else C ═ C + 1;

and 4, step 4: (a, D) -classifier ifAwin, then a ═ a + 1; else D ═ D + 1;

and 5: (a, E) -classifier ifAwin, then a ═ a + 1; else E ═ E + 1;

step 6: (B, C) -classifier ifAwin, then B ═ B + 1; else C ═ C + 1;

and 7: (B, D) -classifier ifAwin, then B ═ B + 1; else D ═ D + 1;

and 8: (B, E) -classifier ifAwin, then B ═ B + 1; else E ═ E + 1;

and step 9: (C, D) -classifier ifAwin, then C ═ C + 1; else D ═ D + 1;

step 10: (C, E) -classifier ifAwin, then C ═ C + 1; else E ═ E + 1;

step 11: (D, E) -classifier ifD win, then D ═ D + 1; else E ═ E + 1;

step 12: the results is The max (A, B, C, D, E).

Further, the specific content of performing the dynamic gesture trajectory recognition in step S7 is: when the system enters a dynamic gesture recognition module, the gesture of the palm is tracked, and the palm is dynamically operated once from the initial position to the final position; and only when the pause time of the palm at the starting position and the ending position exceeds a system set threshold time N seconds and the distance between the starting position and the ending position exceeds a system set threshold distance M, 0< N <5, 20< M <100, and the current dynamic operation can be regarded as an effective operation.

Compared with the prior art, the invention has the following beneficial effects:

the method provided by the invention can effectively detect and identify the gesture in complex environments with insufficient illumination, similar skin color interference and the like. The gesture recognition problem under the complex environment is solved, meanwhile, the static gesture recognition and the dynamic gesture recognition can be freely switched, and the efficiency of a gesture recognition system is greatly improved. The gesture recognition system based on the FPGA is realized by utilizing the characteristics of parallel processing and pipeline computing of an FPGA hardware platform, rich configurable logic resources, a callable IP core and the like, and experimental results show that the method has a good recognition effect.

Drawings

Fig. 1 is a system configuration diagram of an embodiment of the present invention.

Fig. 2 is a flowchart of an identification method according to an embodiment of the present invention.

Fig. 3 is a hardware architecture diagram of an HOG feature extraction module according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the present embodiment provides a gesture recognition system implemented based on an FPGA, which includes a CMOS camera data acquisition module, an FPGA data processing module, a DDR3 storage module, and a VGA display module; the FPGA data processing module comprises a camera driving module, a data read-write control module, a color space transformation module, a median filtering module, a histogram equalization module, a gesture segmentation module, a static and dynamic gesture judgment module, a feature extraction module, a classification identification module, a dynamic gesture track identification module, a character driving and video superposition module and a VGA driving module; the camera driving module drives the CMOS camera data acquisition module to acquire video data, and transmits the video data back to the data read-write control module, so that the data read-write control module controls the read and write of the acquired video data in the DDR3 storage module; the DDR3 storage module is used for storing video data acquired by the CMOS camera data acquisition module, transmitting the stored video data to the color space conversion module for data format conversion, and filtering through the median filtering module to reduce noise of the video data; the histogram equalization module is used for performing equalization processing on the video data filtered by the median filtering module so as to improve the contrast of the video data; the gesture segmentation module is used for segmenting a gesture area in the video data after equalization processing; the static and dynamic gesture judgment module comprises a dynamic gesture processing module and a static gesture processing module and is used for judging whether the gesture of the current frame is a static gesture or a dynamic gesture, when the specific gesture of the palm is detected, the gesture is considered to be a dynamic gesture, and the video data stream enters the dynamic gesture processing module; when the specific gesture of the palm is not detected, the gesture is considered to be a static gesture, and the video data stream enters a static gesture processing module; the feature extraction module is used for extracting HOG features of the current static gesture area; the classification recognition module uses an SVM classifier to classify and recognize the HOG characteristics of the current gesture area; the dynamic gesture track recognition module tracks the current dynamic gesture and recognizes the current dynamic gesture; the character driving and video overlapping module is used for driving the result of the static gesture recognition or the result of the dynamic gesture recognition into corresponding characters and overlapping the corresponding characters with the original gesture video; the VGA driving module is used for driving the VGA chip and displaying the superposed video in real time.

Preferably, as shown in fig. 2, the embodiment provides a recognition method of a gesture recognition system implemented based on an FPGA, including the following steps:

MN is the total number of image pixels, n_jThe number of pixels with the gray level j is, and the value range of j is 0 to k; l is the number of image grey levels, (e.g. 256 for an 8bit image).

By the above formula, the gray value of the pixel in the image after histogram equalization can be obtained by mapping the gray value r (k) of the pixel in the image after median filtering to s (k);

the static and dynamic gesture judgment module can be divided into a static gesture processing module and a dynamic gesture processing module, the static and dynamic gesture judgment module judges the current gesture, and if the current gesture is a static gesture, the video data stream enters the static gesture processing module. If the gesture is a dynamic gesture, the video data stream enters a dynamic gesture processing module.

The character driving and video overlapping module can generate corresponding image data according to the names of various gestures, then drives the data according to the gesture recognition result, overlaps the driven character image data and the recognized gesture area image data, and finally displays the data in a VGA.

In this embodiment, the conversion formula for converting the collected RGB format data into the YCbCr format data in step S3 is:

Y＝0.299R+0.587G+0.114B

Cb＝-0.172R-0.339G+0.511B+128

Cr＝0.511R-0.428G-0.083B+128

this formula is further translated into: to facilitate hardware implementation

In this embodiment, the specific content of the filtering process performed in step S4 is: firstly, respectively sequencing 3 images in each row; then extracting the minimum value of the three maximum values, the median value of the three median values and the maximum value of the three minimum values; and finally, taking the median of the three values obtained above again, namely obtaining the median of the finally obtained 9 pixels.

In this embodiment, the step S5 specifically includes the following steps:

M₀₀is a gesture areaCentroid weight, M, of the entire frame image₀₁Is the weight of the centroid of the gesture area in the vertical direction, M₁₀Is the weight of the centroid of the gesture area in the horizontal direction, X_cIs the abscissa, Y, of the centroid of the gesture area_cThe vertical coordinate of the centroid of the gesture area, the pixel value of each pixel point of the f (x, y) gesture area, y is the vertical coordinate of the pixel point of the gesture area, and x is the horizontal coordinate of the pixel point of the gesture area.

In this embodiment, the step S52 specifically includes the following steps:

In this embodiment, the specific content of performing the HOG feature extraction of the gesture area in step S7 is: using a Shift _ RAM Shift register to perform two-line Shift storage on the static gesture image, forming a 3 x 3 pixel array with the input data of the current line, and calculating the gradient amplitude and the gradient direction of an input pixel point (x, y) by using the pixel array;

G_x(x,y)＝H(x+1,y)-H(x-1,y)

G_y(x,y)＝H(x,y+1)-H(x,y-1)

In this embodiment, the specific content of the SVM classification and identification in step S7 is as follows: the classification recognition module performs classification recognition by using a linear SVM classifier, and realizes static gesture multi-classification by using one-to-many methods (OVO SVMs); defining five types of static gestures, wherein the mark of the gesture 1 is A, the mark of the gesture 2 is B, the mark of the gesture 3 is C, the mark of the gesture fist is D and the mark of the gesture palm is E;

the gesture 1 is a gesture when the human hand stretches out one finger, the gesture 2 is a gesture when the human hand stretches out two fingers, and the gesture 3 is a gesture when the human hand stretches out three fingers. The fist is the gesture when the human hand stretches out the fist, and the palm is the gesture when the human hand stretches out the palm.

For the A, B, C, D, E five types of gestures, respectively selecting feature vectors corresponding to (A, B), (A, C), (A, D), (A, E), (B, C), (B, D), (B, E), (C, D), (C, E), (D, E) as a training set during training to obtain ten groups of training results; during testing, ten groups of training results are respectively used for testing, and the ten groups of classification results are voted to obtain the final classification result.

In this embodiment, the voting process is as follows:

step 1: a ═ B ═ C ═ D ═ E ═ 0; // ticket number initialization

Step 2: (a, B) -classifier ifAwin, then a ═ a + 1; else B ═ B + 1;

and step 3: (a, C) -classifier ifAwin, then a ═ a + 1; else C ═ C + 1;

and 4, step 4: (a, D) -classifier ifAwin, then a ═ a + 1; else D ═ D + 1;

and 5: (a, E) -classifier ifAwin, then a ═ a + 1; else E ═ E + 1;

step 6: (B, C) -classifier ifAwin, then B ═ B + 1; else C ═ C + 1;

and 7: (B, D) -classifier ifAwin, then B ═ B + 1; else D ═ D + 1;

and 8: (B, E) -classifier ifAwin, then B ═ B + 1; else E ═ E + 1;

and step 9: (C, D) -classifier ifAwin, then C ═ C + 1; else D ═ D + 1;

step 10: (C, E) -classifier ifAwin, then C ═ C + 1; else E ═ E + 1;

step 11: (D, E) -classifier ifD win, then D ═ D + 1; else E ═ E + 1;

step 12: the results is The max (A, B, C, D, E).

In this embodiment, the specific content of performing the dynamic gesture trajectory recognition in step S7 is: when the system enters a dynamic gesture recognition module, the gesture of the palm is tracked, and the palm is dynamically operated once from the initial position to the final position; and only when the pause time of the palm at the starting position and the ending position exceeds a system set threshold time N seconds and the distance between the starting position and the ending position exceeds a system set threshold distance M, 0< N <5, 20< M <100, and the current dynamic operation can be regarded as an effective operation.

Preferably, in this embodiment, the CMOS camera is connected to the FPGA, and the driving of the camera is completed inside the FPGA chip. After video data collected by the camera enters the FPGA chip, the video data is cached in the DDR3 storage module under the action of the data read-write control module, and meanwhile, the video data is read out under the action of the data read-write control module. And the character driving and video overlapping module overlaps the recognized content with the gesture image in a character form in real time according to the recognition results of the static gesture and the dynamic gesture and then sends the overlapped recognized content to the VGA display module for display.

Specifically, in this embodiment, the CMOS camera data acquisition module acquires video data in real time to the inside of the FPGA chip; the FPGA data processing module carries out image preprocessing, gesture segmentation, static and dynamic gesture recognition, driving of a CMOS camera and a VGA and controlling read and write of DDR3 video data on the collected video data. The DDR3 memory module stores captured video data. And the VGA display module displays the gesture image marked by the gesture recognition result. The data read-write control module controls the read and write of the collected video data in the DDR 3; the DDR3 storage module stores video data acquired by the camera acquisition module; the color space transformation module converts the acquired RGB format data into YCbCr format data; the median filtering module is used for filtering the video data to reduce the noise of the video data; the histogram equalization module performs equalization processing on the video data to improve the contrast of the video data; the gesture segmentation module segments a gesture area in the video data; the static and dynamic gesture judgment module judges whether the current frame gesture is a static gesture or a dynamic gesture; the feature extraction module extracts HOG features of the current gesture area; the classification recognition module uses an SVM classifier to classify and recognize the HOG characteristics of the current gesture area; the dynamic gesture track recognition module tracks the current dynamic gesture and recognizes the current dynamic gesture; and the character driving and video overlapping module drives the result of the static gesture recognition or the result of the dynamic gesture recognition into corresponding characters and overlaps the original gesture video. The VGA driving module drives the VGA chip to display the superposed video in real time.

Preferably, in this embodiment, the video data collected is first converted from RGB video format to YCbCr format, then the video data is subjected to median filtering, and then the filtered data is subjected to histogram equalization. After the video image is enhanced, the gesture area is segmented based on the skin color, and the skin color interference such as the human face in the skin color threshold segmentation is removed through connected domain analysis. And then calculating the centroid of the gesture area, and generating a minimum bounding rectangle of the gesture area according to the centroid. And judging whether the current gesture is a static gesture or a dynamic gesture, if so, extracting HOG characteristics of the gesture area, and performing SVM classification and identification. And if the gesture is a dynamic gesture, performing dynamic gesture track recognition. And driving characters according to the static gesture recognition result or the dynamic gesture recognition result, overlapping the characters with the original gesture image, and displaying the characters in a VGA (video graphics array) mode.

Preferably, the hardware architecture of the HOG feature extraction module of the present embodiment is shown in fig. 3. And performing two-line Shift storage on the video data stream input by the gesture area by using a Shift _ RAM Shift register, forming a 3 x 3 pixel array with the input data of the current line, and calculating the horizontal gradient and the vertical gradient of the input pixel points by using the pixel array. And then calling an IP core of the multiplier, an IP core of the open operation and an arc tangent IP core to calculate the gradient amplitude and the gradient direction. At the same time, the gesture area data is buffered in an 8 × 8 array, resulting in a cell of 8 × 8 pixels. And counting the HOG characteristics of each cell in 9 direction intervals, connecting 4 cell characteristics of the block area in series to form the HOG characteristics of the block, and connecting the HOG characteristics of each block in series to form the HOG characteristics of the image. And finally, normalizing all HOG characteristics and outputting the normalized HOG characteristics to the SVM for classification use.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. The utility model provides a gesture recognition system based on FPGA realizes which characterized in that: the device comprises a CMOS camera data acquisition module, an FPGA data processing module, a DDR3 storage module and a VGA display module; the FPGA data processing module comprises a camera driving module, a data read-write control module, a color space transformation module, a median filtering module, a histogram equalization module, a gesture segmentation module, a static and dynamic gesture judgment module, a feature extraction module, a classification identification module, a dynamic gesture track identification module, a character driving and video superposition module and a VGA driving module; the camera driving module drives the CMOS camera data acquisition module to acquire video data, and transmits the video data back to the data read-write control module, so that the data read-write control module controls the read and write of the acquired video data in the DDR3 storage module; the DDR3 storage module is used for storing video data acquired by the CMOS camera data acquisition module, transmitting the stored video data to the color space conversion module for data format conversion, and filtering through the median filtering module to reduce noise of the video data; the histogram equalization module is used for performing equalization processing on the video data filtered by the median filtering module so as to improve the contrast of the video data; the gesture segmentation module is used for segmenting a gesture area in the video data after equalization processing; the static and dynamic gesture judgment module comprises a dynamic gesture processing module and a static gesture processing module and is used for judging whether the gesture of the current frame is a static gesture or a dynamic gesture, when the specific gesture of the palm is detected, the gesture is considered to be a dynamic gesture, and the video data stream enters the dynamic gesture processing module; when the specific gesture of the palm is not detected, the gesture is considered to be a static gesture, and the video data stream enters a static gesture processing module; the feature extraction module is used for extracting HOG features of the current static gesture area; the classification recognition module uses an SVM classifier to classify and recognize the HOG characteristics of the current gesture area; the dynamic gesture track recognition module tracks the current dynamic gesture and recognizes the current dynamic gesture; the character driving and video overlapping module is used for driving the result of the static gesture recognition or the result of the dynamic gesture recognition into corresponding characters and overlapping the corresponding characters with the original gesture video; the VGA driving module is used for driving the VGA chip and displaying the superposed video in real time.

2. An identification method of the gesture recognition system based on the FPGA implementation of claim 1, characterized in that: the method comprises the following steps:

step S4: the median filtering module adopts a 3 multiplied by 3 template and uses a rapid median filtering algorithm to filter the data converted by the color space transformation module;

step S5: the filtered data is instantiated into two Random Access Memories (RAMs) by a histogram equalization module, a Y component in YCbCr data after format data conversion is firstly subjected to histogram statistics to obtain a histogram, the histogram is cut and redistributed according to a cutting threshold value, then equalization operation is carried out on the histogram to improve the contrast of an image, the threshold value range is 0-100, and an equalization formula is as follows:

step S8: the names of various gestures in the character driving and video overlapping module can generate corresponding character image data, SVM classification recognition results or dynamic gesture track recognition results are input into the character driving and video overlapping module, the image data are driven by the SVM classification recognition results or dynamic gesture track recognition results, the driven character image data and the recognized gesture area image data are overlapped, and finally the character image data and the recognized gesture area image data are displayed on the VGA display module at the same time.

3. The recognition method of the gesture recognition system based on FPGA implementation according to claim 2, characterized in that: in step S3, the conversion formula for converting the acquired RGB format data into YCbCr format data is:

Y＝0.299R+0.587G+0.114B

Cb＝-0.172R-0.339G+0.511B+128

Cr＝0.511R-0.428G-0.083B+128

this formula is further translated into:

Y＝(77R+150G+29B)＞＞8

Cb＝(-43R-85G+128B+32768)＞＞8。

Cr＝(128R-107G-21B+32768)＞＞8

4. the recognition method of the gesture recognition system based on FPGA implementation according to claim 2, characterized in that: the specific contents of the filtering process in step S4 are: firstly, respectively sequencing 3 images in each row; then extracting the minimum value of the three maximum values, the median value of the three median values and the maximum value of the three minimum values; and finally, taking the median of the three values obtained above again, namely obtaining the median of the finally obtained 9 pixels.

5. The recognition method of the gesture recognition system based on FPGA implementation according to claim 2, characterized in that: the step S5 specifically includes the following steps:

6. The recognition method of the FPGA-based gesture recognition system according to claim 5, comprising the following steps of: the step S52 specifically includes the following steps:

7. The recognition method of the gesture recognition system based on FPGA implementation according to claim 2, characterized in that: the specific content of the HOG feature extraction of the gesture area in step S7 is: using a Shift _ RAM Shift register to perform two-line Shift storage on the static gesture image, forming a 3 x 3 pixel array with the input data of the current line, and calculating the gradient amplitude and the gradient direction of an input pixel point (x, y) by using the pixel array;

G_x(x,y)＝H(x+1,y)-H(x-1,y)

G_y(x,y)＝H(x,y+1)-H(x,y-1)

8. The recognition method of the gesture recognition system based on FPGA implementation according to claim 2, characterized in that: the specific content of the SVM classification and identification in step S7 is as follows: the classification recognition module carries out classification recognition by using a linear SVM classifier and realizes multi-classification of static gestures by a one-to-many method; defining five types of static gestures, wherein the mark of the gesture 1 is A, the mark of the gesture 2 is B, the mark of the gesture 3 is C, the mark of the gesture fist is D and the mark of the gesture palm is E; for the A, B, C, D, E five types of gestures, respectively selecting feature vectors corresponding to (A, B), (A, C), (A, D), (A, E), (B, C), (B, D), (B, E), (C, D), (C, E), (D, E) as a training set during training to obtain ten groups of training results; during testing, ten groups of training results are respectively used for testing, and the ten groups of classification results are voted to obtain the final classification result.

9. An identification method of the gesture recognition system based on FPGA implementation according to claim 7, characterized in that: the voting process is as follows:

step 1: a ═ B ═ C ═ D ═ E ═ 0; // ticket number initialization

Step 2: (a, B) -classifer if a win, then a ═ a + 1; else B ═ B + 1;

and step 3: (a, C) -classifer if a win, then a ═ a + 1; else C ═ C + 1;

and 4, step 4: (a, D) -classifer if a win, then a ═ a + 1; else D ═ D + 1;

and 5: (a, E) -classifer if a win, then a ═ a + 1; else E ═ E + 1;

step 6: (B, C) -classifer if a win, then B ═ B + 1; else C ═ C + 1;

and 7: (B, D) -classifer if a win, then B ═ B + 1; else D ═ D + 1;

and 8: (B, E) -classifer if a win, then B ═ B + 1; else E ═ E + 1;

and step 9: (C, D) -classifer if a win, then C ═ C + 1; else D ═ D + 1;

step 10: (C, E) -classifer if a win, then C ═ C + 1; else E ═ E + 1;

step 11: (D, E) -classifier if D win, then D ═ D + 1; else E ═ E + 1;

step 12: the results is The max (A, B, C, D, E).

10. The recognition method of the gesture recognition system based on FPGA implementation according to claim 2, characterized in that: the specific content of performing dynamic gesture trajectory recognition in step S7 is: when the system enters a dynamic gesture recognition module, the gesture of the palm is tracked, and the palm is dynamically operated once from the initial position to the final position; and only when the pause time of the palm at the starting position and the ending position exceeds a system set threshold time N seconds and the distance between the starting position and the ending position exceeds a system set threshold distance M, 0< N <5, 20< M <100, and the current dynamic operation can be regarded as an effective operation.