WO2020244071A1

WO2020244071A1 - Neural network-based gesture recognition method and apparatus, storage medium, and device

Info

Publication number: WO2020244071A1
Application number: PCT/CN2019/103056
Authority: WO
Inventors: 张爽; 王义文; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-06-06
Filing date: 2019-08-28
Publication date: 2020-12-10
Also published as: CN110334605B; CN110334605A

Abstract

A neural network-based gesture recognition method and apparatus, a storage medium, and a device. The neural network-based gesture recognition method comprises: obtaining an original gesture image, and performing binarization processing on the original gesture image to obtain a binarized gesture image (S210); inputting the original gesture image and the binarized gesture image respectively into two channels of a neural network model for recognition, and obtaining gesture feature information of the original gesture image (S220); and calculating the Euclidean distance between the gesture feature information and each piece of positive sample gesture feature information in a database, and determining a gesture type in the original gesture image according to the Euclidean distance (S230). Therefore, the problem of gesture recognition accuracy being low may be solved so as to improve the accuracy of gesture recognition.

Description

Neural network-based gesture recognition method, device, storage medium and equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 6, 2019 with the application number 201910493340.5 and the invention title "Neural Network-based Gesture Recognition Method, Device, Storage Medium and Equipment", and its entire contents Incorporated in this application by reference.

Technical field

This application relates to the field of image recognition technology. Specifically, this application relates to a neural network-based gesture recognition method, device, storage medium, and equipment.

Background technique

Gesture recognition is to make the computer recognize the gestures of the human body in pictures or shots through a certain algorithm, and then understand the meaning of the gestures, and realize the mutual communication between the user and the computer. With the development of machine learning and deep learning, gesture recognition is widely used in games, shopping and other scenarios.

The inventor realizes that in the prior art, gesture images are generally used to perform corresponding image processing and recognition to obtain gesture types. However, due to different photographing environments, scenes such as insufficient lighting, occlusion, insufficient resolution, incorrect posture, etc. are often caused. The use of the above-mentioned existing technology is likely to cause problems such as a decrease in the accuracy of gesture recognition, which greatly causes the gesture recognition process. Challenges.

Summary of the invention

This application provides a neural network-based gesture recognition method, a neural network-based gesture recognition device, a computer-readable storage medium, and computer equipment to solve the problem of low accuracy of gesture recognition and improve the accuracy of gesture recognition.

The embodiment of the application first provides a method for gesture recognition based on neural network, including:

Acquiring an original gesture image, and performing binarization processing on the original gesture image to obtain a binarized gesture image;

Input the original gesture image and the binarized gesture image into two channels of a neural network model respectively for recognition, and obtain gesture feature information of the original gesture image;

Calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.

In order to solve the above-mentioned problem, an embodiment of the present application also provides a gesture recognition device based on a neural network, including:

The binarization processing module is configured to obtain an original gesture image, and perform binarization processing on the original gesture image to obtain a binarized gesture image;

A recognition module, configured to input the original gesture image and the binarized gesture image into two channels of a neural network model for recognition, respectively, to obtain gesture feature information of the original gesture image;

The gesture type determining module is used to calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.

To solve the above problems, the embodiments of the present application also provide a non-volatile computer-readable storage medium, the computer-readable storage medium is used to store computer instructions, when it runs on a computer, the computer can execute A neural network-based gesture recognition method, wherein the steps of the neural network-based gesture recognition method include:

Furthermore, an embodiment of the present application also provides a computer device, and the computer device includes:

One or more processors;

Storage device for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the steps of the above-mentioned neural network-based gesture recognition method, wherein the neural network-based gesture recognition The steps of the method include:

The neural network-based gesture recognition method provided by the embodiments of this application inputs the original gesture image and its corresponding binarized gesture image into the neural network model for recognition to obtain the characteristic information of the original gesture image, and then according to the original gesture image The Euclidean distance between the feature information and the feature information of the positive sample gesture image stored in the database determines the gesture type in the original gesture image. Since the binary gesture image can reflect the texture features of the original gesture image, the multi-channel neural network model extracts the gesture features and texture feature information of the original gesture image, which is compared with the traditional single-channel neural network for gesture recognition. , Improve the recognition accuracy of the original gesture image.

The additional aspects and advantages of this application will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of this application.

Description of the drawings

FIG. 1 is a diagram of an implementation environment of a neural network-based gesture recognition method provided by an embodiment of this application;

Figure 2 is a flowchart of a neural network-based gesture recognition method provided by an embodiment of the application;

FIG. 3 is a flowchart of performing binarization processing on an original gesture image to obtain a binarized gesture image according to an embodiment of the application;

Fig. 4 is a flowchart of establishing a neural network model provided by an embodiment of the application;

Fig. 5 is a flowchart of establishing a dual-channel neural network model provided by another embodiment of the application;

6 is a flowchart of calculating the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determining the gesture type in the original gesture image according to the Euclidean distance according to an embodiment of the application ；

FIG. 7 is a schematic structural diagram of a gesture recognition device based on a neural network provided by an embodiment of this application;

FIG. 8 is a structural block diagram of a computer device provided by an embodiment of this application.

Detailed ways

The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary, and are only used to explain the present application, and cannot be construed as a limitation to the present application.

Those skilled in the art can understand that, unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or combinations thereof.

It can be understood that the terms "first", "second", etc. used in this application can be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish the first element from another element. For example, without departing from the scope of the present application, the first live video image may be referred to as the second live video image, and similarly, the second live video image may be referred to as the first live video image. Both the first live video image and the second live video image are live video images, but they are not the same live video image.

Fig. 1 is an implementation environment diagram of a neural network-based gesture recognition method provided in an embodiment, and the implementation environment includes a user terminal and a server side.

The neural network-based gesture recognition method provided in this embodiment can be executed on the server side. The execution process is as follows: obtain an original gesture image, and perform binarization processing on the original gesture image to obtain a binarized gesture image; The gesture image and the binarized gesture image are respectively input into the two channels of the neural network model for recognition, to obtain the gesture feature information of the original gesture image, and calculate the difference between the gesture feature information and each positive sample gesture feature information in the database And determine the gesture type in the original gesture image according to the Euclidean distance.

It should be noted that the user terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., and the server side can be implemented by a computer device with processing functions, but is not limited to this. The server and the user terminal can be connected to the network through Bluetooth, USB (Universal Serial Bus) or other communication connection methods, and this application is not limited here.

In one embodiment, FIG. 2 is a schematic flowchart of a neural network-based gesture recognition method provided by an embodiment of the application. The neural network-based gesture recognition method can be applied to the server side described above, and includes the following steps:

Step S210: Obtain an original gesture image, and perform binarization processing on the original gesture image to obtain a binarized gesture image;

Step S220: Input the original gesture image and the binarized gesture image into two channels of the neural network model for recognition respectively, and obtain the gesture feature information of the original gesture image;

Step S230: Calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.

The gesture recognition solution provided by this application can be applied to the following scenarios: during the identity verification process, the user’s verification gesture image is captured. Due to the complexity of the actual situation, the captured verification gesture image may be blurred and difficult to recognize; or in the game In files such as, video, etc., the gesture image is only a small part of the entire frame of the picture. Due to insufficient storage technology or shooting technology, it is impossible to clearly identify the type of gesture in the image.

In order to solve the above-mentioned problems, this application provides a neural network-based gesture recognition method, which binarizes the acquired original gesture image, obtains its binary gesture image, and uses a two-channel neural network model to recognize gesture feature information , Determine the Euclidean distance between the gesture feature information and each positive sample gesture feature information, and determine the gesture type of the original gesture image according to the Euclidean distance. For example, the positive sample gesture with the smallest Euclidean distance can be used as the gesture type of the original gesture image.

After using the above solution to identify the gesture type in the original gesture image, the identification result can be used to perform the following operations, such as: performing verification analysis in identity verification, or returning the recognition result of the gesture image to the user.

Binarize the original gesture image to obtain a binary gesture image. The binary gesture image extracts the texture features of the original gesture image, especially the local texture feature information of the original gesture image. The texture feature recognizes user gestures and can distinguish users. For gestures and background images, the recognition of gesture categories based on binarized gesture images is conducive to improving the accuracy of gesture recognition.

The solution provided in this application is suitable for static gesture recognition scenarios. In order to solve the difficulty or failure of gesture recognition caused by insufficiently clear collected gesture images, the solution provided in this application proposes a neural network-based gesture recognition method based on a neural network model. The neural network model has two input channels. The dual-channel convolutional neural network can accept different features of the image as input at the same time. There are two features in the solution provided by this application. One feature is the gesture feature, such as gesture posture information, one One kind of feature is texture feature, which is respectively processed by convolution, and then these features are combined to extract more original gesture feature information for image recognition and classification, which is beneficial to improve the recognition accuracy of gesture images.

In order to be more clear about the neural network-based gesture recognition solution provided by the present application and its technical effects, the specific solution will be described in detail in a number of embodiments below.

In an embodiment, the step of performing binarization processing on the original gesture image to obtain a binarization gesture image in step S210 may be processed in the following manner. The schematic flow chart is shown in FIG. 3 and includes the following sub-steps :

S211: Divide the original gesture image into several sub-areas;

S212: Perform the following operations on each pixel window of each sub-region: take the gray value of the central pixel of the window as a threshold, and compare the gray value of adjacent pixels with it to obtain the LBP value of the pixel window;

S213, replacing the original gray value of the pixel window with the LBP value of the pixel window to obtain a binarized gesture image corresponding to the original gesture image.

LBP refers to Local Binary Patterns, an operator used to describe the local texture features of an image. The extracted features are the local texture features of the original gesture image. This solution is to convert the original gesture image into a binary gesture image, so as to extract the texture feature information of the original gesture image.

Specifically, the original gesture image is divided into several sub-regions. When the several sub-regions include one or more than one sub-region, each sub-region includes multiple pixels. Choose an appropriate window size and take the gray scale of the center pixel of the window. The value is used as the threshold, and the gray value of adjacent pixels is compared with the corresponding binary code to represent the local texture feature. This solution is illustrated by an example. The window size is 3*3. If the surrounding pixel value at the center of the window is greater than the center pixel value, then The position of the pixel is marked as 1, otherwise it is 0. In this way, 8 pixels in the 3*3 neighborhood can be compared to generate an 8-bit binary number, and the binary number is converted into a decimal number to obtain the center pixel of the window The LBP value of the point is used to reflect the texture feature information of the pixel window. Obtain the LBP value of each window pixel in the sub-area according to the above method, replace the original gray value of the pixel window with this LBP value, and after all the pixel windows are replaced, obtain the binarized sub-area gesture image, and obtain all The binarized sub-region gesture image corresponding to the sub-region is spliced together to obtain the binarized gesture image of the original gesture image.

In an embodiment, before the step of inputting the original gesture image and the binarized gesture image into the two channels of the neural network model for recognition in step S220, the method further includes: establishing a neural network model according to the training gesture image, The process of establishing a neural network model can be carried out in the following manner. Please refer to the flow diagram shown in Figure 4, which includes the following steps:

S221: Acquire training gesture images in a preset training image set, perform feature extraction on the training gesture image and its binarized gesture image, and obtain the training gesture image and the N-dimensional feature vector corresponding to the binarized gesture image respectively. Integrating the N-dimensional feature vector to obtain a 2N-dimensional feature vector;

S222: Perform feature vector comparison based on the 2N-dimensional feature vector, and use the comparison result of the positive sample gesture image to adjust the weight of the feature vector to obtain a neural network model.

Among them, N is any positive integer, N=1, 2.... The training gesture image is extracted from the preset training image set, and the training gesture image is binarized to obtain the binarized gesture image corresponding to the training gesture image, and the training gesture image and its binarized gesture image are characterized Extraction, extract N-dimensional feature vectors for both images, integrate the obtained N-dimensional feature vectors to obtain 2N-dimensional feature vectors, compare the feature vectors of positive sample gesture images to obtain the weight of the 2N-dimensional feature vector, and establish Neural network model.

The positive sample gesture image refers to an image that includes a known gesture type, that is, the positive sample gesture image and the corresponding gesture type are pre-stored. Extract the above-mentioned 2N-dimensional feature vector of the positive sample gesture image, and use the positive sample gesture image as a training sample to obtain the weight of the 2N-dimensional feature vector. The established neural network model can be described in the following way:

P=A ₁ *X ₁ +A ₂ *X ₂ +…+A _2N *X _2N ,

Among them, X ₁ , X ₂ … X _2N are 2N feature vectors, A ₁ , A ₂ … A _2N are the weights of feature vectors corresponding to the 2N feature vectors, and P is the corresponding gesture type, which is collected by positive sample gesture images A large number of positive sample gesture images are trained to obtain the weights of feature vectors corresponding to 2N feature vectors. Obtaining the neural network model through this kind of big data training method is beneficial to call the neural network model during subsequent gesture image recognition to quickly obtain accurate gesture types.

This embodiment further elaborates how to obtain the neural network model. The neural network model described in the embodiment of this application is preferably a two-channel neural network model, and the two-channel neural network model is preferably obtained as shown in FIG. 5, including the following sub step:

S2221, using the basic network structure of Inception-Resnet-V2 to construct an initial two-channel neural network model;

S2222: Extract the 2N-dimensional feature vector in the positive sample gesture image, and input the 2N-dimensional feature vector and the gesture type corresponding to the positive sample gesture image into the initial two-channel neural network model to obtain the initial weight value of the 2N-dimensional feature vector ；

S2223: Use all positive sample gesture images in the positive sample gesture image set and the corresponding gesture types to continuously adjust the initial weight value of each feature vector in the initial two-channel neural network model. After the weight value of each feature vector is determined, the two-channel neural network model is obtained .

Specifically, the set of positive sample gesture images includes a large number of positive sample gesture images of known gesture types. Specifically, the 2N-dimensional feature vector in the first positive-sample gesture image is extracted, and the 2N-dimensional feature vector and the gesture type corresponding to the first positive-sample gesture image are input into a dual-channel neural network model to obtain the 2N-dimensional feature vector The first weight value, the first weight value is the initial weight value; the 2N-dimensional feature vector in the second positive sample gesture image is extracted, and the 2N-dimensional feature vector and the gesture type corresponding to the second positive sample gesture image are input The two-channel neural network model in which the weight value of each eigenvector is the first weight value, obtains the second weight value of the 2N-dimensional eigenvector, and the second weight value is the weight value after adjusting the first weight value. According to this method , Using the positive sample gesture images in the positive sample gesture set to adjust the weight value of the 2N-dimensional feature vector, after obtaining the final weight value of each feature vector in the dual-channel neural network model, and the weight value corresponding to each feature vector is determined, namely A two-channel neural network model was established.

Step S2221 uses the basic network structure of Inception-Resnet-V2 to construct the initial two-channel neural network model. Inception-Resnet-V2 is a convolutional neural network, which is the neural network with the best image classification effect in today’s benchmark tests. The convolutional neural network model established by this network structure can improve the accuracy of gesture type recognition.

Take N to 64 as an example to illustrate the process of obtaining 2N-dimensional feature vectors: one channel of the two-channel neural network model inputs a positive sample gesture image, and the other channel inputs a binary gesture image corresponding to the positive sample gesture image. Feature extraction is performed in the channels respectively, and 64-dimensional feature vectors are obtained respectively. After L2 normalization, they are finally integrated and connected into 128-dimensional vectors.

The training process of the whole sample gesture image based on the integrated 128-dimensional vector is as follows: use the first positive sample image to obtain the corresponding first weight value, and use the neural network model corresponding to the first weight value to calculate the output of the second positive sample gesture image The value of the loss function between the preset gesture type and the weight value of the neural network model is adjusted according to the value of the loss function to reduce the value of the loss function. The loss function between the output of the neural network model and the preset gesture type of the positive sample is continuously calculated. After a large number of sample training in the sample set, the value of the loss function continues to decrease, and the accuracy of the model output gesture type is getting higher and higher, and the final extracted The 128-dimensional feature vector can reflect the feature points of the gesture to be verified, and the weight of each feature vector can accurately reflect the image power of each feature point, which is conducive to rapid and accurate gesture recognition.

Model training is carried out through the above method, and step 2 is used to extract gesture feature information, and then gesture recognition is performed based on the extracted feature information. Preferably, 128-dimensional gesture features are extracted for gesture verification, which increases the robustness and accuracy of the gesture recognition algorithm.

Compared with the single-channel neural network model, the dual-channel neural network model built with 2N-dimensional feature vectors increases the extracted feature information. Using the extracted feature vectors for feature information comparison and recognition is helpful to improve the accuracy of gesture recognition degree.

The foregoing embodiment describes how to establish a neural network model based on the obtained 2N-dimensional feature vector, and the following embodiment describes how to obtain the 2N-dimensional feature vector of a binary gesture image.

Further, the following operations can also be performed to obtain the LBP feature vector of the original gesture image: Count the distribution of the LBP value of each subregion, obtain the LBP histogram of each subregion, and connect the histograms of each subregion to obtain the original gesture image The LBP texture feature vector.

The N-dimensional feature vector corresponding to the binarized gesture image can be obtained through the following operations, including the following sub-steps:

A1. Divide the binarized gesture image into N sub-areas;

A2. Acquire LBP histograms of the N sub-regions, and perform normalization processing on the LBP histograms;

A3. Connect the normalized histograms of the N sub-regions to obtain the N-dimensional feature vector corresponding to the binarized gesture image.

The process of obtaining the LBP histogram of N sub-regions in the embodiment of the present application is: firstly, the solution described in S212 is used to obtain the LBP value of each pixel window in each sub-region, and the distribution of LBP values in each sub-region is calculated to obtain each sub-region LBP histogram.

Normalize the LBP histograms of N sub-regions, and arrange the processed histograms of each sub-region in a row according to the spatial order of each sub-region to form the LBP feature vector, and obtain the binarized gesture image in this way The corresponding N-dimensional feature vector.

In step S230, the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database is calculated, and the gesture type in the original gesture image is determined according to the Euclidean distance. The determination can be performed in the following manner. The schematic diagram is shown in Figure 6, including the following steps:

S231: Obtain the feature vector of the original gesture image and the same-dimensional feature vector of each positive sample gesture image in the database, and calculate the Euclidean distance between the feature vector of the original gesture image and the feature vector of each positive sample gesture image;

S232: Obtain the confidence level between the original gesture image and each positive sample gesture image according to the Euclidean distance, and output the positive sample gesture type corresponding to the highest confidence level as the gesture type in the original gesture image.

In step S230, the Euclidean distance between the gesture feature information of the original gesture image and the gesture feature information of each positive sample in the database is used to determine the feature information in the original gesture image. The gesture feature information can be expressed in various forms. The embodiment of the present application preferably adopts The feature vector represents the feature information of the original gesture image. According to the solution provided in step S221, feature extraction is performed on the original gesture image and its binarized gesture image to obtain the N-dimensional feature vector corresponding to the original gesture image and the binarized gesture image respectively, and integrate The N-dimensional feature vector obtains a 2N-dimensional feature vector. The same scheme obtains the 2N-dimensional feature vector of each positive sample gesture image, and calculates the Euclidean distance between the feature vector of the original gesture image and each positive sample gesture image. The size obtains the confidence of each positive sample gesture image, and the gesture type in the positive sample gesture image with the highest confidence is output as the gesture type of the original gesture image.

The solution provided by the embodiment of the application uses the Euclidean distance between the feature vector corresponding to the original gesture image and the feature vector corresponding to the positive sample gesture image to determine the gesture type in the original gesture image, and can accurately determine the gesture type and the positive gesture in the original gesture image. The similarity between the gesture types of the sample gesture images is used to accurately determine the gesture types in the original gesture images in a short time.

Optionally, in order to improve the recognition accuracy of the gesture type in the original gesture image, the number of samples of the positive sample gesture image can be added to the database, and the original gesture image can also be subjected to image enhancement and/or image filtering in advance, using edge preservation The noise reduction algorithm processes the original gesture image, and the edge preservation noise reduction algorithm is used to process the original gesture image. It is helpful to highlight the gesture part in the original gesture image, filter out the noise in the original gesture image, and improve the gesture recognition of the original gesture image.

The foregoing is the embodiment of the neural network-based gesture recognition method provided by this application. With respect to this method, the following describes the corresponding embodiment of the neural network-based gesture recognition device.

An embodiment of the present application also provides a gesture recognition device based on a neural network. The structure diagram is shown in FIG. 7, and includes: a binarization processing module 710, a recognition module 720, and a gesture type determination module 730, which are specifically as follows:

The binarization processing module 710 is configured to obtain an original gesture image, and perform binarization processing on the original gesture image to obtain a binarized gesture image;

The recognition module 720 is configured to input the original gesture image and the binarized gesture image into two channels of a neural network model for recognition respectively, and obtain gesture feature information of the original gesture image;

The gesture type determining module 730 is configured to calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.

Regarding the neural network-based gesture recognition device in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be elaborated here.

Further, an embodiment of the present application also provides a non-volatile computer-readable storage medium having computer instructions stored thereon, and when the computer instructions are executed by a processor, the neural network-based gesture recognition described in any one of the above is realized Method steps. Wherein, the storage medium includes, but is not limited to, any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM (Random AccesSS Memory), and then Memory), EPROM (EraSable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), EEPROM (Electrically EraSable Programmable Read-Only Memory), flash memory, magnetic card or optical card. That is, the storage medium includes any medium that stores or transmits information in a readable form by a device (for example, a computer). It can be a read-only memory, magnetic disk or optical disk, etc.

One or more processors;

Storage device for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the steps of the neural network-based gesture recognition method described in any one of the above.

Fig. 8 is a structural block diagram showing a computer device 800 according to an exemplary embodiment. For example, the computer device 800 may be provided as a server. 8, the computer device 800 includes a processing component 822, which further includes one or more processors, and a memory resource represented by a memory 832, for storing instructions executable by the processing component 822, such as application programs. The application program stored in the memory 832 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 822 is configured to execute instructions to execute the steps of the above-mentioned two-channel neural network-based neural network-based gesture recognition method.

The computer device 800 may also include a power component 826 configured to perform power management of the computer device 800, a wired or wireless network interface 850 configured to connect the computer device 800 to a network, and an input output (I/O) interface 858 . The computer device 800 can operate based on an operating system stored in the memory 832, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like. It should be understood that, although the various steps in the flowchart of the drawings are shown in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.

It should be understood that the functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.

The above are only part of the implementation of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of this application, several improvements and modifications can be made, and these improvements and modifications are also Should be regarded as the scope of protection of this application.

Claims

A gesture recognition method based on neural network, including:

Acquiring an original gesture image, and performing binarization processing on the original gesture image to obtain a binarized gesture image;

Input the original gesture image and the binarized gesture image into two channels of a neural network model respectively for recognition, and obtain gesture feature information of the original gesture image;

Calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.
The neural network-based gesture recognition method according to claim 1, wherein the step of performing binarization processing on the original gesture image to obtain a binarized gesture image comprises:

Divide the original gesture image into several sub-areas;

Perform the following operations on the pixel window in each sub-region: take the gray value of the center pixel of the window as the threshold, compare the gray value of adjacent pixels with it, and obtain the LBP value of the pixel window;

The original gray value of the pixel window is replaced by the LBP value of the pixel window to obtain the binarized gesture image corresponding to the original gesture image.
The neural network-based gesture recognition method according to claim 1, before the step of inputting the original gesture image and the binarized gesture image into the two channels of the neural network model for recognition respectively, the method further comprises:

Establish a neural network model according to the training gesture image; among them, the steps of establishing a neural network model include:

Acquire training gesture images in a preset training image set, perform feature extraction on the training gesture image and its binarized gesture image, obtain the training gesture image and the N-dimensional feature vector corresponding to the binarized gesture image, and integrate all The N-dimensional feature vector is used to obtain a 2N-dimensional feature vector;

The feature vector is compared based on the 2N-dimensional feature vector, and the comparison result of the positive sample gesture image is used to adjust the weight of the feature vector to obtain the neural network model.
The neural network-based gesture recognition method according to claim 3, wherein the step of adjusting the weight of the feature vector by using the comparison result of the positive sample gesture image to obtain the neural network model comprises:

Use the basic network structure of Inception-Resnet-V2 to construct the initial two-channel neural network model;

Extracting the 2N-dimensional feature vector in the positive sample gesture image, and inputting the 2N-dimensional feature vector and the gesture type corresponding to the positive sample gesture image into an initial two-channel neural network model to obtain the initial weight value of the 2N-dimensional feature vector;

Use all positive sample gesture images and corresponding gesture types in the positive sample gesture image set to continuously adjust the initial weight value of each feature vector in the initial two-channel neural network model. After the weight value of each feature vector is determined, the two-channel neural network model is obtained.
According to the neural network-based gesture recognition method according to claim 3, the step of obtaining the N-dimensional feature vector corresponding to the binarized gesture image includes:

Dividing the binarized gesture image into N sub-areas;

Obtaining LBP histograms of the N sub-regions, and normalizing the LBP histograms;

Connect the normalized histograms of the N sub-regions to obtain the N-dimensional feature vector corresponding to the binarized gesture image.
The neural network-based gesture recognition method according to claim 3, said calculating the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determining the original gesture image according to the Euclidean distance The steps in the gesture type include:

Obtain the feature vector of the original gesture image and the feature vector of the same dimension of each positive sample gesture image in the database, and calculate the Euclidean distance between the feature vector of the original gesture image and the feature vector of each positive sample gesture image;

Obtain the confidence level between the original gesture image and each positive sample gesture image according to the Euclidean distance, and output the positive sample gesture type corresponding to the highest confidence level as the gesture type in the original gesture image.
The neural network-based gesture recognition method according to claim 1, before the step of performing binarization processing on the original gesture image to obtain a binarized gesture image, further comprising:

Use edge-preserving and noise reduction algorithm to reduce the noise of the original gesture image.
A gesture recognition device based on neural network, including:

The binarization processing module is configured to obtain an original gesture image, and perform binarization processing on the original gesture image to obtain a binarized gesture image;

A recognition module, configured to input the original gesture image and the binarized gesture image into two channels of a neural network model for recognition, respectively, to obtain gesture feature information of the original gesture image;

The gesture type determining module is used to calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.
A non-volatile computer-readable storage medium used to store computer instructions. When it runs on a computer, the computer can execute a neural network-based gesture recognition method, wherein the The steps of the neural network gesture recognition method include:

Acquiring an original gesture image, and performing binarization processing on the original gesture image to obtain a binarized gesture image;

Input the original gesture image and the binarized gesture image into two channels of a neural network model respectively for recognition, and obtain gesture feature information of the original gesture image;

Calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.
The non-volatile computer-readable storage medium according to claim 9, wherein the step of performing binarization processing on the original gesture image to obtain a binarized gesture image comprises:

Divide the original gesture image into several sub-areas;

Perform the following operations on the pixel window in each sub-region: take the gray value of the center pixel of the window as the threshold, compare the gray value of adjacent pixels with it, and obtain the LBP value of the pixel window;

The original gray value of the pixel window is replaced by the LBP value of the pixel window to obtain the binarized gesture image corresponding to the original gesture image.
10. The non-volatile computer-readable storage medium according to claim 10, before the step of inputting the original gesture image and the binarized gesture image into two channels of a neural network model for recognition respectively, further comprising:

Establish a neural network model according to the training gesture image; among them, the steps of establishing a neural network model include:

Acquire training gesture images in a preset training image set, perform feature extraction on the training gesture image and its binarized gesture image, obtain the training gesture image and the N-dimensional feature vector corresponding to the binarized gesture image, and integrate all The N-dimensional feature vector is used to obtain a 2N-dimensional feature vector;

The feature vector is compared based on the 2N-dimensional feature vector, and the comparison result of the positive sample gesture image is used to adjust the weight of the feature vector to obtain the neural network model.
11. The non-volatile computer-readable storage medium according to claim 11, wherein the step of using the comparison result of the positive sample gesture image to adjust the weight of the feature vector to obtain the neural network model comprises:

Use the basic network structure of Inception-Resnet-V2 to construct the initial two-channel neural network model;

Extracting the 2N-dimensional feature vector in the positive sample gesture image, and inputting the 2N-dimensional feature vector and the gesture type corresponding to the positive sample gesture image into an initial two-channel neural network model to obtain the initial weight value of the 2N-dimensional feature vector;

Use all positive sample gesture images and corresponding gesture types in the positive sample gesture image set to continuously adjust the initial weight value of each feature vector in the initial two-channel neural network model. After the weight value of each feature vector is determined, the two-channel neural network model is obtained.
According to the non-volatile computer-readable storage medium of claim 12, the step of obtaining the N-dimensional feature vector corresponding to the binarized gesture image comprises:

Dividing the binarized gesture image into N sub-areas;

Obtaining LBP histograms of the N sub-regions, and normalizing the LBP histograms;

Connect the normalized histograms of the N sub-regions to obtain the N-dimensional feature vector corresponding to the binarized gesture image.
The non-volatile computer-readable storage medium according to claim 11, said calculating the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determining the original Euclidean distance according to the Euclidean distance The steps of the gesture type in the gesture image include:

Obtain the feature vector of the original gesture image and the feature vector of the same dimension of each positive sample gesture image in the database, and calculate the Euclidean distance between the feature vector of the original gesture image and the feature vector of each positive sample gesture image;

Obtain the confidence level between the original gesture image and each positive sample gesture image according to the Euclidean distance, and output the positive sample gesture type corresponding to the highest confidence level as the gesture type in the original gesture image.
A computer device, the computer device includes:

One or more processors;

Storage device for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the steps of the aforementioned neural network-based gesture recognition method, wherein the neural network-based gesture recognition The steps of the method include:

Acquiring an original gesture image, and performing binarization processing on the original gesture image to obtain a binarized gesture image;

Input the original gesture image and the binarized gesture image into two channels of a neural network model respectively for recognition, and obtain gesture feature information of the original gesture image;

Calculate the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determine the gesture type in the original gesture image according to the Euclidean distance.
The computer device according to claim 15, wherein the step of performing binarization processing on the original gesture image to obtain a binarized gesture image comprises:

Divide the original gesture image into several sub-areas;

Perform the following operations on the pixel window in each sub-region: take the gray value of the center pixel of the window as the threshold, compare the gray value of adjacent pixels with it, and obtain the LBP value of the pixel window;

The original gray value of the pixel window is replaced by the LBP value of the pixel window to obtain the binarized gesture image corresponding to the original gesture image.
15. The computer device according to claim 15, before the step of inputting the original gesture image and the binarized gesture image into two channels of a neural network model for recognition respectively, it further comprises:

Establish a neural network model according to the training gesture image; among them, the steps of establishing a neural network model include:

Acquire training gesture images in a preset training image set, perform feature extraction on the training gesture image and its binarized gesture image, obtain the training gesture image and the N-dimensional feature vector corresponding to the binarized gesture image, and integrate all The N-dimensional feature vector is used to obtain a 2N-dimensional feature vector;

The feature vector is compared based on the 2N-dimensional feature vector, and the comparison result of the positive sample gesture image is used to adjust the weight of the feature vector to obtain the neural network model.
18. The computer device according to claim 17, wherein the step of using the comparison result of the positive sample gesture image to adjust the weight of the feature vector to obtain the neural network model comprises:

Use the basic network structure of Inception-Resnet-V2 to construct the initial two-channel neural network model;

Extracting the 2N-dimensional feature vector in the positive sample gesture image, and inputting the 2N-dimensional feature vector and the gesture type corresponding to the positive sample gesture image into an initial two-channel neural network model to obtain the initial weight value of the 2N-dimensional feature vector;

Use all positive sample gesture images and corresponding gesture types in the positive sample gesture image set to continuously adjust the initial weight value of each feature vector in the initial two-channel neural network model. After the weight value of each feature vector is determined, the two-channel neural network model is obtained.
The computer device according to claim 17, wherein the step of obtaining the N-dimensional feature vector corresponding to the binarized gesture image comprises:

Dividing the binarized gesture image into N sub-areas;

Obtaining LBP histograms of the N sub-regions, and normalizing the LBP histograms;

Connect the normalized histograms of the N sub-regions to obtain the N-dimensional feature vector corresponding to the binarized gesture image.
The computer device according to claim 17, said calculating the Euclidean distance between the gesture feature information and each positive sample gesture feature information in the database, and determining the type of gesture in the original gesture image according to the Euclidean distance The steps include:

Obtain the feature vector of the original gesture image and the feature vector of the same dimension of each positive sample gesture image in the database, and calculate the Euclidean distance between the feature vector of the original gesture image and the feature vector of each positive sample gesture image;

Obtain the confidence level between the original gesture image and each positive sample gesture image according to the Euclidean distance, and output the positive sample gesture type corresponding to the highest confidence level as the gesture type in the original gesture image.