CN115205974A - Gesture recognition method and related equipment - Google Patents

Gesture recognition method and related equipment Download PDF

Info

Publication number
CN115205974A
CN115205974A CN202210832652.6A CN202210832652A CN115205974A CN 115205974 A CN115205974 A CN 115205974A CN 202210832652 A CN202210832652 A CN 202210832652A CN 115205974 A CN115205974 A CN 115205974A
Authority
CN
China
Prior art keywords
projection
data
gesture
training
gesture recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210832652.6A
Other languages
Chinese (zh)
Inventor
康来
张亚坤
刘星宇
张辉泽
魏迎梅
蒋杰
谢毓湘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210832652.6A priority Critical patent/CN115205974A/en
Publication of CN115205974A publication Critical patent/CN115205974A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a gesture recognition method and related equipment. The method comprises the following steps: acquiring gesture data; carrying out data decoding on the gesture data to obtain decoded data; carrying out spatial position index projection on the decoded data to obtain accumulated information; carrying out multi-dimensional projection on the accumulated information to obtain a multi-dimensional projection result; and inputting the multidimensional projection result into the trained gesture recognition model, outputting a recognition label corresponding to the gesture data, and obtaining a gesture recognition result according to the recognition label corresponding to the gesture data. The calculation redundancy in the projection process is simplified, the data processing efficiency is improved, the utilization of time information is increased, the information is fully utilized, and the gesture recognition accuracy can be effectively improved.

Description

Gesture recognition method and related equipment
Technical Field
The present application relates to the field of gesture recognition technologies, and in particular, to a gesture recognition method and a related device.
Background
With the development of virtual reality technology, dynamic gesture recognition is regarded as a more natural and efficient man-machine interaction mode, and can be widely applied to the fields of sign language recognition, night driving, game development and the like. The sensors used to collect dynamic visual data, i.e. event cameras, are capable of monitoring brightness changes asynchronously.
Based on the above situation, in the projection method of dynamic gesture recognition adopted in the prior art, the projection matrix is stretched to one dimension for projection, and each accumulation is performed on the whole one-dimensional matrix, so that the calculation is performed at a position where the brightness does not change, and a large amount of calculation redundancy exists.
Disclosure of Invention
In view of the foregoing, an object of the present invention is to provide a gesture recognition method and related apparatus, so as to solve or partially solve the above technical problems.
In view of the above, a first aspect of the present application provides a gesture recognition method, including:
acquiring gesture data;
performing data decoding on the gesture data to obtain decoded data;
performing spatial position index projection on the decoded data to obtain accumulated information;
carrying out multidimensional projection on the accumulated information to obtain a multidimensional projection result;
and inputting the multi-dimensional projection result into a trained gesture recognition model, outputting a recognition label corresponding to the gesture data, and obtaining a gesture recognition result according to the recognition label corresponding to the gesture data.
Optionally, the decoded data includes a plurality of events, the events including abscissa, ordinate and time;
the performing spatial position index projection on the decoded data to obtain accumulated information includes:
performing spatial position index projection on the decoded data to locate a target spatial position with changed brightness, and acquiring an event of the target spatial position from all events in the decoded data;
and integrating the abscissa, the ordinate and the time corresponding to the event of the target space position to obtain accumulated information.
Optionally, the performing multidimensional projection on the accumulated information to obtain a multidimensional projection result includes:
carrying out multidimensional projection on the abscissa and the ordinate corresponding to the event of the target space position to obtain a first projection;
carrying out multidimensional projection on the abscissa and the time corresponding to the event of the target space position to obtain a second projection, and carrying out equal-interval sampling on the second projection to obtain a first equal-interval sampling projection;
carrying out multidimensional projection on a vertical coordinate and time corresponding to the event of the target space position to obtain a third projection, and carrying out equal-interval sampling on the third projection to obtain a second equal-interval sampling projection;
transposing the first equi-spaced sampling projection and the second equi-spaced sampling projection respectively to obtain a first transposing projection and a second transposing projection;
and superposing and fusing the first projection, the first transfer projection and the second device projection to obtain a multi-dimensional projection result.
Optionally, the gesture recognition model is obtained by training through the following processes:
acquiring a training data set and a real label which are preprocessed, and dividing the training data set into a training set and a test set;
inputting the training data in the training set into a pre-training model which is constructed in advance for training to obtain a trained pre-training model;
inputting the test data in the test set into the trained pre-training model to test the training of the training data about the pre-training model to obtain a test identification label corresponding to the test gesture data in the test set, and performing difference comparison on the test identification label corresponding to the test gesture data and the real gesture identification label to obtain a comparison result;
and in response to the comparison result being greater than or equal to a preset threshold, repeatedly executing the process of inputting the training data in the training set into a pre-training model for training so as to adjust the parameters of the trained pre-training model until the comparison result is smaller than the preset threshold, obtaining the trained pre-training model, and taking the trained pre-training model as the gesture recognition model.
Optionally, the gesture recognition model is a pulse neural network model.
Optionally, the obtaining a gesture recognition result according to the recognition tag corresponding to the gesture data includes:
acquiring action information corresponding to the identification tag of the historical gesture data;
and inquiring the identification tag of the historical gesture data according to the identification tag corresponding to the gesture data to obtain the identification tag of the historical gesture data which is the same as the identification tag corresponding to the gesture data, acquiring the action information corresponding to the identification tag of the historical gesture data which is the same as the identification tag corresponding to the gesture data, and taking the action information as a gesture identification result.
Optionally, the data decoding the gesture data to obtain decoded data includes:
and dividing the gesture data according to a preset timestamp to obtain the decoding data.
Based on the same inventive concept, a second aspect of the present application provides a gesture recognition apparatus, including:
a data acquisition module configured to acquire gesture data;
the data decoding module is configured to perform data decoding on the gesture data to obtain decoded data;
the index projection module is configured to perform spatial position index projection on the decoded data to obtain accumulated information;
the multidimensional projection module is configured to perform multidimensional projection on the accumulated information to obtain a multidimensional projection result;
and the gesture recognition module is configured to input the multi-dimensional projection result into a trained gesture recognition model, output a recognition label corresponding to the gesture data, and obtain a gesture recognition result according to the recognition label corresponding to the gesture data.
Based on the same inventive concept, a third aspect of the present application provides an electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method of the first aspect.
Based on the same inventive concept, a fourth aspect of the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.
From the above, according to the gesture recognition method and the related equipment provided by the application, accumulated information can be obtained by performing spatial position index projection, the position with the brightness change accurately positioned is accumulated, the calculation redundancy in the projection process is simplified, the data processing efficiency is improved, then the time information is embedded into the data projection plane by performing multi-dimensional projection, the multi-dimensional projection is fused and overlapped with the original single-channel projection, the utilization of the time information is increased, the information is fully utilized, the multi-dimensional projection result is obtained, finally, the gesture recognition result is obtained by combining a gesture recognition model, the gesture recognition accuracy can be effectively improved, and the time cost is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the present application or related technologies, the drawings required for the embodiments or related technologies in the following description are briefly introduced, and it is obvious that the drawings in the following description are only the embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a gesture recognition method according to an embodiment of the present disclosure;
FIG. 2-a is an input image of a conventional gesture recognition according to an embodiment of the present application;
2-b are input images of a gesture recognition model projected in multiple dimensions according to an embodiment of the present application;
FIG. 3-a is an image of a conventional gesture recognition projection according to an embodiment of the present application;
3-b are multi-dimensionally projected images of embodiments of the present application;
FIG. 4 is a schematic structural diagram of a gesture recognition apparatus according to an embodiment of the present application;
fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.
It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the present application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used only to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
In the related art, the projection matrix is stretched to one dimension by adopting a traditional projection method for projection, each accumulation is to calculate the whole one-dimensional matrix, so that the calculation is also carried out at the position where the brightness is not changed, a large amount of calculation redundancy exists, and in addition, the traditional gesture recognition expression also has the problem of insufficient information utilization.
The embodiment of the application provides a gesture recognition method and related equipment, through carrying out spatial position index projection and multidimensional projection, calculation redundancy can be reduced to a great extent, the problem of insufficient information utilization is effectively solved, the gesture recognition accuracy is improved, and then a gesture recognition model is utilized, so that the time overhead of gesture recognition is greatly reduced.
As shown in fig. 1, the method includes:
step 101, gesture data is obtained.
In specific implementation, the gesture data may be image data, data acquired by a dynamic vision sensor, infrared data, acoustic wave data, and the like, and the data acquired by the dynamic vision sensor is preferred here.
The dynamic vision sensor can still see affairs at night, and can reconnaissance at night to timely master the deployment dynamics of a party needing observation and reconnaissance.
And 102, performing data decoding on the gesture data to obtain decoded data.
During specific implementation, the gesture data is decoded to obtain decoded data, the gesture data is reshaped into a preset form, such as a single pulse sequence, an integrated frame and the like, so that the expected form of subsequent data processing can be met, meanwhile, the utilization form of the original gesture data is changed, for example, the decoded data comprises coordinate information, time information, polarity information and space-time sparsity information after data decoding, and the subsequent data processing can utilize the coordinate information, the time information, the polarity information and the space-time sparsity information to achieve different recognition effects.
In some embodiments, step 102 comprises:
and dividing the gesture data according to a preset timestamp to obtain the decoding data.
In specific implementation, the gesture data is divided by using a given time stamp in a data predetermined file format depending on decoding of a frightening deep learning framework (SpikingJelly), and the original gesture data is decoded without any deletion in an event form to obtain decoded data.
And 103, performing spatial position index projection on the decoded data to obtain accumulated information.
During specific implementation, the position with brightness change is directly positioned through spatial position index projection, and the position with brightness change is accumulated to obtain accumulated information, so that the operation at the position without brightness change is avoided, the calculation redundancy is reduced, and the time cost is greatly reduced.
In some embodiments, the decoded data comprises a plurality of events, the events comprising an abscissa, an ordinate, and a time;
step 103, comprising:
and step 1031, performing spatial position index projection on the decoded data to locate a target spatial position of brightness change, and acquiring events of the target spatial position from all events in the decoded data.
And 1032, integrating the abscissa, the ordinate and the time corresponding to the event of the target space position to obtain accumulated information.
In a specific implementation, the decoded data includes a plurality of events, and each event includes an abscissa, an ordinate, a time, and the like. And performing spatial position index projection on the decoded data, locating a spatial position (namely a target spatial position) of brightness change, and acquiring an event of the spatial position of the brightness change from the decoded data, wherein the event of the spatial position of the brightness change comprises an abscissa, an ordinate, time and the like, and integrating the spatial positions of all the brightness changes by taking the abscissa, the ordinate and the time of each spatial position of the brightness change as indexes to obtain accumulated information. The spatial position index projection is utilized to avoid operation at the position where the brightness does not change, so that the calculation redundancy is reduced, and the time cost is greatly reduced.
The event refers to that a camera based on the event tracks the change of the logarithmic intensity of the image, and returns an event when the logarithmic intensity change exceeds a set threshold, the events are accumulated in a fixed time interval, and an event stream is formed, wherein the event stream is represented as a 3D (3-dimensional) point cloud in space and time. Each event is represented as a point in a three-dimensional continuum of (x, y, t) (i.e., abscissa, ordinate, time). Each gesture generates a unique cloud of events in the (x, y, t) coordinate system, referred to as a spatiotemporal event cloud. By interpreting the event stream as a spatiotemporal event cloud, the spatial and temporal features are fused in a 3D spatiotemporal continuum. Therefore, the recognition of one gesture becomes the recognition of the geometrical distribution of the event cloud generated by the gesture, and the space-time sparsity is fully utilized.
And 104, carrying out multi-dimensional projection on the accumulated information to obtain a multi-dimensional projection result.
When the method is specifically implemented, a multidimensional projection result is obtained by carrying out multidimensional projection, time information is embedded into a data projection surface in the multidimensional projection process and is fused and superposed with original single-channel projection, the utilization of the time information is increased, and the information is fully utilized.
For example, as shown in fig. 2-a and 2-b, fig. 2-a is a conventional RGB image acquired using an RGB camera or a depth camera, and fig. 2-b is a multi-dimensional projection result obtained by multi-dimensionally projecting data acquired by a dynamic vision sensor, which increases the utilization of time information and satisfies a data input format for subsequent data processing. The high dynamic range, lower latency and lower throughput of event data is better suited than conventional images for tracking hands, which are typically in fast motion, and the dynamic vision sensor can easily capture motion that can only be captured by a camera with more than 1000 frames per second, compared to conventional methods of gesture recognition using an RGB camera or a depth camera. On the other hand, only local pixel level changes are transmitted as they occur, and the dynamic vision sensor by virtue of its on-demand output characteristics easily addresses the limitations of conventional cameras.
In some embodiments, step 104 comprises:
step 1041, performing multidimensional projection on the abscissa and the ordinate corresponding to the event of the target spatial position to obtain a first projection.
And 1042, performing multidimensional projection on the abscissa and the time corresponding to the event of the target space position to obtain a second projection, and performing equal-interval sampling on the second projection to obtain a first equal-interval sampling projection.
Step 1043, performing multidimensional projection on the ordinate and the time corresponding to the event of the target spatial position to obtain a third projection, and performing equal-interval sampling on the third projection to obtain a second equal-interval sampling projection.
Step 1044, transposing the first equi-spaced sampling projection and the second equi-spaced sampling projection respectively to obtain a first transposing projection and a second transposing projection.
And 1045, overlaying and fusing the first projection, the first conversion projection and the second device projection to obtain a multi-dimensional projection result.
In specific implementation, multidimensional projection is carried out by utilizing an abscissa and an ordinate to obtain an x-y projection (namely a first projection), space-time information is encoded, coding is added to time information, multidimensional projection is carried out by utilizing the abscissa and time to obtain an x-t projection (namely a second projection), multidimensional projection is carried out by utilizing the ordinate and time to obtain a y-t projection (namely a third projection), equal-interval sampling is carried out on the x-t projection and the y-t projection respectively to obtain gray maps of two channels (namely a first equal-interval sampling projection and a second equal-interval sampling projection), and then the two gray maps are transposed respectively to obtain a first transposed projection and a second transposed projection, so that the dimensionalities of the first transposed projection and the second transposed projection can be kept consistent with the x-y projection, therefore, the time information can be retained to a certain extent, so that the time information can be overlapped with the x-y projection, then the x-t projection, the y-t projection and the x-y projection are overlapped and fused to obtain a new integrated frame (namely a multi-dimensional projection result), compared with a single-channel gray-scale image of the original x-y projection, after the projection information of two channels is newly added, the image has obvious new information, namely a blue channel and a green channel respectively, the main axes of the event points of the two channels are consistent with the original x-y gray-scale image, and the accuracy of the new projection work is verified from the side (the new projection adds processing such as equal-interval sampling, transposition and the like, and the two newly added channels, namely x-t projection and y-t projection, can be matched with the x-y projection of the original channel only by performing the operations).
In which, as shown in fig. 3-a and 3-b, the projection information of two channels is added to the image of fig. 3-b, compared with that of fig. 3-a, and the image has obvious new information.
For example, a three-dimensional data rectangular solid (128 × 128 × 1000) is subjected to projection processing, a grayscale map whose x-y projection result is (128 × 128) and grayscale maps whose x-t and y-t projection results are both (128 × 1000) are subjected to one equal-interval sampling processing, and then the grayscale map and the grayscale map are also converted to (128 × 128), and a condition for superimposing the grayscale map is satisfied. After equal-interval sampling, the x-t gray-scale image and the y-t gray-scale image need to be transposed by one step, and the x-t gray-scale image and the y-t gray-scale image can be correctly matched and added to obtain higher accuracy, and the accuracy is greatly reduced without transposition.
And 105, inputting the multi-dimensional projection result into the trained gesture recognition model, outputting a recognition label corresponding to the gesture data, and obtaining a gesture recognition result according to the recognition label corresponding to the gesture data.
In a specific implementation, for example, the gesture recognition model is an impulse neural network model, the impulse neural network model is a multipurpose network model, and the arrangement of the hierarchical architecture is variable, so that the gesture recognition model can be used for various data.
The multi-dimensional projection result is input into the trained gesture recognition model, the recognition labels corresponding to the gesture data are output, the gesture recognition result is obtained according to the recognition labels corresponding to the gesture data, the pulse neural network has the capability of efficiently processing dynamic visual sensing data, and the time overhead of dynamic gesture recognition is greatly reduced.
In some embodiments, in step 105, obtaining a gesture recognition result according to the identification tag corresponding to the gesture data includes:
step 1051, obtaining action information corresponding to the identification tag of the historical gesture data.
Step 1052, querying the identification tag of the historical gesture data according to the identification tag corresponding to the gesture data to obtain an identification tag of historical gesture data that is the same as the identification tag corresponding to the gesture data, and obtaining action information corresponding to the identification tag of the historical gesture data that is the same as the identification tag corresponding to the gesture data, and taking the action information as a gesture identification result.
In specific implementation, for example, the action information corresponding to the identification tag of the historical gesture data is shown in table 1 below:
TABLE 1
Figure BDA0003749016720000081
Figure BDA0003749016720000091
And inquiring the identification tag corresponding to the gesture data from the identification tags of the historical data according to the identification tag corresponding to the gesture data, acquiring the action information corresponding to the identification tag, and taking the action information as a gesture result.
In some embodiments, the gesture recognition model is obtained by training through the following processes:
acquiring a training data set and a real label which are preprocessed, and dividing the training data set into a training set and a test set;
inputting the training data in the training set into a pre-training model which is constructed in advance for training to obtain a trained pre-training model;
inputting the test data in the test set into the trained pre-training model to test the training of the training data about the pre-training model to obtain a test identification label corresponding to the test gesture data in the test set, and performing difference comparison on the test identification label corresponding to the test gesture data and the real gesture identification label to obtain a comparison result;
and in response to the comparison result being greater than or equal to a preset threshold, repeatedly executing the process of inputting the training data in the training set into a pre-training model for training so as to adjust the parameters of the trained pre-training model until the comparison result is smaller than the preset threshold, obtaining the trained pre-training model, and taking the trained pre-training model as the gesture recognition model.
In specific implementation, the real label is a known actual label of each data in the training data set, the training data set is divided into a training set and a test set, and a part of data is reserved for testing the training degree of the pre-training model.
Inputting training data in a training set into a pre-training model for training, iteratively adjusting parameters of the pre-training model repeatedly to obtain a trained pre-training model, inputting data in a testing set into the trained pre-training model to test the training degree of the pre-training model until the difference between a recognition result (namely a testing recognition label corresponding to testing gesture data) of the data in the testing set and a real label (namely a real gesture recognition label) is within a preset threshold range, indicating that the precision of the pre-training model reaches a standard, finishing training of the pre-training model at the moment, and taking the trained pre-training model as a gesture recognition model.
And the training data in the test set is used for training the pre-training model once, and the test data in the test set is used for testing the pre-training model once.
When the pre-training model is trained, some neurons are discarded randomly, and their corresponding connecting edges are discarded to avoid overfitting, for example, a fixed probability p is set, and each neuron is discarded randomly according to the probability p.
In some embodiments, the gesture recognition model is a spiking neural network model.
During specific implementation, the gesture recognition model is a pulse neural network model, the pulse neural network model is a multipurpose network model, and the arrangement of the hierarchical architecture is variable, so that the gesture recognition model can be used for various data sets, the gesture recognition rate is improved, and the time cost of gesture recognition is reduced.
Through the scheme of the embodiment, accumulated information can be obtained by carrying out spatial position index projection, the position with accurately positioned brightness change is accumulated, the position with brightness change is directly positioned by spatial position index projection, the accumulated information is obtained by accumulating, operation at the position without brightness change is avoided, the calculation redundancy is reduced, the time cost is greatly reduced, multi-dimensional projection is carried out to embed the time information into a data projection surface, the multi-dimensional projection is fused and overlapped with the original single-channel projection, the utilization of the time information is increased, the guarantee information is fully utilized, the multi-dimensional projection result is obtained, finally, a gesture recognition result is obtained by combining a gesture recognition model, the gesture recognition rate is improved, and the time cost of gesture recognition is reduced.
It should be noted that the method of the embodiment of the present application may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and is completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiment, and the multiple devices interact with each other to complete the method.
It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to the method of any embodiment, the application also provides a gesture recognition device.
Referring to fig. 4, the gesture recognition apparatus includes:
a data acquisition module 401 configured to acquire gesture data;
a data decoding module 402 configured to perform data decoding on the gesture data to obtain decoded data;
an index projection module 403, configured to perform spatial position index projection on the decoded data to obtain accumulated information;
a multidimensional projection module 404 configured to perform multidimensional projection on the accumulated information to obtain a multidimensional projection result;
the gesture recognition module 405 is configured to input the multi-dimensional projection result into a trained gesture recognition model, output a recognition tag corresponding to the gesture data, and obtain a gesture recognition result according to the recognition tag corresponding to the gesture data.
In some embodiments, the decoded data comprises a plurality of events, the events comprising an abscissa, an ordinate, and a time;
the index projection module 403 is specifically configured to:
performing spatial position index projection on the decoded data to locate a target spatial position with changed brightness, and acquiring an event of the target spatial position from all events in the decoded data;
and integrating the abscissa, the ordinate and the time corresponding to the event of the target space position to obtain accumulated information.
In some embodiments, the multidimensional projection module 404 is specifically configured to:
carrying out multidimensional projection on the abscissa and the ordinate corresponding to the event of the target space position to obtain a first projection;
carrying out multidimensional projection on the abscissa and time corresponding to the event of the target space position to obtain a second projection, and carrying out equal-interval sampling on the second projection to obtain a first equal-interval sampling projection;
carrying out multidimensional projection on a vertical coordinate and time corresponding to the event of the target space position to obtain a third projection, and carrying out equal-interval sampling on the third projection to obtain a second equal-interval sampling projection;
transposing the first equi-spaced sampling projection and the second equi-spaced sampling projection respectively to obtain a first transposing projection and a second transposing projection;
and superposing and fusing the first projection, the first transfer projection and the second device projection to obtain a multi-dimensional projection result.
In some embodiments, the gesture recognition apparatus further comprises a model training module specifically configured to:
acquiring a training data set and a real label which are preprocessed, and dividing the training data set into a training set and a test set;
inputting the training data in the training set into a pre-training model which is constructed in advance for training to obtain a trained pre-training model;
inputting the test data in the test set into the trained pre-training model to test the training of the training data about the pre-training model to obtain a test identification label corresponding to the test gesture data in the test set, and performing difference comparison on the test identification label corresponding to the test gesture data and the real gesture identification label to obtain a comparison result;
and in response to the comparison result being greater than or equal to a preset threshold, repeatedly executing the process of inputting the training data in the training set into a pre-training model for training so as to adjust the parameters of the trained pre-training model until the comparison result is smaller than the preset threshold, obtaining the trained pre-training model, and taking the trained pre-training model as the gesture recognition model.
In some embodiments, the gesture recognition model is a spiking neural network model.
In some embodiments, the gesture recognition module 405 is specifically configured to:
acquiring action information corresponding to the identification tag of the historical gesture data;
and inquiring the identification tag of the historical gesture data according to the identification tag corresponding to the gesture data to obtain the identification tag of the historical gesture data which is the same as the identification tag corresponding to the gesture data, acquiring the action information corresponding to the identification tag of the historical gesture data which is the same as the identification tag corresponding to the gesture data, and taking the action information as a gesture identification result.
In some embodiments, the data decoding module 402 is specifically configured to:
and dividing the gesture data according to a preset timestamp to obtain the decoding data.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations as the present application.
The device of the above embodiment is used to implement the corresponding gesture recognition method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to the method of any embodiment described above, the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the gesture recognition method described in any embodiment above is implemented.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 501, a memory 502, an input/output interface 503, a communication interface 504, and a bus 505. Wherein the processor 501, the memory 502, the input/output interface 503 and the communication interface 504 are communicatively connected to each other within the device via a bus 505.
The processor 501 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.
The Memory 502 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 502 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 502 and called to be executed by the processor 501.
The input/output interface 503 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 504 is used to connect a communication module (not shown in the figure) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).
Bus 505 comprises a path that transfers information between the various components of the device, such as processor 501, memory 502, input/output interface 503, and communication interface 504.
It should be noted that although the above-mentioned device only shows the processor 501, the memory 502, the input/output interface 503, the communication interface 504 and the bus 505, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the above embodiment is used to implement the corresponding gesture recognition method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, the present application also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the gesture recognition method according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the gesture recognition method according to any one of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that the embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made without departing from the spirit or scope of the embodiments of the present application are intended to be included within the scope of the claims.

Claims (10)

1. A gesture recognition method, comprising:
acquiring gesture data;
performing data decoding on the gesture data to obtain decoded data;
performing spatial position index projection on the decoded data to obtain accumulated information;
carrying out multi-dimensional projection on the accumulated information to obtain a multi-dimensional projection result;
and inputting the multi-dimensional projection result into a trained gesture recognition model, outputting a recognition label corresponding to the gesture data, and obtaining a gesture recognition result according to the recognition label corresponding to the gesture data.
2. The method of claim 1, wherein the decoded data comprises a plurality of events, the events comprising an abscissa, an ordinate, and a time;
the performing spatial position index projection on the decoded data to obtain accumulated information includes:
performing spatial position index projection on the decoded data to locate a target spatial position with changed brightness, and acquiring an event of the target spatial position from all events in the decoded data;
and integrating the abscissa, the ordinate and the time corresponding to the event of the target space position to obtain accumulated information.
3. The method of claim 2, wherein the multi-dimensionally projecting the accumulated information to obtain a multi-dimensionally projected result comprises:
carrying out multi-dimensional projection on the abscissa and the ordinate corresponding to the event of the target space position to obtain a first projection;
carrying out multidimensional projection on the abscissa and the time corresponding to the event of the target space position to obtain a second projection, and carrying out equal-interval sampling on the second projection to obtain a first equal-interval sampling projection;
carrying out multidimensional projection on a vertical coordinate and time corresponding to the event of the target space position to obtain a third projection, and carrying out equal-interval sampling on the third projection to obtain a second equal-interval sampling projection;
transposing the first equi-spaced sampling projection and the second equi-spaced sampling projection respectively to obtain a first transposing projection and a second transposing projection;
and superposing and fusing the first projection, the first transfer projection and the second device projection to obtain a multi-dimensional projection result.
4. The method of claim 1, wherein the gesture recognition model is obtained by training through the following process:
acquiring a training data set and a real label which are preprocessed, and dividing the training data set into a training set and a test set;
inputting the training data in the training set into a pre-training model which is constructed in advance for training to obtain a trained pre-training model;
inputting the test data in the test set into the trained pre-training model to test the training of the training data about the pre-training model to obtain a test identification label corresponding to the test gesture data in the test set, and performing difference comparison on the test identification label corresponding to the test gesture data and the real gesture identification label to obtain a comparison result;
and in response to the comparison result being greater than or equal to a preset threshold, repeatedly executing the process of inputting the training data in the training set into a pre-training model for training so as to adjust the parameters of the trained pre-training model until the comparison result is smaller than the preset threshold, obtaining the trained pre-training model, and taking the trained pre-training model as the gesture recognition model.
5. The method of claim 1, wherein the gesture recognition model is a spiking neural network model.
6. The method according to claim 1, wherein obtaining the gesture recognition result according to the identification tag corresponding to the gesture data comprises:
acquiring action information corresponding to an identification label of historical gesture data;
and inquiring the identification tag of the historical gesture data according to the identification tag corresponding to the gesture data to obtain the identification tag of the historical gesture data which is the same as the identification tag corresponding to the gesture data, acquiring the action information corresponding to the identification tag of the historical gesture data which is the same as the identification tag corresponding to the gesture data, and taking the action information as a gesture identification result.
7. The method of claim 1, wherein the data decoding the gesture data to obtain decoded data comprises:
and dividing the gesture data according to a preset time stamp to obtain the decoding data.
8. A gesture recognition apparatus, comprising:
a data acquisition module configured to acquire gesture data;
the data decoding module is configured to perform data decoding on the gesture data to obtain decoded data;
the index projection module is configured to perform spatial position index projection on the decoded data to obtain accumulated information;
the multidimensional projection module is configured to perform multidimensional projection on the accumulated information to obtain a multidimensional projection result;
and the gesture recognition module is configured to input the multi-dimensional projection result into a trained gesture recognition model, output a recognition label corresponding to the gesture data, and obtain a gesture recognition result according to the recognition label corresponding to the gesture data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202210832652.6A 2022-07-15 2022-07-15 Gesture recognition method and related equipment Pending CN115205974A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210832652.6A CN115205974A (en) 2022-07-15 2022-07-15 Gesture recognition method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210832652.6A CN115205974A (en) 2022-07-15 2022-07-15 Gesture recognition method and related equipment

Publications (1)

Publication Number Publication Date
CN115205974A true CN115205974A (en) 2022-10-18

Family

ID=83582663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210832652.6A Pending CN115205974A (en) 2022-07-15 2022-07-15 Gesture recognition method and related equipment

Country Status (1)

Country Link
CN (1) CN115205974A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576784A (en) * 2024-01-15 2024-02-20 吉林大学 Method and system for recognizing diver gesture by fusing event and RGB data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576784A (en) * 2024-01-15 2024-02-20 吉林大学 Method and system for recognizing diver gesture by fusing event and RGB data
CN117576784B (en) * 2024-01-15 2024-03-26 吉林大学 Method and system for recognizing diver gesture by fusing event and RGB data

Similar Documents

Publication Publication Date Title
CN109508681B (en) Method and device for generating human body key point detection model
CN111328396B (en) Pose estimation and model retrieval for objects in images
EP3506161A1 (en) Method and apparatus for recovering point cloud data
CN111161349B (en) Object posture estimation method, device and equipment
CN108701376A (en) The Object Segmentation based on identification of 3-D view
US11488320B2 (en) Pose estimation method, pose estimation apparatus, and training method for pose estimation
CN105144236A (en) Real time stereo matching
WO2020134818A1 (en) Image processing method and related product
CN109074497A (en) Use the activity in depth information identification sequence of video images
CN112017300A (en) Processing method, device and equipment for mixed reality image
CN116997941A (en) Keypoint-based sampling for pose estimation
CN111680678A (en) Target area identification method, device, equipment and readable storage medium
CN114219855A (en) Point cloud normal vector estimation method and device, computer equipment and storage medium
CN114519853A (en) Three-dimensional target detection method and system based on multi-mode fusion
CN114298982A (en) Image annotation method and device, computer equipment and storage medium
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN114387346A (en) Image recognition and prediction model processing method, three-dimensional modeling method and device
CN111126358A (en) Face detection method, face detection device, storage medium and equipment
CN115205974A (en) Gesture recognition method and related equipment
CN115188066A (en) Moving target detection system and method based on cooperative attention and multi-scale fusion
CN114299230A (en) Data generation method and device, electronic equipment and storage medium
CN116823884A (en) Multi-target tracking method, system, computer equipment and storage medium
CN111310595A (en) Method and apparatus for generating information
CN113239915B (en) Classroom behavior identification method, device, equipment and storage medium
CN112651351B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination