CN116030535B - Gesture recognition method and device, chip and electronic equipment - Google Patents
Gesture recognition method and device, chip and electronic equipment Download PDFInfo
- Publication number
- CN116030535B CN116030535B CN202310297117.XA CN202310297117A CN116030535B CN 116030535 B CN116030535 B CN 116030535B CN 202310297117 A CN202310297117 A CN 202310297117A CN 116030535 B CN116030535 B CN 116030535B
- Authority
- CN
- China
- Prior art keywords
- neural network
- value
- trend value
- clockwise
- threshold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a gesture recognition method and device, a chip and electronic equipment. In order to solve the technical problem that the existing gesture recognition scheme is difficult to be applied in large scale commercialization, the invention processes the pulse event related to the dynamic gesture by utilizing the pulse neural network to obtain the classification result of the hand shape and the location thereof; and the post-processing module is used for carrying out post-processing on the classification results of the hand type and the location thereof output by the impulse neural network so as to identify the dynamic gesture states such as clockwise or anticlockwise. The impulse neural network is small in scale, simple in post-processing logic, easy to implement, low in hardware difficulty and chip cost, adaptive, high in robustness and accuracy, and capable of accurately judging changeable gestures and user behaviors. The invention is suitable for the field of brain-like chips or event cameras.
Description
Technical Field
The invention relates to a gesture recognition method and device, a chip and electronic equipment, in particular to a dynamic gesture recognition method and device, a chip and electronic equipment, which have low power consumption, low cost and high precision and can rapidly recognize clockwise or anticlockwise rotation.
Background
Gesture recognition is a very popular research and application direction. Generalized gesture recognition can be categorized into two-dimensional hand-type recognition (which can be considered as a static gesture), two-dimensional gesture recognition, and three-dimensional gesture recognition. From a sensor perspective, it can also be divided into visual schemes (e.g. camera) and non-visual schemes (e.g. ToF ranging, infrared). From the viewpoint of recognition methods, methods such as machine learning (SVM, decision tree, etc.), deep learning (CNN, etc.), template matching (DTW, etc.), statistical learning (KNN, etc.), etc. can be classified.
The gesture recognition function is not difficult to realize, for example, a large deep neural network (such as a three-dimensional convolutional neural network) can be run on the GPU device, user gestures in a complex visual background can be detected with hundred watt level power consumption, and simple gestures can be recognized based on PIR. However, the related products of gesture recognition are difficult to be popularized and applied in large scale, and the reasons for the problems are that besides the power consumption, the real-time performance and the cost of the products, the robustness and the accuracy are insufficient, the complexity of the real scene is difficult to deal with, and the commercial value is not high due to the poor user experience.
In vision-based schemes, successive frame images are typically acquired using a common frame camera, i.e., at specific time intervals and exposure times. While the traditional Artificial Neural Network (ANN) can well perform computer vision tasks such as classification, object detection and the like on a single frame image, when video information with a time sequence relationship and redundant data are faced, the network scale is huge, and hardware is required to have larger calculation force to meet larger calculation amount requirements (even if a user does not send gesture instructions), so that the power consumption is higher, and the method is difficult to apply to edge equipment which is extremely sensitive to both power consumption and cost.
Being constrained by power consumption/computation, cost, real-time, robustness, recognition distance, gesture diversity and ambiguity, it is not a simple task to propose an excellent gesture recognition scheme that can be used commercially on a large scale, but rather very tricky. The prior art 1-2 are based on frame images and decision trees, infrared ranging sensors (non-visual schemes) and computing units such as microprocessors, respectively, to recognize gestures. However, these schemes often have difficult power consumption advantages due to the separate nature of the von willebrand architecture, or require an infrared sensor array that occupies a larger sensing area.
Prior art 1: CN107679512a;
prior art 2: CN106919261a.
A new sensor, namely an event camera, is currently emerging in the field of computer vision. It captures only dynamically changing pixel data and the static background is not imaged in the event camera. This eliminates a large amount of extraneous redundant data at the input, and the time resolution is extremely high, thus greatly facilitating low power gesture recognition. But the data of the event camera is not structured data, such as a frame image, but rather is streaming data (a pulse sequence consisting of a plurality of pulse events). The conventional image signal processing method is difficult to be directly applied to the new pattern data.
The pulse neural network (SNN) is known as a third generation neural network, has event-driven characteristics, and has the characteristic of low power consumption by imitating the brain operation principle. The SNN generally refers to a pulse neural network for analog simulation and training on a traditional computing platform such as a GPU or the like, or a pulse neural network running on neuromorphic hardware (also called an SNN processor), wherein the latter is a hardware platform which can actually exert the advantage of the SNN, and is also an ultimate hardware carrier of the SNN. It should be noted that neuromorphic hardware is a non-von neumann architecture that does not perform various mathematical/procedural function calculations in the traditional sense based on computer programs, with the advantages of high real-time, low power consumption, and low cost.
Recognizing that a scheme based on a combination of an event camera (or a frame image obtained by a pulse sequence through a difference frame) and SNN is expected to provide a gesture recognition scheme with low power consumption (which can be as low as milliwatts) and high real-time performance (which can be as high as microseconds), the inventor selects the scheme as a technical starting point of gesture recognition.
While the current SNN can effectively identify the hand type (such as palm, fist, etc.), referring to fig. 1, a schematic diagram of a certain SNN network outputting an identification result according to a user waving the palm in front of a vision sensor is shown, the SNN of a small network cannot be distinguished for the palm sliding clockwise or anticlockwise, but only outputs a classification result representing the palm.
To solve this problem, one way is to further apply a method for detecting/tracking the location of the hand, but this requires a model with higher complexity (such as optical flow method, kalman filtering method, and deep learning method such as MDNet, FCNT in the prior art, which usually require a GPU, which is a high-power hardware device), and the high computational power requirement necessarily causes a difficulty of returning to high power consumption and high cost.
The other way is that the SNN judges clockwise and anticlockwise sliding based on continuous motion, but the way has high requirements on SNN network resources to process continuously-changed coordinate information or calculate motion information in a period of time, which requires that the SNN has stronger decision-making force and requires a complex or/and large SNN network. The judgment of continuous actions affects the processing speed, and the large-scale network training difficulty is increased and is unfavorable for the realization of hardware. In addition, the speed of rotation, whether the hand is shielded in the rotation process, the distance between the hand and the sensor and the like have great influence on the results, and the device is difficult to adapt to behavior habits of different scenes and different users.
In addition, most of current dynamic gesture recognition schemes are still limited to algorithms or computer simulation, and have poor performance after hardware realization, and real-time performance, power consumption, precision and the like are difficult to meet the requirements of edge equipment.
How to obtain a rotation gesture recognition scheme with low power consumption, low cost, high precision, high real-time performance, easy hardware realization and high robustness under a complex background is a technical target pursued in the field.
Disclosure of Invention
In order to solve or alleviate some or all of the above technical problems, the present invention is implemented by the following technical solutions:
a gesture recognition method utilizes a pulse neural network to process pulse events related to dynamic gestures to obtain classification results of hand types and locations thereof; post-processing the classification results of the hand type and the location thereof output by the pulse neural network through a post-processing module so as to identify the dynamic gesture state;
the pulse neural network obtains pulse events related to dynamic gestures based on signals acquired by the vision sensor; the field of view or screen range of the vision sensor is divided into a plurality of regions, the location being any of the plurality of regions.
In some embodiments, the post-processing module updates the trend value based on the classification result of the hand pattern and its location output by the impulse neural network; identifying a dynamic gesture based on the updated trend value; wherein the trend value represents a dynamic gesture potential energy or trend.
In some embodiments, when the updated trend value satisfies the clockwise condition, the current gesture is in a clockwise rotation state;
when the updated trend value meets the anticlockwise condition, the current gesture is in an anticlockwise rotation state;
wherein the clockwise condition and the counterclockwise condition include one of:
i) When the first threshold value is larger than or equal to the first threshold value and rotates clockwise, the second threshold value is smaller than or equal to the second threshold value and rotates anticlockwise;
ii) if the first threshold is less than or equal to the clockwise rotation, the second threshold is greater than or equal to the counterclockwise rotation.
In certain classes of embodiments, the first threshold value or/and the second threshold value comprises one of the following:
i) Is a numerical value;
ii) is a range comprising an upper limit or/and a lower limit.
In certain classes of embodiments, the first threshold comprises an upper limit or/and a lower limit, and the clockwise condition comprises one of:
i) The trend value is larger than or equal to a first threshold value and is clockwise rotation, and when the updated trend value is smaller than or equal to a first threshold value lower limit, the trend value is out of a clockwise rotation state;
ii) less than or equal to the first threshold is a clockwise rotation: when the updated trend value is greater than or equal to the first threshold upper limit, the clockwise rotation state is exited;
When the second threshold includes an upper limit or/and a lower limit, the counterclockwise condition includes one of:
i) The trend value is smaller than or equal to a second threshold value and is clockwise rotation, and when the updated trend value is larger than or equal to a second threshold value upper limit, the counter-clockwise rotation state is exited;
ii) greater than or equal to the second threshold is a clockwise rotation, and when the updated trend value is less than or equal to the second threshold lower limit, the counter-clockwise rotation state is exited.
In some types of embodiments, if the updated trend value does not satisfy the clockwise or counter-clockwise condition, the clockwise or counter-clockwise state is exited.
In some class of embodiments, the trend value is reset or initialized upon exiting the clockwise or counter-clockwise state.
In some embodiments, if the location of the same hand type changes at least twice in a clockwise direction, then the current gesture is a clockwise rotation;
if the position of the same hand type changes at least twice continuously in the anticlockwise direction, the current gesture is anticlockwise rotation.
In certain embodiments, the trend value increases or decreases when the location of two adjacent hands is changed.
In some embodiments, if the post-processing module obtains the intermediate state, the trend value is unchanged; or the change amount of the trend value in the middle state is smaller than that of the trend value in the area position change;
The intermediate state refers to that the two adjacent post-processing modules acquire the same classification result from the impulse neural network or do not acquire effective output from the impulse neural network;
the effective output means that the impulse neural network does not output, or the hand type classification of the impulse neural network which is output twice is different.
In some embodiments, if the number of times a certain intermediate state occurs continuously is greater than or equal to two, resetting the trend value;
if a rollback phenomenon occurs, the trend value is reset, and the rollback means that the trend value is increased by a value first and then decreased by the same value in the middle state is ignored.
In certain embodiments, the post-processing module identifies the number of clockwise or counterclockwise rotations based on the number of times the same hand-shaped location has been changed and passed through the same location.
In some class of embodiments, the trend value is updated based on an addition or subtraction operation.
In certain embodiments, the post-processing module obtains the output of the impulse neural network in real time, or the post-processing module obtains the output of the impulse neural network at intervals, or obtains the output of the impulse neural network at fixed number of impulse neural network input impulses.
In some types of embodiments, the interval of time that the post-processing module reads the output of the impulse neural network is proportional to the speed of the dynamic gesture, or the number of impulse neural network input pulses that the post-processing module reads the reference of the output of the impulse neural network is proportional to the speed of the dynamic gesture.
A gesture recognition device comprises a pulse neural network and a post-processing module;
the impulse neural network is used for processing impulse events related to the dynamic gestures to obtain a classification result of the hands and the locations thereof;
the post-processing module is coupled with the pulse neural network and used for judging the state of the dynamic gesture based on the classification result of the hand type and the location thereof output by the pulse neural network;
the pulse neural network obtains pulse events related to dynamic gestures based on signals acquired by the vision sensor; the field of view or screen range of the vision sensor includes a plurality of regions, the location being any of the plurality of regions.
In certain classes of embodiments, the post-processing module includes:
the logic operation module carries out logic operation on the classification results of the hand type and the area position of the hand type output by the adjacent two-time pulse neural network so as to update the trend value;
the state identification module is used for identifying the dynamic gesture state based on the updated trend value;
wherein the trend value represents a dynamic gesture potential energy or trend.
In some embodiments, when the updated trend value satisfies the clockwise condition, the current gesture is in a clockwise rotation state; when the updated trend value meets the anticlockwise condition, the current gesture is in an anticlockwise rotation state;
Wherein the clockwise condition and the counterclockwise condition include one of:
i) When the first threshold value is larger than or equal to the first threshold value and rotates clockwise, the second threshold value is smaller than or equal to the second threshold value and rotates anticlockwise;
ii) if the first threshold is less than or equal to the clockwise rotation, the second threshold is greater than or equal to the counterclockwise rotation.
In certain classes of embodiments, the first threshold value or/and the second threshold value comprises one of the following:
i) Is a numerical value;
ii) is a range comprising an upper limit or/and a lower limit.
In certain classes of embodiments, the first threshold comprises an upper limit or/and a lower limit, and the clockwise condition comprises one of:
i) The trend value is larger than or equal to a first threshold value and is clockwise rotation, and when the updated trend value is smaller than or equal to a first threshold value lower limit, the trend value is out of a clockwise rotation state;
ii) less than or equal to the first threshold is a clockwise rotation: when the updated trend value is greater than or equal to the first threshold upper limit, the clockwise rotation state is exited;
when the second threshold includes an upper limit or/and a lower limit, the counterclockwise condition includes one of:
i) The trend value is smaller than or equal to a second threshold value and is clockwise rotation, and when the updated trend value is larger than or equal to a second threshold value upper limit, the counter-clockwise rotation state is exited;
ii) greater than or equal to the second threshold is a clockwise rotation, and when the updated trend value is less than or equal to the second threshold lower limit, the counter-clockwise rotation state is exited.
In some types of embodiments, if the updated trend value does not satisfy the clockwise or counter-clockwise condition, the clockwise or counter-clockwise state is exited.
In some embodiments, if the location of the same hand type changes at least twice in a clockwise direction, then the current gesture is a clockwise rotation;
if the position of the same hand type changes at least twice continuously in the anticlockwise direction, the current gesture is anticlockwise rotation.
In certain embodiments, the trend value increases or decreases when the location of two adjacent hands is changed.
In some embodiments, if the post-processing module obtains the intermediate state, the trend value is unchanged;
or the change amount of the trend value in the middle state is smaller than that of the trend value in the area position change;
the intermediate state refers to that the two adjacent post-processing modules acquire the same classification result from the impulse neural network or do not acquire effective output from the impulse neural network;
the effective output means that the impulse neural network does not output, or the hand type classification of the impulse neural network which is output twice is different.
In some embodiments, if the number of times a certain intermediate state occurs continuously is greater than or equal to two, resetting the trend value; if a rollback phenomenon occurs, the trend value is reset, and the rollback means that the trend value is increased by a value first and then decreased by the same value in the middle state is ignored.
In certain embodiments, the post-processing module identifies the number of clockwise or counterclockwise rotations based on the number of times the same hand-shaped location has been changed and passed through the same location.
In certain embodiments, the post-processing module obtains the output of the impulse neural network in real time, or the post-processing module obtains the output of the impulse neural network at intervals, or obtains the output of the impulse neural network at fixed number of impulse neural network input impulses.
In certain classes of embodiments, the post-processing module further comprises:
the adjustment module is coupled between the impulse neural network and the logic operation module, and is used for adjusting the interval time for reading the impulse neural network output or adjusting the number of impulse neural network input pulses for reading the impulse neural network output reference by the post-processing module based on the speed of the dynamic gesture.
In some types of embodiments, the interval time for reading the impulse neural network output is proportional to the speed of the dynamic gesture, or the number of impulse neural network input pulses referenced by the post-processing module reading the impulse neural network output is proportional to the speed of the dynamic gesture.
In some class of embodiments, the trend value is updated based on an addition or subtraction operation.
In certain classes of embodiments, the vision sensor is a frame-based image sensor, or a sensor in which some or all of the pixels are capable of generating a pulse event.
In certain embodiments, when two adjacent regions of the same hand type change, the absolute value of the trend value change is proportional or inversely proportional to the vector angle of the region change of the same hand type.
In a class of embodiments, the gesture recognition apparatus further comprises: the selection module is used for selecting and directly outputting the classification result of the impulse neural network or transmitting the classification result of the impulse neural network to the post-processing module for processing.
In certain classes of embodiments, the number of layers of the impulse neural network is less than or equal to 6 layers, or the total number of neurons in the impulse neural network is less than five thousand or less than ten thousand.
A chip comprising a dynamic gesture recognition device as described above.
In certain classes of embodiments, the chip is a brain-like chip or a neuromorphic chip.
An electronic device comprising a dynamic gesture recognition apparatus as described above or comprising a chip as described above.
In certain classes of embodiments, the electronic device includes a vision sensor coupled with a pulsed neural network processor;
the impulse neural network processor obtains impulse events related to dynamic gestures based on signals acquired by the vision sensors.
Some or all embodiments of the present invention have the following beneficial technical effects:
1) The method has stable rotation gesture recognition performance under a complex background, has self-adaptability, and has quite high accuracy for complex environments or situations, such as operations with different speeds, actions at different distances from a sensor, situations that hands are not in the field of view or are partially blocked in the rotation process, and the like.
2) The invention is realized based on a small SNN network, has simple operation, low resource requirement (small network and easy training; the data volume is small, and the calculation cost is low; the SNN decision force is not required to be high), the processing speed is high, and the rotation gesture recognition is realized with low power consumption (milliwatt level), low cost and high precision while the timeliness is ensured.
3) The rotation gesture recognition technology is based on simple summation (addition/subtraction) operation, is easy to realize in hardware, integrates a neuromorphic chip of the technology, and has very small difference between a test result in an actual scene (complex background) and a computer simulation result and can be ignored. The chip does not need networking, and the power consumption in the whole process is less than 10mW from visual perception (an event camera) to pulse neural network reasoning to post-processing and outputting a rotation identification result, and the overall processing speed is less than 5ms. Therefore, the rotation gesture recognition technology can be effectively implemented on edge equipment, and has commercial application value.
Further advantageous effects will be further described in the preferred embodiments.
The above-described technical solutions/features are intended to summarize the technical solutions and technical features described in the detailed description section, and thus the ranges described may not be exactly the same. However, these new solutions disclosed in this section are also part of the numerous solutions disclosed in this document, and the technical features disclosed in this section and the technical features disclosed in the following detailed description section, and some contents in the drawings not explicitly described in the specification disclose more solutions in a reasonable combination with each other.
The technical scheme combined by all the technical features disclosed in any position of the invention is used for supporting the generalization of the technical scheme, the modification of the patent document and the disclosure of the technical scheme.
Drawings
FIG. 1 is a schematic diagram of gesture recognition by SNN;
FIG. 2 is a schematic diagram of a dynamic gesture recognition apparatus according to a preferred embodiment of the present invention;
FIG. 3 is a view of SNN classification results read by the post-processing module during clockwise palm rotation in some cases;
FIG. 4 is a graph illustrating trend value changes upon identifying a clockwise rotation in a preferred embodiment;
FIG. 5 is a graph illustrating trend value changes upon identifying counter-clockwise rotation in certain preferred embodiments;
FIG. 6 is a diagram showing a state change when a rotation gesture is recognized according to a preferred embodiment of the present invention;
FIG. 7 is a schematic diagram of setting upper and lower limits for a first threshold and a second threshold in an embodiment of the invention;
FIG. 8 is a block diagram of a post-processing module in accordance with an embodiment of the present invention;
fig. 9 is a diagram showing the trend value change upon identifying clockwise rotation in another preferred embodiment.
Detailed Description
Since various alternatives are not exhaustive, the gist of the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention. Other technical solutions and details not disclosed in detail below, which generally belong to technical objects or technical features that can be achieved by conventional means in the art, are limited in space and the present invention is not described in detail.
Except where division is used, any position "/" in this disclosure means a logical "or". The ordinal numbers "first", "second", etc., in any position of the present invention are used merely for distinguishing between the labels in the description and do not imply an absolute order in time or space, nor do they imply that the terms preceded by such ordinal numbers are necessarily different from the same terms preceded by other ordinal terms.
The present invention will be described in terms of various elements for use in various combinations of embodiments, which elements are to be combined in various methods, products. In the present invention, even if only the gist described in introducing a method/product scheme means that the corresponding product/method scheme explicitly includes the technical feature.
The description of a step, module, or feature in any location in the disclosure does not imply that the step, module, or feature is the only step or feature present, but that other embodiments may be implemented by those skilled in the art with the aid of other technical means according to the disclosed technical solutions. The embodiments of the present invention are generally disclosed for the purpose of disclosing preferred embodiments, but it is not meant to imply that the contrary embodiments of the preferred embodiments are not intended to cover all embodiments of the invention as long as such contrary embodiments are at least one technical problem addressed by the present invention. Based on the gist of the specific embodiments of the present invention, a person skilled in the art can apply means of substitution, deletion, addition, combination, exchange of sequences, etc. to certain technical features, so as to obtain a technical solution still following the inventive concept. Such solutions without departing from the technical idea of the invention are also within the scope of protection of the invention.
Neuromorphic (mimicry) chips: the method has the event driving characteristic, the event is driven to be calculated or processed after the event occurs, and the ultra-high real-time performance and ultra-low power consumption are realized on a hardware circuit. Neuromorphic chips are classified, according to type, into neuromorphic chips based on analog or digital or data hybrid circuits.
Pulsed neural network (Spiking neural network, SNN): one of the event-driven neuromorphic chips is a third-generation artificial neural network, has rich space-time dynamics characteristics, various coding mechanisms and event-driven characteristics, and is low in calculation cost and low in power consumption. Compared with the artificial neural network ANN, the SNN is more bionic and advanced, and the brain-like calculation (brain-inspired computing) or the neuromorphic calculation (neuromorphic computing) based on the SNN has better performance and calculation overhead than the traditional AI chip. It should be noted that, the type of the impulse neural network is not specifically limited in the embodiments of the present invention, and as long as the neural network driven based on the impulse signal or the event can be applied to the sound source direction method provided in the embodiments of the present invention, the impulse neural network, for example, the impulse convolutional neural network (SCNN), the impulse cyclic neural network (SRNN), the long-short-term neural network (LSTM), and the like, can be built according to the actual application scenario.
In dynamic gesture recognition, due to the variability and freedom of gestures and the complexity of environmental background (such as illumination, shielding and the like), how to quickly and robustly recognize or track complex actions with low power consumption and low cost is a research hotspot of current dynamic gesture recognition.
The invention discloses a dynamic gesture recognition method based on a pulse neural network, and particularly relates to a dynamic gesture recognition method based on a position classification result. FIG. 2 is a schematic diagram of a dynamic gesture recognition apparatus according to a preferred embodiment of the present invention, including a pulsed neural network and a post-processing module.
And the impulse neural network is used for processing impulse events (spiking events) related to the dynamic gestures to obtain classification results. Wherein, the pulse event related to the dynamic gesture is obtained based on the signal acquired by the vision sensor.
The visual sensor may be a sensor that senses light/changes to directly generate a pulse event. The vision sensor may be a sensor with a portion of pixels capable of generating a pulse event, such as a neuromorphic sensor, e.g., an event camera, dynamic vision sensor DVS, etc., or a fused sensor, e.g., a DAVIS sensor, an RGB fused with pulse event sensor, an infrared fused with pulse event sensor, a radar fused with pulse event sensor, etc. The vision sensor may also be a frame image sensor/camera sensor, etc., and the frame image is subjected to a difference frame and randomization process to obtain a pulse sequence related to the dynamic gesture, and the prior patent technology CN114466153A, CN114495178A of the inventor is incorporated herein by reference in its entirety.
The visual classification of the impulse neural network not only comprises gestures, particularly hand gestures, such as fist (class A), 1 finger (class B), 2 fingers (class C) …, 5 fingers (class E) and the like, but also can comprise the position of the gestures relative to the screen area. Note that the position here is different from the position information of each event/pulse generated by the event camera mentioned in the background art, the position of the event in the event camera refers to the coordinates of each pixel unit, and the position here refers to dividing the visual field of the vision sensor/camera into a plurality of areas, also referred to as areas or regions, for example, equally divided into 4 areas (P1, P2, P3 and P4), the division of specific areas is not limited by the present invention, and may be divided according to actual requirements, for example, equally divided into 4 areas, equally divided/equally divided into 9 areas, and the like. In the following embodiments, the field of view is equally divided into 4 areas, but the invention is not limited thereto.
Corresponding to the division of the field of view into 4 regions, 5 classifications may be further set for the gesture, for example: p0 indicates that there is no valid gesture/hand type in the field of view (e.g., not in the field of view/out of the screen, or the gesture is blocked, etc., or the hand type is undesirable), P1 indicates in region 1, P2 indicates in region 2, P3 indicates in region 3, and P4 indicates in region 4.
The impulse neural network is based on classification of gestures/hand types and the positions thereof, compared with conventional gesture-based simple classification (such as fist, extending fingers, modeling and the like, and does not comprise complex operations such as rotation and the like which need continuous motion recognition), the impulse neural network only slightly increases the number of neurons of an output layer, can recognize complex dynamic gestures such as clockwise rotation, anticlockwise rotation and the like, has a simple network model and low resource requirements.
And the post-processing module is used for judging whether the gesture rotates or slides based on the gesture position classification result output by the impulse neural network. Wherein the post-processing module is also called a judging module.
In a preferred embodiment, the classification result of the impulse neural network may be selected to be output according to the actual requirement, so as to obtain the first output, or the classification result of the impulse neural network may be post-processed to obtain the second output. The first output may be a direct output of the impulse neural network or obtained by clustering the direct output of the impulse neural network, for example, the location information is ignored by the clustering to adapt to a specific application scenario, and the first output is a simple gesture/hand type. The second output is obtained by performing logic operation/judgment on the classification result of the impulse neural network, and is a complex gesture, such as clockwise rotation, anticlockwise rotation and sliding.
Fig. 3 is a schematic diagram illustrating clockwise rotation recognition performed by the post-processing module according to an embodiment of the invention. The visual field/screen of the visual sensor is divided into 4 areas, P0 to P4 represent the positions of gestures (hand types), the P0 to P4 pulse neural network processor performs real-time reasoning/processing to identify the hand types and the positions thereof, t represents a time stamp, t0 to t5 are time of generating output by SNN or time of reading SNN classification results by a post-processing module, E represents the gesture/hand type identified by the pulse neural network as palm, A represents the gesture/hand type identified by the pulse neural network as fist, and SNN is not output when no gesture or no effective gesture (such as partial or complete occlusion or unsatisfactory hand type) exists in the visual field.
In certain embodiments, the post-processing module obtains the output of the SNN in real time. However, since the rotation speed of the hand is not limited, the rotation speed is concentrated in a certain interval range, and when the rotation speed is extremely high, the SNN may generate a lot of classification results of the same gesture in the same location in a short time, so as to further improve sparsity and reduce power consumption, and the post-processing module captures/reads the classification results of the SNN at intervals (for example, 0.05s to 0.2s, etc., which is not limited by the present invention).
Meanwhile, the SNN has event-driven characteristics, the inference results are processed and generated in real time, and for dynamic complex gestures with different speeds, such as rotation, the faster the speed is, the faster the gesture position classification result generated by the SNN changes, and vice versa. In a preferred embodiment, in order to better adapt to dynamic complex gestures with different speeds, accuracy is improved, and based on the speed of change of the gesture position classification result, the interval time Δt between reading the SNN classification result by the post-processing module is adjusted, for example, the faster the rotation speed, the smaller Δt, and vice versa. In a preferred embodiment, to avoid user play or unintended manipulation, an upper limit and/or a lower limit is set for the interval Δt.
If the delay is set to 0.1s (according with the habit of human action), the speed can be at least 2m/s, the slow action speed can be at least 0.67m/s, and the speed is about 3 times of the slow speed. The adaptation to gestures with different rates can be adjusted by setting different interval times, and the tolerance of about 3 times for the speed is provided under the setting of the same interval time.
In addition, if the post-processing module reads the SNN output once based on a certain number of SNN input pulses, the adaptability to the gesture speed is better, for example, the output is read once every 1000 pulses are input, no matter how fast or slow, the speed can be up to 1000 spike, and enough information is accumulated, so that the post-processing tolerance is better.
In some cases, the SNN classification result read by the post-processing module when the palm rotates clockwise is listed in fig. 3. Based on the gesture position/location classification result output by the SNN, fig. 3 (a) shows that the post-processing module respectively obtains a P1 region, a P2 region, a P3 region, a P4 region and a P3 region, which are sequentially present in the visual field of the vision sensor, from t0 to t 5. Fig. 3 (b) shows that when the palm of the user rotates clockwise, the speed is not uniform (slow first and fast second), and the palm appears in the P1 region, the P2 region, the P3 region and the P4 region in the visual field of the vision sensor in sequence at time t0 to time t4, and the hand changes to a fist at time t 5. Fig. 3 (c) shows that when the palm of the user rotates clockwise, the speed is not uniform (fast and slow), and the palm does not appear in the P3 region (the palm does not pass through the region or goes beyond the screen or is blocked, etc.), and the palm sequentially appears in the P1 region, the P2 region, the non-appearing region, the P4 region, and the P1 region in the visual field of the vision sensor at t0 to t 5. Fig. 3 (d) shows that when the palm of the user rotates clockwise, the motion slides directly from the P1 region to the P3 region, and the user is beyond the visual field or is stimulated by occlusion to the P4 region. The above are merely examples of 4 clockwise rotation gestures, which the present invention is not limited to.
The post-processing module utilizes a value V to represent movement potential energy/trend, wherein the larger the V value (trend value) is, the more consistent with clockwise trend, the smaller the V value (negative number) is, the more consistent with anticlockwise trend, and the V value is an intermediate value 0, and represents no action, invalid operation or no clockwise/anticlockwise trend. When the palm position/zone position obtained by the post-processing module is changed, the trend value is increased or decreased, and if the palm position/zone position is unchanged, the trend value is unchanged. Fig. 4 is a diagram showing that the post-processing module performs clockwise rotation determination in an embodiment, as shown in fig. 4, when the palm position changes from P1 to P2, the trend value increases by 3, and when the palm position changes from P2 to P3, the trend value further increases by 3 or another value (e.g. 5), and the change amount (increase/decrease amount) of the trend value is not particularly limited when the palm position changes. Setting the initial value of the trend value to 0 (or other values, which is not limited by the present invention), when the palm position obtained by the processing module in fig. 4 changes from P1 to P2, the trend value is updated to 3, and when the palm position changes from P2 to P3 again, the trend value is updated to 6 … …
In some embodiments, the trend value changes by the same amount as the palm position changes. In a preferred embodiment, the trend value is set based on the angle of palm position change, for example, in an embodiment, when the palm position changes from P1 to P2, the trend value increases by 3, and when the palm position changes from P2 to P3, the change angle is 90 degrees, and the trend value increases by 5. In another embodiment, when the palm position is changed from P1 to P2, the trend value increases by 3, and when the palm position is changed from P2 to P4 (the region P3 is skipped), the change angle is 45 degrees (or the vector angle is 135 degrees), the trend value increases by 6, and at this time, the absolute value of the change amount of the trend value is proportional to the vector angle of the change of the palm position. In other embodiments, the change angle is 45 degrees (or the vector angle is 135 degrees), the trend value is increased by 4, and at this time, the absolute value of the change amount of the trend value is inversely proportional to the vector angle of the palm position change.
The post-processing module judges whether a preset condition is met or not based on the trend value, if so, the gesture is a dynamic rotation gesture (clockwise or anticlockwise rotation), and the like.
In some embodiments, the preset condition is a trend value threshold, and the current dynamic gesture is determined based on the trend value threshold, including a first threshold or/and a second threshold, where the first threshold is a clockwise threshold and the second threshold is a counterclockwise threshold. Specifically, the post-processing module performs a logic operation, such as summing, adding or subtracting, on the trend value based on the change of the palm area, calculates/updates the trend value, and the updated trend value satisfies the first threshold or the second threshold, and then the current operation is clockwise or anticlockwise.
For example, if the clockwise direction is the trend value increasing direction and the counterclockwise direction is the trend value decreasing direction, when the updated trend value is greater than or equal to the first threshold, the current gesture is determined to be a clockwise sliding, and when the updated trend value is less than or equal to the second threshold, the current gesture is determined to be a counterclockwise sliding, in some embodiments, if the initial value/the reset value of the trend value is 0, the first threshold is a positive number, the second threshold is a negative number, and in some preferred embodiments, the absolute values of the first threshold and the second threshold are equal. If the initial value/reset value of the trend value is not 0, the first threshold value and the second threshold value are located at two sides of the initial value, and in a preferred embodiment, the first threshold value and the second threshold value are equal in distance from the initial value.
And if the clockwise direction is the trend value decreasing direction and the anticlockwise direction is the trend value increasing direction, judging that the current gesture is clockwise sliding when the updated trend value is smaller than or equal to a first threshold value, and judging that the current gesture is anticlockwise sliding when the updated trend value is larger than or equal to a second threshold value. In the following embodiments of the present invention, the trend value is increased during clockwise rotation and the trend value is decreased during counterclockwise rotation are taken as examples, but the present invention is not limited thereto.
Fig. 4 is a schematic diagram showing the trend value change when clockwise rotation is identified in a preferred embodiment, and fig. 5 is a schematic diagram showing the trend value change when counterclockwise rotation is identified in a preferred embodiment.
If the palm area position changes from P1 to P2 to P3 and the trend value is 6 and is larger than the first threshold, the current gesture is considered to be clockwise rotation. If the palm position change experiences P1, P2, P3, P4, P1, P1, P2, P3, P4, the current gesture is considered to be rotating clockwise all the time from the palm position change to the first P3.
In some embodiments, the number of clockwise rotations may be further determined, and subsequent control may be performed based on the number of rotations. For example, the starting position is P1, the number of times that the palm passes through P1 after the position change is calculated, 1 time passes through the palm is regarded as clockwise rotating by 1 turn, 1 time passes through the palm is regarded as clockwise rotating by 2 turns, and the like. In order to adapt to the diversity of the rotation gestures, if the initial position is P1, the times of passing P3 or P4 by the palm after the position change are calculated, the palm is considered to rotate clockwise for 1 circle after passing 1 time, the palm is considered to rotate clockwise for 2 circles after passing 1 time, and the like. In some embodiments, the rotation duration may be further determined, and subsequent control may be performed according to the rotation duration, for example, to control volume increase/decrease according to the rotation duration.
In the rotation process, taking action difference into consideration, setting an intermediate state, wherein the intermediate state refers to that the post-processing module acquires the same gesture from the SNN twice (for example, the result is that the palm is positioned in the P1 area twice in sequence) in the same area/region, or the post-processing module does not acquire effective output (with shielding and the gesture is not in the screen range) from the SNN currently, such as no gesture in the field of view/screen, and the like. In the middle state clockwise, the trend value is unchanged or reduced by an amount smaller than the previous trend value, as in fig. 4, if the palm area change goes through P1, P2, P3, the trend value increases by 3 when the area change occurs, and decreases by 1 or 2 when the trend value changes from the middle state P2 to P2. If the palm position changes P1, P2, none and P4, the trend value increases by 3 when the P1 to P2 position changes, the trend value decreases by 1 or 2 due to the intermediate state of SNN without output, and then the position value is obtained as P4, and the trend value increases by 3. Until the latest updated trend value reaches (is greater than or equal to) the first threshold value, the current gesture is considered to be clockwise rotation.
In a preferred embodiment, if a certain intermediate state occurs more than twice in succession, the gesture is considered to be interrupted, or not a valid action, and the trend value is reset. For example, palm-zone bit changes occur P1, P2, P2, P2 or P1, P2, P3, P3, P3, resetting the trend value.
In another preferred embodiment, the trend value is reset if a rollback phenomenon occurs. The rollback is to ignore the intermediate state, and the trend value is increased first and then decreased by the same value. For example, the palm position is changed from the first position to the second position and back to the first position. The palm position is changed by two continuous areas, the trend value is increased by one value, then the same value is reduced, or the trend value is reduced by one value and then the same value is increased, and the trend value is changed twice, and at the moment, the trend value is unchanged. For example, P1 changes to P2, and then P2 changes to P1.
In other alternative embodiments, if there is no set back reset trend value, after the trend value reaches the first threshold, if the location changes in the opposite direction (counterclockwise), the decrease of the trend value increases as the number of times the location changes in the opposite direction increases. For example, the palm position changes occur P1, P2, P3, P4, P3, P2, where the position change of P1 to P4 indicates that the palm is rotating clockwise, each time the trend value of the position change increases by 3, the trend value decreases by 3 when the position change is from P4 to P3 in the opposite direction, and the trend value decreases by 6 when the position is again in the opposite direction.
In some embodiments, if the palm is rotated all the time, an upper limit is set for the trend value to avoid the trend value from increasing all the time, and likewise, a lower limit is set for the trend value to avoid the trend value from decreasing all the time, when the increase of the instantaneous needle rotation trend value reaches the upper limit, the trend value does not increase any more even if the palm continues to rotate clockwise, or when the decrease of the counter-clockwise rotation trend value reaches the lower limit, the trend value does not decrease any more even if the palm continues to rotate counter-clockwise.
In an embodiment, the preset condition is the number of changes of the location, the location changes at least twice continuously in the same direction, the trend value increases at least twice continuously or decreases at least twice continuously, and the gesture is considered to be a rotation operation. Note that the continuous increase/decrease here is to ignore the intermediate state that meets the condition or consider the intermediate state trend value unchanged, for example, the trend value changes to 3, 6, 9 or 3, 6, 9 when the palm rotates, and the trend value is considered to have two continuous increases. In one embodiment, to avoid ambiguity, the preset condition is that the location change changes at least three times in succession towards the same direction, i.e. the trend value increases at least three times in succession, or the trend value decreases at least three times in succession, the gesture is considered to be a rotation operation.
Taking the clockwise rotation value increase as an example, the current trend value is-12, then the palm position is subjected to the change shown in fig. 4, the trend value is updated to-9 when changing from P1 to P2, and the trend value is updated to-6 when changing from P2 to P3, at this time, the trend value is considered to be continuously changed by 2 times in the forward direction, and the gesture is considered to be clockwise rotation. In a preferred embodiment, the trend value is changed at least three times in succession in one direction, the gesture being considered a rotation.
Because the event-driven neuromorphic sensor and the processor recognize the change information, the actions of rotating the palm, swaying the palm and the like in the same location, such as P1, are considered to be simple gestures, and do not belong to the complex gestures or dynamic gestures emphasized by the invention, and the complex gestures or dynamic gestures of the invention particularly relate to the dynamic operations such as rotation, sliding and the like of the trans-location.
FIG. 6 is a diagram showing a state change when a rotation gesture is recognized according to a preferred embodiment of the present invention, including a first state (clockwise state), a second state (counterclockwise state), and a third state, wherein the third state refers to other states that do not satisfy the clockwise state or the counterclockwise state. In a preferred embodiment, the trend value is reset or reset when a jump is made from the first state or the second state to the third state.
Along with the continuous acquisition of the palm position/zone bit classification result output by the SNN, the post-processing module updates the trend value. When the updated trend value reaches/meets a first threshold (clockwise threshold), the current gesture is a clockwise rotation state. And if the updated trend value does not meet the first threshold value, the clockwise state is exited and the third state is skipped. If the updated trend value reaches the second threshold (counterclockwise threshold), the current gesture is in a counterclockwise rotation state.
In some embodiments, the first threshold and the second threshold are a value. Reaching/satisfying the first threshold or the second threshold is a case including: a first threshold value is larger than or equal to a clockwise rotation, and a second threshold value is smaller than or equal to a counterclockwise rotation; alternatively, a rotation less than or equal to the first threshold is clockwise and a rotation greater than or equal to the second threshold is counterclockwise.
In other embodiments, to improve accuracy, a first upper threshold limit or/and a first lower threshold limit, and a second upper threshold limit or/and a second lower threshold limit are set. Taking the case of clockwise rotation (i.e. the trend value increases when rotating clockwise) as an example when the trend value is greater than or equal to the first threshold value, the first threshold value is set to be 10, the lower limit of the first threshold value is set to be 8, and when the updated trend value is greater than or equal to the first threshold value 10, the clockwise state is skipped/judged. When entering the clockwise state, the current state (clockwise state) is maintained as long as the updated trend value is not less than the first threshold lower limit 8, and if the updated trend value is lower than the first threshold lower limit 8, the state is skipped to the third state, and the trend value is initialized, for example, to 0. When the updated trend value is less than or equal to a second threshold (e.g., -10), the transition is to a counter-clockwise state. When entering the counter-clockwise state, the current state (counter-clockwise state) is maintained as long as the updated trend value is not greater than the second upper threshold limit (-8), and if the updated trend value is greater than the second upper threshold limit (-8), a jump is made to a third state, such as reset/reset.
In a preferred embodiment, a first upper threshold (e.g. 12), or/and a second lower threshold (e.g. 12) are further set to avoid excessive trend values and simplify the determination. FIG. 7 is a schematic diagram of setting upper and lower limits for a first threshold and a second threshold in an embodiment of the invention.
The invention fully utilizes the SNN event driven characteristic, judges which state (clockwise state, anticlockwise state or other states) the current gesture is in through the state change, and effectively saves the power consumption. By using simple logic operation and addition/subtraction, the method has the advantages of less resource consumption and quite high processing speed.
Taking palm rotation as an example, the method for recognizing clockwise or anticlockwise rotation gestures comprises the following steps:
step S1, based on the palm position/zone bit classification result output by the SNN, the post-processing module updates the trend value. The post-processing module updates the trend value in a plurality of ways as described above. When SNN is not output effectively, the updated trend value is the same as the trend value at the previous time, or the trend value is changed by an appropriate amount, which means that the trend value is changed by a smaller amount than the previous trend value. SNN has no active output, specifically an intermediate state. In some cases, the intermediate state is one or more of the third states.
And S2, performing rotation judgment based on the updated trend value. Specifically, the updated trend value is compared with a trend value threshold, and if the trend value threshold is reached, the trend value is in a clockwise or counterclockwise rotation state.
In a preferred embodiment, the determination is to maintain the current state or exit the current state based on the updated trend value and whether the current gesture is in a rotated state. If the current palm is not in a rotating state, comparing the updated trend value with a trend value threshold, and if the updated trend value reaches the trend value threshold, rotating the palm in a clockwise or anticlockwise state. If the current palm is in a rotating state, judging whether to maintain the current state or change to an exiting state based on the updated trend value, and resetting/resetting the trend value. And if the updated trend value does not meet the first threshold value, exiting the current process.
Reaching the trend value threshold refers to: a first threshold value is larger than or equal to a clockwise rotation, and a second threshold value is smaller than or equal to a counterclockwise rotation; alternatively, a rotation less than or equal to the first threshold is clockwise and a rotation greater than or equal to the second threshold is counterclockwise.
In some embodiments, the first threshold or/and the second threshold is a value.
In other embodiments, the first threshold or/and the second threshold are intervals comprising a first upper threshold or/and a first lower threshold, and a second upper threshold or/and a second lower threshold. Taking the case that the trend value threshold is greater than or equal to the first threshold and is smaller than or equal to the second threshold and is rotated counterclockwise as an example, when the updated trend value is greater than or equal to the first threshold, the trend value is in a clockwise state, after entering the clockwise state, as long as the updated trend value is not less than the first threshold lower limit, the current state (clockwise state) is maintained, and if the updated trend value is lower than the first threshold lower limit, the clockwise state is exited, for example, reset or reset is performed. And when the updated trend value is smaller than or equal to the second threshold value, jumping to a counterclockwise state, when entering the counterclockwise state, maintaining the current state (the counterclockwise state) as long as the updated trend value is not larger than the second threshold upper limit, and if the updated trend value is larger than the second threshold upper limit, jumping to exit the counterclockwise state.
In other non-preferred embodiments, the post-processing module determines that it is based on a change in trend values over a period of timeIf the preset condition is not met, the gesture is a dynamic rotation gesture, sliding/waving gesture and the like. However, setting the time threshold increases complexity on the one hand and decreases accuracy on the other hand, for example, continuous actions may be truncated, and unrecognized situations may occur. When the processing time of the post-processing module reaches or exceeds the time threshold T θ When (i.e., no decision result/conclusion within the time threshold period), no valid result/conclusion, the trend value is reset. In some embodiments, to be able to stably recognize complex dynamic gestures, T θ Not less than 3 Δt, in other embodiments 3 Δt.ltoreq.T θ And is less than or equal to 10. Furthermore, in a preferred embodiment, to better adapt to dynamic complex gestures with different speeds, the time threshold is adjusted according to the speed of change of the gesture position classification result, for example, the faster the rotation speed is, the smaller the time threshold is, whereas the slower the rotation speed is, the larger the time threshold is. Thus, the time threshold T θ The magnitude may be proportional to the interval Δt during which the post-processing module reads the SNN classification result.
Further, based on the change of the trend value, the post-processing module may determine that the rotation is clockwise by several turns and counterclockwise by several turns. If the trend value is ignored and continuously changed for 2 to 4 times in one direction within a period of time, the palm is considered to rotate clockwise for one circle, if the trend value is continuously changed for 5 to 7 times in one direction, the palm is considered to rotate clockwise for two circles, if the trend value is continuously changed for 9 to 11 times in one direction, the palm is considered to rotate clockwise for three circles, and so on. Here, the direction of the increase in the direction index value, or the direction of the decrease in the value. In addition, the counterclockwise rotation judgment is the same.
In other embodiments, the time threshold T may be based on θ The magnitude of the internal trend value judges that the internal trend value rotates clockwise by a plurality of circles and rotates anticlockwise by a plurality of circles. For example, if the trend value is changed by +3 or-3 each time, the trend value is changed by T θ And if V is more than or equal to 15 and less than or equal to 21, the palm is considered to rotate clockwise for two circles, and the like. In addition, the counterclockwise rotation judgment is the same.
FIG. 8 is a block diagram of a post-processing module according to an embodiment of the present invention, including a logic operation module and a state recognition module. The logical operation includes calculating/judging the direction of change of the trend value, or updating the trend value with the result of the logical calculation such as summation (addition or subtraction).
The state recognition module judges the current state based on the updated trend value. In one embodiment, the updated trend value is compared to a trend value threshold, and if the trend value threshold is reached, the trend value is rotated clockwise or counterclockwise. In another embodiment, the determination is to maintain the current state or exit the current state based on the updated trend value and whether the current gesture is in a rotated state.
In a preferred embodiment, the post-processing module further includes an adjustment module for adjusting an interval Δt between reading the SNN classification result by the post-processing module based on the speed of the change of the gesture position classification result output by the SNN, or/and adjusting a time threshold T for performing logic operation in the post-processing module in a non-preferred embodiment θ 。
In a preferred embodiment, the view of the visual sensor may be subdivided according to actual requirements, as shown in fig. 9, which is a schematic diagram of identifying trend value changes during clockwise rotation in another preferred embodiment, and the view is divided into 9 areas, 16 areas, etc., if the SNN processor with higher resolution DVS and higher processing capability can support more subdivision. In addition, when the palm-area position change occurs at an angle other than 0 ° or 180 °, the variable of the trend value becomes larger, as in fig. 7, the absolute value of the trend value change is 5, and the change amount of the trend value is larger when the palm-area position is changed from P3 to P4 than when P2 to P3, but in other embodiments, the variable of the trend value may be set smaller when the palm-area position change occurs at an angle other than 0 ° or 180 °, as in fig. 7, the absolute value of the trend value change is 2. In other embodiments, for trend values that vary in one direction (clockwise/counterclockwise), the amount of change in trend values as zone locations change may be set equal, or the variable of trend values may be set proportional or inversely proportional to the number of trend value changes. For a dynamic gesture recognition task, the visual field is not too thin, so that the rotation gesture recognition is realized, the resources and the power consumption are saved, and the processing speed is improved.
In a preferred embodiment, the field of view or screen range of the vision sensor is divided into 3 areas for rotation identification. The technical scheme of the invention is also suitable for sliding identification, and in other embodiments, the visual field or the screen range of the visual sensor is divided into a plurality of areas for rotation or sliding identification. The invention has 2 or more fingers.
Due to the high freedom and complexity of dynamic gestures, the behavior habits of different users are different and even the rotation gestures of the same user are changeable. The invention can accurately identify and distinguish clockwise or anticlockwise even identify which direction rotates for a plurality of circles under the conditions of different rotation speeds, different distances and shielding and no gesture in the screen range in the rotation process, has simple scheme, low power consumption and good instantaneity, and effectively solves the difficult problems of complex dynamic gesture identification such as rotation and the like.
The invention relates to a chip, which comprises a pulse neural network and a post-processing module. And the impulse neural network is used for processing impulse events related to the dynamic gestures to obtain classification results of the gestures/hands and the locations thereof.
The post-processing module is coupled with the impulse neural network and used for judging whether the state of the dynamic gesture is clockwise rotation or anticlockwise rotation based on the gesture output by the impulse neural network and the zonal classification result.
In a preferred embodiment, the chip further includes a selection module, configured to select to output the SNN classification result directly or after clustering, or select to input the gesture and location classification result output by the SNN to the post-processing module, and output the result after processing by the post-processing module.
Existing neural networks that recognize dynamic gestures typically require large networks, typically with at least 7 and even more than ten convolutional layers, and a relatively large number of convolutional kernels per layer. The SNN is more suitable for small networks, the number of network layers or/and the total number of neurons is small, and furthermore, the number of convolution kernels of each layer is smaller. The impulse neural network used in the test chip comprises a 4-layer network (a 3-convolution network and a 1-layer fully-connected network), and the total number of the neurons is less than three thousand. When the test chip performs dynamic gesture recognition, the static power consumption is about 0.1mW, and when the test chip performs clockwise or anticlockwise rotation recognition, the dynamic power consumption is less than 10mW (mostly 6-8 mW), and the accuracy of the rotation gesture recognition is usually more than 95%. In addition, as the real-time performance of the SNN is very high, the post-processing process is simple logic operation and addition/subtraction, and the processing time of clockwise or anticlockwise rotation identification (including the whole process of DVS perception+SNN reasoning+post-processing) of the test chip is not more than 5ms. Moreover, the test chip can accurately classify near or far gestures (reflected on imaging size), the optimal range of palm imaging is 1/16-1/4 of the field of view/screen occupied by palm imaging, the optimal clustering range is about 0.5m-1.5m, and different adaptation intervals can be obtained by changing lenses with different focal lengths.
The SNN-based gesture/hand-type position classification and simple post-processing can effectively cope with the variability and complexity of dynamic gestures and backgrounds. The SNN is event-driven, the input of the SNN is a pulse event related to gesture change, and the hand cannot be always motionless, so that background information can be effectively filtered, meanwhile, the post-processing module is simple in logic and easy to implement, on the one hand, on the basis that system power consumption and instantaneity are not affected, the changeable gesture and user behavior can be accurately judged, and on the other hand, the post-processing module remarkably reduces the network scale, algorithm difficulty, network training cost (massive training data, calculation cost, training time and the like required by large-network training), hardware difficulty and chip cost required by SNN tracking changeable dynamic gestures.
The trend value increase during clockwise rotation, the clockwise judgment process and the like are examples, the anticlockwise judgment is the same, the clockwise judgment condition and the anticlockwise judgment condition are interchangeable, and a person skilled in the art can change or adjust according to actual requirements or user habits, so that the technical scheme also belongs to the protection scope of the invention.
Although the present invention has been described with reference to specific features and embodiments thereof, various modifications, combinations, substitutions can be made thereto without departing from the invention. The scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification, but rather, the methods and modules may be practiced in one or more products, methods, and systems of the associated, interdependent, inter-working, pre/post stages.
The specification and drawings are, accordingly, to be regarded in an abbreviated manner as an introduction to some embodiments of the technical solutions defined by the appended claims and are thus to be construed in accordance with the doctrine of greatest reasonable interpretation and are intended to cover as much as possible all modifications, changes, combinations or equivalents within the scope of the disclosure of the invention while also avoiding unreasonable interpretation.
Further improvements in the technical solutions may be made by those skilled in the art on the basis of the present invention in order to achieve better technical results or for the needs of certain applications. However, even if the partial improvement/design has creative or/and progressive characteristics, the technical idea of the present invention is relied on to cover the technical features defined in the claims, and the technical scheme shall fall within the protection scope of the present invention.
The features recited in the appended claims may be presented in the form of alternative features or in the order of some of the technical processes or the sequence of organization of materials may be combined. Those skilled in the art will readily recognize that such modifications, changes, and substitutions can be made herein after with the understanding of the present invention, by changing the sequence of the process steps and the organization of the materials, and then by employing substantially the same means to solve substantially the same technical problem and achieve substantially the same technical result, and therefore such modifications, changes, and substitutions should be made herein by the equivalency of the claims even though they are specifically defined in the appended claims.
The steps and components of the embodiments have been described generally in terms of functions in the foregoing description to clearly illustrate this interchangeability of hardware and software, and in terms of various steps or modules described in connection with the embodiments disclosed herein, may be implemented in hardware, software, or a combination of both. Whether such functionality is implemented as hardware or software depends upon the particular application or design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be beyond the scope of the claimed invention.
Claims (36)
1. A gesture recognition method, characterized in that:
processing pulse events related to the dynamic gestures by using a pulse neural network to obtain a classification result of the hands and the locations thereof;
post-processing the classification results of the hand type and the location thereof output by the pulse neural network through a post-processing module so as to identify dynamic gestures;
the pulse neural network obtains pulse events related to dynamic gestures based on signals acquired by the vision sensor; the visual field or screen range of the visual sensor is divided into a plurality of areas, and the location is any area in the plurality of areas;
The post-processing module updates the trend value based on the classification result of the hand shape and the location thereof output by the pulse neural network; the dynamic gesture is identified based on the updated trend value, which represents the dynamic gesture potential energy or trend.
2. The gesture recognition method of claim 1, wherein:
when the updated trend value meets the clockwise condition, the current gesture is in a clockwise rotation state;
when the updated trend value meets the anticlockwise condition, the current gesture is in an anticlockwise rotation state;
wherein the clockwise condition and the counterclockwise condition include one of:
i) When the first threshold value is larger than or equal to the first threshold value and rotates clockwise, the second threshold value is smaller than or equal to the second threshold value and rotates anticlockwise;
ii) if the first threshold is less than or equal to the clockwise rotation, the second threshold is greater than or equal to the counterclockwise rotation.
3. The gesture recognition method of claim 2, wherein:
the first threshold value or/and the second threshold value comprises one of the following:
i) Is a numerical value;
ii) is a range comprising an upper limit or/and a lower limit.
4. A gesture recognition method according to claim 3, wherein:
the first threshold includes an upper limit or/and a lower limit, and the clockwise condition includes one of:
i) The trend value is larger than or equal to a first threshold value and is clockwise rotation, and when the updated trend value is smaller than or equal to a first threshold value lower limit, the trend value is out of a clockwise rotation state;
ii) less than or equal to the first threshold is a clockwise rotation: when the updated trend value is greater than or equal to the first threshold upper limit, the clockwise rotation state is exited;
when the second threshold includes an upper limit or/and a lower limit, the counterclockwise condition includes one of:
i) The trend value is smaller than or equal to a second threshold value and is clockwise rotation, and when the updated trend value is larger than or equal to a second threshold value upper limit, the counter-clockwise rotation state is exited;
ii) greater than or equal to the second threshold is a clockwise rotation, and when the updated trend value is less than or equal to the second threshold lower limit, the counter-clockwise rotation state is exited.
5. The gesture recognition method of claim 4, wherein:
and if the updated trend value does not meet the clockwise condition or the anticlockwise condition, exiting the clockwise state or the anticlockwise state.
6. The gesture recognition method according to claim 4 or 5, characterized in that:
and when the clockwise state or the anticlockwise state is exited, resetting or initializing the trend value.
7. The gesture recognition method of claim 2, wherein:
If the region position of the same hand type is changed at least twice continuously in the clockwise direction, the current gesture is clockwise rotation;
if the position of the same hand type changes at least twice continuously in the anticlockwise direction, the current gesture is anticlockwise rotation.
8. The gesture recognition method according to any one of claims 1 to 5 and 7, wherein:
when the position of the same hand is changed twice, the trend value is increased or decreased.
9. The gesture recognition method of claim 8, wherein:
if the post-processing module acquires the intermediate state, the trend value is unchanged;
or the change amount of the trend value in the middle state is smaller than that of the trend value in the area position change;
the intermediate state refers to that the two adjacent post-processing modules acquire the same classification result from the impulse neural network or do not acquire effective output from the impulse neural network;
the effective output means that the impulse neural network does not output, or the hand type classification of the impulse neural network which is output twice is different.
10. The gesture recognition method of claim 9, wherein:
if the number of times of continuous occurrence of a certain intermediate state is greater than or equal to two times, resetting the trend value;
If a rollback phenomenon occurs, the trend value is reset, and the rollback means that the trend value is increased by a value first and then decreased by the same value in the middle state is ignored.
11. The gesture recognition method according to any one of claims 2 to 5 and 7, wherein:
the post-processing module identifies the number of clockwise or counterclockwise rotations based on the number of passes through the same location after the change of the same hand-type location.
12. The gesture recognition method according to any one of claims 1 to 5 and 7, wherein:
the trend value is updated based on an addition or subtraction operation.
13. The gesture recognition method according to any one of claims 1 to 5 and 7, wherein:
the post-processing module acquires the output of the impulse neural network in real time, or acquires the output of the impulse neural network at intervals of fixed number of impulse neural network input impulses.
14. The gesture recognition method of claim 13, wherein:
the interval time of the post-processing module for reading the output of the impulse neural network is in direct proportion to the speed of the dynamic gesture, or the number of the input impulses of the impulse neural network, which is referred by the output of the post-processing module for reading the impulse neural network, is in direct proportion to the speed of the dynamic gesture.
15. A gesture recognition apparatus, characterized in that:
the system comprises a pulse neural network and a post-processing module;
the impulse neural network is used for processing impulse events related to the dynamic gestures to obtain a classification result of the hands and the locations thereof;
the pulse neural network obtains pulse events related to dynamic gestures based on signals acquired by the vision sensor; the visual field or screen range of the visual sensor comprises a plurality of areas, and the location is any area in the plurality of areas;
the post-processing module is coupled with the pulse neural network and used for judging the state of the dynamic gesture based on the classification result of the hand type and the location thereof output by the pulse neural network;
wherein, the post-processing module includes:
the logic operation module is used for carrying out logic operation on the classification results of the hand type and the location thereof output by the adjacent two-time pulse neural network so as to update a trend value, wherein the trend value represents dynamic gesture potential energy or trend;
and the state identification module is used for identifying the dynamic gesture state based on the updated trend value.
16. The gesture recognition apparatus of claim 15, wherein:
when the updated trend value meets the clockwise condition, the current gesture is in a clockwise rotation state;
When the updated trend value meets the anticlockwise condition, the current gesture is in an anticlockwise rotation state;
wherein the clockwise condition and the counterclockwise condition include one of:
i) When the first threshold value is larger than or equal to the first threshold value and rotates clockwise, the second threshold value is smaller than or equal to the second threshold value and rotates anticlockwise;
ii) if the first threshold is less than or equal to the clockwise rotation, the second threshold is greater than or equal to the counterclockwise rotation.
17. The gesture recognition apparatus of claim 16, wherein:
the first threshold value or/and the second threshold value comprises one of the following:
i) Is a numerical value;
ii) is a range comprising an upper limit or/and a lower limit.
18. The gesture recognition apparatus of claim 17, wherein:
the first threshold includes an upper limit or/and a lower limit, and the clockwise condition includes one of:
i) The trend value is larger than or equal to a first threshold value and is clockwise rotation, and when the updated trend value is smaller than or equal to a first threshold value lower limit, the trend value is out of a clockwise rotation state;
ii) less than or equal to the first threshold is a clockwise rotation: when the updated trend value is greater than or equal to the first threshold upper limit, the clockwise rotation state is exited;
when the second threshold includes an upper limit or/and a lower limit, the counterclockwise condition includes one of:
i) The trend value is smaller than or equal to a second threshold value and is clockwise rotation, and when the updated trend value is larger than or equal to a second threshold value upper limit, the counter-clockwise rotation state is exited;
ii) greater than or equal to the second threshold is a clockwise rotation, and when the updated trend value is less than or equal to the second threshold lower limit, the counter-clockwise rotation state is exited.
19. The gesture recognition apparatus of claim 18, wherein:
and if the updated trend value does not meet the clockwise condition or the anticlockwise condition, exiting the clockwise state or the anticlockwise state.
20. The gesture recognition apparatus of claim 16, wherein:
if the region position of the same hand type is changed at least twice continuously in the clockwise direction, the current gesture is clockwise rotation;
if the position of the same hand type changes at least twice continuously in the anticlockwise direction, the current gesture is anticlockwise rotation.
21. The gesture recognition apparatus according to any one of claims 15 to 20, wherein:
when the position of the same hand is changed twice, the trend value is increased or decreased.
22. The gesture recognition apparatus according to any one of claims 15 to 20, wherein:
If the post-processing module acquires the intermediate state, the trend value is unchanged;
or the change amount of the trend value in the middle state is smaller than that of the trend value in the area position change;
the intermediate state refers to that the two adjacent post-processing modules acquire the same classification result from the impulse neural network or do not acquire effective output from the impulse neural network;
the effective output means that the impulse neural network does not output, or the hand type classification of the impulse neural network which is output twice is different.
23. The gesture recognition apparatus of claim 22, wherein:
if the number of times of continuous occurrence of a certain intermediate state is greater than or equal to two times, resetting the trend value;
if a rollback phenomenon occurs, the trend value is reset, and the rollback means that the trend value is increased by a value first and then decreased by the same value in the middle state is ignored.
24. The gesture recognition apparatus according to any one of claims 16 to 20, wherein:
the post-processing module identifies the number of clockwise or counterclockwise rotations based on the number of passes through the same location after the change of the same hand-type location.
25. The gesture recognition apparatus according to any one of claims 15 to 20, wherein:
The post-processing module acquires the output of the impulse neural network in real time, or acquires the output of the impulse neural network at intervals of fixed number of impulse neural network input impulses.
26. The gesture recognition device of claim 25, wherein the post-processing module further comprises:
the adjustment module is coupled between the impulse neural network and the logic operation module, and is used for adjusting the interval time for reading the impulse neural network output or adjusting the number of impulse neural network input pulses for reading the impulse neural network output reference by the post-processing module based on the speed of the dynamic gesture.
27. The gesture recognition apparatus of claim 26, wherein:
the interval time of the output of the read pulse neural network is in direct proportion to the speed of the dynamic gesture, or the number of the input pulses of the pulse neural network, which are referred by the output of the read pulse neural network, of the post-processing module is in direct proportion to the speed of the dynamic gesture.
28. The gesture recognition apparatus according to any one of claims 15 to 20, wherein:
the trend value is updated based on an addition or subtraction operation.
29. The gesture recognition apparatus according to any one of claims 15 to 20, wherein:
the vision sensor is a frame-based image sensor or a sensor in which some or all of the pixels are capable of generating a pulse event.
30. The gesture recognition apparatus according to any one of claims 15 to 20, wherein:
when the region position of the same hand type changes twice adjacently, the absolute value of the change quantity of the trend value is in direct proportion or inverse proportion to the vector included angle of the region position change of the same hand type.
31. The gesture recognition device of claim 27, further comprising:
the selection module is used for selecting and directly outputting the classification result of the impulse neural network or transmitting the classification result of the impulse neural network to the post-processing module for processing.
32. The gesture recognition apparatus according to any one of claims 15 to 20, wherein:
the number of layers of the impulse neural network is less than or equal to 6 layers, or the total number of neurons in the impulse neural network is less than five thousand.
33. A chip comprising a gesture recognition apparatus according to any one of claims 15 to 32.
34. The chip of claim 33, wherein:
The chip is a brain-like chip or a neuromorphic chip.
35. An electronic device comprising gesture recognition apparatus according to any one of claims 15 to 32, or comprising a chip according to any one of claims 33 to 34.
36. The electronic device of claim 35, wherein:
the electronic device includes a vision sensor coupled with a pulsed neural network processor;
the impulse neural network processor obtains impulse events related to dynamic gestures based on signals acquired by the vision sensors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310297117.XA CN116030535B (en) | 2023-03-24 | 2023-03-24 | Gesture recognition method and device, chip and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310297117.XA CN116030535B (en) | 2023-03-24 | 2023-03-24 | Gesture recognition method and device, chip and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116030535A CN116030535A (en) | 2023-04-28 |
CN116030535B true CN116030535B (en) | 2023-06-20 |
Family
ID=86079772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310297117.XA Active CN116030535B (en) | 2023-03-24 | 2023-03-24 | Gesture recognition method and device, chip and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116030535B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116343342B (en) * | 2023-05-30 | 2023-08-04 | 山东海量信息技术研究院 | Sign language recognition method, system, device, electronic equipment and readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111695011A (en) * | 2020-06-16 | 2020-09-22 | 清华大学 | Tensor expression-based dynamic hypergraph structure learning classification method and system |
CN113902106A (en) * | 2021-12-06 | 2022-01-07 | 成都时识科技有限公司 | Pulse event decision device, method, chip and electronic equipment |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111417963B (en) * | 2018-11-01 | 2021-06-22 | P·A·范德梅德 | Improved spiking neural network |
EP3926544B1 (en) * | 2020-06-18 | 2024-03-13 | Tata Consultancy Services Limited | System and method of gesture recognition using a reservoir based convolutional spiking neural network |
CN115115526A (en) * | 2021-03-19 | 2022-09-27 | 阿里巴巴新加坡控股有限公司 | Image processing method and apparatus, storage medium, and graphic calculation processor |
CN113326918A (en) * | 2021-04-29 | 2021-08-31 | 杭州微纳核芯电子科技有限公司 | Feature extraction circuit, neural network, system, integrated circuit, chip and device |
CN113205048B (en) * | 2021-05-06 | 2022-09-09 | 浙江大学 | Gesture recognition method and system |
CN115546248A (en) * | 2021-06-30 | 2022-12-30 | 华为技术有限公司 | Event data processing method, device and system |
CN113516676B (en) * | 2021-09-14 | 2021-12-28 | 成都时识科技有限公司 | Angular point detection method, impulse neural network processor, chip and electronic product |
CN114816057A (en) * | 2022-04-20 | 2022-07-29 | 广州瀚信通信科技股份有限公司 | Somatosensory intelligent terminal interaction method, device, equipment and storage medium |
CN114998996B (en) * | 2022-06-14 | 2024-04-05 | 中国电信股份有限公司 | Signal processing method, device and equipment with motion attribute information and storage |
-
2023
- 2023-03-24 CN CN202310297117.XA patent/CN116030535B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111695011A (en) * | 2020-06-16 | 2020-09-22 | 清华大学 | Tensor expression-based dynamic hypergraph structure learning classification method and system |
CN113902106A (en) * | 2021-12-06 | 2022-01-07 | 成都时识科技有限公司 | Pulse event decision device, method, chip and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN116030535A (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bao et al. | Tiny hand gesture recognition without localization via a deep convolutional network | |
KR102437456B1 (en) | Event camera-based deformable object tracking | |
US9886094B2 (en) | Low-latency gesture detection | |
CN106537305B (en) | Method for classifying touch events and touch sensitive device | |
US20140204013A1 (en) | Part and state detection for gesture recognition | |
Akolkar et al. | What can neuromorphic event-driven precise timing add to spike-based pattern recognition? | |
Rao et al. | Visual object target tracking using particle filter: a survey | |
CN116030535B (en) | Gesture recognition method and device, chip and electronic equipment | |
CN113449573A (en) | Dynamic gesture recognition method and device | |
Zheng et al. | Spike-based motion estimation for object tracking through bio-inspired unsupervised learning | |
CN117581275A (en) | Eye gaze classification | |
Hsieh et al. | Air-writing recognition based on deep convolutional neural networks | |
CN106778670A (en) | Gesture identifying device and recognition methods | |
Cohen | Event-based feature detection, recognition and classification | |
Kirkland et al. | Perception understanding action: adding understanding to the perception action cycle with spiking segmentation | |
Surya et al. | Cursor Movement Based on Object Detection Using Vision Transformers | |
Nguyen et al. | Hand segmentation and fingertip tracking from depth camera images using deep convolutional neural network and multi-task segnet | |
CN113031464B (en) | Device control method, device, electronic device and storage medium | |
CN111242084B (en) | Robot control method, robot control device, robot and computer readable storage medium | |
Sen et al. | Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface | |
Fernández | Development of a hand pose recognition system on an embedded computer using Artificial Intelligence | |
Ke et al. | [Retracted] A Visual Human‐Computer Interaction System Based on Hybrid Visual Model | |
Ghodichor et al. | Virtual mouse using hand gesture and color detection | |
Vasanthagokul et al. | Virtual Mouse to Enhance User Experience and Increase Accessibility | |
Sullivan et al. | Representing motion information from event-based cameras |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |