CN113127834A

CN113127834A - Barrier-free man-machine identification verification method and system

Info

Publication number: CN113127834A
Application number: CN202110523264.5A
Authority: CN
Inventors: 李哲; 李若愚; 陈弢
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-07-16

Abstract

The present disclosure provides a barrier-free man-machine identification verification method, which includes: receiving a voice indication that a user is to gesture; recognizing a gesture made by the user based on the voice indication; determining whether the recognized gesture matches the indicated gesture; and determining that the user is authenticated based on the recognized gesture matching the indicated gesture.

Description

Barrier-free man-machine identification verification method and system

Technical Field

The present disclosure relates generally to human-machine recognition, and more particularly to barrier-free human-machine recognition.

Background

The verification code is a widely used product in the mobile internet and aims to perform man-machine identification. The mainstream verification code products in the market at present have the forms of pattern recognition, character recognition, pattern sliding and the like. However, the mainstream verification code mode is difficult to be completed for the visually impaired people, because the mainstream verification code mode usually depends on the visual ability of the user, and the visually impaired people cannot independently complete the verification. This will make the user experience worse for visually impaired people.

On the other hand, when the user is in sports, riding, driving and the like, the authentication is not suitable for the authentication in the form of the authentication code product which is mainstream in the market at present, but the authentication can be performed even if the sight line is not transferred to the mobile phone or the equipment screen.

Therefore, there is a need in the art for a scheme that enables visually impaired people or people in need to successfully perform verification while ensuring safety, so as to improve the user's sense of safety and experience.

Disclosure of Invention

In order to solve the technical problem, the present disclosure provides an obstacle-free human-machine identification verification scheme, which enables visually impaired people or people with needs to successfully complete verification, and improves the safety and experience of users.

In an embodiment of the present disclosure, an accessible human-machine identification verification method is provided, including: receiving a voice indication that a user is to gesture; recognizing a gesture made by the user based on the voice indication; determining whether the recognized gesture matches the indicated gesture; and determining that the user is authenticated based on the recognized gesture matching the indicated gesture.

In another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a gesture made by the user in a contactless interaction scenario.

In yet another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a gesture made by the user in free hands.

In another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a gesture made by the user in a contact interaction scenario.

In yet another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a gesture made by the user holding a cell phone.

In another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a gesture made by the user through a wearable device.

In yet another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a static gesture.

In another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a dynamic gesture.

In yet another embodiment of the present disclosure, determining whether the recognized gesture matches the indicated gesture is based on a template matching method.

In another embodiment of the present disclosure, determining whether the recognized gesture matches the indicated gesture is based on a gesture dynamic trajectory recognition method.

In yet another embodiment of the present disclosure, determining whether the recognized gesture matches the indicated gesture is based on a key frame template matching method.

In an embodiment of the present disclosure, there is provided an obstacle-free human-machine identification verification system, including: a receiving module that receives a voice indication that a user is to make a gesture; a gesture recognition module that recognizes a gesture made by the user based on the voice indication; a match determination module that determines whether the recognized gesture matches the indicated gesture and determines that the user is authenticated based on the recognized gesture matching the indicated gesture; and an output module that outputs the verification determination result.

In yet another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a gesture made by the user in a contact interaction scenario.

In another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a static gesture.

In yet another embodiment of the present disclosure, the gesture made by the user based on the voice indication is a dynamic gesture.

In an embodiment of the disclosure, a computer-readable storage medium is provided that stores instructions that, when executed, cause a machine to perform the method as previously described.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Drawings

The foregoing summary, as well as the following detailed description of the present disclosure, will be better understood when read in conjunction with the appended drawings. It is to be noted that the appended drawings are intended as examples of the claimed invention. In the drawings, like reference characters designate the same or similar elements.

FIG. 1 is a flow chart illustrating a method of barrier-free human identification verification according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a process for barrier-free human identification verification with a user in contact human interaction, according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a process for barrier-free human identification verification with a user in a contactless human-machine interaction, according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating an unobstructed human machine recognition verification process in the case of recognizing a static gesture, according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating an unobstructed human machine recognition verification process in the case of recognizing a dynamic gesture, according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating an unobstructed human machine identification verification framework in accordance with an embodiment of the present disclosure;

fig. 7 is a block diagram illustrating an unobstructed human-machine identification verification system according to an embodiment of the present disclosure.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, embodiments accompanying the present disclosure are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein, and thus the present disclosure is not limited to the specific embodiments disclosed below.

In the traditional man-machine identification process, the operation of the verification code basically cannot be separated from the hand-eye cooperation operation of a user, such as image identification, character identification, graphic sliding and the like.

However, in modern life, many scenes are not suitable for human-computer identification by using traditional verification codes. For example, visually impaired people have difficulty reading images or text on the human-computer interface due to lack of or defective vision. As another example, authentication may be required while driving a car, where the driver must move his or her eyes away from the road if interacting with the touch screen display, creating a potential hazard.

In order to improve user experience and guarantee user safety, a more friendly and safer human-computer identification verification scheme needs to be introduced.

The barrier-free man-machine identification verification method disclosed by the disclosure is a barrier-free man-machine identification verification scheme based on the posture of a user, and can ensure that visual impairment or people with needs can successfully complete verification while ensuring safety.

In the present disclosure, the user gesture will be described as an example, but those skilled in the art can understand that the technical solution of the present disclosure is also applicable to other gestures of the user, such as a dance gesture, a gait gesture, and the like, and details are not described herein.

Fig. 1 is a flow diagram illustrating an unobstructed human-machine identification verification method 100 according to an embodiment of the present disclosure.

At 102, a voice indication is received that a user is to gesture.

When the user is inconvenient or can not carry out verification operation through the man-machine interface, the APP or the platform gives a voice indication that the user needs to gesture, for example, "please hold the mobile phone to draw a triangle in the air", "please draw an 8 in the air with free hands", and the like.

The device now receives and captures the voice indication and analyzes it, acquiring the gesture indicated by the voice indication. Those skilled in the art will appreciate that the acquisition and analysis of the voice indication during this interaction may be performed using voice recognition and natural language processing techniques. Moreover, the technical solution of the present disclosure may incorporate a new information processing technology, which is not described herein again.

At 104, a gesture made by the user based on the voice indication is recognized.

The user will make a gesture based on the voice indication, e.g. the gestures listed above. The user's gestures may be made by way of contact or contactless interaction.

Contact interaction is performed through wearable devices (e.g., data gloves), cell phones, touch screens, and the like. The contactless interaction is performed by the motion of the user or by free hand. In one embodiment of the present disclosure, a user holding a mobile phone draws a triangle in the air as a contact interaction, and the data related to the gesture thereof can be captured by a built-in sensor (e.g., an inertial sensor) of the mobile phone. In another embodiment of the present disclosure, the user draws 8 one in the air with a free hand as a non-contact interaction, and the gesture related data thereof can be collected by a sensor, a camera, a radar and the like.

After the gesture-related data is captured, adaptive gesture recognition, i.e., static gesture recognition and dynamic gesture recognition, is performed. This will be described in detail below with reference to fig. 4 and 5, respectively.

At 106, it is determined whether the recognized gesture matches the indicated gesture.

After the recognized gesture at 104 is obtained, it is determined whether the recognized gesture matches the indicated gesture. Such a match determination may be a similarity determination, i.e., whether the degree of similarity of the recognized gesture to the indicated gesture is within an acceptable threshold; a classification determination may also be made, i.e., whether the recognized gesture is of the same category as the indicated gesture; of course, a trajectory determination may also be made, i.e. whether the trajectories of both the recognized gesture and the indicated gesture fit.

Those skilled in the art will appreciate that the match determination can be performed using different techniques as desired, and that new match determination techniques can be incorporated.

At 108, it is determined that the user is authenticated based on the recognized gesture matching the indicated gesture.

Based on the match determination at 106, if the recognized gesture matches the indicated gesture, the user is determined to be authenticated.

In an embodiment of the present disclosure, when the similarity of the recognized gesture and the indicated gesture is within an acceptable threshold, determining that the user is authenticated; in another embodiment of the present disclosure, when the recognized gesture and the indicated gesture belong to the same category, it is determined that the user is authenticated; in yet another embodiment of the present disclosure, the user is determined to be authenticated when the recognized gesture matches the trajectory of both the indicated gestures.

Therefore, the barrier-free man-machine identification verification method does not need to rely on the visual ability of the user, and is convenient and safe for the visually impaired people or other people in need. Meanwhile, the risk of automatic operation of the machine script can be effectively prevented according to the recognition of the air gesture of the user.

Fig. 2 is a schematic diagram illustrating a process of barrier-free human-machine identification verification with a user in contact human-machine interaction according to an embodiment of the present disclosure. Fig. 3 is a schematic diagram illustrating a process of barrier-free human-machine identification verification with a user in a contactless human-machine interaction according to an embodiment of the present disclosure.

Gestures are various shapes and actions produced by a combination of hands or arms that express a certain meaning and idea. For example, sign language in gestures is a way to communicate information between deaf-mutes and healthy people. Human-computer interaction may be through a textual user interface, a Graphical User Interface (GUI), and a Natural User Interface (NUI), among others. Human-computer interactions can be divided into contact interactions and non-contact interactions.

The contact interaction described in the present disclosure does not refer to an interaction process of direct input by touching a keyboard, a mouse, or a screen, that is, does not refer to an interaction using a graphical user interface GUI. In fact, the human-computer interaction described in this disclosure relates to a natural user interface NUI, and distinguishes between contact and non-contact according to the way of acquiring interaction information, so as to divide the human-computer interaction into contact interaction and non-contact interaction.

As shown in fig. 2, contact interaction is typically based on wearable devices using multiple sensors (e.g., data gloves), built-in sensors of cell phones (e.g., inertial sensors, accelerometers), multi-touch screens, and the like.

As shown in FIG. 3, contactless interaction is typically based on the use of optical cameras, radar detection, color marker interaction, markerless interaction, and the like. For example, using a dedicated camera such as a structured light or time of flight camera (TOF), a depth map of the depth of field seen by the camera over a short range can be generated and used to approximate a 3D representation of the scene seen. The user's gestures or postures will be effectively detected due to the close range capabilities of such cameras. As another example, micro radars may also be used to capture subtle gesture motions. The micro radar mainly uses reflection of radio waves for imaging. The computer emits radio waves and radar reflected waves from the antenna, and by comparing the emitted waves and the reflected waves, the position and the velocity of the user are obtained, and the movement of the user can be captured finely.

In the embodiment shown in fig. 2, when the user is in motion using the wearable device or when the user is in the riding process with the mobile phone, it is inconvenient for the user to interact with the graphical user interface GUI.

Thus, first, a voice indication is received that the user is to gesture. The user then uses the wearable device for contact user interaction. Data detected by the plurality of sensors of the wearable device may cause the APP or platform to recognize a gesture made by the user based on the voice indication and determine whether the recognized gesture matches the indicated gesture. Then, it is again determined that the user is authenticated based on the recognized gesture matching the indicated gesture.

In the embodiment shown in FIG. 3, the user is in motion with his bare hands, where the user is inconvenienced with interacting using a graphical user interface GUI.

Thus, first, a voice indication is received that the user is to gesture. The user then performs the contactless user interaction freehand. For example, data detected by the TOF camera may cause the APP or platform to recognize a gesture made by the user based on the voice indication and determine whether the recognized gesture matches the indicated gesture. Then, it is again determined that the user is authenticated based on the recognized gesture matching the indicated gesture.

In the field, the gesture is divided into a static gesture and a dynamic gesture, the static gesture refers to a gesture defined by the gesture at a certain time point, and includes a hand shape, a direction, a specific hand index number, a joint depression and the like, and the static gesture corresponds to a certain point in the air. A dynamic gesture is a sequence of gestures of a hand over a series of time periods, a coherent course of action, which corresponds to a certain continuous sequence in the air.

The barrier-free human-machine recognition verification process for static gestures or dynamic gestures will be described below with reference to fig. 4 and 5, respectively.

Fig. 4 is a schematic diagram illustrating an unobstructed human-machine recognition verification process in the case of recognizing a static gesture according to an embodiment of the present disclosure.

In the embodiment shown in FIG. 4, the voice indication is to please make a rabbit gesture with both hands. When the user makes the gesture by hands, the APP or the platform performs target capture.

In order to recognize the gesture of the user, a gesture segmentation is first performed, i.e. the gesture is segmented from the background. The computer collects the gesture information and the scene information of the gesture. Using edge detection, the hand is separated from the background.

Gesture modeling is then performed. The choice of the model depends on the specific application, and if natural human-computer interaction is to be achieved, a fine and effective gesture model must be established so that the recognition system can correctly react to most gestures made by the user. There are a variety of ways to model gestures, for example, appearance-based gesture modeling and 3D model-based gesture modeling.

And classifying the gestures according to different numbers of fingers of the gestures and included angles among the fingers based on apparent gesture modeling, so as to realize the rapid recognition of the gestures under the rotating and zooming conditions. The gesture modeling method based on the 3D model firstly synthesizes the 3D model of the human body, then changes the parameters of the model until the model and the real human body map out the same visual image, and then analyzes the body posture. The gesture modeling mode based on the apparent characteristics needs to consider the influence of light and colors of other parts of the body, and factors such as shielding are easy to cause error recognition. The models for gesture modeling based on 3D models are relatively complex. In an embodiment of the disclosure, a mode of combining depth image information and gesture apparent characteristics is adopted, so that recognition speed based on apparent gesture modeling can be achieved, and recognition accuracy based on 3D gesture modeling can be achieved.

Followed by gesture analysis. Gesture analysis is to estimate selected gesture model parameters, typically including feature detection and parameter estimation. In the feature detection process, the position of the human hand must be determined first. Depending on the cues used, localization techniques can be classified as color-based localization, motion-based localization, and multimodal localization, among others. Those skilled in the art will appreciate that different positioning techniques may be employed as desired.

Matching or recognition of the gesture is then performed. In the case of a rabbit gesture with both hands of the disclosed embodiment, a static gesture will be recognized and the recognition result output. Static gesture recognition is generally implemented by using a template matching method, in which the similarity between actions is calculated without considering the continuity relationship of the actions in time, and may be considered as a classification problem (selecting one from preset actions) or a regression problem (similarity to a certain standard action). For example, the gesture is recognized by comparing the similarity of the gesture with the predefined gestures in the template library in real time through some similarity criteria, so as to extract the gesture template according to the maximum similarity.

Fig. 5 is a schematic diagram illustrating an unobstructed human-machine recognition verification process in the case of recognizing a dynamic gesture according to an embodiment of the present disclosure.

Unlike static gestures, dynamic gestures involve temporal and spatial context. Most dynamic gestures are modeled as a trajectory in the gesture model space. The problems of speed difference, proficiency and the like existing when different users make gestures can cause great deviation between the motion track and the model track.

To eliminate this problem, different Dynamic gesture recognition techniques may be employed as needed, e.g., Hidden Markov Model (HMM) based, Dynamic Time Warping (DTW) based, compressed timeline based. HMMs are employed in one embodiment of the present disclosure, which have automatic segmentation and classification capabilities.

In the embodiment shown in FIG. 5, the voice indication is a bird gesture with two hands with wings that are flapping continuously. When the user makes the gesture by hands, the APP or the platform performs target capture.

In order to recognize a dynamic gesture of a user, gesture segmentation is also performed first.

Dynamic gesture modeling is then performed. Appearance-based and 3D-based gesture models may also be employed. In an embodiment of the present disclosure, the motion parameter appearance model adopted for dynamic gesture modeling considers the spatiotemporal relationship of the model, and the selectable motion parameters include translation, rotation, deformation, orientation and the like.

Followed by gesture analysis. For dynamic gesture analysis, because more accurate positioning is required, a multi-mode positioning technology, namely, integrating motion and color information to position the human hand, can be adopted. Those skilled in the art will appreciate that different positioning techniques may be employed as desired.

Matching or recognition of the gesture is then performed. In the case of making a bird gesture with both hands according to the embodiment of the present disclosure, a dynamic gesture is recognized and a recognition result is output. The consistency in time needs to be considered for the dynamic gesture, so that the dynamic trajectory of the gesture can be identified, or key frames are extracted according to the continuous time sequence of the gesture and then classified or regressed. Accordingly, classification or regression can be performed using, for example, a dynamic skeleton-based motion recognition framework (ST-GCN), or using key frame template matching techniques, which is essentially an extension of static motion estimation.

FIG. 6 is a schematic diagram illustrating an unobstructed human machine identification verification framework according to an embodiment of the present disclosure.

In natural human-computer interaction, data are collected by using a sensor (for example, a built-in sensor of a mobile phone or a camera). Many mobile applications transfer data to remote cloud processing, but the transfer process is very time delay demanding and high sensor sampling frequency (such as accelerators, gyroscopes) makes data transmission difficult to support.

Thus, in an embodiment of the present disclosure, a deep learning framework, deep sense, running on an end device is adopted, which can locally acquire sensor data to be processed and apply a deep learning model to the data without uploading to the cloud, for example, a convolutional neural network or a gated cyclic neural network. Such local processing also meets the need for privacy protection.

As shown in fig. 6, according to an embodiment of the present disclosure, a deep sense-based clear human machine identification verification framework is employed. In the description of fig. 6, a single sensor will be taken as an example. Those skilled in the art will appreciate that the data input to the framework may come from a plurality of sensors.

Data acquisition is performed first. A single sensor, such as a motion sensor, may provide multi-dimensional measurement data. The sensor data for the multiple dimensions will be acquired at intervals (i.e., time series).

And then the acquired data enters a data segmentation processing layer for data segmentation. The acquired data is divided into time windows for each dimension, and the divided data is Fourier transformed into frequency components (including magnitude and phase), so that the data for each window will form a matrix of d × 2 f.

The multiple time-windowed data matrices then enter a single sensor data convolutional layer (i.e., the first convolutional neural network). The single sensor data convolution layer is comprised of a plurality of convolution filters. First, a two-dimensional convolution filter is used to capture the interaction between dimensions within a window. One or more one-dimensional convolution filters are then used to capture the interaction between the windows. Finally, the output of the filter is flattened to produce a sensor feature vector. And inputting the data matrix of each K sensors into the single sensor data convolution layer to form a K-row matrix.

The K-row matrix is then entered into the multi-sensor data integration layer (i.e., the second convolutional neural network). The multi-sensor data integration layer may be the same or different structure than the single-sensor data convolution layer. Through the multi-sensor data integration layer, the sensor feature vectors with T combinations can be obtained, and the interaction in the learning window of each sensor feature vector is the same.

Next, the T combined sensor feature vectors are input to the recurrent neural network layer RNN to learn the relationship between the windows across the time window. The recurrent neural network layer RNN may employ, for example, Gated Recurrent Units (GRUs) or long short-term memory networks (LSTM). In one embodiment of the present disclosure, the deep sense-based barrier-free human recognition verification framework uses a two-layer stacked GRU structure. This structure can run incrementally when there is a new time window, thereby processing streaming data more quickly. It is understood that a single layer GRU structure may also be employed.

And finally, the output of the recurrent neural network layer enters a classification task output layer to generate classification output, so that an identification result is formed.

The deep sense-based barrier-free man-machine recognition verification framework shown in fig. 6 needs to be trained, and an example training process thereof is as follows:

1. setting a candidate graph set: designing an optional graphic set comprising numbers, letters, simple set graphics, simple Chinese characters and the like, wherein more than 100 candidate graphics are usually needed;

2. selecting more than 20 data collectors, sequentially holding the mobile phone by hand to draw the graphs in the candidate graph set in the air, and collecting a sensor (such as an accelerometer and a gyroscope) data sequence and a corresponding graph category in the process of drawing a single graph, wherein the process can be repeated for more than 10 times;

3. cleaning training data, wherein a sensor data sequence of each graph in a single acquisition process is used as one piece of training data, a corresponding graph category is used as a label, and at least 20 × 100 × 10 pieces of training data are obtained, namely 2 ten thousand pieces of training data;

4. training a DeepSense-based barrier-free man-machine identification verification framework shown in FIG. 6 on the data set;

5. and finishing algorithm training to obtain a model. The model input is a sequence of sensor data and the output is a score for each gesture pattern category.

The unobstructed human machine identification verification system as shown in FIG. 7 includes a receiving module 702, a gesture recognition module 704, a match determination module 706, and an output module 708.

The receiving module 702 receives a voice indication that a user is to gesture.

When the user is inconvenient or can not carry out verification operation through the human-computer interface, the APP or the platform gives a voice indication that the user needs to take a gesture. The receiving module 702 receives and captures the voice indication at this time, and analyzes it to obtain the gesture indicated by the voice indication. Those skilled in the art will appreciate that the acquisition and analysis of the voice indication during this interaction may be performed using voice recognition and natural language processing techniques.

Gesture recognition module 704 recognizes a gesture made by the user based on the voice indication.

The user will make a gesture based on the voice indication, while the user's gesture may be made by way of contact or contactless interaction. After the gesture-related data is captured, gesture recognition module 704 will perform adaptive gesture recognition, i.e., static gesture recognition and dynamic gesture recognition.

The match determination module 706 determines whether the recognized gesture matches the indicated gesture.

After obtaining the gesture recognized by gesture recognition module 704, match determination module 706 determines whether the recognized gesture matches the indicated gesture. Such a match determination may be a similarity determination, i.e., whether the degree of similarity of the recognized gesture to the indicated gesture is within an acceptable threshold; a classification determination may also be made, i.e., whether the recognized gesture is of the same category as the indicated gesture; of course, a trajectory determination may also be made, i.e. whether the trajectories of both the recognized gesture and the indicated gesture fit.

The match determination module 706 determines that the user is authenticated based on the recognized gesture matching the indicated gesture. Based on the match determination of the match determination module 706, the match determination module 706 determines that the user is authenticated if the recognized gesture matches the indicated gesture.

The output module 708 outputs the verification determination result of the matching judgment module 706.

Therefore, the barrier-free man-machine identification verification system does not need to rely on the visual ability of the user, so that the system is convenient and safe for the visually impaired people or other people in need. Meanwhile, the risk of automatic operation of the machine script can be effectively prevented according to the recognition of the air gesture of the user.

The various steps and modules of the barrier-free human recognition verification method and system described above may be implemented in hardware, software, or a combination thereof. If implemented in hardware, the various illustrative steps, modules, and circuits described in connection with the present invention may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic component, hardware component, or any combination thereof. A general purpose processor may be a processor, microprocessor, controller, microcontroller, or state machine, among others. If implemented in software, the various illustrative steps, modules, etc. described in connection with the present invention may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Software modules implementing the various operations of the present invention may reside in storage media such as RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable disk, a CD-ROM, cloud storage, etc. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium, and execute the corresponding program modules to perform the steps of the present invention. Furthermore, software-based embodiments may be uploaded, downloaded, or accessed remotely through suitable communication means. Such suitable communication means include, for example, the internet, the world wide web, an intranet, software applications, cable (including fiber optic cable), magnetic communication, electromagnetic communication (including RF, microwave, and infrared communication), electronic communication, or other such communication means.

It is also noted that the embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged.

The disclosed methods, apparatus, and systems should not be limited in any way. Rather, the invention encompasses all novel and non-obvious features and aspects of the various disclosed embodiments, both individually and in various combinations and sub-combinations with each other. The disclosed methods, apparatus, and systems are not limited to any specific aspect or feature or combination thereof, nor do any of the disclosed embodiments require that any one or more specific advantages be present or that a particular or all technical problem be solved.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes may be made in the embodiments without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An accessible man-machine identification verification method comprises the following steps:

receiving a voice indication that a user is to gesture;

identifying a gesture made by the user based on the voice indication;

determining whether the recognized gesture matches the indicated gesture; and

determining that the user is authenticated based on the recognized gesture matching the indicated gesture.

2. The method of claim 1, wherein the gesture made by the user based on the voice indication is a gesture made by the user in a contactless interaction scenario.

3. The method of claim 2, the gesture made by the user based on the voice indication being a gesture made by the user freehand.

4. The method of claim 1, wherein the gesture made by the user based on the voice indication is a gesture made by the user in a contact interaction scenario.

5. The method of claim 4, wherein the gesture made by the user based on the voice indication is a gesture made by the user holding a cell phone.

6. The method of claim 4, the gesture made by the user based on the voice indication being a gesture made by the user through a wearable device.

7. The method of claim 1, wherein the gesture made by the user based on the voice indication is a static gesture.

8. The method of claim 1, wherein the gesture made by the user based on the voice indication is a dynamic gesture.

9. The method of claim 7, determining whether the recognized gesture matches the indicated gesture is based on a template matching method.

10. The method of claim 8, determining whether the recognized gesture matches the indicated gesture is based on a gesture dynamic trajectory recognition method.

11. The method of claim 8, determining whether the recognized gesture matches the indicated gesture is based on a key frame template matching method.

12. An unobstructed human-machine identification verification system, comprising:

a receiving module that receives a voice indication that a user is to make a gesture;

a gesture recognition module that recognizes a gesture made by the user based on the voice indication;

a match determination module that determines whether the recognized gesture matches the indicated gesture and determines that the user is authenticated based on the recognized gesture matching the indicated gesture; and

and the output module outputs the verification determination result.

13. The system of claim 12, wherein the gesture made by the user based on the voice indication is a gesture made by the user in a contactless interaction scenario.

14. The system of claim 12, wherein the gesture made by the user based on the voice indication is a gesture made by the user in a contact interaction scenario.

15. The system of claim 12, wherein the gesture made by the user based on the voice indication is a static gesture.

16. The system of claim 12, wherein the gesture made by the user based on the voice indication is a dynamic gesture.

17. A computer-readable storage medium having stored thereon instructions that, when executed, cause a machine to perform the method of any of claims 1-11.