CN109145802B - Kinect-based multi-person gesture man-machine interaction method and device - Google Patents

Kinect-based multi-person gesture man-machine interaction method and device Download PDF

Info

Publication number
CN109145802B
CN109145802B CN201810921343.XA CN201810921343A CN109145802B CN 109145802 B CN109145802 B CN 109145802B CN 201810921343 A CN201810921343 A CN 201810921343A CN 109145802 B CN109145802 B CN 109145802B
Authority
CN
China
Prior art keywords
coordinate system
camera
screen
camera coordinate
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810921343.XA
Other languages
Chinese (zh)
Other versions
CN109145802A (en
Inventor
陶彦博
阮松波
梁斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810921343.XA priority Critical patent/CN109145802B/en
Publication of CN109145802A publication Critical patent/CN109145802A/en
Application granted granted Critical
Publication of CN109145802B publication Critical patent/CN109145802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a Kinect-based multi-person gesture human-computer interaction method and device, wherein the method comprises the following steps: acquiring a color image and a depth image of a scene through a Kinect camera, and acquiring the human body joint position and the face characteristic parameters of an operator in the scene; obtaining cursor positions corresponding to the two hands of the operator according to the human joint position and the facial characteristic parameters by using a light straight line propagation principle; and segmenting the palm area image of the operator in the color image according to the position of the human body joint, and identifying and classifying the operation instruction by a pre-trained gesture classification model. The method is natural and quick to operate, does not need other external equipment, adopts a non-contact interaction mode, can deal with special scenes such as hospitals and scientific researches, can be used by multiple persons at the same time, and meets more complex interaction requirements in the future.

Description

Kinect-based multi-person gesture man-machine interaction method and device
Technical Field
The invention relates to the technical field of human-computer interaction, in particular to a Kinect-based multi-person gesture human-computer interaction method and device.
Background
Human-computer interaction is the exchange of information between a person and a computer, and the information is interacted between the person and the computer. The former man-machine interaction mainly takes a computer as a center, for example, interaction modes such as a command line, a graphical user interface and the like, and people can only interact with the computer through modes such as a keyboard, a mouse, a touch screen and the like. Nowadays, more novel interaction forms conforming to the natural interaction of human beings are becoming more mature, and the new technologies are closer to each other between human beings and computers and can improve the interaction efficiency. Human-computer interaction has also undergone a transition and evolution from command lines, graphical interfaces, to natural user interfaces. Natural user interface means that a user interacts with a computer in a natural way of interaction, such as speech, gestures, etc.
The gesture is a novel interaction mode in human-computer interaction, and the interaction mode has no contact and is more in line with the natural interaction behavior of human beings. Currently, the research of gesture recognition is mainly based on color image flow, and the target is detected and tracked according to the color, texture, gray scale feature or motion feature of the image. However, since the color image only has two-dimensional coordinate information and is easily affected by corresponding background, illumination and environment, the detection and tracking effects on the target are poor, and the target under a complex environment is difficult to detect and track. The Kinect camera is a special camera, and can acquire a color image and a depth image of a scene at the same time so as to obtain three-dimensional information of the scene. According to the acquired three-dimensional information, the information of the human joints and the gestures in the scene can be accurately extracted.
The existing gesture interaction system is mostly combined with a graphical interface for use, but mostly passes through static gesture recognition, and can only replace a remote controller or a keyboard to interact with the graphical interface in a menu mode. To replace the more complex image interface of the most simple and common keyboard and mouse combination language for interaction, the function of cursor movement needs to be realized through gestures. The position of the cursor is the position which is most concerned by the operator in the graphical interface and can be reflected by the sight line of the operator and the position pointed by the finger.
The existing interactive system is used by a single person, and in some special scenes, such as design and creation, multiple persons may be required to interact with a computer at the same time. And the gesture-based interaction mode can interact within the visual field range of the camera without other interaction equipment, so that the gesture-based interaction mode has the condition that a plurality of people interact with the computer at the same time.
However, the man-machine interaction mode of the related art needs to be realized through a mouse, a keyboard, a handle and other devices, and all the control devices can be connected in a wired or wireless mode. In a wired mode, the control distance is short; under the wireless mode, the operation is slightly dull, and both modes do not have the nature of controlling. Another interaction method is touch screen manipulation, which is currently most applied in small-sized mobile phones and tablet computers. For large touch screens up to 60 inches or even 100 inches in size, some basic operations become awkward for the operator to operate because the operator is too close to the screen to see all of the content on the screen.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention aims to provide a Kinect-based multi-person gesture human-computer interaction method which is natural and quick to operate, does not need other external equipment, adopts a non-contact interaction mode, can cope with special scenes such as hospitals and scientific researches, can be used by multiple persons at the same time, and meets more complex interaction requirements in the future.
The invention also aims to provide a Kinect-based multi-person gesture human-computer interaction device.
In order to achieve the above object, an embodiment of the invention provides a human-computer interaction method based on Kinect for multi-person gestures, which includes the following steps: acquiring a color image and a depth image of a scene through a Kinect camera, and acquiring the human joint position and the face characteristic parameters of an operator in the scene; obtaining cursor positions corresponding to the two hands of the operator according to the human joint position and the facial characteristic parameters by a light linear propagation principle; and segmenting the palm area image of the operator in the color image according to the human body joint position, and identifying and classifying the operation instruction by a pre-trained gesture classification model.
According to the Kinect-based multi-person gesture human-computer interaction method, the information of joints and faces of a human body is obtained through the Kinect camera, the position of an operation cursor is determined according to the positions of eyes and hands of an operator, and an operation instruction is determined according to the acquired gesture image of the operator, so that interaction with a graphical interface of equipment is realized, the operation is natural and rapid, other external equipment is not needed, a non-contact interaction mode is adopted, special scenes such as hospitals and scientific researches can be met, multiple persons can use the method at the same time, and more complex interaction requirements in the future are met.
In addition, the Kinect-based multi-person gesture human-computer interaction method according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the method further includes: and acquiring a camera coordinate system, a screen coordinate system and a human face coordinate system, and detecting the fingertip and the cursor point.
Further, in an embodiment of the present invention, the method further includes: detecting a position of a screen origin in the camera coordinate system and a rotation matrix of an orientation of the position in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the screen coordinate system to obtain a screen position; and acquiring a position of a human head joint under the camera coordinate system and a rotation matrix of the orientation of the human head joint under the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the human face coordinate system to obtain the human face direction and the visual field.
Further, in an embodiment of the present invention, the obtaining of the cursor positions corresponding to the two hands of the operator by using a principle of light propagation along a straight line according to the human joint position and the facial feature parameter further includes: detecting the positions of the fingertips and the noseroots of the person under the camera coordinate system, and judging whether the fingertips are in the visual field range or not according to a transformation matrix from the camera coordinate system to a face coordinate system and the coordinates of the fingertips and the noseroots of the person under the face coordinate system; and obtaining a transformation matrix from the camera coordinate system to the screen coordinate system through three-dimensional calibration to obtain the position of the fingertip under the camera coordinate system and the position of the nasion under the camera coordinate system, obtaining the representation of a straight line passing through the fingertip and the nasion under the camera coordinate system, obtaining the intersection point of the straight line passing through the fingertip and the nasion and a screen plane, and obtaining the position of a cursor in the screen coordinate system through a conversion matrix from the camera to the screen coordinate system to judge whether a cursor point is in a screen range.
Further, in an embodiment of the present invention, the segmenting the palm region image of the operator from the color image according to the positions of the human joints, and performing recognition and classification on the operation instructions by a pre-trained gesture classification model, further includes: acquiring the pixel position of the palm of the operator in the color image, and segmenting by taking the palm of the operator as a midpoint to generate a gesture image with a preset size; and classifying and identifying the gesture image by the artificial neural network, and determining an instruction corresponding to the gesture.
In order to achieve the above object, an embodiment of the present invention provides a multi-person gesture human-computer interaction device based on Kinect, including: the system comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring a color image and a depth image of a scene through a Kinect camera and acquiring the human body joint position and the face characteristic parameters of an operator in the scene; the processing module is used for obtaining cursor positions corresponding to the two hands of the operator according to the human joint position and the facial characteristic parameters through a light linear propagation principle; and the recognition and classification module is used for segmenting the palm area image of the operator in the color image according to the human body joint position and recognizing and classifying the operation instruction by a pre-trained gesture classification model.
According to the Kinect-based multi-person gesture human-computer interaction device, the Kinect camera is used for obtaining information of joints and faces of a human body, the position of an operation cursor is determined according to the positions of eyes and hands of an operator, and an operation instruction is determined according to the collected gesture image of the operator, so that interaction with a graphical interface of equipment is realized, the operation is natural and rapid, other external equipment is not needed, a non-contact interaction mode is adopted, special scenes such as hospitals and scientific researches can be met, multiple persons can use the device at the same time, and more complex interaction requirements in the future are met.
In addition, the Kinect-based multi-person gesture human-computer interaction device according to the above embodiment of the invention may further have the following additional technical features:
further, in an embodiment of the present invention, the method further includes: the first acquisition module is used for acquiring a camera coordinate system, a screen coordinate system and a human face coordinate system and detecting fingertips and cursor points.
Further, in an embodiment of the present invention, the method further includes: the detection module is used for detecting the position of the screen origin position in the camera coordinate system and the rotation matrix of the orientation of the screen origin position in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the screen coordinate system to obtain a screen position; and the second acquisition module is used for acquiring a rotation matrix of the position of the human head joint under the camera coordinate system and the direction of the human head joint in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the human face coordinate system so as to obtain the human face direction and the visual field.
Further, in one embodiment of the present invention, the processing module includes: the detection unit is used for detecting the positions of the fingertips and the noseroots under the camera coordinate system and judging whether the fingertips are in the visual field range or not according to a transformation matrix from the camera coordinate system to a human face coordinate system and the coordinates of the fingertips and the noseroots under the human face coordinate system; the acquisition unit is used for acquiring a transformation matrix from the camera coordinate system to the screen coordinate system through three-dimensional calibration so as to obtain the position of the fingertip under the camera coordinate system and the position of the nasion under the camera coordinate system, acquiring the representation of a straight line passing through the fingertip and the nasion under the camera coordinate system, acquiring the intersection point of the straight line passing through the fingertip and the nasion and a screen plane, and acquiring the position of a cursor in the screen coordinate system through the transformation matrix from the camera to the screen coordinate system so as to judge whether a cursor point is in a screen range.
Further, in an embodiment of the present invention, the recognition and classification module is further configured to acquire a pixel position of the palm of the operator in the color image, segment and generate a gesture image with a preset size by using the palm of the operator as a midpoint, perform classification and recognition on the gesture image by using the artificial neural network, and determine an instruction corresponding to the gesture.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a Kinect-based multi-person gesture human-machine interaction method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a human joint obtained by Kinect for Windows SDK V2 according to one embodiment of the present invention;
FIG. 3 is a diagram of facial feature effects extracted by Kinect for Windows SDK V2 according to one embodiment of the present invention;
FIG. 4 is a diagram of a cursor position calculation coordinate system according to one embodiment of the present invention;
FIG. 5 is a flowchart of a Kinect-based multi-person gesture human-machine interaction method according to an embodiment of the present invention;
FIG. 6 is a schematic view of a human field of view according to one embodiment of the invention;
FIG. 7 is a schematic structural diagram of a Kinect-based multi-person gesture human-computer interaction device according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The multi-person gesture human-computer interaction method and device based on the Kinect provided by the embodiment of the invention are described below with reference to the attached drawings, and firstly, the multi-person gesture human-computer interaction method based on the Kinect provided by the embodiment of the invention is described with reference to the attached drawings.
FIG. 1 is a flow chart of a Kinect-based multi-person gesture human-computer interaction method according to an embodiment of the invention.
As shown in fig. 1, the Kinect-based multi-person gesture human-computer interaction method includes the following steps:
in step S101, a color image and a depth image of a scene are acquired by a Kinect camera, and a human joint position and facial feature parameters of an operator in the scene are acquired.
It can be understood that the human body joint and facial feature information of the operator in the scene are identified through the Kinect camera extraction model in the embodiment of the invention. Specifically, the embodiment of the present invention acquires a color image and a depth image of a scene through a Kinect camera, for example, the human joint position and facial feature parameters of an operator in the scene, such as eyebrow, eye, nose, mouth position and facial contour, may be obtained through microsoft Kinect for Windows SDK 2.0 development kit. The acquired human body joint parameters are shown in fig. 2, and the human body facial features are shown in fig. 3.
In step S102, cursor positions corresponding to both hands of the operator are obtained according to the human joint position and the facial feature parameter by the principle of light propagation along a straight line.
It is understood that after the human joint position and the face feature parameter are acquired, the cursor position of the operation of each operator is found. That is to say, the embodiments of the present invention may calculate the cursor positions corresponding to the two hands of each operator according to the principle that light propagates along a straight line, that is, the eyes of the operator, the fingertips of the two hands of the operator, and the cursor are aligned with each other, through the operator joints and the face model parameters obtained in the previous step.
Further, in an embodiment of the present invention, the method of an embodiment of the present invention further includes: and acquiring a camera coordinate system, a screen coordinate system and a human face coordinate system, and detecting the fingertip and the cursor point.
Specifically, as shown in fig. 4, the embodiment of the present invention has three coordinate systems in total, including a camera coordinate system, a screen coordinate system, and a face coordinate system, and two key points, including a fingertip and a cursor point. Wherein the camera coordinate system is used as a world coordinate system and the origin is OcCorresponding to the imaging center of the Kinect camera, ZcThe axis coinciding with the optical axis, YcAxis and XcThe axis is parallel to the imaging plane. The origin of the screen coordinate system is OiCorresponding to the upper left corner of the screen, its YiAxis and XiThe axis being parallel to the camera imaging plane and, in order to coincide with the general image coordinates, therefore its ZiThe axes are opposite to the camera coordinate system. The origin of the face coordinate system is OfCorresponding to the position of the head joint, ZfThe axis is oriented with the connecting surface portion. The conversion between the respective coordinate systems is effected by a homogeneous transformation, e.g. a transformation matrix of the camera coordinate system to the screen coordinate system as
Figure GDA0002970086630000051
The transformation matrix from the camera coordinate system to the face coordinate system is
Figure GDA0002970086630000052
Wherein the content of the first and second substances,
Figure GDA0002970086630000053
the relative position of the camera and the screen is determined as a fixed parameter of the system;
Figure GDA0002970086630000054
the head pose supplied by the function provided by Kinect for Windows SDK V2 is found to be changing in value.
The description in the two coordinate systems { A } and { B } for an arbitrary point p isAp andBp, and the following homogeneous transformation relations:
Figure GDA0002970086630000061
homogeneous transformation matrix
Figure GDA0002970086630000062
Is a 4x4 square matrix having the form:
Figure GDA0002970086630000063
wherein the content of the first and second substances,
Figure GDA0002970086630000064
shows a translation transformation
Figure GDA0002970086630000065
And rotational transformation
Figure GDA0002970086630000066
Wherein the content of the first and second substances,
Figure GDA0002970086630000067
a translation vector representing the coordinate system { B } relative to the coordinate system { A } describing its position relative to the coordinate system;
Figure GDA0002970086630000068
the orientation of the coordinate system { B } relative to the coordinate system { A } is described. SDK gives the quaternion representation q ═ q0,q1,q2,q3]TIt may represent a rotation operation from coordinate system { A } to coordinate system { B }, with a relationship to the rotation matrix:
Figure GDA0002970086630000069
description of the coordinate system B relative to the coordinate system A is known
Figure GDA00029700866300000610
Determining { A } description relative to { B }
Figure GDA00029700866300000611
The problem of homogeneous transformation inversion is specifically as follows:
Figure GDA00029700866300000612
in a three-dimensional homogeneous coordinate system, points are paired with planes, so that the planar representation coincides with points. The plane is represented by a four-dimensional vector, pi ═ pi (pi)1234)TIf point x is (x)1,x2,x3,x4)TOn a plane:
π1x12x23x34x4=0 (5),
a plane is defined by three non-collinear points, namely:
Figure GDA00029700866300000613
when the point is on a plane, the matrix M ═ x, x1,x2,x3]Has a determinant of 0, i.e.:
detM=x1D234-x1D134+x1D124-x1D123=0 (7),
thus, the plane can be expressed as:
π=(D234,-D134,D124,-D123) (8),
wherein D isjklRepresents a 4x 3 matrix x1,x2,x3]The jkl row of (a) constitutes the determinant of the matrix.
In the three-dimensional homogeneous coordinate system, a straight line is represented by a 4 × 4 matrix, and a straight line L is represented by two points x1And x2Determining:
Figure GDA0002970086630000071
and the intersection of the plane pi with the line L is denoted x:
x=Lπ (10)。
further, in an embodiment of the present invention, the method of an embodiment of the present invention further includes: detecting the position of the screen origin in a camera coordinate system and a rotation matrix of the orientation of the screen origin in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the screen coordinate system to obtain a screen position; and acquiring a rotation matrix of the position and the direction of the human head joint under the camera coordinate system and a transformation matrix from the camera coordinate system to the human face coordinate system to obtain the human face direction and the visual field.
Specifically, (1) screen position, as shown in fig. 5
Obtaining a screen origin position O by measurementiPosition in camera coordinate system
Figure GDA0002970086630000072
And its orientation in the camera coordinate systemIs represented by a rotation matrix
Figure GDA0002970086630000073
Obtaining a transformation matrix from a camera coordinate system to a screen coordinate system according to equation 4
Figure GDA0002970086630000074
And
Figure GDA0002970086630000075
the screen plane, i.e. the xy plane of the screen coordinate system, is then determined
Figure GDA0002970086630000076
Representation in camera coordinate system
Figure GDA0002970086630000077
(2) Face direction and field of view estimation
Human head joint position O is obtained through Kinect for Windows SDK V2fPosition in camera coordinate system
Figure GDA0002970086630000078
And rotation matrix representation of its orientation in the camera coordinate system
Figure GDA0002970086630000079
Obtaining a transformation matrix from a camera coordinate system to a face coordinate system according to the formula 4
Figure GDA00029700866300000710
Further, in an embodiment of the present invention, the obtaining of the cursor positions corresponding to the two hands of the operator according to the human joint position and the facial feature parameter by the principle of light propagating along a straight line further includes: detecting the positions of the fingertips and the noseroots under a camera coordinate system, and judging whether the fingertips are in a visual field range or not according to a transformation matrix from the camera coordinate system to a human face coordinate system and the coordinates of the fingertips and the noseroots under the human face coordinate system; obtaining a transformation matrix from a camera coordinate system to a screen coordinate system through three-dimensional calibration to obtain the position of a fingertip under the camera coordinate system and the position of a nasion under the camera coordinate system, obtaining the representation of a straight line passing through the fingertip and the nasion under the camera coordinate system, obtaining the intersection point of the straight line passing through the fingertip and the nasion and a screen plane, and obtaining the position of a cursor in the screen coordinate system through the transformation matrix from the camera to the screen coordinate system to judge whether a cursor point is in a screen range.
Specifically, (1) fingertip position estimation, as shown in fig. 5
In the embodiment of the invention, the positions of the human fingertip M and the human nasion N in the camera coordinate system are obtained through the Kinect for Windows SDK V2cpMAndcpNfrom the above, the transformation matrix from the camera coordinate system to the face coordinate system is obtained
Figure GDA00029700866300000711
Combined formula 1, the coordinates of the finger tip and the nose root under the face coordinate system are obtainedfpMAndfpNthereby judging whether the finger tip is in the visual field range of human eyes. Specifically, the angle between the projection of the connecting line of the fingertip M and the nasion N on the xy plane and the xz plane and the x axis is obtained, if the angle is smaller than the horizontal visual angle of human eyes by 62 degrees and the vertical visual angle by 50 degrees, the angle is in the human visual field range, otherwise, the angle is not. In which the field of vision of the human eye is shown in figure 6.
(2) Cursor position estimation
After the system is built, obtaining a transformation matrix from a camera coordinate system to a screen coordinate system through three-dimensional calibration
Figure GDA0002970086630000081
The position of the human fingertip M under the camera coordinate system is obtained by combining the Kinect for Windows SDK V2cpMAnd the position of the nasion N in the camera coordinate systemcpNAnd the expression of the straight line passing through the fingertip and the nasion in the camera coordinate system is obtained by the formula 9cLMN. Line through fingertip and nasion and screen plane using equation 10
Figure GDA0002970086630000082
I.e. the secondary coordinate representation of the cursor in the camera coordinate systemcpI. Finally, the matrix is converted from the camera to the screen coordinate system
Figure GDA0002970086630000083
Combination formula 1 obtains the position of cursor in screen coordinate systemipI. And finally, judging whether the cursor point is in the screen range, and if not, determining that the cursor point is invalid.
In step S103, the palm region image of the operator is segmented in the color image according to the joint position of the human body, and the operation command is recognized and classified by the pre-trained gesture classification model.
It is understood that after the above steps are performed, the embodiment of the present invention determines the operation instruction according to the gesture type of the operator. That is, according to the human joint position extracted in step S101, the embodiment of the present invention segments the image of the palm region of the operator from the color image, and recognizes and classifies the operation command by the gesture classification model trained in advance.
Further, in an embodiment of the present invention, the method includes segmenting a palm region image of an operator from a color image according to positions of joints of a human body, and performing recognition and classification on an operation instruction by using a pre-trained gesture classification model, and further includes: acquiring pixel positions of the palm of the operator in the color image, and segmenting by taking the palm of the operator as a midpoint to generate a gesture image with a preset size; and classifying and identifying the gesture images by an artificial neural network, and determining an instruction corresponding to the gesture.
Specifically, (1) gesture image segmentation, as shown in FIG. 5
The embodiment of the invention can obtain the pixel position of the palm of the operator in the color image through the Kinect for Windows SDK V2, and divide the image with the size of 128 × 128 by taking the palm of the operator as a midpoint as a gesture image.
(2) Gesture recognition
And classifying and identifying the segmented gesture image by a pre-trained artificial neural network to determine an instruction corresponding to the gesture. For the artificial neural network, the convolutional neural network is used as a training model, images of eight different gestures of the volunteer collected by the Kinect under different conditions are used as a data set, and the images are obtained by training after the data set is enhanced. While the final command is determined by a single gesture or multiple gestures.
In summary, the method of the embodiment of the invention is suitable for equipment which has a large-size display device and needs to be operated by no contact or multiple persons, and comprises three steps of acquiring human joints, calculating the position of a cursor and identifying an operation gesture. The embodiment of the invention aims to complete a novel human-computer interaction system solution which is used for controlling user interfaces of computers, projectors, game machines and the like in a non-contact mode and can simultaneously deal with a multi-user control scene. The embodiment of the invention can interact with the machine through the gesture without other equipment, and the gesture has the advantages of naturalness, intuition, easy understanding and the like and is more in line with the daily communication habit of human beings. Moreover, the control can be realized as long as the operator is within the Kinect visual field range, so that the operator can control the equipment with a larger real interface size by 1-4 meters away from the display. Finally, the system designed according to the embodiment of the invention can support multi-person operation, and Kinect supports real-time tracking of human joint positions of 6 operators at most, so that the method provided by the embodiment of the invention can be applied to more complex human-computer interaction scenes in the future.
According to the Kinect-based multi-person gesture human-computer interaction method provided by the embodiment of the invention, the information of joints and faces of a human body is obtained through a Kinect camera, the position of an operation cursor is determined according to the positions of eyes and hands of an operator, and an operation instruction is determined according to the acquired gesture image of the operator, so that the interaction with a graphical interface of equipment is realized, the operation is natural and quick, other external equipment is not needed, a non-contact interaction mode is adopted, special scenes such as hospitals and scientific researches can be met, multiple persons can use the method at the same time, and more complex interaction requirements in the future can be met.
Next, a multi-person gesture human-computer interaction device based on Kinect according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 7 is a schematic structural diagram of a Kinect-based multi-person gesture human-computer interaction device according to an embodiment of the invention.
As shown in fig. 7, the Kinect-based multi-person gesture human-computer interaction device 10 includes: an acquisition module 100, a processing module 200, and an identification classification module 300.
The acquisition module 100 is configured to acquire a color image and a depth image of a scene through a Kinect camera, and acquire a human joint position and facial feature parameters of an operator in the scene. The processing module 200 is configured to obtain cursor positions corresponding to the two hands of the operator according to the human joint position and the facial feature parameters based on the principle that light propagates along a straight line. The recognition and classification module 300 is configured to segment the palm region image of the operator from the color image according to the position of the human body joint, and recognize and classify the operation instruction by a pre-trained gesture classification model. The device 10 of the embodiment of the invention can be operated naturally and quickly, does not need other external equipment, adopts a non-contact interaction mode, can deal with special scenes such as hospitals, scientific researches and the like, can be used by multiple people at the same time, and meets more complex interaction requirements in the future.
Further, in one embodiment of the present invention, the apparatus 10 of the embodiment of the present invention further comprises: a first obtaining module. The first acquisition module is used for acquiring a camera coordinate system, a screen coordinate system and a human face coordinate system and detecting fingertips and cursor points.
Further, in one embodiment of the present invention, the apparatus 10 of the embodiment of the present invention further comprises: the device comprises a detection module and a second acquisition module.
The detection module is used for detecting the position of the screen origin position in the camera coordinate system and the rotation matrix of the orientation of the screen origin position in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the screen coordinate system to obtain the screen position. The second acquisition module acquires the position of the human head joint under the camera coordinate system and the rotation matrix of the orientation of the human head joint in the camera coordinate system, and acquires the transformation matrix from the camera coordinate system to the human face coordinate system so as to obtain the human face direction and the visual field.
Further, in one embodiment of the present invention, the processing module comprises: a detection unit and an acquisition unit.
The detection unit is used for detecting the positions of the finger tips and the nose roots of the people in a camera coordinate system and judging whether the finger tips are in the visual field range or not according to a transformation matrix from the camera coordinate system to a human face coordinate system and the coordinates of the finger tips and the nose roots of the people in the human face coordinate system. The acquisition unit is used for acquiring a transformation matrix from a camera coordinate system to a screen coordinate system through three-dimensional calibration so as to acquire the position of a fingertip under the camera coordinate system and the position of a nasion under the camera coordinate system, acquiring the representation of a straight line passing through the fingertip and the nasion under the camera coordinate system, acquiring the intersection point of the straight line passing through the fingertip and the nasion and a screen plane, and acquiring the position of a cursor in the screen coordinate system through the transformation matrix from the camera to the screen coordinate system so as to judge whether a cursor point is in a screen range.
Further, in an embodiment of the present invention, the recognition and classification module 300 is further configured to obtain a pixel position of the palm of the operator in the color image, segment the palm of the operator as a midpoint to generate a gesture image with a preset size, perform classification and recognition on the gesture image by using an artificial neural network, and determine an instruction corresponding to the gesture.
It should be noted that the explanation of the embodiment of the multi-person gesture human-computer interaction method based on the Kinect is also applicable to the multi-person gesture human-computer interaction device based on the Kinect of the embodiment, and is not repeated herein.
According to the Kinect-based multi-person gesture human-computer interaction device provided by the embodiment of the invention, the information of joints and faces of a human body is obtained through the Kinect camera, the position of an operation cursor is determined according to the positions of eyes and hands of an operator, and an operation instruction is determined according to the acquired gesture image of the operator, so that the multi-person gesture human-computer interaction device interacts with a graphical interface of equipment.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (4)

1. A Kinect-based multi-person gesture man-machine interaction method is characterized by comprising the following steps:
acquiring a color image and a depth image of a scene through a Kinect camera, and acquiring the human joint position and the face characteristic parameters of an operator in the scene;
acquiring a camera coordinate system, a screen coordinate system and a human face coordinate system, and detecting fingertip and cursor points, specifically: the camera coordinate system is used as a world coordinate system and the origin thereof is OcCorresponding to the imaging center of the Kinect camera, ZcThe axis coinciding with the optical axis, YcAxis and XcThe axis is parallel to the imaging plane; the origin of the screen coordinate system is OiCorresponding to the upper left corner of the screen, its YiAxis and XiThe axis being parallel to the camera imaging plane, ZiThe axis is opposite to the camera coordinate system; the origin of the face coordinate system is OfCorresponding to the position of the head joint, ZfThe shaft faces the connecting surface part; the conversion between the coordinate systems is realized by homogeneous transformation, and the transformation matrix from the camera coordinate system to the screen coordinate system is
Figure FDA0002970086620000011
The transformation matrix from the camera coordinate system to the face coordinate system is
Figure FDA0002970086620000012
Wherein the content of the first and second substances,
Figure FDA0002970086620000013
the relative position of the camera and the screen is determined as a fixed parameter of the system;
Figure FDA0002970086620000014
then the head pose provided by the function provided by the Kinect for Windows SDK V2 is obtained, and the value of the head pose is changed continuously; the description in the two coordinate systems { A } and { B } for an arbitrary point p isAp andBp, and the following homogeneous transformation relations:
Figure FDA0002970086620000015
homogeneous transformation matrix
Figure FDA0002970086620000016
Is a 4x4 square matrix:
Figure FDA0002970086620000017
wherein the content of the first and second substances,
Figure FDA0002970086620000018
shows a translation transformation
Figure FDA0002970086620000019
And rotational transformation
Figure FDA00029700866200000110
Wherein the content of the first and second substances,
Figure FDA00029700866200000111
a translation vector representing the coordinate system { B } relative to the coordinate system { A };
Figure FDA00029700866200000112
describe the orientation of the coordinate system { B } relative to the coordinate system { A }; SDK gives the quaternion representation q ═ q0,q1,q2,q3]TIt may represent a rotation operation from coordinate system { A } to coordinate system { B }, with a relationship to the rotation matrix:
Figure FDA00029700866200000113
description from coordinate system { B } to coordinate system { A }
Figure FDA00029700866200000114
Determining { A } description relative to { B }
Figure FDA00029700866200000115
The method specifically comprises the following steps:
Figure FDA00029700866200000116
in a three-dimensional homogeneous coordinate system, a point is coupled with a plane, the representation form of the plane is consistent with that of the point, and the plane is represented by a four-dimensional vector with pi ═ pi (pi ═ pi)1234)TIf point x is (x)1,x2,x3,x4)TOn a plane: pi1x12x23x34x4A plane is defined by three non-collinear points, 0:
Figure FDA00029700866200000117
when the point is on a plane, the matrix M ═ x, x1,x2,x3]The determinant of (a) is 0: detM ═ x1D234-x1D134+x1D124-x1D1230; the plane is represented as: pi ═ D (D)234,-D134,D124,-D123) Wherein D isjklRepresents a 4x 3 matrix x1,x2,x3]The jkl row of (a) constitutes a determinant of the matrix; in the three-dimensional homogeneous coordinate system, a straight line is represented by a 4 × 4 matrix, and a straight line L is represented by two points x1And x2Determining:
Figure FDA0002970086620000021
the intersection of the plane pi with the line L is denoted x: x is L pi;
detecting a position of a screen origin in the camera coordinate system and a rotation matrix of an orientation of the position in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the screen coordinate system to obtain a screen position; specifically, the method comprises the following steps: obtaining a screen origin position O by measurementiPosition in camera coordinate system
Figure FDA0002970086620000022
And rotation matrix representation of its orientation in the camera coordinate system
Figure FDA0002970086620000023
According to
Figure FDA0002970086620000024
Obtaining a transformation matrix from a camera coordinate system to a screen coordinate system
Figure FDA0002970086620000025
And
Figure FDA00029700866200000212
the screen plane, i.e. the xy plane of the screen coordinate system, is then determined
Figure FDA0002970086620000026
Representation in camera coordinate system
Figure FDA0002970086620000027
Acquiring a position of a human head joint under the camera coordinate system and a rotation matrix of the orientation of the human head joint under the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the human face coordinate system to obtain a human face direction and a visual field; specifically, the method comprises the following steps: human head joint position O is obtained through Kinect for Windows SDK V2fPosition in camera coordinate system
Figure FDA0002970086620000028
And rotation matrix representation of its orientation in the camera coordinate system
Figure FDA0002970086620000029
According to
Figure FDA00029700866200000210
Obtaining transformation matrix from camera coordinate system to human face coordinate system
Figure FDA00029700866200000211
Obtaining cursor positions corresponding to the two hands of the operator according to the human joint position and the facial characteristic parameters by a light linear propagation principle; the obtaining of the cursor positions corresponding to the two hands of the operator according to the human joint position and the facial feature parameters through a light linear propagation principle further comprises: detecting the positions of the fingertips and the noseroots of the person under the camera coordinate system, and judging whether the fingertips are in the visual field range or not according to a transformation matrix from the camera coordinate system to a face coordinate system and the coordinates of the fingertips and the noseroots of the person under the face coordinate system; obtaining a transformation matrix from the camera coordinate system to the screen coordinate system through three-dimensional calibration to obtain the position of the fingertip under the camera coordinate system and the position of the nasion under the camera coordinate system, obtaining the representation of a straight line passing through the fingertip and the nasion under the camera coordinate system, obtaining the intersection point of the straight line passing through the fingertip and the nasion and a screen plane, and obtaining the position of a cursor in the screen coordinate system through a conversion matrix from the camera to the screen coordinate system to judge whether a cursor point is in a screen range; and
and segmenting the palm area image of the operator in the color image according to the human body joint position, and identifying and classifying the operation instruction by a pre-trained gesture classification model.
2. The Kinect-based multi-person gesture human-computer interaction method as claimed in claim 1, wherein the step of segmenting the palm area image of the operator from the color image according to the human body joint position and performing recognition and classification on the operation instruction by a pre-trained gesture classification model further comprises:
acquiring the pixel position of the palm of the operator in the color image, and segmenting by taking the palm of the operator as a midpoint to generate a gesture image with a preset size;
and classifying and identifying the gesture image by an artificial neural network, and determining an instruction corresponding to the gesture.
3. The utility model provides a many people gesture human-computer interaction device based on Kinect which characterized in that includes:
the system comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring a color image and a depth image of a scene through a Kinect camera and acquiring the human body joint position and the face characteristic parameters of an operator in the scene;
the first acquisition module is used for acquiring a camera coordinate system, a screen coordinate system and a human face coordinate system and detecting fingertips and cursor points, and specifically comprises the following steps: the camera coordinate system is used as a world coordinate system and the origin thereof is OcCorresponding to the imaging center of the Kinect camera, ZcThe axis coinciding with the optical axis, YcAxis and XcThe axis is parallel to the imaging plane; the origin of the screen coordinate system is OiCorresponding to the upper left corner of the screen, its YiAxis and XiThe axis being parallel to the camera imaging plane, ZiShaft andthe camera coordinate systems are opposite; the origin of the face coordinate system is OfCorresponding to the position of the head joint, ZfThe shaft faces the connecting surface part; the conversion between the coordinate systems is realized by homogeneous transformation, and the transformation matrix from the camera coordinate system to the screen coordinate system is
Figure FDA0002970086620000031
The transformation matrix from the camera coordinate system to the face coordinate system is
Figure FDA0002970086620000032
Wherein the content of the first and second substances,
Figure FDA0002970086620000033
the relative position of the camera and the screen is determined as a fixed parameter of the system;
Figure FDA0002970086620000034
then the head pose provided by the function provided by the Kinect for Windows SDK V2 is obtained, and the value of the head pose is changed continuously; the description in the two coordinate systems { A } and { B } for an arbitrary point p isAp andBp, and the following homogeneous transformation relations:
Figure FDA0002970086620000035
homogeneous transformation matrix
Figure FDA0002970086620000036
Is a 4x4 square matrix:
Figure FDA0002970086620000037
wherein the content of the first and second substances,
Figure FDA0002970086620000038
shows a translation transformation
Figure FDA0002970086620000039
And rotational transformation
Figure FDA00029700866200000310
Wherein the content of the first and second substances,
Figure FDA00029700866200000311
a translation vector representing the coordinate system { B } relative to the coordinate system { A };
Figure FDA00029700866200000312
describe the orientation of the coordinate system { B } relative to the coordinate system { A }; SDK gives the quaternion representation q ═ q0,q1,q2,q3]TIt may represent a rotation operation from coordinate system { A } to coordinate system { B }, with a relationship to the rotation matrix:
Figure FDA00029700866200000313
description from coordinate system { B } to coordinate system { A }
Figure FDA00029700866200000314
Determining { A } description relative to { B }
Figure FDA00029700866200000315
The method specifically comprises the following steps:
Figure FDA00029700866200000316
in a three-dimensional homogeneous coordinate system, a point is coupled with a plane, the representation form of the plane is consistent with that of the point, and the plane is represented by a four-dimensional vector with pi ═ pi (pi ═ pi)1234)TIf point x is (x)1,x2,x3,x4)TOn a plane: pi1x12x23x34x4A plane is defined by three non-collinear points, 0:
Figure FDA0002970086620000041
when the point is on a plane, the matrix M ═ x, x1,x2,x3]The determinant of (a) is 0: detM ═ x1D234-x1D134+x1D124-x1D1230; the plane is represented as: pi ═ D (D)234,-D134,D124,-D123) Wherein D isjklRepresents a 4x 3 matrix x1,x2,x3]The jkl row of (a) constitutes a determinant of the matrix; in the three-dimensional homogeneous coordinate system, a straight line is represented by a 4 × 4 matrix, and a straight line L is represented by two points x1And x2Determining:
Figure FDA0002970086620000042
the intersection of the plane pi with the line L is denoted x: x is L pi;
the detection module is used for detecting the position of the screen origin position in the camera coordinate system and the rotation matrix of the orientation of the screen origin position in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the screen coordinate system to obtain a screen position; specifically, the method comprises the following steps: obtaining a screen origin position O by measurementiPosition in camera coordinate system
Figure FDA0002970086620000043
And rotation matrix representation of its orientation in the camera coordinate system
Figure FDA0002970086620000044
According to
Figure FDA0002970086620000045
Obtaining a transformation matrix from a camera coordinate system to a screen coordinate system
Figure FDA0002970086620000046
And
Figure FDA00029700866200000413
the screen plane, i.e. the xy plane of the screen coordinate system, is then determined
Figure FDA0002970086620000047
Representation in camera coordinate system
Figure FDA0002970086620000048
The second acquisition module is used for acquiring a position of a human head joint under the camera coordinate system and a rotation matrix of the orientation of the human head joint in the camera coordinate system, and acquiring a transformation matrix from the camera coordinate system to the human face coordinate system to obtain the human face direction and the visual field; specifically, the method comprises the following steps: human head joint position O is obtained through Kinect for Windows SDK V2fPosition in camera coordinate system
Figure FDA0002970086620000049
And rotation matrix representation of its orientation in the camera coordinate system
Figure FDA00029700866200000410
According to
Figure FDA00029700866200000411
Obtaining transformation matrix from camera coordinate system to human face coordinate system
Figure FDA00029700866200000412
The processing module is used for obtaining cursor positions corresponding to the two hands of the operator according to the human joint position and the facial characteristic parameters through a light linear propagation principle; the processing module comprises: the detection unit is used for detecting the positions of the fingertips and the noseroots under the camera coordinate system and judging whether the fingertips are in the visual field range or not according to a transformation matrix from the camera coordinate system to a human face coordinate system and the coordinates of the fingertips and the noseroots under the human face coordinate system; the acquisition unit is used for acquiring a transformation matrix from the camera coordinate system to the screen coordinate system through three-dimensional calibration so as to obtain the position of the fingertip under the camera coordinate system and the position of the nasion under the camera coordinate system, acquiring the representation of a straight line passing through the fingertip and the nasion under the camera coordinate system, acquiring the intersection point of the straight line passing through the fingertip and the nasion and a screen plane, and acquiring the position of a cursor in the screen coordinate system through the transformation matrix from the camera to the screen coordinate system so as to judge whether a cursor point is in a screen range; and
and the recognition and classification module is used for segmenting the palm area image of the operator in the color image according to the human body joint position and recognizing and classifying the operation instruction by a pre-trained gesture classification model.
4. The Kinect-based multi-person gesture human-computer interaction device as claimed in claim 3, wherein the recognition and classification module is further configured to obtain pixel positions of the palm of the operator in the color image, segment the palm of the operator as a midpoint to generate a gesture image with a preset size, perform classification and recognition on the gesture image through an artificial neural network, and determine a command corresponding to a gesture.
CN201810921343.XA 2018-08-14 2018-08-14 Kinect-based multi-person gesture man-machine interaction method and device Active CN109145802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810921343.XA CN109145802B (en) 2018-08-14 2018-08-14 Kinect-based multi-person gesture man-machine interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810921343.XA CN109145802B (en) 2018-08-14 2018-08-14 Kinect-based multi-person gesture man-machine interaction method and device

Publications (2)

Publication Number Publication Date
CN109145802A CN109145802A (en) 2019-01-04
CN109145802B true CN109145802B (en) 2021-05-14

Family

ID=64793321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810921343.XA Active CN109145802B (en) 2018-08-14 2018-08-14 Kinect-based multi-person gesture man-machine interaction method and device

Country Status (1)

Country Link
CN (1) CN109145802B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766822B (en) * 2019-01-07 2021-02-05 山东大学 Gesture recognition method and system based on neural network
CN110443154B (en) * 2019-07-15 2022-06-03 北京达佳互联信息技术有限公司 Three-dimensional coordinate positioning method and device of key point, electronic equipment and storage medium
CN110458494A (en) * 2019-07-19 2019-11-15 暨南大学 A kind of unmanned plane logistics delivery method and system
CN112706158B (en) * 2019-10-25 2022-05-06 中国科学院沈阳自动化研究所 Industrial man-machine interaction system and method based on vision and inertial navigation positioning
CN113448427B (en) 2020-03-24 2023-09-12 华为技术有限公司 Equipment control method, device and system
CN111913577A (en) * 2020-07-31 2020-11-10 武汉木子弓数字科技有限公司 Three-dimensional space interaction method based on Kinect
CN116974369B (en) * 2023-06-21 2024-05-17 广东工业大学 Method, system, equipment and storage medium for operating medical image in operation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184021A (en) * 2011-05-27 2011-09-14 华南理工大学 Television man-machine interaction method based on handwriting input and fingertip mouse
CN103370678A (en) * 2011-02-18 2013-10-23 维塔驰有限公司 Virtual touch device without pointer
US20150000026A1 (en) * 2012-06-27 2015-01-01 sigmund lindsay clements Touch Free User Recognition Assembly For Activating A User's Smart Toilet's Devices
CN106909216A (en) * 2017-01-05 2017-06-30 华南理工大学 A kind of Apery manipulator control method based on Kinect sensor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103370678A (en) * 2011-02-18 2013-10-23 维塔驰有限公司 Virtual touch device without pointer
CN102184021A (en) * 2011-05-27 2011-09-14 华南理工大学 Television man-machine interaction method based on handwriting input and fingertip mouse
US20150000026A1 (en) * 2012-06-27 2015-01-01 sigmund lindsay clements Touch Free User Recognition Assembly For Activating A User's Smart Toilet's Devices
CN106909216A (en) * 2017-01-05 2017-06-30 华南理工大学 A kind of Apery manipulator control method based on Kinect sensor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Real-Time Hand Posture Recognition System Using Deep Neural Networks;Ao Tang etal.,;《ACM Transactions on Intelligent Systems and Technology》;20150331;第6卷(第2期);第1-23页 *
基于视线跟踪和手势识别的人机交互;肖志勇 等;《计算机工程》;20090831;第35卷(第15期);第198-200页 *

Also Published As

Publication number Publication date
CN109145802A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109145802B (en) Kinect-based multi-person gesture man-machine interaction method and device
US10394334B2 (en) Gesture-based control system
Wang et al. Real-time hand-tracking with a color glove
CN107665042B (en) Enhanced virtual touchpad and touchscreen
EP2904472B1 (en) Wearable sensor for tracking articulated body-parts
Murthy et al. A review of vision based hand gestures recognition
CN106598227B (en) Gesture identification method based on Leap Motion and Kinect
CN107632699B (en) Natural human-machine interaction system based on the fusion of more perception datas
CN108845668B (en) Man-machine interaction system and method
US20130335318A1 (en) Method and apparatus for doing hand and face gesture recognition using 3d sensors and hardware non-linear classifiers
CN107357427A (en) A kind of gesture identification control method for virtual reality device
CN103135753A (en) Gesture input method and system
Jaemin et al. A robust gesture recognition based on depth data
Dan et al. Survey on hand gesture recognition approaches
O'Hagan et al. Visual gesture interfaces for virtual environments
Roy et al. Real time hand gesture based user friendly human computer interaction system
KR20160141023A (en) The method of dynamic and static gesture recognition using depth camera and interface of immersive media contents
Abdallah et al. An overview of gesture recognition
Xu et al. Bare hand gesture recognition with a single color camera
Chaudhary Finger-stylus for non touch-enable systems
Thomas et al. A comprehensive review on vision based hand gesture recognition technology
Jain et al. Human computer interaction–Hand gesture recognition
Vančo et al. Gesture identification for system navigation in 3D scene
CN114296543A (en) Fingertip force detection and gesture recognition intelligent interaction system and intelligent ring
Varga et al. Survey and investigation of hand motion processing technologies for compliance with shape conceptualization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant