CN114201054A

CN114201054A - Method for realizing non-contact human-computer interaction based on head posture

Info

Publication number: CN114201054A
Application number: CN202210150603.4A
Authority: CN
Inventors: 袁宏宇; 刘国清; 杨广; 王启程; 徐涵; 全丹辉
Original assignee: Shenzhen Minieye Innovation Technology Co Ltd
Current assignee: Shenzhen Minieye Innovation Technology Co Ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-03-18

Abstract

The invention discloses a method for realizing non-contact human-computer interaction based on head gestures, which comprises the following steps: establishing a 3D face model, establishing a camera coordinate system, collecting a face image video data stream of a user in real time, automatically positioning facial feature points through a feature point coordinate determination unit, outputting feature point coordinates of a current frame, determining an output result of the unit according to the feature point coordinates, selecting the feature points corresponding to the points in the 3D face model by a head posture determination unit, and calculating a pitch angle and a yaw angle under the camera coordinate system; recording a yaw angle and a pitch angle output for the first time as initial positions, enabling the current angle to correspond to a central point on a display, and calculating a difference value of the angles to be multiplied by a coefficient to correspond to pixel points of a screen; the invention can complete non-contact and continuous human-computer interaction under the condition of low image quality, does not need to wear related equipment, and has low cost and high identification accuracy.

Description

Method for realizing non-contact human-computer interaction based on head posture

Technical Field

The invention relates to the technical field of computers, in particular to a method for realizing non-contact human-computer interaction based on head gestures.

Background

The existing human-computer interaction scheme is basically completed by direct contact between a human and a machine, such as direct hand touch equipment, sliding click through a keyboard and a mouse and the like, and along with the rapid development of artificial intelligence technology, the human-computer interaction scheme with direct contact between the human and the machine cannot meet all application scenes; in addition, a sight line tracking scheme is adopted, so that the problems of low precision and high cost when the distance is too far exist, and high requirements on image quality are met; the scheme based on the gesture is difficult to continuously send out the instruction, and the aim can be achieved only by making corresponding gestures for many times.

The Chinese invention with publication number CN104123002B discloses a wireless somatosensory mouse based on head movement, which comprises a movement acquisition module, a data processing module and a control module, wherein the movement acquisition module is used for acquiring head movement information, recording head movement signals and transmitting head movement signals to the data processing module; the data processing module is used for receiving the head movement data transmitted by the motion acquisition module and processing the data to obtain data required by controlling a computer cursor; the wireless receiving and sending module is used for realizing wireless transmission of data between the equipment and the computer; the power module is used for providing a working power supply for the motion acquisition module, the data processing module and the wireless receiving and sending module, however, the wireless somatosensory mouse cannot meet all application scenes, and still needs to be carried on the head to move the head so as to move a cursor.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a method for realizing non-contact human-computer interaction based on head gestures, which can realize human-computer interaction without carrying equipment.

The purpose of the invention is realized by the following technical scheme:

a method for realizing non-contact human-computer interaction based on head gestures comprises the following steps:

step one, establishing a 3D face model, ensuring that the number of characteristic points of the 3D face model is matched with the output result of a characteristic point coordinate determination unit, and establishing a head coordinate system by using the 3D face model;

establishing a camera coordinate system, acquiring a human face image video data stream of a user in real time, and establishing an image coordinate system for a single-frame image;

step three, determining a unit to automatically position facial feature points through feature point coordinates, and outputting feature point coordinates of a current frame, wherein the feature point coordinates are located in an image coordinate system;

fourthly, according to the output result of the feature point coordinate determination unit, the head posture determination unit selects the feature points to correspond to the points in the 3D face model, and the pitch angle and the yaw angle under the camera coordinate system are calculated;

fifthly, according to the yaw angle and the pitch angle output by the head attitude determination unit, recording the yaw angle and the pitch angle output for the first time as initial positions by using an angle-coordinate conversion unit, enabling the current angle to correspond to a central point on a display, and calculating the difference value of the angles to be multiplied by a coefficient to correspond to pixel points on a screen;

and step six, moving the head to realize non-contact human-computer interaction.

Furthermore, the single-frame image in the second step is cut and rotationally scaled to meet the input requirement of the characteristic point coordinate determination power supply.

Further, the characteristic points comprise left and right intraocular canthi, bridge of nose, tip of nose and chin point.

Furthermore, the angle ranges of the pitch angle and the yaw angle are-90 degrees.

Further, when the absolute value of the pitch angle is greater than 25 degrees or the absolute value of the yaw angle is greater than 40 degrees, angle-coordinate conversion is not performed, and the conversion formula is as follows:

wherein x and y are coordinates of the characteristic point in the image coordinate system, U, V and W are coordinates of the characteristic point in the head coordinate system, R is a rotation matrix, T is a translation vector, f_x， f_y， c_x，c_yIs a camera distortion parameter.

Further, the calculation formula of the coordinates of the feature points in the image coordinate system is as follows:

wherein: k is a coefficient, yaw_sYaw as the initial position_nPitch being the yaw angle of the current position_sPitch angle of initial position, pitch_nThe pitch angle for the current position.

Further, the size of the k value is positively correlated with the size of the display.

Further, the method further comprises: calculating whether the head of the user is excessively pitched or not according to the change of the feature points in the facial image of the user; and if the pitching is determined to be excessive, locking the motion of the moving cursor.

Compared with the prior art, the invention has the following advantages and beneficial effects:

according to the invention, the non-contact and continuous human-computer interaction between the user and the equipment can be realized by capturing the head posture and the characteristic points, the quality requirement and the cost of the picture are lower than those of a sight tracking scheme, the human-computer interaction can be accurately performed when the distance is long, the user does not need to continuously perform the same or different gestures for multiple times to realize the human-computer interaction, the non-contact and continuous human-computer interaction can be also completed under the condition of low image quality, the cursor movement is controlled by moving the head, the cost is low, the recognition accuracy is high, the speed is high, and the method is suitable for market popularization.

Drawings

FIG. 1 is a flow chart of a method for implementing non-contact human-computer interaction based on head pose;

fig. 2 is a schematic product side view.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is to be understood that the terms "comprises," "comprising," and "having" and any variations thereof in the description and claims of the invention and the above-described drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

As shown in FIG. 1, a method for realizing non-contact human-computer interaction based on head gestures comprises the following steps

The method comprises the following steps:

step one, establishing a 3D face model, ensuring that the number of characteristic points of the 3D face model is matched with the output result of a characteristic point coordinate determination unit, and establishing a head coordinate system by using the 3D face model; the 3D face model is used for describing a point set of three-dimensional coordinate information of feature points of a face, a head coordinate system is established by taking a head as a central point in the real world, the positive direction of the face is the positive direction of a Z axis, the positive direction of the top of the head is the positive direction of a Y axis, and the positive direction of the left ear is the positive direction of an X axis;

establishing a camera coordinate system, acquiring a human face image video data stream of a user in real time, and establishing an image coordinate system for a single-frame image; a coordinate system established by taking the camera 1 as a central point in the real world is called a camera coordinate system, the positive direction of the camera 1 is the positive direction of a Z axis, the positive direction of the top is the positive direction of a Y axis, and the positive direction of the left side is the positive direction of an X axis; the image coordinate system is a coordinate system established with the image presented on the display 2, the right direction is the positive direction of the X axis, and the upper direction is the positive direction of the Y axis;

thirdly, automatically positioning facial feature points by a feature point coordinate determination unit according to a trained human face 68 point key point regression neural network model, and outputting feature point coordinates of the current frame, wherein the feature point coordinates are located in an image coordinate system;

fourthly, according to the output result of the feature point coordinate determination unit, the head posture determination unit selects feature points corresponding to points in the 3D face model, calculates feature points of a head coordinate system of the 3D face model to an affine transformation matrix of the feature points on an image coordinate system, and calculates a pitch angle and a yaw angle under a camera coordinate system by using rotation information in the affine transformation matrix, wherein the affine transformation means that in geometry, a vector space is subjected to linear transformation once and is connected with translation, and is transformed into another vector space, wherein an angle rotating around an X axis is called a pitch angle, and an angle rotating around a Y axis is called a yaw angle;

step five, according to the yaw angle and the pitch angle output by the head attitude determination unit, recording the yaw angle and the pitch angle output for the first time as initial positions by using an angle-coordinate conversion unit, enabling the current angle to correspond to a central point on the display 2, enabling the current angle to correspond to the central point on the display 2, subtracting the pitch angle and the yaw angle obtained subsequently from the pitch angle and the yaw angle of the initial positions, calculating the difference value of the angles to be multiplied by a coefficient to correspond to pixel points of a screen, and enabling the head attitude to represent the angles of the head pitch angle, the yaw angle and the roll angle, wherein the pitch angle and the yaw angle are continuous, so that non-contact continuous man-machine interaction can be realized;

sixthly, moving the head to realize non-contact human-computer interaction;

the cursor is controlled to move by moving the head, an interaction result is displayed on the display 2, the result is fed back to the camera 1, head portrait data of the user is continuously obtained, and the user can continuously send a cursor moving instruction and cannot be stuck.

In addition, the single-frame image in the second step is cut and rotationally scaled to meet the input requirement of the characteristic point coordinate determination power supply.

The feature points comprise left and right eye canthi inside the eye, nose bridge, nose tip and chin point, if the selected feature points are shielded, the change can be made according to the actual situation, if the feature points can also select left and right eye external canthus, left and right mouth internal canthus and the like from key points of the face 68.

The angle range of the pitch angle and the yaw angle is-90 degrees. When the absolute value of the pitch angle is larger than 25 degrees or the absolute value of the yaw angle is larger than 40 degrees, angle-coordinate conversion is not carried out, the maximum angle limit of the pitch angle and the yaw angle can be changed according to the actual situation, wherein the conversion formula is as follows:

（1）

The calculation formula of the coordinates of the feature points under the image coordinate system is as follows:

（2）

The size of the k value is positively correlated with the size of the display 2, and the coordinates of the feature points are obtained through the formula (1) and the formula (2).

The method for realizing non-contact human-computer interaction based on the head posture further comprises the following steps: calculating whether the head of the user is excessively pitched or not according to the change of the feature points in the facial image of the user; and if the pitching is determined to be excessive, locking the motion of the moving cursor.

As shown in fig. 2, the working principle of the present invention is as follows:

the method comprises the steps that a user moves the head, a camera 1 captures feature points set on a human face, a feature point coordinate determination unit automatically positions facial feature points, a head posture determination unit selects the feature points to correspond to points in a 3D human face model, a yaw angle and a pitch angle are calculated, an angle-coordinate conversion unit is used for recording the initial positions of the pitch angle and the yaw angle, the subsequently acquired pitch angle and yaw angle are subtracted from the pitch angle and the yaw angle of the initial positions to obtain a difference value, the difference value of the calculated angles is multiplied by a coefficient and corresponds to pixel points of a display 2, when the user rotates the head, the points in the display 2 can move along with the rotation of the user's head, the user can conveniently determine the position of the current head posture in the display 2, and non-contact human-computer interaction without carrying equipment is achieved.

It should be understood that the above-described embodiments are merely preferred embodiments of the present invention and the technical principles applied thereto, and that any changes, modifications, substitutions, combinations and simplifications made by those skilled in the art without departing from the spirit and principle of the present invention shall be regarded as equivalent substitutions and shall be covered by the protection scope of the present invention.

Claims

1. A method for realizing non-contact human-computer interaction based on head gestures is characterized by comprising the following steps:

2. The method for realizing non-contact human-computer interaction based on head gestures according to claim 1, characterized in that: and cutting and rotating and scaling the single-frame image in the second step until the single-frame image meets the input requirement of the characteristic point coordinate determination power supply.

3. The method for realizing non-contact human-computer interaction based on head gestures according to claim 2, characterized in that: the characteristic points comprise left and right intraocular canthi, nose bridge, nose tip and chin point.

4. The method for realizing non-contact human-computer interaction based on head gestures according to claim 3, characterized in that: the angle range of the pitch angle and the yaw angle is-90 degrees.

5. The method for realizing non-contact human-computer interaction based on head gestures according to claim 4, characterized in that: when the absolute value of the pitch angle is larger than 25 degrees or the absolute value of the yaw angle is larger than 40 degrees, angle-coordinate conversion is not performed, and the conversion formula is as follows:

wherein x and y are coordinates of the characteristic point in the image coordinate system, U, V and W are coordinates of the characteristic point in the head coordinate system, R is a rotation matrix, and T isTranslation vector, f_x， f_y， c_x，c_yIs a camera distortion parameter.

6. The method for realizing non-contact human-computer interaction based on head gestures according to claim 5, characterized in that: the calculation formula of the coordinates of the feature points in the image coordinate system is as follows:

7. The method for realizing non-contact human-computer interaction based on head gestures according to claim 6, characterized in that: the magnitude of the k value is positively correlated to the size of the display.

8. The method for realizing non-contact human-computer interaction based on head gestures according to claim 7, characterized in that: the method further comprises the following steps: calculating whether the head of the user is excessively pitched or not according to the change of the feature points in the facial image of the user; and if the pitching is determined to be excessive, locking the motion of the moving cursor.