CN111158489A

CN111158489A - Camera-based gesture interaction method and system

Info

Publication number: CN111158489A
Application number: CN201911417209.7A
Authority: CN
Inventors: 林树宏; 张三顺
Original assignee: Shanghai Youjiu Health Technology Co Ltd
Current assignee: Shanghai Youjiu Health Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-15
Anticipated expiration: 2039-12-31
Also published as: CN111158489B

Abstract

The invention relates to the technical field of camera shooting, in particular to a camera-based gesture interaction method and a camera-based gesture interaction system, wherein the method comprises the following steps: step S1, collecting depth data through a depth camera; step S2, intercepting depth data in a rectangular range to calculate a depth mean value; step S3, judging the depth mean value as the distance from the human body to the equipment through the depth mean value; step S4, acquiring a pre-depth value range; s5, acquiring a human body contour according to the depth value in the range of the pre-depth value; step S6, determining hand position points; step S7, determining a hand movable area; step S8, calculating the size of the hand movable area and the screen resolution ratio of the equipment to determine the screen pointer point; and step S9, recognizing the gesture state, and determining a gesture instruction to control the equipment. The invention has the beneficial effects that: the method has the advantages that people can control the equipment in a long distance, the requirement on the resolution ratio of the equipment is low, the response speed is high, and the gesture is simple and easy to operate.

Description

Camera-based gesture interaction method and system

Technical Field

The invention relates to the technical field of camera shooting, in particular to a camera-based gesture interaction method and a camera-based gesture interaction system.

Background

With the development of computer vision and the conversion from two-dimension to three-dimension, the depth camera is widely applied in the market, and some devices such as cameras, body measuring instruments, intelligent body measuring mirrors and body sensing game devices can acquire three-dimensional characteristics of a human body through the depth camera, and can recognize various postures of the human body and even construct a three-dimensional model of the human body according to depth data provided by the camera.

However, in the prior art, these devices require a certain distance from the human body to the device during the use, and when people need to operate the device, the use needs to be temporarily interrupted, and the device is operated before the device is operated, which brings inconvenience to people to control the device in real time.

Disclosure of Invention

Aiming at the problems in the prior art, a camera-based gesture interaction method and a camera-based gesture interaction system are provided.

The specific technical scheme is as follows:

the invention provides a gesture interaction method based on a camera, wherein the camera is arranged on equipment, the equipment comprises a depth camera and a color camera, and the gesture interaction method specifically comprises the following steps:

step S1, acquiring depth data within the shooting range of the equipment through the depth camera;

step S2, intercepting the depth data in a rectangular range by taking the central position of the equipment as a reference, and calculating a depth mean value of the depth data in the rectangular range according to the depth data;

step S3, judging whether human bodies exist in the shooting range of the equipment or not through the depth mean value,

if so, judging that the depth mean value is the distance from the human body to the equipment, and turning to the step S4;

if not, returning to the step S2;

s4, acquiring a pre-depth value range of the human body standing position according to the depth mean value;

step S5, according to the pre-depth value range, judging whether the depth value in the pre-depth value range is smaller or larger than the depth mean value,

if so, removing a first pixel point corresponding to the depth value;

if not, reserving a second pixel point corresponding to the depth value to obtain a human body contour;

step S6, determining a hand position point of the human body according to the position point closest to the equipment in the human body contour;

step S7, acquiring a preset hand movable area according to the hand position point;

step S8, carrying out proportional calculation on the size of the hand movable area and the screen resolution of the equipment to obtain a proportional relation so as to determine that the hand position point corresponds to the screen pointer point of the equipment;

step S9, recognizing the gesture state of the screen pointer point according to the gesture action prestored in the color camera, so as to determine the gesture instruction sent by the corresponding hand position point,

when the gesture state is recognized to be converted from the fist-making state to the opening state, the screen pointer point executes clicking operation according to the corresponding gesture instruction;

and when the gesture state is recognized to be converted into an open state after moving from the fist-making state, the screen pointer point executes dragging operation according to the corresponding gesture instruction.

Preferably, in step S6, a standard hand depth value when the human body stands is calculated and a current hand depth value moved in the human body contour is selected to determine whether a difference between the current hand depth value and the standard hand depth value is greater than a threshold value,

if not, reselecting the current hand depth value in the human body contour;

if so, determining that the hand in the human body contour is in a forward extending state, and determining a position point closest to the equipment in the human body contour so as to determine the hand position point.

Preferably, the step S7 includes:

step S70, determining a lowest height value and a highest height value of the lifted hand position point and a maximum distance of the hand position point moving left and right by calculating the height value of the human body outline;

and step S71, determining the hand movable area according to the minimum height value, the maximum height value and the maximum distance of the hand position point moving left and right.

Preferably, in the step S8, the human body contour is subjected to a mirror image processing to ensure that the left or right of the human body contour in the screen of the device corresponds to the left or right of the human body contour in real situations.

Preferably, in step S9, the color camera includes an image processing library, and the pre-stored gesture actions are extracted from the image processing library.

The invention also provides a gesture interaction method based on the camera, wherein the camera is arranged on equipment, the equipment comprises a depth camera and a color camera, the gesture interaction method is adopted, and the gesture interaction system comprises the following steps:

the acquisition module acquires depth data within the shooting range of the equipment through the depth camera;

the intercepting module is connected with the acquisition module, intercepts the depth data in a rectangular range by taking the central position of the equipment as a reference, and calculates a depth mean value of the depth data in the rectangular range according to the depth data;

the first judging module is connected with the intercepting module, judges whether a human body exists in the shooting range of the equipment or not through the depth mean value, and judges that the depth mean value is the distance from the human body to the equipment if the human body exists;

the first acquisition module is connected with the first judgment module and acquires a pre-depth value range of the human body standing position according to the depth mean value;

the second judging module is connected with the first acquiring module, judges whether the depth value in the pre-depth value range is smaller than or larger than the depth mean value according to the pre-depth value range, and removes the first pixel point corresponding to the depth value if the depth value in the pre-depth value range is smaller than or larger than the depth mean value; otherwise, reserving a second pixel point corresponding to the depth value to obtain a human body contour;

the first determining module is connected with the second judging module and determines a hand position point of the human body according to the position point which is closest to the equipment in the human body contour;

the second acquisition module is connected with the first determination module and acquires a preset hand movable area according to the hand position point;

the second determining module is connected with the second acquiring module and used for carrying out proportional calculation on the size of the hand movable area and the screen resolution of the equipment to obtain a proportional relation so as to determine that the hand position point corresponds to the screen pointer point of the equipment;

the recognition module is connected with the second determination module and recognizes the gesture state of the screen pointer point according to the gesture action prestored in the color camera so as to determine the gesture instruction sent by the corresponding hand position point,

Preferably, the first determination module comprises;

a judging unit, for judging whether the difference between the current hand depth value and the hand standard depth value is greater than a threshold value by calculating a hand standard depth value when the human body stands and selecting a current hand depth value moved in the human body contour, and if the difference between the current hand depth value and the hand standard depth value is less than and/or equal to the threshold value, re-selecting the current hand depth value in the human body contour; if the difference value between the current hand depth value and the hand standard depth value is larger than the threshold value, determining that the hand in the human body contour is in a forward extending state, so as to determine a position point closest to the equipment in the human body contour, and further determine the hand position point.

Preferably, the second obtaining module includes:

the first determining unit is used for determining a lowest height value and a highest height value of the lifted hand position point and a maximum distance of the hand position point moving left and right by calculating the height value of the human body outline;

and the second determining unit is connected with the first determining unit and determines the hand movable area according to the minimum height value, the maximum height value and the maximum distance of the hand position point moving left and right.

The technical scheme of the invention has the beneficial effects that: capturing the distance from a human body to equipment, the contour of the human body and a hand position point through a depth camera, synchronizing the hand position point to a screen pointer point corresponding to the equipment, controlling the movement of the screen pointer point through the movement of the hand position point, recognizing the gesture state of the screen pointer point through a color camera, and executing clicking operation according to a corresponding gesture instruction by the screen pointer point when the gesture state is recognized to be converted from a fist-holding state to an opening state; when the gesture state is recognized, after the gesture state is moved again from the fist-making state, the gesture state is converted into the opening state, then the screen pointer point corresponds to the gesture instruction to execute dragging operation, so that simple and practical gesture interaction is realized, people can remotely control equipment, the method is low in requirement on the resolution ratio of the equipment, the response speed is high, and the gesture is simple and easy to get to the hand.

Drawings

Embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings. The drawings are, however, to be regarded as illustrative and explanatory only and are not restrictive of the scope of the invention.

FIG. 1 is a diagram illustrating steps of a gesture interaction method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the step S6 of the gesture interaction method according to the embodiment of the present invention;

FIG. 3 is a diagram illustrating the step S7 of the gesture interaction method according to the embodiment of the present invention;

FIG. 4 is a block diagram of a gesture interaction system of an embodiment of the present invention;

FIG. 5 is a block diagram of a first determination module of the gesture interaction system of an embodiment of the present invention;

FIG. 6 is a block diagram of a second obtaining module of the gesture interaction system according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.

The invention provides a gesture interaction method based on a camera, wherein the camera is arranged on equipment, the equipment comprises a depth camera and a color camera, and the gesture interaction method is characterized by comprising the following steps:

s1, acquiring depth data within the shooting range of the equipment through a depth camera;

step S2, intercepting depth data in a rectangular range by taking the central position of the equipment as a reference, and calculating a depth mean value of the depth data in the rectangular range according to the depth data;

step S3, judging whether human body exists in the shooting range of the device through the depth mean value,

if yes, judging that the depth mean value is the distance from the human body to the equipment, and turning to the step S4;

if not, returning to the step S2;

step S5, according to the pre-depth value range, determining whether the depth value in the pre-depth value range is smaller or larger than the depth mean value,

if so, removing a first pixel point corresponding to the depth value;

if not, a second pixel point corresponding to the depth value is reserved to obtain a human body contour;

step S8, the size of the hand movable area and the screen resolution of the equipment are subjected to proportional calculation to obtain a proportional relation so as to determine that the hand position point corresponds to the screen pointer point of the equipment;

step S9, recognizing the gesture state of the screen pointer point according to the gesture action pre-stored in the color camera, so as to determine the gesture instruction sent by the corresponding hand position point,

when the gesture state is recognized to be converted from the fist-making state to the opening state, the screen pointer point executes single-click operation according to the corresponding gesture instruction;

and when the gesture state is recognized to be changed into the open state after moving from the fist-making state, the screen pointer point executes dragging operation according to the corresponding gesture instruction.

Through the gesture interaction method, as shown in fig. 1, firstly, depth data in a shooting range of the device is collected through a depth camera of the device, the device can be a camera, a body measuring instrument, an intelligent body measuring mirror, a body sensing game device and the like, when no human body exists in the shooting range of the device, the depth data is a distance between a background and the device, wherein the background generally mainly comprises a ground surface and a wall surface, and the distance between the human body or the wall surface and the device needs to be calculated through the distance, so that the influence of the ground surface needs to be eliminated.

Therefore, the depth data in the rectangular range is intercepted by taking the central position of the equipment as a reference, and the depth mean value of the depth data in the rectangular range is calculated according to the depth data, namely the distance between the human body or the wall surface and the equipment.

Further, in order to recognize the posture of the human body, the human body needs to be separated from the background, namely, human body contour matting is carried out, according to the depth mean value obtained in the above, an approximate range of the pre-depth value of the standing position of the human body can be obtained, according to the range of the pre-depth value, whether the depth value in the range of the pre-depth value is smaller than or larger than the depth mean value is judged, any first pixel point corresponding to the depth value smaller than or larger than the range of the pre-depth value is considered as a background point, and the background point is deleted, and other second pixel points corresponding to the depth value are reserved, so that the human body contour can be obtained.

Further, in order for the movement of the hand position point to control the movement of the screen pointer point of the device, the hand must be lifted and extended forward by a certain distance and then translated in a specific area, in which case, the point closest to the device (i.e., the point with the smallest depth value) is the hand position point for the human body.

In order to determine the hand position point, whether the hand is in a forward extending state needs to be determined, firstly, a standard hand depth value when the human body stands naturally needs to be calculated, when the difference value between the current moving hand depth value in the selected human body contour and the standard hand depth value is larger than a threshold value, the hand is considered to be in the forward extending state, and therefore the hand position point, closest to the equipment, of the hand can be determined according to the hand in the forward extending state. The calculation process of the so-called hand standard depth value is as follows: the number of pixel points in the transverse and longitudinal directions of the human body can be obtained according to the human body contour, so that the position of the gravity center of the human body is determined, when the human body stands naturally, the hand and the gravity center of the human body are basically in the same plane, and the distance between the hand and the gravity center of the human body relative to equipment is basically equal, so that the depth value of the gravity center is used as the standard depth value of the hand.

Further, for the hand movable area, firstly, the height value of the human body contour is calculated, the lowest height value of the lifted hand position point is determined by utilizing the proportion of the height values, if the height value is larger than the lowest height value, the hand position point is considered to be lifted, if the hand position point is higher than the vertex position of the human body, the hand movable area in the longitudinal range can be determined by considering that the hand is lifted too high, then, the hand movable area in the transverse range can be determined by moving left and right according to the hand position point of the human body contour, and finally, the complete hand movable area can be obtained according to the hand movable area in the longitudinal range and the hand movable area in the transverse range.

Further, the human body contour is subjected to a mirroring process to ensure that the left or right of the human body contour in the screen of the device corresponds to the left or right of the human body contour in the real situation, by which a hand position point has been determined, whose coordinate system is based on a hand movable region (W1H 1), expressed in coordinate form as (x1, y 1).

Further, in order to obtain the screen pointer point of the hand position point corresponding to the device, the screen resolution (W2 × H2) of the device needs to be obtained, and the coordinate point (x2, y2) of the screen pointer point corresponding to the device at this time is determined according to the proportional relation between the screen resolution and the size of the hand movable region, so that the screen pointer point can move along with the movement of the hand position point in real time. The mapping relationship is as follows:

x2＝(W2/W1)*x1；y2＝(H2/H1)*y1。

furthermore, the gesture state of the pointer point of the screen is identified according to the gesture action prestored in the color camera, so as to determine the gesture instruction sent by the corresponding hand position point,

when the gesture state is recognized to be changed into the open state after the fist-holding state is moved again, the screen pointer point performs dragging operation according to the corresponding gesture instruction, so that simple and practical gesture interaction is achieved, people can remotely control the equipment, the method is low in requirement on the resolution ratio of the equipment, the response speed is high, and the gesture is simple and easy to get to the hand.

In a preferred embodiment, as shown in FIG. 2, in step S6, a standard hand depth value when the human body is standing is calculated and a current hand depth value moved in the human body contour is selected to determine whether a difference between the current hand depth value and the standard hand depth value is greater than a threshold value,

if not, reselecting the current hand depth value in the human body contour;

if so, determining that the hand in the human body contour is in a forward extending state, and determining the position point closest to the equipment in the human body contour so as to determine the position point of the hand.

In a preferred embodiment, step S7 includes:

step S70, determining a lowest height value and a highest height value of the raised hand position point and a maximum distance of the hand position point moving left and right by calculating the height value of the human body outline;

Specifically, as shown in fig. 3, the height value of the human body contour is calculated, the lowest height value of the lifted hand position point is determined by using the proportion of the height values, the hand position point is considered to be lifted if the height value is larger than the lowest height value, the hand is considered to be lifted too high if the hand position point is higher than the vertex position of the human body, the hand movable region within the longitudinal range can be determined, the hand movable region within the transverse range can be determined by moving left and right according to the hand position point of the human body contour, and finally the hand movable region within the longitudinal range and the hand movable region within the transverse range are obtained, so that the complete hand movable region is obtained.

In a preferred embodiment, in step S8, the human body contour is mirrored to ensure that the left or right of the human body contour in the screen of the device corresponds to the left or right of the human body contour in real-life.

In a preferred embodiment, in step S9, the color camera includes an image processing library, and the pre-stored gesture actions are extracted from the image processing library.

The invention also provides a gesture interaction method based on the camera, the camera is arranged on equipment, the equipment comprises a depth camera and a color camera, wherein, by adopting the gesture interaction method, the gesture interaction system comprises:

the acquisition module 1 acquires depth data within the shooting range of the equipment through the depth camera;

the intercepting module 2 is connected with the acquisition module 1, intercepts depth data in a rectangular range by taking the central position of the equipment as a reference, and calculates a depth mean value of the depth data in the rectangular range according to the depth data;

the first judging module 3 is connected with the intercepting module 2, judges whether a human body exists in the shooting range of the equipment through the depth mean value, and judges that the depth mean value is the distance from the human body to the equipment if the human body exists;

the first acquisition module 4 is connected with the first judgment module 3 and acquires a pre-depth value range of the human body standing position according to the depth mean value;

the second judging module 5 is connected with the first acquiring module 4, judges whether the depth value in the range of the pre-depth value is smaller than or larger than the depth mean value according to the range of the pre-depth value, and removes the first pixel point corresponding to the depth value if the depth value in the range of the pre-depth value is smaller than or larger than the depth mean value; otherwise, a second pixel point corresponding to the depth value is reserved to obtain a human body contour;

the first determining module 6 is connected with the second judging module 5 and determines a hand position point of the human body according to the position point which is closest to the equipment in the human body contour;

the second acquisition module 7 is connected with the first determination module 6 and acquires a preset hand movable area according to the hand position point;

the second determining module 8 is connected with the second acquiring module 7, and is used for carrying out proportional calculation on the size of the hand movable area and the screen resolution of the equipment to obtain a proportional relation so as to determine that the hand position point corresponds to the screen pointer point of the equipment;

a recognition module 9 connected with the second determination module 8 for recognizing the gesture state of the pointer point on the screen according to the gesture action prestored in the color camera, thereby determining the gesture command sent by the corresponding hand position point,

Specifically, as shown in fig. 4, firstly, the acquisition module 1 acquires depth data within a shooting range of the device through a depth camera of the device, where the device may be a camera, a body measuring instrument, a smart body measuring mirror, a motion sensing game device, and the like, and when there is no human body within the shooting range of the device, the depth data is a distance from a background to the device, where the background generally mainly includes a ground surface and a wall surface, and the influence of the ground surface must be eliminated in order to calculate the distance from the human body or the wall surface to the device through the distance.

Therefore, the intercepting module 2 intercepts the depth data in the rectangular range by taking the central position of the equipment as a reference, and calculates the depth mean value of the depth data in the rectangular range according to the depth data, namely the distance between the human body or the wall surface and the equipment.

Further, in order to recognize the posture of the human body, the human body needs to be separated from the background, that is, human body contour matting is performed, and according to the depth mean value obtained in the above, an approximate range of the pre-depth value of the standing position of the human body can be obtained through the first obtaining module 4, whether the depth value in the range of the pre-depth value is smaller than or larger than the depth mean value is judged through the second judging module 5 according to the range of the pre-depth value, any first pixel point corresponding to the depth value smaller than or larger than the range of the pre-depth value is considered as a background point, and is deleted, and the second pixel points corresponding to other depth values are reserved, so that the human body contour can be obtained.

In order to determine the hand position point, it is necessary to determine whether the hand is in a forward-extending state, first, a standard depth value of the hand when the human body stands naturally needs to be calculated, and when a difference value between a current moving hand depth value in the selected human body contour and the standard depth value is greater than a threshold value, the hand is considered to be in the forward-extending state, so that the hand position point closest to the device can be determined by the first determination module 6 according to the hand in the forward-extending state. The calculation process of the so-called hand standard depth value is as follows: the number of pixel points in the transverse and longitudinal directions of the human body can be obtained according to the human body contour, so that the position of the gravity center of the human body is determined, when the human body stands naturally, the hand and the gravity center of the human body are basically in the same plane, and the distance between the hand and the gravity center of the human body relative to equipment is basically equal, so that the depth value of the gravity center is used as the standard depth value of the hand.

Further, for the hand movable region, firstly, the height value of the human body contour is calculated through the second acquisition module 7, the lowest height value of the lifted hand position point is determined by utilizing the proportion of the height values, if the height value is larger than the lowest height value, the hand position point is considered to be lifted, if the hand position point is higher than the vertex position of the human body, the hand lifted is considered to be too high, the hand movable region in the longitudinal range can be determined, then, the hand movable region in the transverse range can be determined by moving left and right according to the hand position point of the human body contour, and finally, the hand movable region in the longitudinal range and the hand movable region in the transverse range are obtained, so that the complete hand movable region is obtained.

Further, by means of the second determination module 8, the body contour is first mirrored to ensure that the left or right of the body contour in the screen of the device corresponds to the left or right of the body contour in real-life, by having determined the hand position point whose coordinate system is based on the hand movable region (W1H 1), expressed in coordinate form as (x1, y 1).

x2＝(W2/W1)*x1；y2＝(H2/H1)*y1。

furthermore, the gesture state of the pointer point of the screen is recognized through the recognition module 9 according to the gesture action prestored in the color camera, so as to determine the gesture instruction sent by the corresponding hand position point,

In a preferred embodiment, as shown in fig. 5, the first determination module 6 comprises;

a judging unit 60, for judging whether the difference between the current hand depth value and the hand standard depth value is greater than a threshold value by calculating a hand standard depth value when the human body stands and selecting a current hand depth value moved in the human body contour, and re-selecting the current hand depth value in the human body contour if the difference between the current hand depth value and the hand standard depth value is less than and/or equal to the threshold value; and if the difference value between the current hand depth value and the hand standard depth value is larger than the threshold value, determining that the hand in the human body contour is in a forward extending state, so as to determine the position point closest to the equipment in the human body contour, and further determine the position point of the hand.

In a preferred embodiment, as shown in fig. 6, the second obtaining module 7 comprises:

a first determining unit 70, for determining a lowest height value and a highest height value of the hand position point and a maximum distance of the hand position point moving left and right by calculating the height value of the human body contour;

and a second determining unit 71 connected to the first determining unit 70 for determining the hand movable region according to the minimum height value, the maximum height value and the maximum distance of the hand position point moving left and right.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A gesture interaction method based on a camera is provided, the camera is arranged on equipment, the equipment comprises a depth camera and a color camera, and the gesture interaction method is characterized by comprising the following steps:

if not, returning to the step S2;

if so, removing a first pixel point corresponding to the depth value;

2. The camera-based gesture interaction method according to claim 1, wherein in step S6, by calculating a hand standard depth value when the human body is standing and selecting a current hand depth value moved in the human body contour, it is determined whether the difference between the current hand depth value and the hand standard depth value is greater than a threshold value,

if not, reselecting the current hand depth value in the human body contour;

3. The camera-based gesture interaction method according to claim 1, wherein the step S7 includes:

4. The camera-based gesture interaction method according to claim 1, wherein in the step S8, the human body contour is mirrored to ensure that the left or right of the human body contour in the screen of the device corresponds to the left or right of the human body contour in real-life.

5. The method as claimed in claim 1, wherein in step S9, the color camera includes an image processing library, and the pre-stored gesture actions are extracted from the image processing library.

6. A gesture interaction method based on a camera, wherein the camera is arranged on a device, the device comprises a depth camera and a color camera, and the gesture interaction method according to any one of claims 1 to 5 is adopted, and the gesture interaction system comprises:

7. The camera-based gesture interaction method according to claim 6, wherein the first determination module comprises;

8. The camera-based gesture interaction method according to claim 6, wherein the second obtaining module comprises: