CN111949134A

CN111949134A - Human-computer interaction method, device and computer-readable storage medium

Info

Publication number: CN111949134A
Application number: CN202010892054.9A
Authority: CN
Inventors: 党伟珍; 董金明; 周峰
Original assignee: Shenzhen TCL Digital Technology Co Ltd
Current assignee: Shenzhen TCL Digital Technology Co Ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2020-11-17

Abstract

The invention discloses a man-machine interaction method, which comprises the following steps: when a target object exists in a preset space area, judging whether the target object is a hand or not; if the target object is the hand, acquiring an operation gesture of the hand; and determining an operation instruction according to the operation gesture, and executing the operation instruction. The invention also discloses a man-machine interaction device and a computer readable storage medium. The invention can be realized by only needing a common binocular camera device, has no special requirement on the selection of a preset space region, only needs a flat table top, does not increase the cost of other hardware, can execute a corresponding operation instruction only by making a certain operation gesture in the preset space region, has low learning cost for a user, and has universal applicability in such a man-machine interaction mode.

Description

Human-computer interaction method, device and computer-readable storage medium

Technical Field

The present invention relates to the field of human-computer interaction, and in particular, to a human-computer interaction method, device, and computer-readable storage medium.

Background

With the rapid development of artificial intelligence, the 3D image recognition technology is mature more, and a more intelligent implementation mode is provided for human-computer interaction.

The existing man-machine interaction mode comprises mouse key (mouse and keyboard) input, gesture input, voice input and the like, wherein the mouse key input needs specific near-field desktop equipment, the occupied space is large, the existing mouse key input generally emits a projection on a special plane, and a user performs gesture operation on the projection.

Disclosure of Invention

The invention mainly aims to provide a human-computer interaction method, a device and a computer readable storage medium, and aims to solve the technical problems of high hardware cost and learning cost of the existing human-computer interaction method.

In addition, in order to achieve the above object, the present invention further provides a human-computer interaction method, including the following steps:

when a target object exists in a preset space area, judging whether the target object is a hand or not;

if the target object is the hand, acquiring an operation gesture of the hand;

and determining an operation instruction according to the operation gesture, and executing the operation instruction.

Optionally, when a target object exists in the preset space region, the step of determining whether the target object is a hand includes:

when a target object exists in a preset space area, acquiring an imaging picture of the target object;

acquiring a color numerical value of a target pixel point corresponding to the target object in the imaging picture;

and judging whether the target object is a hand or not according to the color numerical value.

when a target object exists in a preset space region, acquiring a minimum distance value between the target object and a camera shooting surface of the preset space region and a first distance value between a base surface of the preset space region and the camera shooting surface;

and judging whether the target object is a hand or not according to the minimum distance value and the first distance value.

Optionally, after the step of obtaining the color value of the target pixel point corresponding to the target object in the imaging picture, the method includes:

inputting the color value into a preset classification model, and acquiring a classification result confidence value set output by the preset classification model, wherein the classification result confidence value set comprises a plurality of classification result confidence values, and each classification result confidence value reflects the probability that the color value belongs to each classification result;

and judging whether the target object is a hand or not according to the classification result confidence value with the maximum value in the classification result confidence value set.

Optionally, after the step of determining whether the target object is a hand when the target object exists in the preset space region, the method includes:

judging whether the color numerical value belongs to a preset numerical value interval or not;

if the color numerical value does not belong to the preset interval, judging that the target object is not a hand;

if the color numerical value belongs to the preset interval, determining that the target object is a hand, and executing the step of acquiring the operation gesture of the hand if the target object is the hand.

Optionally, if the target object is the hand, the step of acquiring an operation gesture of the hand includes:

if the target object is the hand, acquiring depth information of the hand;

determining a fingertip operation point of the hand according to the depth information, the base surface and the image pickup surface;

creating a three-dimensional coordinate system based on the preset space region, and acquiring the three-dimensional coordinates of the fingertip operation point in the three-dimensional coordinate system;

and determining the operation gesture of the hand according to the three-dimensional coordinates.

Optionally, the step of determining an operation gesture of the hand according to the three-dimensional coordinates includes:

if the three-dimensional coordinates are not changed within a first preset time period, determining that the operation gesture of the hand is a static gesture;

and if the three-dimensional coordinates change within a second preset time period, determining that the operation gesture of the hand is a dynamic gesture.

Optionally, after the step of determining an operation instruction according to the operation gesture, the method includes:

when a gesture adjusting instruction is received, acquiring a target operation gesture and a target operation instruction corresponding to the gesture adjusting instruction;

and establishing a corresponding relation between the target operation gesture and the target operation instruction, so that when the acquired operation gesture of the hand is the target operation gesture, the operation instruction is determined to be the target operation instruction. In addition, to achieve the above object, the present invention further provides a human-computer interaction device, including: the system comprises a memory, a processor and a human-computer interaction program stored on the memory and capable of running on the processor, wherein the human-computer interaction program realizes the steps of the human-computer interaction method when being executed by the processor.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, on which a human-computer interaction program is stored, and the human-computer interaction program, when executed by a processor, implements the steps of the human-computer interaction method as described above.

The embodiment of the invention provides a man-machine interaction method, equipment and a computer readable storage medium. According to the embodiment of the invention, whether the target object existing in the preset space area is a hand is judged, and when the judgment result is yes, the operation gesture made by the hand in the preset space area is further acquired, so that the operation instruction corresponding to the operation gesture is determined according to the operation gesture, and the operation instruction is executed. The implementation of the man-machine interaction method in the embodiment only needs a common binocular camera device, no special requirements are required for the selection of the preset space region, only a flat table top is needed, the cost of other hardware is not increased, and the user only needs to make certain operation gestures in the preset space region to execute corresponding operation instructions.

Drawings

Fig. 1 is a schematic hardware structure diagram of an implementation manner of a human-computer interaction device according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a human-computer interaction method according to a first embodiment of the present invention;

FIG. 3 is a diagram of a preferred application scenario in a first embodiment of the human-computer interaction method of the present invention;

FIG. 4 is a flowchart illustrating a human-computer interaction method according to a second embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

The man-machine interaction terminal (also called terminal, equipment or terminal equipment) in the embodiment of the invention can be display equipment which can process information, such as a personal computer, a smart phone and the like, also can be camera equipment which at least has a depth camera and a color camera, and also can be data storage equipment such as a memory, a server and the like.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a human-computer interaction program.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke a human-computer interaction program stored in the memory 1005, which when executed by the processor implements operations in the human-computer interaction method provided by the embodiments described below.

Based on the hardware structure of the human-computer interaction equipment, the embodiment of the human-computer interaction method is provided.

Referring to fig. 2, in a first embodiment of the human-computer interaction method of the present invention, the human-computer interaction method includes:

step S10, when a target object exists in the preset spatial region, determine whether the target object is a hand.

The man-machine interaction method in the embodiment is applied to a man-machine interaction terminal, wherein the man-machine interaction terminal comprises a display device which can perform information processing, such as a personal computer and a smart phone, a camera device which at least comprises a depth camera and a color camera, and data storage devices such as a memory and a server.

Therefore, when no object exists in the preset space region, the depth information acquired by the depth camera only includes the distance between the depth camera and the base surface, where the distance is a fixed value, and when an object exists in the preset space region, the depth information acquired by the depth camera includes not only the distance between the depth camera and the base surface, but also the distance between the depth camera and the object, that is, the depth information includes at least two depth values, and by determining the number of the depth values in the depth information, it can be determined whether an object (that is, a target object in this embodiment) exists in the preset space region.

A preferred application scenario of the present embodiment is shown in fig. 3, where a part in fig. 3 is a display device, which can also be used as an information processing, and is also an execution subject of the operation instruction in the present embodiment. Part B is a preset image pickup device in this embodiment, and the preset image pickup device at least includes two types of cameras, namely, a depth camera and a color camera, where the depth camera is used to obtain a depth image in a certain space (i.e., a preset space region in this embodiment) and determine a position of an object in the preset space region, and the specific principle is as follows: the depth camera comprises two lenses of an infrared emitter and an infrared Complementary Metal Oxide Semiconductor (CMOS) camera, the depth camera adopts an optical coding technology, the infrared emitter emits a three-dimensional code with three-dimensional depth, a light source of the optical coding is a laser speckle (a random diffraction spot obtained after laser irradiates a rough object or penetrates through ground glass), the laser speckle has high three-dimensional space randomness, and after the light source calibration is completed, a speckle pattern in the space is recorded, so that when an object is placed in a preset space area, the speckle pattern on the surface of the object can be obtained, the position of the object can be known, and further a depth image of the object in the preset space area can be obtained.

In fig. 3, the part C is a flat table top, the preset space area is based on the table top, a user can make an operation gesture with a hand in the preset space area, after the position and the operation gesture of the hand of the user are obtained by the depth camera, an operation instruction corresponding to the operation gesture is determined, and the human-computer interaction program sends the operation instruction to the part a and executes the operation instruction in the part a. Part D in fig. 3 is a hand in this embodiment, and it is known that, when the depth camera detects that an object enters a preset space region, the color camera acquires color information of the object, and the depth camera acquires depth information of each point of the object to further obtain a space structure of the object, and it is known that the difference between skin colors of normal hands of the same race is not large, and the color value of the skin color of the hand should be in a certain range.

Step S20, if the target object is the hand, acquiring an operation gesture of the hand.

In this embodiment, the operation gesture of the hand refers to a series of actions performed by the hand in a preset space region, the actions need to be standardized and have a certain motion rule, the standardization may be that the actions cannot be continuous, and a certain time interval needs to be provided between two adjacent actions, so that a human-computer interaction program can obtain a specific operation gesture corresponding to the actions. It is understood that the method for obtaining these actions may be an existing skeletal tracking technique, and the embodiment will not be described in detail.

And step S30, determining an operation instruction according to the operation gesture, and executing the operation instruction.

It can be known that, in this embodiment, the corresponding relationship between the operation gesture and the operation instruction is preset, so that when the human-computer interaction program determines the operation gesture of the hand, the operation instruction is determined at the same time, because the operation habits of different users are different, in order to facilitate the use of different users, in this embodiment, the corresponding relationship between the operation gesture and the operation instruction may be that a plurality of operation gestures correspond to one operation instruction, but it is to avoid the problem that one operation gesture corresponds to a plurality of operation instructions, in this embodiment, an example of the corresponding relationship between the operation gesture and the operation instruction is given, for example, when the hand leaves a base surface (a surface in a preset space region, on which the hand can be placed) once within 0.1 second, the human-computer interaction program determines that the operation gesture is a click operation, and the corresponding operation instruction is a click; when the hand leaves the base surface twice within 0.1 second, the man-machine interaction program judges that the operation gesture is double-click operation, and the corresponding operation instruction is double-click.

Specifically, the step S10 is a step of refining, including:

step a1, when a target object exists in a preset space area, acquiring an imaging picture of the target object.

Step a2, obtaining a color value of a target pixel point corresponding to the target object in the imaging picture.

Step a3, judging whether the target object is a hand according to the color value.

When the human-computer interaction program determines that a target object exists in the preset spatial region, the human-computer interaction program further acquires an imaging picture of the target object, and acquires a pixel point corresponding to the target object in the imaging picture, that is, the target pixel point in the present embodiment, further, the human-computer interaction program acquires color information of the target pixel point, acquires an RGB (color standard) format image of the target pixel point through a color camera, and performs HSV conversion on the RGB format image. For example, if the RGB values are 255, 255 and 255 (which correspond to the brightest white tones), respectively, and the HSV values after conversion by HSV are 0 °, 0 and 1, respectively, which correspond to bright white. As can be seen, the range of HSV values corresponding to the skin color of the hands in Asia is more than or equal to 0 degrees and less than or equal to 20 degrees, S is more than or equal to 48 degrees, and V is more than or equal to 50 degrees, which is the preset value range in the embodiment. When the human-computer interaction program obtains the color value (HSV value after RGB conversion) of the target pixel point, it can be known that the color value is within the preset value range only if each value of the color value is within the corresponding value range, and further, the target object corresponding to the target pixel point can be preliminarily determined to be a hand.

Specifically, the step refined in step S10 further includes:

step b1, when a target object exists in a preset space region, acquiring a minimum distance value between the target object and a camera shooting surface of the preset space region, and a first distance value between a base surface of the preset space region and the camera shooting surface.

Step b2, determining whether the target object is a hand according to the minimum distance value and the first distance value.

It should be noted that the image pickup plane in this embodiment refers to the top surface of the preset spatial region, and is opposite to the base surface in the above embodiment, and it can be understood that, if the preset spatial region is a cube, the first distance value in this embodiment is the height of the cube, and the minimum distance value between the target object and the image pickup plane is the shortest distance between the target object in the cube and the top surface of the cube, and subtracting the minimum distance value between the target object and the image pickup plane from the first distance value is substantially the maximum thickness of the target object, and if the maximum thickness is within a value range, the human-computer interaction program determines that the target object is a hand, and the value range is the thickness of the hand in a normal condition and the thickness of the hand rolled up.

Specifically, the step a2 is further comprising:

step c1, inputting the color value into a preset classification model, and obtaining a classification result confidence value set output by the preset classification model, wherein the classification result confidence value set comprises a plurality of classification result confidence values, and each classification result confidence value reflects the probability that the color value belongs to each classification result.

And c2, judging whether the target object is a hand or not according to the classification result confidence value with the maximum value in the classification result confidence value set.

Therefore, the skin color of the same hand may have a certain difference, for example, people with scars on the hand or some vitiligo patients may generate an inaccurate judgment result when the color value of a certain point on the hand is randomly selected for the numerical range judgment, for example, if the randomly selected point on the hand is the vitiligo part of the vitiligo patient, the obtained color value is certainly not within the preset numerical range, and the judgment result is not the hand. In order to solve the problem, in this embodiment, a preset classification model is introduced, a color camera randomly obtains color values of multiple points on a target object, and inputs the color values into the preset classification model to obtain a classification result confidence value set output by the preset classification model, it is known that the classification result confidence value in the classification result confidence value set represents the probability of which color (classification result) the color value belongs to, after the obtained color value corresponding to the target object is input into the preset classification model, the preset classification model outputs a set of confidence values, each confidence value in the set of confidence values corresponds to a certain color, that is, each confidence value corresponds to a color classification result, the color classification result corresponding to the target confidence value with the largest value in the confidence value set is the color corresponding to the color value, and further determines whether the target object is a hand according to the color, therefore, the color values input into the preset classification model should be sufficient, that is, the color camera randomly obtains the color values of more points on the target object, and the points should cover the whole surface of the target object, so that the obtained classification result is more accurate.

Specifically, the steps after the step a2 further include:

and d1, judging whether the color value belongs to a preset value interval.

And d2, if the color value does not belong to the preset interval, determining that the target object is not a hand.

And d3, if the color value belongs to the preset interval, determining that the target object is a hand.

Therefore, as can be seen from the above description of the embodiment, it is determined that the color value is within the preset value range only if each value of the color value is within the corresponding value range, and if the color value is not within the preset value range, the human-computer interaction program determines that the target object is not a hand, and if the color value is within the preset value range, the human-computer interaction program may further preliminarily determine that the target object corresponding to the target pixel point is a hand, and then execute the next step.

Specifically, the steps subsequent to step S40 further include:

step e1, when receiving the gesture adjustment instruction, obtaining a target operation gesture and a target operation instruction corresponding to the gesture adjustment instruction.

Step e2, establishing a corresponding relationship between the target operation gesture and the target operation command, so that when the acquired operation gesture of the hand is the target operation gesture, it is determined that the operation command is the target operation command.

Therefore, the corresponding relationship between the operation gesture of the hand and the operation instruction is preset, and the corresponding relationship is modifiable, and in consideration of the difference of the operation habits of each person, the man-machine interaction method in the embodiment supports the manual adjustment of the corresponding relationship between the operation gesture and the operation instruction for goods inspection.

In this embodiment, by determining whether a target object existing in the preset space region is a hand, and if the determination result is yes, an operation gesture made by the hand in the preset space region is further obtained, so that an operation instruction corresponding to the operation gesture is determined according to the operation gesture, and the operation instruction is executed. The implementation of the man-machine interaction method in the embodiment only needs a common binocular camera device, no special requirements are required for the selection of the preset space region, only a flat table top is needed, the cost of other hardware is not increased, and the user only needs to make certain operation gestures in the preset space region to execute corresponding operation instructions.

Further, referring to fig. 4, a second embodiment of the human-computer interaction method of the present invention is provided on the basis of the above-mentioned embodiment of the present invention.

This embodiment is a step of the first embodiment, which is a refinement of step S20, and the difference between this embodiment and the above-described embodiment of the present invention is:

step S21, if the target object is the hand, acquiring depth information of the hand.

In this embodiment, the depth information of the hand is the distance from each point on the hand to the depth camera, so that the three-dimensional structure of the hand can be acquired.

Step S22 is to determine a fingertip operation point of the hand based on the depth information, the base plane, and the imaging plane.

Therefore, after the depth camera acquires the depth information of the hand, whether the hand is in contact with the base surface is judged first, and on the premise that the hand is in contact with the base surface, the distance from each point on the hand to the depth camera is further acquired, so that in the contact part of the hand and the base surface, a point closest to the depth camera is selected as an operable point to ensure that the operable point is in contact with the initial spatial relationship of the base surface, this also conforms to the user's general operating habits.

Each point of the hand corresponds to a determined coordinate value in the three-dimensional coordinate system, the three-dimensional structure of the hand is obtained through the depth camera, and the distance from each point on the hand to the depth camera can be determined, so that a point on the hand closest to the depth camera can be determined, the point is a finger operation point in the embodiment, as shown in a in fig. 3, according to the common operation habit of the user, the operation point is the index finger tip of the hand. After the operable point is determined, the spatial relationship between the operable point and the base surface is determined through the depth camera, that is, the corresponding operation gesture can be determined, for example, when the operable point only moves horizontally on the base surface, the corresponding operation gesture is a moving operation, when the operable point leaves the base surface once within a preset time period (for example, 0.1 second), the corresponding operation gesture is a single-click operation, and when the operable point leaves the base surface twice within the preset time period, the corresponding operation gesture is a double-click operation. After the operation gesture is determined, the corresponding operation instruction is executed by the display device in part a of fig. 3.

Step S23, creating a three-dimensional coordinate system based on the preset spatial region, and acquiring a three-dimensional coordinate of the fingertip operation point in the three-dimensional coordinate system.

As can be seen from the preferred application scenario in the first embodiment, there is no strict requirement for selecting the preset spatial region, and only one flat desktop needs to be selected, where the flat desktop is the base surface in this embodiment, and the size of the base surface is not strictly limited, but due to technical limitations, the base surface is not too large, but can meet the operation habit of the user. A three-dimensional coordinate system is established on the base surface, namely the three-dimensional coordinate system is established by taking the base surface as an X-Y plane and taking a line vertical to the surface as a Z axis.

And step S24, determining the operation gesture of the hand according to the three-dimensional coordinates.

Specifically, the step S24 is a step of refining, including:

and e1, if the three-dimensional coordinates are not changed within a first preset time period, determining that the operation gesture of the hand is a static gesture.

And e2, if the three-dimensional coordinates change within a second preset time period, determining that the operation gesture of the hand is a dynamic gesture.

Referring to the common operation habit of using a mouse, when a user wants to perform a click operation, the user slightly lifts a finger and then presses the finger, the operation action corresponding to the hand on the base surface is to leave the base surface once within a first preset time period and then return to the base surface, and the first preset time period can be preset according to the data processing speed of the human-computer interaction device and can be 0.1 second. When people want to perform double-click operation, the fingers are slightly lifted, then pressed down, then the fingers are slightly lifted again, and then pressed down again, the operation action corresponding to the hands on the basic surface is to leave the basic surface twice within a second preset time period, and then return to the basic surface, wherein the second preset time period is also preset according to the data processing speed of the human-computer interaction equipment, is slightly longer than the first preset time period, and can be 0.2 second. When people want to perform moving operation, the hand can be moved towards any direction on the premise of not exceeding the preset range, the operation action corresponding to the hand on the base surface is to move towards any direction on the premise of not exceeding the preset range, different operation gestures correspond to different operation instructions, the corresponding relation between the operation gestures and the operation instructions can be automatically modified according to the habit of a user, and therefore, when the three-dimensional coordinates corresponding to the hand in the three-dimensional coordinate system do not change within a first preset time period, the hand is represented not to move, and at the moment, the operation gestures of the hand are static gestures; and when the three-dimensional coordinates of the hand corresponding to the three-dimensional coordinate system change within a second preset time period, the hand is moved, and at the moment, the operation gesture of the hand is a dynamic gesture.

In the embodiment, the three-dimensional coordinate system is established on the basis, the three-dimensional information of the hand is acquired, different operation gestures are determined, the operation instruction corresponding to the operation gesture is further determined, and the display device executes the operation instruction.

In addition, an embodiment of the present invention further provides a human-computer interaction device, where the human-computer interaction device includes:

the device comprises a judging module, a judging module and a judging module, wherein the judging module is used for judging whether a target object is a hand or not when the target object exists in a preset space region;

an operation gesture obtaining module, configured to obtain an operation gesture of the hand if the target object is the hand;

and the operation execution module is used for determining an operation instruction according to the operation gesture and executing the operation instruction.

Optionally, the determining module includes:

the imaging image acquisition unit is used for acquiring an imaging image of a target object when the target object exists in a preset space region;

the color numerical value acquisition unit is used for acquiring the color numerical value of a target pixel point corresponding to the target object in the imaging picture;

and the first judgment unit is used for judging whether the target object is a hand or not according to the color numerical value.

Optionally, the determining module further includes:

the distance value acquisition unit is used for acquiring a minimum distance value between a target object and a camera shooting surface of a preset space region and a first distance value between a base surface of the preset space region and the camera shooting surface when the target object exists in the preset space region;

and the second judging unit is used for judging whether the target object is a hand or not according to the minimum distance value and the first distance value.

Optionally, the color numerical value obtaining unit includes:

the model input unit is used for inputting the color numerical value into a preset classification model and acquiring a classification result confidence value set output by the preset classification model;

and the screening unit is used for judging whether the target object is a hand or not according to the classification result confidence value with the maximum value in the classification result confidence value set.

Optionally, the human-computer interaction device includes:

the judging module is used for judging whether the color numerical value belongs to a preset numerical value interval or not;

the first judgment unit is used for judging that the target object is not a hand if the color numerical value does not belong to the preset interval;

and the second judgment unit is used for judging that the target object is a hand if the color numerical value belongs to the preset interval.

Optionally, the operation gesture obtaining module includes:

a depth information acquisition unit configured to acquire depth information of the hand if the target object is the hand;

a fingertip operation point determination unit configured to determine a fingertip operation point of the hand based on the depth information, the base plane, and the imaging plane;

the three-dimensional coordinate acquisition unit is used for creating a three-dimensional coordinate system based on the preset space region and acquiring the three-dimensional coordinates of the fingertip operation point in the three-dimensional coordinate system;

and the operation gesture determining unit is used for determining the operation gesture of the hand according to the three-dimensional coordinates.

Optionally, the operation gesture determination unit includes:

the static gesture determining unit is used for determining that the operation gesture of the hand is a static gesture if the three-dimensional coordinate is not changed within a first preset time period;

and the dynamic gesture determining unit is used for determining that the operation gesture of the hand is a dynamic gesture if the three-dimensional coordinate changes within a second preset time period.

Optionally, the operation gesture determination unit includes:

and the operation instruction acquisition module is used for acquiring a target operation gesture and a target operation instruction corresponding to the gesture adjustment instruction when the gesture adjustment instruction is received.

And the corresponding relation establishing module is used for establishing the corresponding relation between the target operation gesture and the target operation instruction so as to determine that the operation instruction is the target operation instruction when the acquired operation gesture of the hand is the target operation gesture.

The method executed by each program module can refer to each embodiment of the method of the present invention, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a tablet computer, etc.) to execute the human-computer interaction method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A human-computer interaction method is characterized by comprising the following steps:

if the target object is the hand, acquiring an operation gesture of the hand;

2. The human-computer interaction method of claim 1, wherein when the target object exists in the preset spatial region, the step of determining whether the target object is a hand comprises:

3. The human-computer interaction method of claim 1, wherein when the target object exists in the preset spatial region, the step of determining whether the target object is a hand comprises:

4. The human-computer interaction method according to claim 2, wherein the step of obtaining the color value of the target pixel point corresponding to the target object in the imaged picture comprises:

5. The human-computer interaction method according to claim 2, wherein the step of obtaining the color value of the target pixel point corresponding to the target object in the imaged picture comprises:

and if the color numerical value belongs to the preset interval, judging that the target object is a hand.

6. A human-computer interaction method as claimed in claims 1 to 3, wherein if the target object is the hand, the step of acquiring the operation gesture of the hand comprises:

if the target object is the hand, acquiring depth information of the hand;

7. A human-computer interaction method as claimed in claim 6, characterized in that the step of determining an operating gesture of a hand from the three-dimensional coordinates comprises:

8. The human-computer interaction method as claimed in claim 1, wherein the step of determining an operation instruction according to the operation gesture comprises, after:

and establishing a corresponding relation between the target operation gesture and the target operation instruction, so that when the acquired operation gesture of the hand is the target operation gesture, the operation instruction is determined to be the target operation instruction.

9. A human-computer interaction device, characterized in that the human-computer interaction device comprises: memory, a processor and a human-computer interaction program stored on the memory and executable on the processor, the human-computer interaction program when executed by the processor implementing the steps of the human-computer interaction method as claimed in any one of claims 1 to 8.

10. A computer-readable storage medium, having a human-computer interaction program stored thereon, which, when executed by a processor, implements the steps of the human-computer interaction method of any one of claims 1 to 8.