CN113039550A

CN113039550A - Gesture recognition method, VR (virtual reality) visual angle control method and VR system

Info

Publication number: CN113039550A
Application number: CN201880098482.5A
Authority: CN
Inventors: 郑欣
Original assignee: Shenzhen Autel Intelligent Aviation Technology Co Ltd
Current assignee: Shenzhen Autel Intelligent Aviation Technology Co Ltd
Priority date: 2018-10-10
Filing date: 2018-10-10
Publication date: 2021-06-25
Also published as: WO2020073245A1

Abstract

The embodiment of the invention relates to a gesture recognition method, a VR visual angle control method and a VR system. The gesture recognition method comprises the following steps: acquiring depth information (310); acquiring spatial point cloud information (320) according to the depth information; determining a target area in the spatial point cloud information, the target area referring to an area containing hand point cloud information (330); generating a planar image (340) corresponding to the target region; extracting edge points of a hand in the plane image (350); determining the number of fingers in the hand from the edge points of the hand (360); determining a gesture of the hand based on the number of fingers (370). The method realizes the detection and recognition of the gesture through geometric analysis, does not need sample learning algorithms such as machine learning, can effectively reduce the calculation amount of gesture recognition, simultaneously ensures higher gesture recognition rate, and meets the application requirements of low power consumption, low time delay and low computational force platforms.

Description

Gesture recognition method, VR (virtual reality) visual angle control method and VR system

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of virtual reality, in particular to a gesture recognition method, a VR visual angle control method and a VR system.

[ background of the invention ]

Virtual Reality (Virtual Reality VR) is a technology that generates a simulation environment with extremely high simulation degree by using related devices, and immerses a user in the simulation environment through three-dimensional interaction, simulation and the like to obtain excellent use experience.

Virtual reality is typically achieved by VR glasses or similar devices worn on the user's head. These VR glasses require a cover-type shroud to fit over the user's head. Therefore, when wearing VR glasses, the user is inconvenient to adjust the VR field of vision through traditional operation modes such as a remote controller.

Some prior art have collected the change of user's head position through position sensor to adjust the mode of operation in VR field of vision. However, since VR glasses worn on the head of a user have a heavy weight and a large volume, an operation of adjusting the visual field by turning the head is very likely to cause fatigue to the user and increase a feeling of vertigo when the user uses the VR glasses.

With the development of technology, gesture operation may be a better control operation mode capable of meeting the use requirement. However, the detection and recognition process of the gesture needs to consume a very large amount of computation, so that the application of the gesture operation is greatly limited, and how to effectively simplify the complexity of the gesture recognition algorithm is a problem which needs to be solved urgently.

[ summary of the invention ]

In order to solve the above technical problems, embodiments of the present invention provide a gesture recognition method, a VR perspective control method, and a VR system that can reduce the computation required for gesture detection.

In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions: a gesture recognition method. The gesture recognition method comprises the following steps: acquiring depth information; acquiring spatial point cloud information according to the depth information; determining a target area in the spatial point cloud information, wherein the target area refers to an area containing hand point cloud information; generating a planar image corresponding to the target area; extracting edge points of the hand in the plane image; determining the number of fingers in the hand according to the edge points of the hand; and determining the hand gesture according to the number of the fingers.

Optionally, the acquiring the depth information includes: the depth information is acquired by a depth sensor.

Optionally, the determining the target area in the spatial point cloud information includes extracting point cloud information within a preset distance as the target area.

Optionally, the method further comprises: noise in the target region is filtered.

Optionally, the filtering noise in the target region includes filtering noise in the target region by a maximum connected component algorithm.

Optionally, the generating the planar image corresponding to the target region includes: and mapping the point cloud information in the target area to a two-dimensional space to generate the plane image corresponding to the target area.

Optionally, the extracting the edge point of the hand in the planar image includes: extracting the edge points of the hand in the planar image by using a Moore Neighborhod method.

Optionally, the determining the number of fingers in the hand according to the edge point of the hand includes: finding out a convex Hull (Convec Hull) point according to the edge point of the hand; determining the convex hull point as the fingertip of the finger; and determining the number of the fingers in the hand part according to the number of the fingertips.

Optionally, the finding out the convex hull point according to the edge point of the hand includes: the convex points were found using the Graham's scan method.

Optionally, the determining that the convex hull point is the fingertip of the finger includes: selecting a first edge point and a second edge point which are respectively positioned at two sides of the convex hull point and are adjacent to the convex hull point from the edge points of the hand part so as to calculate an included angle between a straight line formed by connecting the convex hull point and the first edge point and a straight line formed by connecting the convex hull point and the second edge point; the first edge point, the second edge point and the convex wrapping point are positioned on the same finger, and a preset number of edge points are arranged between the first edge point and the convex wrapping point and between the second edge point and the convex wrapping point at intervals; judging whether the included angle is within a first preset range or not; and if so, determining the convex points as the fingertips of the fingers.

Optionally, the determining whether the included angle is within the first preset range includes: calculating the included angle:

wherein θ is the angle, P_iIs the convex point, P_lIs the first edge point, P_rThe second edge point is obtained; judging whether the included angle is smaller than a preset value or not; and if so, determining the convex points as the fingertips of the fingers.

Optionally, the preset value is between 20 ° and 60 °.

Optionally, the first edge point and the convex hull point, and the second edge point and the convex hull point are separated by 10-50 edge points.

Optionally, the method further comprises: and judging whether a hand exists in the plane image.

Optionally, the determining whether a hand exists in the planar image includes: calculating the distance between each candidate point and each edge point in the candidate points, wherein the candidate points are points in the enclosing range of the edge points; determining a candidate point corresponding to the maximum distance as the palm center of the palm of the hand; calculating an included angle formed by connecting lines of any two adjacent finger tips and the palm center; judging whether the included angle is within a second preset range or not; and if so, determining that the hand exists in the plane image.

Optionally, the determining whether a hand exists in the planar image includes: calculating the distance between each candidate point and each edge point in the candidate points, wherein the candidate points are points in the enclosing range of the edge points; determining a candidate point corresponding to the maximum distance as the palm center of the palm of the hand; calculating the sum of included angles formed by connecting lines of any two adjacent finger tips and the palm center; judging whether the sum of the included angles exceeds 180 degrees; and if so, determining that the hand exists in the plane image.

Optionally, the determining whether a hand exists in the planar image includes: calculating the distance between each candidate point and each edge point in the candidate points, wherein the candidate points are points in the enclosing range of the edge points; determining that the maximum distance is the radius of the maximum inscribed circle of the palm of the hand; judging whether the radius is within a third preset range; and if so, determining that the hand exists in the plane image.

Optionally, the determining whether the hand exists in the planar image includes: and judging whether the hand exists in the plane image or not according to the number of the fingers.

In order to solve the above technical problems, embodiments of the present invention further provide the following technical solutions: a VR visual angle control method. Wherein the VR perspective control method comprises the following steps:

determining the gesture of the hand of the user by applying the gesture recognition method; and adjusting the VR visual angle according to the fingers of the hand of the user.

Optionally, the adjusting the VR viewing angle according to the finger of the user palm includes: recognizing the gesture of the current hand of the user according to the number of the fingers of the hand of the user; adjusting the position and the orientation of the shooting equipment through the hand gesture of the user; changing a VR angle of view following the change in the position and orientation of the photographing apparatus.

Optionally, the method further comprises: applying the gesture recognition method to determine the palm position of the user hand; and adjusting the VR visual angle according to the palm position of the hand of the user.

In order to solve the above technical problems, embodiments of the present invention further provide the following technical solutions: a VR system. Wherein the VR system includes: the system comprises a mobile carrier, shooting equipment, a depth sensor, a controller and VR display equipment; the shooting equipment and the depth sensor are both arranged on the mobile carrier;

the VR display equipment is in communication connection with the shooting equipment and is used for generating a corresponding VR scene according to video image information collected by the shooting equipment;

the controller is configured to recognize a gesture of a user according to the depth information acquired by the depth sensor using the gesture recognition method as described above, and adjust a VR viewing angle of the VR display device according to the gesture.

Optionally, the controller is specifically configured to: and recognizing the hand gesture of the user according to the number of the fingers, and controlling the gesture of the shooting device and the movement of the mobile carrier according to the hand gesture.

Optionally, the controller is further configured to acquire a palm position of the user by using the gesture recognition method as described above, and adjust a VR viewing angle of the VR display device according to the palm position.

Optionally, the controller is specifically configured to determine a relative position between the palm position and the mobile carrier according to the palm position, so as to control the posture of the shooting device and the movement of the mobile carrier.

Optionally, the shooting device is mounted at the front part of the mobile carrier, and the depth sensor is mounted at the rear part of the mobile carrier.

Optionally, the mobile vehicle is a drone.

Optionally, the shooting device is mounted on the mobile carrier through a cradle head.

Compared with the prior art, the gesture recognition method provided by the embodiment of the invention has the advantages that the detection and recognition of the gesture are realized by extracting the edge points of the hand of the user through generating the spatial point cloud of the hand of the user and the plane image, the sample learning algorithm such as machine learning is not needed, the algorithm realization structure is simple, the calculation amount of gesture recognition is effectively reduced, meanwhile, the higher gesture recognition rate is ensured, and the gesture recognition method can be well applied to the platform with low power consumption, low cost or sensitivity to time delay.

[ description of the drawings ]

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a schematic diagram of an application environment of a VR system according to an embodiment of the invention;

fig. 2 is a block diagram of a controller according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method of gesture recognition according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating fingertip identification of a finger according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the matching of the number of gestures and the gesture actions according to an embodiment of the present invention;

FIG. 6 is a flowchart of a method for gesture recognition according to another embodiment of the present invention;

FIG. 7 is a flowchart of a method for determining whether a hand is present in a plane image according to an embodiment of the present invention;

fig. 8 is a flowchart of a method for determining whether a hand exists in a plane image according to another embodiment of the present invention.

[ detailed description ] embodiments

In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and specific examples. It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may be present. As used in this specification, the terms "upper," "lower," "inner," "outer," "bottom," and the like are used in the orientation or positional relationship indicated in the drawings for convenience in describing the invention and simplicity in description, and do not indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Furthermore, the technical features mentioned in the different embodiments of the invention described below can be combined with each other as long as they do not conflict with each other.

Fig. 1 is an application environment of a VR system according to an embodiment of the present invention. As shown in fig. 1, the application environment includes a mobile vehicle, a controller 20 disposed on the mobile vehicle, a VR display device 30, a user 40, and a wireless network 50.

The mobile vehicle may be any type of loading platform driven by power, including but not limited to a quadcopter, a fixed wing aircraft, a helicopter model, and the like. The mobile carrier can have corresponding volume or power according to the requirements of actual conditions, so that the loading capacity, the speed, the endurance mileage and the like which can meet the use requirements are provided. One or more functional modules can be added to the mobile carrier, so that the mobile carrier can realize corresponding functions.

In the present embodiment, the unmanned aerial vehicle 10 is taken as an example. The unmanned aerial vehicle 10 described in this embodiment may include a fuselage, a horn connected to the fuselage, and a power device provided to the horn. The horn may be fixedly attached to, integrally formed with or foldable relative to the fuselage. The power device comprises a motor and a propeller connected with the motor, and a motor shaft of the motor rotates to drive the propeller to rotate so as to provide the pulling and lifting force required by the flight of the unmanned aerial vehicle.

For example, the drone 10 may also be provided with at least one camera device 11 for capturing image information. The photographing apparatus 11 may be a high-definition video camera, a motion camera, or other types of image capturing devices.

Specifically, the drone 10 may carry the shooting device 11 through a pan-tilt or similar shake elimination device, the pan-tilt allowing the shooting device 11 to rotate about at least one axis relative to the drone 10.

The drone 10 may also be provided with a depth sensor 12 for gathering depth information. The depth sensor 12 may specifically be a binocular camera, a TOF camera, a structured light camera, a lidar or the like.

In some embodiments, the camera 11 is mounted on the front of the drone 10 and the depth sensor 12 is mounted on the rear of the drone.

The controller 20 is a control core disposed in the mobile carrier 10 and configured to execute one or more logic determination steps to control the mobile carrier 10. The controller 20 may include a plurality of functional units, such as a flight control unit for controlling the flight attitude of the unmanned aerial vehicle, a target recognition unit for recognizing a target, a tracking unit for tracking a specific target, a navigation unit (e.g., gps (global Positioning system), beidou) for navigating the aircraft, and a data processing unit for processing environmental information acquired by the relevant onboard equipment (e.g., the photographing device 11).

Fig. 2 is a block diagram of a controller 20 according to an embodiment of the present invention. As shown in fig. 2, the controller 20 may include: a processor 21, a memory 22 and a communication module 25. The processor 21, the memory 22 and the communication module 25 establish a communication connection therebetween by means of a bus.

The processor 21 is any type of single-threaded or multi-threaded processor having one or more processing cores as a core for logic processing and operation, for acquiring data, performing logic operation functions, and issuing operation processing results.

The memory 22 is a non-volatile computer-readable storage medium, such as at least one magnetic disk storage device, flash memory device, distributed storage device remotely located from the processor 21, or other non-volatile solid state storage device.

The memory 22 has a program storage area for storing non-volatile software programs, non-volatile computer-executable programs, and modules for calling by the processor 21 to cause the processor 21 to perform one or more method steps. The memory 22 may also have a data storage area for storing the operation processing result issued by the processor 21.

The communication module 25 is a functional module, such as a WiFi module, a bluetooth module, or other radio frequency transmission module, used by the drone 10 to establish a communication connection and provide a physical channel.

The wireless network 50 may be a wireless communication network for establishing a data transmission channel between two nodes, such as a bluetooth network, a WiFi network, or a wireless cellular network located in different signal frequency bands. The drone 10 may join the wireless network 50 through the communication module 25, and may be communicatively connected to the VR display device 30 through the wireless network 50.

The VR display device 30 is a device located on the user side that provides a virtual display environment for the user. The VR display device 30 may be of any type, and may be a combination of one or more devices that implement VR technology. For example, traditional coated VR glasses, head-mounted VR devices, and Augmented Reality (AR) devices incorporating VR technology.

The VR display device 30 establishes a communication connection with the shooting device 11 in the unmanned aerial vehicle 10, and can receive the video or image information captured by the shooting device 11, and accordingly generate a corresponding VR display image, which is provided for the user to realize an immersive virtual reality experience.

The user 40 is a user wearing the VR display device 30. Which uses the VR display device 30 to enable services such as flight simulation of the drone 10. The user can control the heading angle (or rotational angle of the pan/tilt head) of the drone 10 and the position of the drone (e.g., control the drone to move forward or backward) to change the viewing angle or display interface of the VR display device.

At this time, since the user 40 wears the VR display device. Therefore, it is difficult to control the drone by controlling the remote controller. The adjustment mode of controlling the unmanned aerial vehicle by rotating the head or moving the body is very easy to cause fatigue and dizziness of users.

In some operation scenarios, in order to overcome the problem of the above-mentioned VR display device viewing angle adjustment manner, a gesture operation control manner may be used for implementation. The user 40 wears the VR display device 30, and when the drone 10 hovers near the user 40, the user 40 can reach out and issue control commands through different gesture actions. At this point, the drone 10 acquires, via the depth sensor, the relevant depth image at a position towards the user.

The controller 20 mounted on the unmanned aerial vehicle 10 can analyze and detect the gesture motion of the user based on the depth image acquired by the depth sensor, and analyze the corresponding control instruction to adjust the operation state of the unmanned aerial vehicle 10 (including moving the position of the unmanned aerial vehicle 10, changing the orientation of the shooting device 11, the focal length or changing the rotational angle of the pan/tilt head) to respond to the control instruction sent by the user.

The above-mentioned gesture or the operation mode that the motion changes the VR visual angle that changes unmanned aerial vehicle through the gesture compare with traditional remote controller control, and is more directly perceived convenient. The user does not need to walk or turn around while operating, so that the user operation is more comfortable, and the user experience is better.

In the application environment shown in fig. 1, the drone 10 is described by way of example only. Those skilled in the art can also replace the unmanned aerial vehicle 10 with any type of mobile vehicle, such as a remote control car, etc., to carry the above-mentioned function modules, provide a data source for the VR display device 30, and realize an immersive experience of virtual reality.

The gesture recognition method provided by the embodiment of the invention can be executed by the processor 21 of the controller 20, so that the requirement on the calculation amount of the controller 20 is reduced, the hardware cost, the power consumption and the like of the controller 20 are effectively reduced, and the limitation of the unmanned aerial vehicle 10 on use can be met. Fig. 3 is a gesture recognition method according to an embodiment of the present invention. As shown in fig. 3, the gesture recognition method includes:

310. and acquiring depth information.

In some embodiments, the application of the depth sensor 12 shown in fig. 1 may collect and obtain relevant depth information as the basic data for gesture recognition. The depth information is three-dimensional information that can reflect a subject.

Preferably, in the presence of relatively large noise information, the received depth information may be preprocessed to filter noise in the depth information.

320. And acquiring spatial point cloud information according to the depth information.

The spatial point cloud information is another way to represent three-dimensional information of a photographic subject, which can be converted from depth information. For example, the depth sensor acquires and restores depth information into three-dimensional spatial point cloud information.

330. And determining a target area in the spatial point cloud information, wherein the target area is an area containing hand point cloud information.

The target area is an area including hand point cloud information within a certain depth range. As shown in fig. 1, when performing gesture operation, the user 40 conventionally performs gesture operation by making a gesture by extending a palm. Thus, the palm will fall within a certain distance of the area, distinguished from other parts of the body, without occlusion and without other foreign objects. By combining experience and experiments, the point cloud information in a specific distance area can be selected as a target area containing the hand point cloud information.

Specifically, point cloud information within a preset distance may be extracted as the target area. As described above, the predetermined distance is determined by the specific distance in which the hand falls. It shows that the hand is closer to the depth sensor collecting the depth information than to other parts of the body.

The preset distance is an empirical parameter that can be set by a person skilled in the art depending on the detection accuracy of the sensor, for example, to about 10 cm.

The above-mentioned mode through the space point cloud can be effective, quick select contain the target area of hand point cloud, be convenient for carry on subsequent operation.

In other possible embodiments, noise in the target region may also be filtered by a maximum connected component algorithm to improve the accuracy of gesture recognition.

340. And generating a plane image corresponding to the target area.

The planar image is a projection result of the target area on a plane, and reflects a planar image of the hand.

In some embodiments, the point cloud information in the target region may be mapped to a two-dimensional space to generate a corresponding planar image. Converting three-dimensional information into a planar image is a simple and fast mapping process that can be performed quickly.

After conversion into a planar image, a variety of well-established or conventional image processing algorithms may be applied to extract one or more features from the planar image to accomplish the task of gesture recognition.

350. And extracting edge points of the hand in the plane image.

The edge point of the hand in the plane image is the place where the region attribute has abrupt change. Which is typically formed by the interface between two regions. That is, the planar image is divided into several regions of different attributes (e.g., hand and background) by the identified edges. Step 350 may be accomplished using any type of edge detection or extraction algorithm known in the art.

In some embodiments, a Moore Neighborhod method may be employed to extract a series of consecutive edge points of the hand in the image.

360. And determining the number of fingers in the hand according to the edge points of the hand.

The shape formed by the edge delineation of the hand may be used to determine or judge the specific number of fingers. The number of fingers can be determined based on edge feature calculation of the plane image.

In some embodiments, when the edge is represented by an edge point set composed of a series of continuous edge points, the number of fingers can be determined by the following calculation method:

firstly, convex hull points are found out from edge points through the existing common convex hull detection algorithms, such as an incremental algorithm, a wrapping method (Jarvis stepping method), a monotone chain, a divide and conquer method, an AKI-Toussaint heuristic method, a Graham scan algorithm (Graham scan) and the like.

A "convex Hull" (Convec Hull) is a graphical geometrical concept, which generally refers to a collection of points that enclose a convex polygon that can contain exactly all the target points. The degree of curvature of the curved line segment of the edge can be reflected by the convex hull point.

Then, the convex package point is determined to be the fingertip of the finger.

As shown in fig. 4, when a hand is present in the planar image, the finger is a part of the entire outermost periphery of the hand, and thus the finger takes a relatively long and sharp shape. Therefore, the fingertips corresponding to the fingers are convex points (shown by white squares in fig. 4), and the fingertips can be determined as the fingertips of the hand when the bending degree is sufficient and the edges have a tip shape satisfying the requirement.

Specifically, the fingertip which determines that the convex hull point is the finger can be determined by the following method:

for each convex hull point, finding out the ith-nth edge points of the n edge points adjacent to the convex hull point at the left and right as first detection points, and finding out the (i + n) th edge point as second detection points. And the ith edge point is the convex hull point, and n is a positive integer and represents the edge points which are respectively spaced from the first edge point and the second edge point by the convex hull point.

n is a constant indicating the number of edge points spaced between the first edge point and the convex hull point, and between the second edge point and the convex hull point. Which may be determined in particular by the size of the palm and the resolution of the depth sensor. In some embodiments, n may be optionally set to 10-50, but should not exceed the shortest finger length.

In this embodiment, since the edge points in the edge point set are all continuous, the n preceding and succeeding edge points can be conveniently determined as the first detection point and the second detection point in the edge point set according to the positions of the convex hull points.

Finally, a vector is calculated based on the first detection point, the second detection point and the convex hull point

And

the included angle therebetween. Wherein, P_iIs the convex point, P_rIs said first detection point, P_lIs the second detection point. When the angle between the two vectors is smaller than a first predetermined range, it indicates that the edge of the portion presents a sufficiently sharp portion, and the convex hull point P can be considered as_iIs the tip of a finger and the corresponding area is recorded as a finger.

The first predetermined range is an empirical value used to measure how long and sharp the image area is. The first preset range may be set to 20-60 deg. in consideration of the accuracy of the depth sensor generally used, for determining whether the image area satisfies the shape requirement of the fingertip.

Specifically, the included angle θ can be obtained by calculating according to the following equation (3):

finally, the number of fingers in the hand is determined according to the number of fingertips. After the number of fingertips is known, each of the fingertips corresponds to one finger, and the number of fingers can be determined according to the number of fingertips.

370. And determining the hand gesture according to the number of the fingers.

Under different gestures, the number of fingers obtained by corresponding shooting is different. For example, as shown in the correspondence diagram of fig. 5. Different gestures can be quickly and simply recognized based on the number of fingers to determine specific control instructions.

In addition, in the case where a plurality of image frames are continuously acquired, the relative movement of the palm may also be determined or identified based on the identified change in position of the palm between different image frames.

The control instructions specifically corresponding to each gesture may be configured by a technician and stored in the memory as a software computer program as needed by the actual situation. Of course, the personalized setting can be carried out according to the personal habits of the user. One specific example of a configuration is provided below to set forth in detail the configuration process of gestures.

First, it is determined that the control instructions that the drone may execute according to the gesture are as follows:

1. a movement instruction for controlling the unmanned aerial vehicle to move in the front-back left-right direction (i.e. pitch and roll angles of the unmanned aerial vehicle);

2. adjusting a turning instruction of a course angle (namely a yaw angle of the unmanned aerial vehicle) of the unmanned aerial vehicle;

3. a pan-tilt swing instruction for adjusting the rotation angle of the pan-tilt of the unmanned aerial vehicle in the left-right-up-down direction (i.e. the pitch angle and the yaw angle of the pan-tilt);

4. a cradle head rotation instruction for adjusting a rotation angle of an optical axis of a cradle head of the unmanned aerial vehicle (namely, a roll angle of the unmanned aerial vehicle);

5. and controlling the unmanned aerial vehicle and the holder to recover to the initial position before adjustment or a reset instruction or a pause instruction for pausing the adjustment.

Then, configuring corresponding control instructions for gestures with different numbers of fingers according to the following corresponding modes:

1. when the number of the fingers is 5, determining the fingers as a moving instruction, and correspondingly adjusting the pitch and roll angles of the unmanned aerial vehicle according to the displacement of the palm in the up-down, left-right directions;

2. when the number of the fingers is 4, determining the number as a turning instruction, and correspondingly adjusting the yaw angle of the unmanned aerial vehicle according to the left-right swing of the palm;

3. when the number of the recognized fingers is 3, determining the recognized fingers as a pan-tilt swing command, and correspondingly changing the pitch angle and the yaw angle of the pan-tilt according to the displacement of the palm in the up, down, left and right directions;

4. when the number of the recognized fingers is 2, determining the recognized fingers as a pan-tilt rotation instruction, correspondingly controlling the rotation of the optical axis of the pan-tilt according to the left-right swing of the palm, and adjusting the roll angle of the pan-tilt;

5. when the number of the recognized fingers is 1 or 0, the command is determined to be a reset or pause command, and the cradle head and the unmanned aerial vehicle are controlled to be restored to the initial position (namely, initialization) or the position adjustment of the cradle head and the unmanned aerial vehicle is paused.

By the gesture recognition method, the target area containing the hand can be quickly and accurately determined based on the depth information, convex points and edges are extracted from a plane graph of the target area by combining the existing mature and stable image processing algorithm, and the number of fingers is creatively determined by a geometric figure analysis method. The method for determining the finger has simple operation steps and process, avoids data training processes such as machine learning and the like, consumes less operation amount and is beneficial to meeting the requirement of low power consumption of hardware equipment.

In some embodiments, in order to further improve the accuracy of gesture detection, avoid false recognition and the like, the method may further include the step of determining whether a hand is present in the plane image.

When judging that there is the hand in the plane image, just confirm that the gesture analysis result is correct, make corresponding control command to unmanned aerial vehicle.

Specifically, as shown in fig. 7, the determining whether a hand exists in the plane image may include:

710. and calculating the distance between each candidate point in the candidate points and each edge point.

The candidate point is a point in the plane image, which is located in the range enclosed by the edge points. The distance between the candidate point and the edge point is the shortest distance between the two points.

720. And determining the candidate point corresponding to the maximum distance as the palm center of the palm of the hand.

After the distances corresponding to all the candidate points are calculated, the candidate point with the largest distance can be determined to be the center of the largest circle inscribed in the palm of the hand through comparison.

In the case where a hand is present in the planar image, the area occupied by the palm of the hand is usually the largest. Therefore, the position of the maximum inscribed circle is the position of the palm, and the center position of the maximum inscribed circle can be regarded as the center of the palm.

In contrast to the conventionally used centroid method, which determines the palm centre of the palm from the centroid of the inner region delineated by the edges in the planar image: the mass center method cannot distinguish the arms, the position of the mass center is easily affected by the arms, and the error is large.

The method using the maximum inscribed circle can well distinguish the palm from the arm part (the area of the arm is small, and the area of the inscribed circle is small), and has stronger robustness when the arm is also present in a plane image.

Specifically, when the edge is represented by an edge point set composed of a series of continuous edge points, the maximum inscribed circle may be obtained by the following method:

the two steps of the above method can be represented by the following equations (1) and (2):

D _i＝Min{distance(p _i,c _i)|c _i∈C} (1)

D _n＝Max{D _i|i∈H} (2)

where C is a set of edge points, C_iIs the ith edge point, p, in the edge point set_iIs the ith candidate point, D_iIs a candidate point p_iThe minimum of the distances to all edge points; h is the set of all candidate points, D_nThe radius of the largest inscribed circle.

730. And calculating an included angle formed by connecting lines of any two adjacent finger tips and the palm center.

The angle formed by the connecting line between the finger tip and the palm center can be regarded as the angle between two adjacent fingers.

740. And judging whether the included angle is within a second preset range. If yes, go to step 750. If not, go to step 760.

750. Determining that a hand is present in the planar image.

760. Determining that a hand is not present in the planar image.

The second predetermined range is also an empirical value and can be set by a technician based on actual sensor accuracy, etc.

In some embodiments, the angle between the fingertip and the line connecting the palms may be determined by using different criteria to determine whether a hand is present.

For example, the sum of the included angles formed by the connecting lines of any two adjacent fingertips and the palm center is calculated. And, it is judged whether the sum of the included angles exceeds 180 °. And if so, determining that the hand exists in the plane image. If not, determining that no hand exists in the plane image.

It should be noted, of course, that the limit of the sum of the angles is related to the number of fingers of the user. The upper limit of the sum of the included angles can be further adjusted and set to adapt to different conditions.

In other embodiments, the hand can be judged whether to exist in the plane image or not by combining the mode of the maximum inscribed circle. As shown in fig. 8, the determining whether a hand exists in the plane image may specifically include:

810. and calculating the distance between each candidate point in the candidate points and each edge point.

The candidate point is a point located in the area inside the edge in the plane image. The distance between the candidate point and the edge is the shortest distance between two points.

820. Determining a maximum distance as a radius of a maximum inscribed circle of a palm of the hand.

After the distances corresponding to all the candidate points are calculated, the candidate point with the largest distance can be determined as the center of the largest inscribed circle through comparison.

In the case where a hand is present in the planar image, the area occupied by the palm of the hand is usually the largest. Therefore, the position of the maximum inscribed circle is the position of the palm.

830. And judging whether the radius is within a third preset range. If yes, go to step 840, otherwise go to step 850.

840. Determining that a hand is present in the planar image.

850. Determining that a hand is not present in the planar image.

In a normal state, the maximum inscribed circle of the palm of the hand detected by the user 40 may fluctuate over a certain area. For most users, the palm is of a particular size and is not subject to significant variation. Therefore, the radius corresponding to the normal detection result usually fluctuates only within a certain range, and the result beyond the range has a great probability of being a recognition error.

Therefore, the technician can set a corresponding third preset range according to the actual situation and/or the experimental result, so as to be used as a judgment standard for judging whether the hand exists in the plane image. The third preset range is an empirical value and can be adjusted and set according to actual conditions.

In yet another embodiment, it can be determined whether a hand exists in the plane image according to the number of detected fingers. That is, it is determined whether the number of fingers calculated in the above step satisfies the corresponding condition. If yes, determining whether a hand exists in the plane image. If not, determining that the plane image does not have a hand.

For example, the number of fingers of the normal user 40 should not exceed 5 at the maximum in the case of one-handed control, or 10 in the case of two-handed control.

Therefore, when the number of fingers is detected to be out of the normal range, it can be indicated that a detection or recognition error has occurred, it is determined that there is no hand in the plane image, and the current detection result should be corrected and discarded.

FIG. 6 is a flowchart of a method of the gesture recognition method with misjudgment correction capability. As shown in fig. 6, the gesture recognition method combines the determination of the size of the palm, the included angle between the fingers, and the number of fingers. It may comprise the steps of:

611. a depth map is received from an acquisition device. Specifically, the depth map is a binocular camera disparity map, and includes a baseline length and disparity data to represent depth information of an image.

612. And restoring the depth map into spatial point cloud information.

The interconversion between the depth map and the spatial point cloud information is a very common conversion method. When the input depth map is a binocular camera disparity map, the three-dimensional coordinates of each point (xi, yi, zi) may be expressed as follows:

wherein, baseline is the length of the base line, disparity is parallax data, and px, py, cx, cy, fx and fy are calibration parameters.

613. And extracting the spatial point cloud information within a preset distance to serve as a target area. The target region is a partial image region including a palm.

614. And mapping the three-dimensional space point cloud information in the target area to a two-dimensional space to generate a plane image corresponding to the target area.

615. And extracting edge points in the plane image through a preset edge extraction algorithm. The preset edge extraction algorithm may be any type of existing edge extraction algorithm, for example, a Moore Neighborhod (Moore Neighborhod) method, in which the edge points are a series of continuous edge points obtained by extraction.

616. And calculating the radius of the maximum inscribed circle of the palm according to the edge points.

617. And judging whether the radius of the maximum inscribed circle is within a third preset range. If yes, go to step 620. If not, go to step 619.

As shown in fig. 5, the position of the largest inscribed circle should be coincident with the palm, assuming the depth map is sampled correctly. Therefore, the area of the palm can be determined by corresponding calculation through the maximum inscribed circle.

The third preset range refers to a range of possible radius fluctuation of the palm detected by the user 40 in a normal state. Since the palm is of a particular size for most users, significant variations are not likely to occur. Therefore, the radius corresponding to the normal detection result usually fluctuates only within a certain range, and the result beyond the range has a great probability of being a recognition error. The technician may set this third predetermined range based on actual conditions and/or experimental results.

619. And determining the circle center of the maximum inscribed circle as the palm center of the palm.

When the area of the palm is in a normal condition, the palm recognition can be confirmed to be not wrong, and the center of the maximum inscribed circle can be used as the center of the palm.

620. Determining that a hand is not present in the target area.

Conversely, when the third preset range is exceeded, the detection result is basically considered not to be the palm, and is usually a detection error caused by occlusion or the like. At this time, the report can be actively reported, and the corresponding error correction process can be performed according to the needs of the actual situation. For example, the client is prompted that the gesture is invalid, and the client is re-gestured to capture a new depth map.

621. And searching convex package points in the edge. The convex hull points can be found using the gray's scan.

622. For each convex hull point, respectively finding out the ith-nth edge points of the n edge points adjacent to the convex hull point at the left and right as first detection points, and the (i + n) th edge point as a second detection point. Wherein, the ith edge point is the convex point, and n is a positive integer.

623. Calculating a vector based on the first detection point, the second detection point, and the convex hull point

And

the included angle therebetween.

624. And judging whether the included angle is within a first preset range. And if so, determining the convex points as the fingertips of the fingers. If not, determining that the convex hull point is not the fingertip.

And adding 1 to the number of the fingers after confirming that one finger tip is obtained, or keeping the number of the fingers unchanged.

625. And judging whether the number of fingers of a single hand is more than 5. If yes, go to step 620, otherwise go to step 626.

It is clear that the number of fingers of a single hand of a normal user 40 should not exceed 5 at maximum. Therefore, when a large number of fingers are detected, it can basically indicate that a detection or recognition error has occurred, and the current detection result should be corrected and discarded.

626. And calculating the sum of included angles formed by connecting lines of two adjacent finger tips and the palm center in sequence according to the finger tips and the palm center. The 'sequential calculation' means that an included angle formed by connecting lines of two adjacent finger tips and the palm center is calculated according to the sequence of the detected fingers.

For example, after 3 fingers are detected, an angle 1 between a connecting line between a fingertip of a first finger and a fingertip of a second finger and a palm center and an angle 2 between a fingertip of the second finger and a connecting line between a fingertip of a third finger and a palm center are sequentially included.

627. And judging whether the sum of the included angles is larger than 180 degrees. If yes, go to step 620. If not, then the target area is determined to have a hand (step 628).

The sum of the included angles is the sum of all calculated included angles. For example, angle 1 and angle 2 are added to obtain the sum of the two angles. It will be appreciated that the greater the number of fingers, the greater the number of angles added. In the maximum case, when 5 fingers are detected, four angles need to be added.

Obviously, even in the case where all 5 fingers are detected, the sum of these angles should not exceed the upper limit of 180 ° (the user 40 cannot make this gesture motion). Therefore, when the sum of the included angles is larger than 180 °, it can be also confirmed that a recognition error has occurred.

In the present embodiment, the most extreme cases are adopted for the judgment criteria (the sum of the number of fingers and the included angle) of the error correction judgment section of the fingers. Those skilled in the art can understand that the above judgment criteria can be adjusted, combined or split according to different scene needs, so as to obtain the same technical effect.

For example, the upper limit value of the sum of the included angles may be dynamically changed according to the number of the fingers, or whether the number of the fingers can meet the judgment criterion may be calculated after judging the sum of the included angles.

The controller 20 shown in fig. 1 may use software, hardware, or a combination of software and hardware to execute the gesture recognition method disclosed in the above method embodiment according to the received depth information, so as to detect the gesture of the user 40, and analyze the corresponding control instruction to control the unmanned aerial vehicle 10 and the VR display device 30.

In some embodiments, the controller 20 may control the drone 10 to hover near the user 40, bringing the user 40 within detection range of the depth sensor. The controller 20 determines the user's hand from the depth map using the gesture recognition method described above. And then, adjusting the VR visual angle of the VR display equipment according to the gesture obtained by the hand recognition of the user, and realizing the gesture operation on the VR visual angle or the VR scene.

Above-mentioned mode of using gesture control VR visual angle both can avoid wearing the inconvenience that the VR can't observe the remote controller and cause the maloperation, can reduce again that traditional turn round controls fatigue and dizzy sense that the VR visual angle brought.

Specifically, the controller 20 can precisely control the movement of the drone 10 and the adjustment of the VR viewing angle according to the relative position between the palm and the drone.

For example, make unmanned aerial vehicle move to the target location and adjust the corner of cloud platform in order to adjust the position and the orientation of shooing equipment, finally realize the change and the adjustment of VR visual angle, provide more convenient operation mode and better immersive experience effect for user 40.

In summary, the gesture recognition method provided by the embodiment of the invention extracts the hand by using the depth map, thereby avoiding a relatively complicated process of using the plane image to extract the hand, and greatly reducing the calculation amount. In addition, the traditional machine learning identification method is replaced by a geometric analysis method, and the same equipment can have a higher operation frame rate than that of the machine learning method, so that the application of the equipment with low power consumption and low operation capability is ensured.

Furthermore, considering that the hand arm and the hand are easy to appear in the same plane, the hand region is easy to extract the arm at the same time based on the depth information, the maximum inscribed circle detection method is creatively used for distinguishing and identifying the arm and the hand, and the palm center can be accurately determined under the condition that the arm exists. The finger identification is completed by detecting the included angle of the left and right adjacent points of the convex hull point, and the method has the characteristics of high efficiency and strong robustness.

The hand accurate space position provided by the gesture recognition method provided by the embodiment of the invention can be widely applied to the purposes of robot palm tracking, unmanned aerial vehicle palm landing, gesture recognition, somatosensory operation and the like, and more control selection modes are provided.

It will be further appreciated by those of skill in the art that the various steps of the exemplary data transmission control methods described in connection with the embodiments disclosed herein can be embodied in electronic hardware, computer software, or combinations of both, and that the various exemplary components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation.

Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. The computer software may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

A gesture recognition method, comprising:

acquiring depth information;

acquiring spatial point cloud information according to the depth information;

determining a target area in the spatial point cloud information, wherein the target area refers to an area containing hand point cloud information;

generating a planar image corresponding to the target area;

extracting edge points of the hand in the plane image;

determining the number of fingers in the hand according to the edge points of the hand;

and determining the hand gesture according to the number of the fingers.
The gesture recognition method according to claim 1, wherein the obtaining the depth information comprises:

the depth information is acquired by a depth sensor.
The gesture recognition method according to claim 1 or 2, wherein the determining the target area in the spatial point cloud information comprises:

and extracting point cloud information within a preset distance to serve as the target area.
The gesture recognition method according to any one of claims 1-3, further comprising:

noise in the target region is filtered.
The gesture recognition method of claim 4, wherein the filtering noise in the target region comprises:

and filtering noise points in the target region through a maximum connected domain algorithm.
The gesture recognition method according to any one of claims 1-5, wherein the generating the planar image corresponding to the target region includes:

and mapping the point cloud information in the target area to a two-dimensional space to generate the plane image corresponding to the target area.
The gesture recognition method according to any one of claims 1 to 6, wherein the extracting the edge point of the hand in the planar image includes:

extracting the edge points of the hand in the planar image by using a Moore Neighborhod method.
The gesture recognition method according to any one of claims 1-7, wherein the determining the number of fingers in the hand from the edge points of the hand comprises:

finding out a convex Hull (Convec Hull) point according to the edge point of the hand;

determining the convex hull point as the fingertip of the finger;

and determining the number of the fingers in the hand part according to the number of the fingertips.
The gesture recognition method according to claim 8, wherein the finding the convex hull point according to the edge point of the hand comprises:

the convex points were found using the Graham's scan method.
The gesture recognition method according to claim 8 or 9, wherein the determining that the convex hull point is the fingertip of the finger comprises:

selecting a first edge point and a second edge point which are respectively positioned at two sides of the convex hull point and are adjacent to the convex hull point from the edge points of the hand part so as to calculate an included angle between a straight line formed by connecting the convex hull point and the first edge point and a straight line formed by connecting the convex hull point and the second edge point; the first edge point, the second edge point and the convex wrapping point are positioned on the same finger, and a preset number of edge points are arranged between the first edge point and the convex wrapping point and between the second edge point and the convex wrapping point at intervals;

judging whether the included angle is within a first preset range or not;

and if so, determining the convex points as the fingertips of the fingers.
The gesture recognition method according to claim 10, wherein the determining whether the included angle is within the first preset range includes:

calculating the included angle:

wherein θ is the angle, P_iIs the convex point, P_lIs the first edge point, P_rThe second edge point is obtained;

judging whether the included angle is smaller than a preset value or not;

and if so, determining the convex points as the fingertips of the fingers.
The gesture recognition method of claim 11, wherein the preset value is between 20 ° and 60 °.
The gesture recognition method according to any one of claims 10-12, wherein 10-50 edge points are spaced between the first edge point and the convex hull point and between the second edge point and the convex hull point.
The gesture recognition method according to any one of claims 7-13, further comprising:

and judging whether a hand exists in the plane image.
The gesture recognition method according to claim 14, wherein the determining whether a hand is present in the planar image includes:

calculating the distance between each candidate point and each edge point in the candidate points, wherein the candidate points are points in the enclosing range of the edge points;

determining a candidate point corresponding to the maximum distance as the palm center of the palm of the hand;

calculating an included angle formed by connecting lines of any two adjacent finger tips and the palm center;

judging whether the included angle is within a second preset range or not;

and if so, determining that the hand exists in the plane image.
The gesture recognition method according to claim 14, wherein the determining whether a hand is present in the planar image includes:

calculating the distance between each candidate point and each edge point in the candidate points, wherein the candidate points are points in the enclosing range of the edge points;

determining a candidate point corresponding to the maximum distance as the palm center of the palm of the hand;

calculating the sum of included angles formed by connecting lines of any two adjacent finger tips and the palm center;

judging whether the sum of the included angles exceeds 180 degrees;

and if so, determining that the hand exists in the plane image.
The gesture recognition method according to any one of claims 1-13, further comprising:

and judging whether a hand exists in the plane image.
The gesture recognition method according to claim 17, wherein the determining whether a hand is present in the planar image includes:

calculating the distance between each candidate point and each edge point in the candidate points, wherein the candidate points are points in the enclosing range of the edge points;

determining that the maximum distance is the radius of the maximum inscribed circle of the palm of the hand;

judging whether the radius is within a third preset range;

and if so, determining that the hand exists in the plane image.
The gesture recognition method according to any one of claims 1-13, further comprising:

and judging whether a hand exists in the plane image.
The gesture recognition method according to claim 19, wherein the determining whether the hand is present in the planar image includes:

and judging whether the hand exists in the plane image or not according to the number of the fingers.
A VR perspective control method, comprising:

applying the gesture recognition method of any one of claims 1-20, determining a gesture of a user's hand;

and adjusting the VR visual angle according to the hand gesture of the user.
The VR perspective control method of claim 21, wherein the adjusting the VR perspective based on the gesture of the user's hand includes:

recognizing the gesture of the current hand of the user according to the number of the fingers of the hand of the user;

adjusting the position and the orientation of the shooting equipment according to the hand gesture of the user;

adjusting the VR perspective according to the change in position and orientation of the capture device.
The VR perspective control method of claim 21 or 22, further comprising:

applying the gesture recognition method of any one of claims 1-20, determining a palm position of the user's hand;

and adjusting the VR visual angle according to the palm position of the hand of the user.
A VR system, comprising:

a mobile carrier;

the shooting equipment is carried on the mobile carrier;

a depth sensor disposed on the mobile carrier;

the controller is arranged in the mobile carrier; and

the VR display equipment is in communication connection with the shooting equipment and is used for generating a corresponding VR scene according to the video image information acquired by the shooting equipment;

the controller is to identify a gesture of a user from the depth information acquired by the depth sensor using the gesture recognition method of any of claims 1-20, and to adjust a VR perspective of the VR display device based on the gesture.
The VR system of claim 24, wherein the controller is specifically configured to: and recognizing the hand gesture of the user according to the number of the fingers, and controlling the gesture of the shooting device and the movement of the mobile carrier according to the hand gesture.
The VR system of claim 24 or 25, wherein the controller is further configured to obtain a palm position of the user using the gesture recognition method of any of claims 1-20, and adjust the VR perspective of the VR display device based on the palm position.
The VR system of claim 26, wherein the controller is specifically configured to determine a relative position between the palm position and the mobile carrier according to the palm position to control the pose of the camera and the movement of the mobile carrier.
The VR system of any one of claims 24-27, wherein the camera is mounted to a front portion of the mobile vehicle and the depth sensor is mounted to a rear portion of the mobile vehicle.
The VR system of claims 24-28, wherein the mobile vehicle is a drone.
The VR system of claims 24-29, wherein the camera is mounted to the mobile carrier via a pan-tilt head.