CN117130469A

CN117130469A - Space gesture recognition method, electronic equipment and chip system

Info

Publication number: CN117130469A
Application number: CN202310231441.1A
Authority: CN
Inventors: 张涛; 杜远超; 张飞洋; 朱世宇; 王文博
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-11-28

Abstract

The application provides a method for identifying a space-apart gesture, electronic equipment and a chip system, and relates to the technical field of gesture identification. By determining the dynamic gesture through the three angles of the initial gesture, the end gesture and the movement direction of the dynamic gesture, the space-free dynamic gesture can be accurately recognized through two images.

Description

Space gesture recognition method, electronic equipment and chip system

Technical Field

The embodiment of the application relates to the technical field of gesture recognition, in particular to a method for recognizing a space-free gesture, electronic equipment and a chip system.

Background

The electronic equipment can provide a space-apart gesture recognition function, a user makes a specific gesture in the view angle range of the camera of the electronic equipment, the electronic equipment collects images through the camera, and after the electronic equipment detects the specific gesture in the collected images, the electronic equipment makes a response corresponding to the specific gesture.

When a specific gesture is a dynamic gesture, a motion track of a hand is usually determined according to the position of the hand on each image in continuous multi-frame images, then the motion track is compared with the motion track of each gesture, so as to determine the dynamic gesture, and then the method can only be applied to simpler dynamic gesture recognition, for example, the dynamic gesture recognition that the hand moves up and down and left and right in a constant manner, and the recognition accuracy is poor when the method is applied to more complex dynamic gesture recognition.

Disclosure of Invention

The embodiment of the application provides a method for identifying a space gesture, electronic equipment and a chip system, which can accurately identify a dynamic space gesture.

In order to achieve the above purpose, the application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for identifying a space-apart gesture, including:

detecting the space-free gesture operation of the user;

in response to detecting a space gesture operation of a user, acquiring a first image and a second image, wherein the acquisition time of the second image is later than that of the first image;

when first feature information of the first image and second feature information of the second image meet a first preset condition, determining that a blank gesture operation of a user is a first gesture, wherein the first feature information comprises a first hand gesture and a first hand position, the second feature information comprises a second hand gesture and a second hand position, and the first preset condition comprises: the first hand gesture is consistent with the initial gesture of the first gesture, the second hand gesture is consistent with the end gesture of the first gesture, the direction from the first hand position to the second hand position is consistent with the movement direction of the first gesture, the initial gesture and the end gesture of the first gesture are different, and the first gesture is a slide-up gesture or a slide-down gesture.

According to the application, by comparing the initial gesture, the end gesture and the movement direction of the gesture between the image with earlier acquisition time and the image with later acquisition time, the recognition of the up-slide gesture and the down-slide gesture can be realized by using fewer gesture images.

Specifically, the operation at the time of detecting the space gesture may be that the gesture of the detected user is the start gesture of any gesture. After the operation of the spaced gestures is detected, two images acquired by the camera can be acquired, the hand gesture in the image with earlier acquisition time is compared with the initial gesture of the gesture in the gesture library, the hand gesture in the image with later acquisition time is compared with the end gesture of the gesture in the gesture library, the movement direction of the hand is determined according to the hand position in the image with earlier acquisition time and the hand position in the image with later acquisition time, and the determined movement direction is compared with the movement direction of the gesture in the gesture library, so that whether the condition corresponding to the gesture in the gesture library is met or not is determined.

Of course, in practical application, after the initial gesture is detected, a blank gesture operation is indicated, and when the blank gesture operation has a subsequent action, the first image and the second image may only satisfy the condition corresponding to a certain gesture, so it can be understood that when the feature information corresponding to the two images satisfies the condition corresponding to a certain gesture, the gesture corresponding to the blank gesture operation (including the subsequent hand action) is determined to be the first gesture.

In another implementation of the first aspect, the content currently displayed by the electronic device is slid in response to the first gesture.

In the application, when the first gesture is a slide-up gesture, the content currently displayed by the electronic equipment slides up or turns to the next page, and when the first gesture is a slide-down gesture, the content currently displayed by the electronic equipment slides down or turns to the previous page.

In another implementation of the first aspect, the starting gesture of the first gesture is related to a hand category and a finger orientation; the ending gesture of the first gesture is related to a hand category and a finger orientation; the first hand position comprises the position of a hand detection frame in the first image on the first image; the second hand position includes a position of a hand detection frame in the second image on the second image.

According to the method and the device, aiming at the up-slide gesture and the down-slide gesture, the initial gesture can be compared from the angles of the hand category and the finger orientation, the end gesture can be compared from the angles of the hand category and the finger orientation, and whether the initial gesture (or the end gesture) of the gesture in the gesture library is consistent with the initial gesture (or the end gesture) of the gesture in the gesture library can be more accurately determined through the two angles of the hand category and the finger orientation; the existing hand type detection model outputs a hand type and a hand detection frame, so that the hand detection frame output by the same model can avoid independently detecting the hand position, flow can be reduced, and efficiency is improved.

In another implementation of the first aspect, the first gesture is a swipe gesture, the first gesture has a starting gesture of palm and fingers facing up, and the first gesture has an ending gesture of back of hand and fingers facing down;

the direction from the first hand position to the second hand position is the direction from the center point of the hand detection frame in the first image to the center point of the hand detection frame in the second image, and the movement direction of the first gesture is downward.

In the application, the initial gesture of the first gesture is a palm and the fingers are upward, the end gesture is a back of the hand and the fingers are downward, and it can be understood that the initial gesture and the end gesture in the first gesture are both actions of opening hands, and under the condition of opening the hands, the center point of the hand detection frame can accurately express the position of the hands, so the center point of the hand detection frame is used as one parameter of the hand position.

In another implementation of the first aspect, the first gesture is a swipe-up gesture, the first gesture has a starting gesture of palm and fingers facing down, and the first gesture has an ending gesture of back of hand and fingers facing up;

the direction from the first hand position to the second hand position is the direction from the center point of the hand detection frame in the first image to the center point of the hand detection frame in the second image, and the movement direction of the first gesture is upward.

In another implementation of the first aspect, the first gesture is a swipe-up gesture, the first gesture has a starting gesture of back of hand and fingers facing down, and the first gesture has an ending gesture of palm and fingers facing up;

In the application, the selection reason of the parameters in the up gesture can refer to the selection reason of the parameters of the down gesture.

In another implementation manner of the first aspect, the first preset condition further includes:

The distance from the first hand position to the second hand position satisfies a movement distance condition of the first gesture.

According to the method and the device for recognizing the gesture, on the basis that the gesture of sliding up and the gesture of sliding down are recognized through the three angles of the initial gesture, the end gesture and the movement direction, the movement distance can be increased to more accurately recognize the gesture of sliding up and the gesture of sliding down, and the fact that the movement distance is increased in the judging condition can avoid that some hand actions of a user, which are not gestures, are made in front of a screen when the electronic device is used are recognized as the gesture of sliding up or the gesture of sliding down, so that the accuracy of gesture recognition is improved.

In another implementation of the first aspect, the first hand position includes: a position of a fingertip in the first image on the first image, the second hand position comprising: the position of the fingertip in the second image on the second image.

In the application, because the movement distance of the fingertips in the initial gesture and the end gesture of the up gesture and the down gesture is the largest, the position of the fingertips can be used as one parameter of the hand position, thereby accurately identifying the movement distance of the hands.

In another implementation of the first aspect, the distance from the first hand position to the second hand position satisfies a movement distance condition of the first gesture includes:

The ratio of the distance from the position of the fingertip in the first image to the position of the fingertip in the second image to the hand width in the second image is within a first interval.

In the application, the motion distance of the hand determined in the image is related to the distance between the hand and the camera; in order to obtain accurate distance parameters, the width of the hand that people are relatively close to can be used as a reference, and the ratio of the distance of the fingertip in the image to the width of the hand in the image can be used as a parameter in the motion distance condition.

In another implementation manner of the first aspect, after acquiring the first image and the second image, the method further includes:

when the first characteristic information of the first image and the second characteristic information of the second image meet a second preset condition, determining that the blank gesture operation of the user is a second gesture, wherein the second gesture is a gesture except the first gesture in a gesture library, and the second preset condition comprises: the first hand gesture is consistent with the initial gesture of the second gesture, the second hand gesture is consistent with the end gesture of the second gesture, the superposition area between the first hand position and the second hand position meets the superposition area condition between the initial hand position and the end hand position in the second gesture, and the second gesture is a grasp gesture.

In the application, the gesture library can also comprise a grasping gesture, when a user makes the grasping gesture, the position of the ending gesture is usually close to the region excluding the finger position in the starting gesture, so that in order to accurately recognize the grasping gesture, in addition to judging whether the starting gesture is consistent with the ending gesture, whether the overlapping area of the hand position in the image with earlier acquisition time and the hand position in the image with later acquisition time meets the overlapping facing condition of the sum of the starting hand position and the ending hand position in the grasping gesture can be increased.

In another implementation of the first aspect, the starting gesture of the second gesture is a palm and the fingers are facing upwards, and the ending gesture of the first gesture is a fist;

the overlapping area between the first hand position and the second hand position meets the overlapping area condition between the starting hand position and the ending hand position in the second gesture, and the overlapping area condition comprises:

the intersection ratio of the hand detection frame in the first image and the hand detection frame in the second image is larger than a first threshold.

In the application, the overlapping area of the two hand positions in the image is related to the distance between the hands and the camera, so that in order to accurately measure the overlapping area of the hands in the two images, another contrast parameter needs to be determined in the image so as to determine the overlapping area of the two hand positions, and the intersection ratio of the two hand positions, namely the proportion of the intersection area of the two hand positions to the union of the two hand positions, can be calculated. Currently, the intersection ratio of the hand detection frames in the two images can be calculated by taking the hand detection frames as parameters of the hand positions.

In another implementation manner of the first aspect, the second preset condition further includes:

and the position relation between the fingertip and the finger root in the second image meets the position relation condition of the fingertip and the finger root in the ending gesture of the second gesture.

In the application, in order to accurately determine whether the ending gesture is a fist, the judgment of the ending gesture can be increased, the ending gesture in the grasping gesture is the fist, and the position relationship between the fingertip and the finger root in the fist is special, so that the position relationship between the fingertip and the finger root in the second image with later acquisition time can be determined whether the position relationship between the fingertip and the finger root in the fist is satisfied.

In another implementation manner of the first aspect, the meeting the condition of the positional relationship between the fingertip and the finger root in the second gesture includes:

the ratio of the distance from the tip of the finger to the wrist and the distance from the root of the finger to the wrist in the second image is in a second interval.

In the application, in order to avoid that the parameters of the hands related to the distance are easily influenced by the distance between the hands and the fingertips of the camera, the wrist can be used as a reference, the ratio of the distance from the fingertips to the wrist to the distance from the fingertips to the wrist in the second image is calculated, and whether the position relationship between the fingertips and the fingertips in the second gesture is satisfied is determined based on the ratio.

In another implementation manner of the first aspect, the first image and the second image are two images of a continuous plurality of images acquired by a camera, and there are other images in between the first image and the second image;

before determining that the first characteristic information of the first image and the second characteristic information of the second image meet a first preset condition, the method further includes:

the method comprises the steps of determining that the number of images with hands in an image set is larger than or equal to a first number, wherein the image set comprises a first image, a second image and an image which is acquired by a camera and is positioned between the first image and the second image.

In the application, when dynamic gesture recognition is performed, a certain limit on the number of images is needed, for example, the images with the minimum initial gesture and the images with the minimum final gesture exist; in addition, a certain time is required for a dynamic gesture made by a user, and the camera acquires images in a visual angle range according to a certain frame rate, so that the images acquired by the camera in the earlier stage cannot identify an ending gesture, and in order to reduce power consumption, gesture judgment can be started after the first image is started and the least number of images with hands are acquired.

In another implementation manner of the first aspect, the method further includes:

determining whether the number of images with hands in the image set is greater than or equal to a second number if the first characteristic information and the second characteristic information do not meet the condition corresponding to each gesture in the gesture library;

outputting other gestures if the number of images with hands in the image set is greater than or equal to the second number;

if the number of the images with hands in the image set is smaller than the second number, acquiring a third image acquired by the camera, and judging whether the first characteristic information and the third characteristic information of the third image meet the condition corresponding to any gesture in the gesture library or not, wherein the third image is an image after the second image acquired by the camera.

In the application, the gesture made by the user in the view angle range of the camera may not belong to any gesture in the gesture library, and in order to avoid infinitely judging the gesture, other gestures can be output after the number of images of the right hand exceeds a certain number.

In another implementation manner of the first aspect, before determining that the first feature information of the first image and the second feature information of the second image meet a first preset condition, the method further includes:

And determining that the hand is not hovered.

The embodiment of the application can also provide a function, when the hands of the user keep the same initial posture, the condition judgment corresponding to the gesture is not carried out temporarily, but when the hands do not keep the same initial posture, the judgment of the condition corresponding to the gesture is started again, so that the situation that the power consumption is higher due to the fact that the hands of the user are in a hovering state when the gesture judgment is carried out is avoided.

In another implementation of the first aspect, the determining that the hand has not hovered includes:

acquiring fourth characteristic information of a fourth image, wherein the fourth characteristic information comprises: the hand category, the finger orientation and the position of the hand detection frame on the fourth image, wherein the first image is an image with a first initial gesture or an image with the first initial gesture after the first initial gesture is identified, the first initial gesture is the initial gesture of any gesture in the gesture library, and the fourth image is acquired later than the first image and earlier than the second image;

determining that the hand is not hovering when the first feature information and the fourth feature information meet any one of hovering conditions, the hovering conditions comprising: the hand category in the first characteristic information and the hand category in the fourth characteristic information are inconsistent, the finger orientation in the first characteristic information and the finger orientation in the fourth characteristic information are inconsistent, the ratio of the hand detection frame in the first characteristic information to the hand detection frame in the second characteristic information is outside a third interval, and the intersection ratio of the hand detection frame in the first characteristic information and the hand detection frame in the second characteristic information is smaller than a second threshold value which is larger than the first threshold value.

According to the method and the device, whether the hand gesture and the hand position in the two images change or not can be determined through aspects of the hand types, the finger orientations, the area ratio of the hand detection frames, the intersection ratio and the like in the two images, and whether the hand hovers or not can be accurately determined.

In a second aspect, an embodiment of the present application provides another method for identifying a space-free gesture, including:

detecting the space-free gesture operation of the user;

determining that the blank gesture operation of the user is a second gesture when the first feature information of the first image and the second feature information of the second image meet a second preset condition, wherein the first feature information comprises a first hand gesture and a first hand position, the second feature information comprises a second hand gesture and a second hand position, and the second preset condition comprises: the first hand gesture is consistent with the initial gesture of the second gesture, the second hand gesture is consistent with the end gesture of the second gesture, the superposition area between the first hand position and the second hand position meets the superposition area condition between the initial hand position and the end hand position in the second gesture, and the second gesture is a grasp gesture.

In one implementation manner of the second aspect, the starting gesture of the second gesture is a palm and the fingers are facing upwards, and the ending gesture of the first gesture is a fist;

In an implementation manner of the second aspect, the second preset condition further includes:

and the position relation between the fingertip and the finger root in the second image meets the position relation condition of the fingertip and the finger root in the second gesture.

In an implementation manner of the second aspect, the meeting the condition of the positional relationship between the fingertip and the finger root in the second gesture includes:

In a third aspect, there is provided an electronic device comprising a processor for executing a computer program stored in a memory, implementing the method of any one of the first aspects of the application.

In a fourth aspect, there is provided a system on a chip comprising a processor coupled to a memory, the processor executing a computer program stored in the memory to implement the method of any of the first aspects of the application.

In a fifth aspect, there is provided a computer readable storage medium storing a computer program which when executed by one or more processors performs the method of any of the first aspects of the application.

In a sixth aspect, embodiments of the application provide a computer program product for, when run on a device, causing the device to perform the method of any of the first aspects of the application.

It will be appreciated that the advantages of the second to sixth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of an electronic device to which the method for identifying a space gesture according to an embodiment of the present application is applied;

fig. 2 is a schematic diagram of an application scenario of a method for identifying a space-apart gesture according to an embodiment of the present application;

fig. 3 is an application scenario schematic diagram of a slide-up gesture according to an embodiment of the present application;

Fig. 4 is a schematic diagram of an application scenario of a slide-down gesture according to an embodiment of the present application;

fig. 5 is a schematic diagram of an application scenario of a gripping gesture according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of a method for identifying a space-free gesture according to an embodiment of the present application;

FIG. 7 is a schematic flow chart of a screen state detection and initiation gesture recognition stage according to an embodiment of the present application;

FIG. 8 is a graph comparing model output results of an initial gesture recognition stage and a dynamic gesture recognition stage according to an embodiment of the present application;

FIG. 9 is a schematic flow chart of a dynamic gesture recognition stage according to an embodiment of the present application;

FIG. 10 is a flow chart illustrating another dynamic gesture recognition stage according to an embodiment of the present application;

FIG. 11 is a flow chart illustrating another dynamic gesture recognition stage according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a hover determination process according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a grasping gesture determination flow chart according to an embodiment of the present application;

fig. 14 is a schematic diagram of a hand in a rule for determining a grasping gesture according to an embodiment of the present application;

FIG. 15 is a schematic diagram of a slide gesture determination flow provided in an embodiment of the present application;

FIG. 16 is a schematic diagram of a slide down gesture determination flow according to an embodiment of the present application;

fig. 17 is a schematic diagram of a hand in a rule for determining a slide gesture according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that in embodiments of the present application, "one or more" means one, two, or more than two; "and/or", describes an association relationship of the association object, indicating that three relationships may exist; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Furthermore, in the description of the present specification and the appended claims, the terms "first," "second," "third," "fourth," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The method for identifying the space-apart gestures provided by the embodiment of the application can be applied to the following electronic equipment. The electronic device may be a tablet computer with a front-facing camera, a mobile phone, a wearable device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), or the like. The embodiment of the application does not limit the specific type of the electronic equipment.

Fig. 1 shows a schematic structural diagram of an electronic device. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include, among other things, a pressure sensor 180A, a touch sensor 180K, an ambient light sensor 180L, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a coprocessor (sensor coprocessor, SCP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), or the like. Wherein the different processing units may be separate devices or may be integrated in one or more processors. For example, the processor 110 is configured to perform the method for identifying a space-free gesture in the embodiment of the present application.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it may be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer-executable program code that includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store application programs (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system.

In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.

In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio signals to analog audio signal outputs and also to convert analog audio inputs to digital audio signals. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to listening to voice information. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc. The earphone interface 170D is used to connect a wired earphone. The earphone interface 170D may be a USB interface 130 or a 3.5mm open electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A.

The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

The ambient light sensor 180L is used to sense ambient light level. The electronic device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 180L may also cooperate with proximity light sensor 180G to detect whether electronic device 100 is in a pocket to prevent false touches.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

The embodiment of the present application is not particularly limited to a specific structure of an execution body of the method for identifying a space gesture, as long as communication can be performed by running a code recorded with the method for identifying a space gesture according to the embodiment of the present application, with the method for identifying a space gesture provided according to the embodiment of the present application. For example, the execution body of the method for identifying the space gesture provided by the embodiment of the application may be a functional module in the electronic device, which can call a program and execute the program, or a communication device, such as a chip, applied to the electronic device.

Currently, some electronic devices may provide a function of identifying a space gesture, see fig. 2, which is an application scenario of a space gesture. A user makes a blank gesture, e.g., the gesture of five-finger open in fig. 2, within the view angle range of a front-facing camera of the electronic device; the camera of the electronic device collects images containing gestures of a user, and after the electronic device determines that the air-separation gesture is detected according to the collected images, the electronic device makes a response corresponding to the air-separation gesture, such as unlocking, pausing and the like.

The blank gesture may be a static gesture or a dynamic gesture, taking the static gesture with open five fingers in fig. 2 as an example, the static gesture of the user is kept for a certain period of time in the view angle range of the front camera, the electronic device can easily collect clear images containing the static gesture, and the static gesture can be easily and accurately identified.

If the space gesture is a dynamic gesture, for example, a grasp gesture: the hands are changed from the state of opening the five fingers to the state of making a fist. The dynamic gesture is a dynamic change process of the hand, and an image acquired by a camera of the electronic equipment is easy to be unclear; the electronic equipment can recognize the dynamic gesture in a more inaccurate recognition process; this results in a situation where the electronic device is more likely to be misresponsive or unresponsive, and the user experience is poor.

The embodiment of the application provides a method for identifying a space gesture, which can accurately identify an up-sliding gesture, a down-sliding gesture and a grabbing gesture.

Referring to fig. 3, an application scenario diagram of a slide gesture according to an embodiment of the present application is shown.

The up-slide gesture in the embodiment of the application is as follows: the back of the hand (five fingers facing downwards) faces the electronic device screen, and after the hand icon appears above, the wrist is swung upwards so that the palm (five fingers facing upwards) faces the electronic device screen. The swipe gesture may be applied in an application with page up (e.g., display the next page), swipe up (e.g., content swipe up).

Referring to (a) in fig. 3, the back of the user's hand (five fingers down) faces the screen of the electronic device and is within the view angle range of the camera of the electronic device.

Referring to (b) of fig. 3, after the electronic device detects the back of the user's hand and the five fingers are downward, a hand icon is displayed on the screen; the user may swipe the wrist upward after the hand-shaped icon is displayed on the screen of the electronic device.

Referring to fig. 3 (c), after the user swings the wrist, the palm (five fingers are facing upwards) of the user faces the screen of the electronic device, and after the electronic device recognizes the up-slide gesture of the user, the up-slide gesture is reported to the application currently running in the foreground, and the content displayed by the application slides upwards or pages upwards.

As another slide-up gesture provided by the embodiment of the present application, it may also be: the palm (five fingers facing downwards) faces the electronic device screen, and after the hand icon appears above, the wrist is swung upwards so that the back of the hand (five fingers facing upwards) faces the electronic device screen. The swipe gesture may be applied in an application with page up, swipe up.

Referring to fig. 4, an application scenario diagram of a slide gesture according to an embodiment of the present application is shown.

The downslide gesture in the embodiment of the application is as follows: the palm (five fingers facing up) faces the electronic device screen, and after the hand icon appears above, the wrist is swung down so that the back of the hand (five fingers facing down) faces the electronic device screen. The swipe gesture may be applied in an application with page down (e.g., displaying the next page), swipe down (e.g., content swipe down).

Referring to fig. 4 (a), the palm of the user's hand (five fingers up) faces the screen of the electronic device and is within the view angle range of the camera of the electronic device.

Referring to (b) of fig. 4, after the electronic device detects the palm of the user and five fingers are up, a hand-shaped icon is displayed on the screen; the user may swipe the wrist down after the hand-icon is displayed on the screen of the electronic device.

Referring to fig. 4 (c), after the user swings his wrist, the back of his hand (with five fingers facing downwards) faces the screen of the electronic device, and after the electronic device recognizes the slide-down gesture of the user, the electronic device reports the slide-down gesture to the application currently running in the foreground, and the content displayed by the application slides downwards or pages downwards.

Referring to fig. 5, an application scenario diagram of a grip gesture according to an embodiment of the present application is shown.

The grasping gesture in the embodiment of the application is as follows: the palm (five fingers facing upwards) faces the electronic device screen and the fist is held after the hand icon appears above. The gripping gesture is used to implement a blank screen capture function.

Referring to (a) in fig. 5, the palm of the user's hand (five fingers up) faces the screen of the electronic device and is within the view angle range of the camera of the electronic device.

Referring to (b) of fig. 5, after the electronic device detects the palm of the user (five fingers up), a hand icon is displayed on the screen; the user may make a fist after displaying a hand icon on the screen of the electronic device.

Referring to fig. 5 (c), after the user makes a fist, the user's hand becomes a fist shape, and after the electronic device recognizes the user's grip gesture, the user's grip gesture is reported to an upper application (an application for implementing screen capturing), and the upper application may generate a screen capturing image of the content currently displayed on the screen of the electronic device and display the screen capturing image on the screen for a period of time. In practical applications, whether the generated screenshot image is displayed on the screen in a reduced manner for a period of time is related to the function of the electronic device itself, and the embodiment of the application does not limit whether the screenshot image must be displayed on the screen in a reduced manner for a period of time.

In order to realize the function of accurately recognizing the space-time dynamic gesture in the above scene, the process of recognizing the space-time dynamic gesture provided by the embodiment of the application comprises three stages: a start condition detection phase, an initial gesture recognition phase, and a dynamic gesture recognition phase.

And (3) a starting condition detection stage:

in the embodiment of the application, the function of sliding upwards, sliding downwards or screen capturing can be realized by using the space-free gesture only in the state of the electronic equipment on the screen; therefore, in order to reduce power consumption, a bright screen state of a screen of the electronic device may be taken as an on condition for starting the gesture recognition function. For example, when a screen bright state is detected, the initial gesture recognition function is turned on; and when the screen-off state is detected, the starting gesture recognition function is closed.

There are a variety of ways to determine whether the screen is in a bright or off state. For example, the status of a screen of an electronic device may be determined by the changing condition of ambient light collected by an ambient light sensor provided on the electronic device (screen on or off will affect the ambient light intensity around).

Of course, if the up-slide, down-slide, and screen capture functions are applied in a specific application, the specific application may also be set to run in the foreground as an on condition for starting the gesture recognition function. For example, when a specific application is detected to run in the foreground, starting a start gesture recognition function; after detecting that a particular application is switched to background run or closed, the start gesture recognition function is turned off.

It can be appreciated from the above examples that the starting condition of the initial gesture recognition function may be different according to the application scenario, and the starting condition of the initial gesture recognition function is not limited in the present application. The following describes in detail the starting condition of the gesture recognition function with the screen-on state as the starting start.

Initiating a gesture recognition phase:

in the embodiment of the application, after the initial gesture recognition function is started, an initial gesture recognition stage is started. The initial state of the hand in the up slide gesture is: the back of the hand of the user faces the screen of the electronic equipment; the initial state of the hand in the slide down gesture is: the palm of the user faces the screen of the electronic device; the initial state of the hand in the grasp gesture is: the palm of the user's hand is facing the screen of the electronic device. Therefore, the embodiment of the application can set the initial gesture as follows: palm and back of hand. Other gestures than the starting gesture may be noted as non-starting gestures, for example, gestures such as fist making, scissors hand making, etc. may be noted as non-starting gestures. After the initial gesture (palm or back of hand) is recognized, the dynamic gesture recognition function will be turned on.

Dynamic gesture recognition phase:

in the embodiment of the application, after the dynamic gesture recognition function is started, a dynamic gesture recognition stage is started. Different recognition rules of different dynamic gestures can be set for the up-slide gesture, the down-slide gesture and the grasping gesture respectively. If the recognition rule of any one of the above-mentioned up-slide gesture, down-slide gesture, and grip gesture is satisfied, the dynamic gesture recognition is successful, and the recognized dynamic gesture may be reported to an upper application, which responds based on the recognized dynamic gesture.

The following describes the linking procedure between the above three stages in detail. Referring to fig. 6, a flow chart of a method for identifying a space gesture according to an embodiment of the present application is shown.

S101, detecting a screen state.

In the embodiment of the application, the screen brightness state of the electronic equipment can be used as the condition for starting the gesture recognition function, so that the state of the screen needs to be detected.

S102, determining whether the screen is in a bright screen state according to the screen state detection result.

In the case where the screen is not lit, the process returns to step S101 to continue the execution of the screen state detection.

Under the condition that the screen is on, starting a gesture recognition function; after the start gesture recognition, step S103 will be performed.

S103, checking whether the dynamic gesture recognition function is started.

Because the initial gesture recognition stage and the dynamic gesture recognition stage both need to detect the gesture of the image acquired by the camera, the dynamic gesture recognition function can be set to default to a closed state; before a dynamic gesture recognition function is started, the images acquired by the camera are all subjected to initial gesture recognition; starting a dynamic gesture recognition function after the initial gesture is recognized; after the dynamic gesture recognition function is started, the image acquired by the camera does not perform initial gesture recognition, but performs dynamic gesture recognition in the third stage.

Based on the above understanding, if the dynamic gesture recognition function is not yet turned on, step S105 is executed to perform initial gesture recognition; if the dynamic gesture recognition function is turned on, step S109 is executed to perform dynamic gesture recognition.

In addition, the method can be arranged in an initial gesture recognition stage, wherein the camera acquires images at a certain frame rate (for example, at a supportable minimum frame rate), and the initial gesture recognition stage performs initial gesture recognition on the images acquired by the camera; the method can be arranged in a dynamic gesture recognition stage, the camera acquires images at a certain frame rate (which can be larger than the frame rate of the images acquired by the camera in the initial gesture recognition stage), and the dynamic gesture recognition stage performs dynamic gesture recognition on the images acquired by the camera.

S104, performing initial gesture recognition on the image.

In the embodiment of the application, a rule for identifying the initial gesture may be set, for example, when the preset initial gesture is detected in the current image, the initial gesture is considered to be identified. It may also be provided that the initial gesture is detected on both acquired successive images before the initial gesture is recognized. The rule for recognizing the initial gesture is not limited in the embodiment of the present application, and the detailed process of the initial gesture recognition may refer to the description corresponding to the flow chart of the initial gesture recognition shown in fig. 7.

S105, judging whether the step S104 recognizes the initial gesture.

In the embodiment of the present application, if the initial gesture is recognized, step S106 may be executed to turn on the dynamic gesture recognition function. If the initial gesture is not recognized, step S107 may be performed to determine whether the initial gesture recognition process has timed out.

S106, outputting an instruction for starting the dynamic gesture recognition function.

In the embodiment of the present application, the starting gesture is identified as the starting condition of the dynamic gesture recognition function, after the dynamic gesture recognition function is started, step S103 is continuously executed, and after the dynamic gesture recognition is determined to be started, the dynamic gesture recognition stage is entered.

As another embodiment of the present application, the current image that has been identified as the initial gesture may also be used for dynamic gesture recognition, i.e., after step S106, step S108 is performed, and the current image is continued to be used for dynamic gesture recognition.

S107, checking whether the initial gesture stage is overtime.

In the embodiment of the application, the initial gesture recognition function needs to recognize the image and compare the occupied memory; in addition, the initial gesture recognition may still be executed after the screen of the electronic device is turned off, which may result in excessive power consumption, so that the execution duration after the initial gesture recognition function is turned on may be set, and if the execution duration is exceeded, the step S101 is returned to execute the screen state detection to determine whether the screen of the electronic device is still in the on-screen state, so as to avoid that the initial gesture recognition is still executed after the screen is turned off.

Of course, in the case where the execution period is not exceeded, the determination of step S103 is no, and the execution of the initial gesture recognition based on the newly acquired image will be continued.

As described in step S105 to step S106, if the initial gesture is recognized, the dynamic gesture recognition function is turned on, and step S103 is executed after the dynamic gesture recognition function is turned on; when step S103 is executed, the determination result will be yes, and step S108 is executed.

S108, carrying out dynamic gesture recognition on the image.

In the embodiment of the present application, a dynamic gesture recognition process is also provided, and the dynamic gesture recognition process may be described in detail with reference to the embodiment shown in fig. 9.

S109, judging whether the dynamic gesture is recognized in the step S108.

If the dynamic gesture is recognized, step S110 is executed to report the dynamic gesture to the upper layer application.

If the dynamic gesture is not recognized, step S111 is executed to determine whether the dynamic gesture recognition phase is timed out.

S110, reporting the identified dynamic gesture to an upper layer application.

In the embodiment of the present application, if the identified dynamic gesture is already reported to the upper layer application, the dynamic gesture identification function needs to be turned off (to be restored to the default state), so that the initial gesture identification is executed first when the gesture is identified next time.

S111, judging whether the dynamic gesture recognition stage is overtime.

In the embodiment of the present application, the execution time of the initial gesture recognition stage and the execution time of the dynamic gesture recognition stage may be set to be the same time, or may be set to be different time, or of course, in practical application, other rules may be used as a determining factor of whether to timeout, for example, the number of images, etc., which is not limited in the embodiment of the present application.

If the dynamic gesture recognition phase has not timed out, the process returns to S103 to perform dynamic gesture recognition.

If the dynamic gesture recognition stage has timed out, after executing step S112 to turn off the dynamic gesture recognition function, returning to step S103 to perform the initial gesture recognition again.

The reason why the dynamic gesture was not recognized in the dynamic gesture recognition stage may be: the user does not use the space-apart gesture function, but the hand passes through the camera view angle range, so that the initial gesture is misidentified, and after entering the initial gesture identification stage again after step S112, the initial gesture cannot be identified, the initial gesture identification is executed to be overtime, and the screen state detection is returned.

The reason why the dynamic gesture was not recognized in the dynamic gesture recognition stage may also be: the blank gesture made by the user is not standard, and the user can make the blank gesture again after determining that the electronic device does not respond according to the blank gesture, in this case, after the electronic device enters the initial gesture recognition stage again after step S112, the initial gesture is recognized, and the blank gesture function is not affected.

S112, the dynamic gesture recognition function is closed.

In the embodiment of the application, after the recognized dynamic gesture is reported to an upper layer application or the dynamic gesture recognition stage is overtime, the dynamic gesture recognition function is required to be closed so as to restore the dynamic gesture recognition function to an initial closing state, and the initial gesture recognition is firstly carried out on the acquired image when the gesture recognition is carried out next time.

After the dynamic gesture recognition function is turned off, the step S103 is required to be returned to reenter the initial gesture recognition stage, and a new round of alternate gesture recognition is started.

In the embodiment of the application, whether the dynamic gesture is already recognized and reported to the upper layer application or the dynamic gesture recognition stage is overtime, the dynamic gesture recognition function is turned off, and the gesture is restarted to the initial gesture recognition stage.

In practical applications, there may be a scenario where the user closes the screen of the electronic device when the user uses the electronic device, and in this scenario, the electronic device may recognize the initial gesture in the initial gesture recognition stage, but may not recognize the dynamic gesture in the dynamic gesture recognition stage, so it may also be set to return to the screen state detection stage after closing the dynamic gesture function, that is, step S101 is performed after step S112.

Of course, according to the flow of the embodiment shown in fig. 6, after the dynamic gesture recognition function is turned off, the user returns to the initial gesture recognition stage, and cannot recognize the initial gesture, and returns to the screen state detection stage, so that excessive power consumption is not caused.

The embodiment shown in fig. 6 briefly describes the engagement between the start condition detection phase, the start gesture recognition phase and the dynamic gesture recognition phase, and the process of start gesture recognition and the process of dynamic gesture recognition will be described in detail below.

As can be understood from the scene diagrams shown in fig. 3 to 5, the initial gestures (denoted as positive samples) of the gestures in the embodiments of the present application include a palm and a back of the hand, and other gestures except the initial gestures are non-initial gestures (denoted as negative samples), such as fist making, scissors, and the like.

Referring to fig. 7, a flowchart of an initial gesture recognition stage according to an embodiment of the present application is provided.

S201, acquiring an image acquired by a camera.

In the embodiment of the application, after the initial gesture recognition function is started, the image acquired by the camera is required to be acquired so as to perform initial gesture recognition.

Since the initial gesture recognition is required to be frequently performed in the screen-brightening stage, in order to reduce power consumption, a camera can be set to collect images with lower resolution in the initial gesture recognition stage to perform initial gesture recognition; of course, after the initial gesture is identified and the dynamic gesture recognition function is started, the image can be acquired with higher resolution to perform subsequent dynamic gesture recognition.

In addition, in the initial gesture recognition stage, the camera can be set to collect images at a lower frame rate, and in the dynamic gesture recognition stage, the camera can be set to collect images at a higher frame rate.

S202, preprocessing the image.

In the embodiment of the application, the initial gesture recognition stage can adopt an initial gesture detection model to detect so as to determine whether the current image contains a positive sample. Typically, the detection model requires an input image, so to facilitate accurate detection of the initial gesture, the image may be preprocessed into a format that matches the initial gesture detection model. As an example, the preprocessing may include, but is not limited to, gray scale processing, filtering denoising processing, and the like.

And S203, performing classification detection on the preprocessed image to obtain a classification result.

According to the embodiment of the application, the classification detection can be performed through the preset initial gesture detection model, and the classification detection is used for determining whether the gesture in the input image is a positive sample or a negative sample. Thus, the classification result includes a positive or negative sample.

Of course, in practical application, a specific palm or back of hand in the positive sample may also be output.

S204, pressing the classification result into a result queue.

In the embodiment of the application, a result queue can be set for storing the classification results obtained by each detection, so as to judge whether the initial gesture is recognized or not according to at least two classification results.

S205, judging that two times of the continuous twice or the last three times of classification results are positive samples.

In the embodiment of the application, in the state of the screen of the electronic equipment being on, the starting gesture function is identified and started, and the action of the gesture of the user in the visual field of the camera of the electronic equipment can be identified as the starting gesture. In order to avoid misjudgment, a switch for starting dynamic gesture recognition can be set, wherein positive samples are detected in images acquired twice consecutively, or positive samples are detected twice in images acquired three times consecutively.

As another embodiment of the present application, it may also be provided that the same positive sample is detected in both images acquired two times in succession (for example, both palms or both backs of the hands), or that the same positive sample is detected in two times in the images acquired three times in succession as a condition for turning on dynamic gesture recognition.

If two of the two consecutive or three last classification results are positive samples, executing step S206 to turn on the dynamic recognition gesture switch;

If any one of the two consecutive classification results is not a positive sample and only one of the three last classification results is a positive sample, step S207 is executed to determine whether it is time-out.

S206, turning on a dynamic identification gesture switch.

S207, judging whether the time-out is over.

If the initial gesture recognition stage is not timed out, step S201 is continued to acquire the next image for initial gesture recognition.

If it has timed out, the start condition detection phase is returned (step S101) to perform screen state detection.

In the embodiment of the present application, after the initial gesture recognition stage is finished, either the dynamic gesture recognition function is started to perform dynamic gesture recognition, or the screen state detection is performed in the start condition detection stage, otherwise, the process is circulated inside the initial gesture recognition stage, and of course, the above description is only an example, and other results can be set in practical applications.

Referring to fig. 8, in the initial gesture recognition phase, hand category detection is mainly performed, and the output results include positive samples (back of hand and palm) and negative samples (other gestures).

In the dynamic gesture recognition stage, three modules are mainly included: the gesture detection device comprises a target detection module, a key point detection module and a gesture judgment module. Of course, in practical application, a hover judging module may be added, and the hover judging module may be set as an independent module between the key point detecting module and the gesture judging module, or may set the executed flow inside the gesture judging module.

The object detection module is used for detecting the type of the hand of the image, and can output a hand detection frame when the hand type is output.

The output of the object detection module includes the following classes: palm and back, fist, other categories (other gestures than palm, back and fist) and background (no gestures). It will be appreciated that the output of the hand class at the initial gesture recognition stage is different from the output of the hand class at the dynamic gesture recognition stage.

The key point detection module is used for detecting the key points of the hand detected by the target detection module and obtaining the hand orientation based on the key point information: up, down and orientation are unknown.

The gesture judging module is used for determining whether the gesture is a preset dynamic gesture or not according to the information detected by the target detecting module and the key point detecting module. The output results include: grip, swipe down, swipe up, other gestures, and timeout failure. The other gestures are gestures which are recognized that a hand exists in the multi-frame image, but do not belong to any gesture of the grasp, the up-slide and the down-slide gestures, and can be specifically referred to the description in the subsequent dynamic gesture recognition process.

The dynamic gesture recognition process will be described in detail below, and referring to fig. 9, a flow chart of the dynamic gesture recognition process provided in an embodiment of the present application is shown.

S301, acquiring an image acquired by a camera.

As described above, in the dynamic gesture recognition stage, the frame rate of the image collected by the camera may be increased, and the resolution of the image collected by the camera may also be increased.

S302, detecting the hand type of the image to obtain a detection result.

In the embodiment of the present application, the hand category in the dynamic gesture recognition stage is different from the hand category in the initial gesture recognition stage, and the hand category in the dynamic gesture recognition stage needs to recognize the back of hand, palm, fist or other gestures (with hands but not belonging to the back of hand, palm and fist), and the output result includes: palm, back of hand, fist, other or background, of course, in the case of recognizing the hand, the hand detection frame also needs to be output.

S303, determining whether the image has a hand or not according to the detection result.

In the embodiment of the application, the output result of the last step may have several cases: palm, back of hand, fist, other or background.

If the detection result is palm, back of hand, fist and others, determining that the image has hands; and if the detection result is background, determining that the image is hands-free.

When the image is determined to be hands-free, key point detection is not performed any more, overtime judgment can be performed, and when the dynamic gesture recognition stage is overtime, the starting gesture recognition stage or the screen state detection stage can be returned; if the dynamic gesture recognition stage is not timed out, the process may return to step S301 to continue to acquire the image acquired by the camera for the next cycle of the dynamic gesture recognition stage.

In the case where it is determined that the image is handy, step S304 may be performed to continue the keypoint detection.

S304, performing key point detection to obtain hand key point information and directions.

In the embodiment of the application, if the detection result is palm, back of hand, fist or other, the key point detection is continued, and the key point information and the orientation are output. In specific implementation, the key point detection can be performed by clipping the hand image in the hand detection frame output by the target detection module.

In practical application, the target detection module, the key point detection module and the gesture judgment module can be three independent modules. The three independent modules can be applied not only to the scenes shown in fig. 3 to 5, but also to other application scenes; therefore, as an independent keypoint detection module, it is necessary to first determine whether there is a hand before performing keypoint detection; similarly, as an independent gesture determination module, a step of determining whether the image has a hand is also required before performing gesture determination.

Of course, in practical applications, if the determination result in step S303 is no (no hand in the image), step S306 may be executed to determine whether there is no hand in all of the n consecutive images, and specifically, refer to the flowchart shown in fig. 10.

Alternatively, the logic for determining whether the image is hand-held or not may be executed only once throughout the dynamic gesture recognition stage, and the step of determining whether the image is hand-held or not again in step S305 may be canceled; accordingly, after determining no hand in step S303, step S306 is performed to determine whether time-out has occurred (whether no hand has occurred n times). After the key point information is detected in step S304, step S308 is performed to determine whether to hover, and specific reference may be made to the description in the embodiment shown in fig. 11.

The following description will take the schematic flow chart shown in fig. 9 as an example.

After step S304, the process executed by the gesture determination module will be entered, and if there is a hover determination module, the process executed by the hover determination module will be entered first, and then the process executed by the gesture determination module will be entered.

S305, it is determined whether the current image has a hand.

If no hand exists in the image, whether the whole dynamic gesture recognition stage is overtime needs to be judged, and the overtime judging logic can be set according to the embodiment of the application, and the image is not handed for n times continuously. The continuous n times of images are all hands-free, namely whether the images obtained after the initial gesture recognition stage is entered this time are all hands-free or not.

In the case where there is no hand in the image, step S306 is performed to determine whether there is no hand in the detection result of the image n times in succession.

In the case of a hand in the image, a subsequent step in dynamic gesture recognition is performed.

S306, judging whether no hand exists in the continuous n times of images.

If any one of the n continuous images has a hand, the dynamic gesture recognition stage has not timed out yet, and the process returns to step S301 to acquire the next image acquired by the camera, and the process of the dynamic gesture recognition stage is continuously executed.

If no hand is present in n consecutive images, it is determined that the time-out has been performed, and step S307 is performed to output the time-out, and of course, after the output time-out, the gesture recognition stage may be returned to, or the screen state detection stage may be returned to.

S307, outputting overtime.

As described previously, in step S305, in the case where it is determined that there is a hand in the currently detected image, a subsequent step in dynamic gesture recognition may be performed. Step S309 may be performed to determine whether the number of hands reaches the number of images M required by the gesture determination module. In the embodiment of the present application, the setting of M can ensure that the number of hands currently present can perform gesture recognition, so that a logic rule that the number of hands reaches M may be set to input the first image (or the reference image in the subsequent embodiment) of the target detection model in the dynamic gesture recognition stage to the current image, where the number of hands reaches M. Of course, other logic rules may be set, and accordingly, the number of hands is set to reach other values.

As another embodiment of the present application, a hover judging module (may be disposed inside the gesture judging module) may also be added, that is, in the dynamic gesture stage, if the gesture of the user keeps the initial gesture, the wake-up of the dynamic gesture recognition function is kept. Therefore, step S308 needs to be added before step S309 to determine whether the hand hovers.

S308, if the image is a hand, it is determined whether the hand hovers or not.

In the case of hovering the hand, the gesture judgment module does not need to recognize the gesture, but returns to step S301 to continue to acquire the next image acquired by the camera.

In the case where the hand is not hovering, step S309 may be performed to determine whether the number of hands reaches the number of images M required by the gesture determination module.

S309, it is determined whether the number of images with hands reaches M.

If it is determined that the number of images with hands has not reached M, the process returns to step S301 to acquire the next image acquired by the camera to continue to perform the loop step in the dynamic gesture recognition stage, and gesture determination is performed after the number of hands reaches M.

In the case where it is determined that the number of images with hands has reached M, it is necessary to determine whether or not the above-listed three gestures (grasp, slide up, slide down) are one by one. In practical application, the judging sequence of the three gestures is not sequential, and can be adjusted according to practical conditions.

The embodiment of the application judges whether the gesture is a grasp gesture or not, if not, then judges whether the gesture is a slide-up gesture or not, and if not, judges whether the gesture is a slide-down gesture or not.

S310, judging whether the gesture is a grasp gesture.

The judgment rule of the grip gesture may refer to the description in the subsequent embodiments.

If the gesture is a grasp gesture, step S311 may be performed to output a grasp gesture to report to the upper layer application.

If the gesture is not the grip gesture, step S312 is performed to continuously determine whether the gesture is the up gesture.

S311, outputting a grasp gesture.

S312, judging whether the gesture is a slide-up gesture.

The determination rule of the slide-up gesture may refer to the description in the subsequent embodiments.

If the gesture is a slide-up gesture, step S313 may be executed to output the slide-up gesture to report to the upper layer application.

If the gesture is not the up gesture, step S314 is executed to continuously determine whether the gesture is the down gesture.

S313, outputting a slide-up gesture.

S314, judging whether the gesture is a slide-down gesture.

The determination rule of the slide-down gesture may refer to the description in the subsequent embodiments.

If the gesture is a swipe gesture, step S315 may be performed to output the swipe gesture to the upper-layer application.

If the gesture is not a swipe gesture, it is determined whether the gesture is overtime, and the logic for determining whether the timeout is overtime may be that the number of images with continuous hands should reach Q, and refer to step S316. In the logic rule that the number of hand images reaches Q, the first image (or the reference image in the subsequent embodiment) of the target detection model in the dynamic gesture recognition stage is input to the current image, and the number of hands reaches Q.

S315, outputting a slide-down gesture.

S316, judging whether the number of images with hands reaches Q (Q is larger than M).

If Q has not been reached, step S301 may be executed to continue to acquire the next image acquired by the camera to perform the loop step of the dynamic gesture recognition stage.

If Q is reached, indicating that any gesture other than the grip, slide up, and slide down gestures may be other gestures, step S317 may be performed to output other gestures.

In the embodiment of the application, after the dynamic gesture recognition stage is finished, the corresponding gesture is output to be reported to the upper layer application or overtime is output. In either case, the dynamic gesture recognition function needs to be turned off, and the initial gesture recognition stage or the condition detection stage is returned, otherwise, the dynamic gesture recognition stage is internally circulated, and of course, the above description is only an example, and other results can be set in practical applications.

As described above, the hover determination module may be set inside the gesture determination module in the dynamic gesture recognition stage, that is, after the dynamic gesture recognition function is turned on, if the hand is the starting gesture and remains stationary, the dynamic gesture recognition function may be continuously awakened.

In practical application, the hand of the user cannot be kept motionless in absolute sense, so that some rule conditions can be set, when the electronic equipment detects that the hand of the user in the image meets the rule conditions, the hand of the user is determined to be motionless, and the dynamic gesture recognition function can be continuously awakened.

Referring to fig. 12, a schematic flow chart of a hover judging module for judging whether a gesture hovers according to an embodiment of the present application is provided.

S401, receiving output results of the target detection module and the key point detection module.

In this step, the output result may include a hand category, a detection frame position, key point information, and a hand orientation.

S402, checking whether the received information is the information of the first image in the dynamic gesture recognition stage.

In the embodiment of the application, the hand information in the first image in the dynamic gesture recognition stage can be set as the reference information, and if the difference between the subsequently acquired image and the hand information in the first image is smaller, the hand is considered to be hovered; if the difference between the subsequently acquired image and the hand information in the first image is large, the hand is considered not to hover.

The first image in the current dynamic gesture recognition stage may be an image when the initial gesture is recognized, for example, taking the flowchart shown in fig. 7 as an example, after the initial gesture is recognized, the current image is input into the dynamic gesture recognition stage, so that the image recognizing the initial gesture is the first image in the current dynamic gesture recognition stage.

The first image in the current dynamic gesture recognition stage may also be the next image of the image when the initial gesture is recognized, taking the flowchart shown in fig. 6 as an example, after the initial gesture is recognized, returning to step S103 to obtain the next image, and the next image may be used as the first image in the current dynamic gesture recognition stage.

Since the first image in the current dynamic gesture recognition stage is used as the reference information for gesture determination, the target detection module and the keypoint detection module in the dynamic gesture recognition stage can output the first complete hand information (for example, the hand category is palm or back, the detection frame information is provided, the keypoint information is provided, and the hand orientation is upward or downward) as the first image in the dynamic gesture recognition stage, that is, the reference image.

Of course, the frame rate of the image acquired by the camera in the dynamic gesture recognition stage is relatively fast, so that the image of the 2 nd complete hand information (for example, the hand category is palm or back, the detection frame information is provided, the key point information is provided, and the hand orientation is upward or downward) can be used as the reference image in the dynamic gesture recognition stage by the target detection module and the key point detection module in the dynamic gesture recognition stage. The specific selection of which image corresponds to the information is used as the reference information, and the frame rate of the dynamic gesture recognition stage can be related.

If the current received information is the information of the first image in the dynamic gesture recognition stage, step S403 is executed to record the information of the first image.

If the current received information is not the information of the first image in the dynamic gesture recognition stage, the current received information and the information of the first image need to be compared.

S403, recording information of the first image.

In the embodiment of the present application, in the information of the first image, the hand category may be recorded as handclass ₁ The hand detection frame coordinate information can be recorded asThe hand orientation is handdirect ₁ 。

In the embodiment of the application, under the condition that the difference of the hand information in the current image and the hand information in the first image is large, the hands are considered to be non-hovering, and the dynamic gestures are required to be identified by a subsequent gesture judgment module. Of course, after the image information is received later, the hover judging process is not executed any more, and the dynamic gesture is recognized directly through the gesture judging module. Therefore, in practical application, the hover judging module has two branches: and judging whether the gesture in the information of the currently received image is in a hovering state or directly sending the gesture to a gesture judging module to identify the dynamic gesture.

In order to realize the two branches, the hover condition can be initialized to be true, namely, the hover condition is defaulted to be hand hover, after the change of the gesture in the current image and the gesture of the first image is determined, the hover condition is set to be false, and then the process of hover judgment is not executed any more under the condition that the hover condition is false. In a specific implementation, step S402 may be followed by an addition step 404 of determining whether the hover condition is true.

S404, if the judgment result of the step S402 is not the first image, checking whether the hovering condition is true.

In the case where the hover condition is true, it is determined whether or not the hand information in the current image and the hand information in the first image are greatly changed (i.e., step S405 to step S408).

If the hover condition is false, the execution of step S410 proceeds to the gesture determination module to perform gesture determination.

S405, judging whether the hand category of the currently received image and the first image is changed.

In the embodiment of the application, the hand category of the current image can be recorded as handclass ₂ The hand detection frame coordinate information can be recorded asThe hand orientation is handdirect ₂ 。

In this step, the handclass can be viewed ₁ And handclass ₂ Whether the same or not indicates that the hand class has not changed, and the different indicates that the hand class has changed.

If the hand category changes, step S409 is performed to set the hover condition to false.

If the hand type has not changed, step S406 is executed to determine whether the hand orientation has changed.

S406, judging whether the hand orientations of the currently received image and the first image are changed.

In this step, the handdirect can be viewed ₁ And handdirect ₂ If the hand orientation is the same, the hand orientation is unchanged, and the hand orientation is not the same.

If the hand orientation changes, step S409 is performed to set the hover condition to false.

If the hand orientation is not changed, step S407 is performed to determine whether the detection frame area ratio is within the range.

S407, judging whether the area ratio of the detection frame of the currently received image and the first image is out of range.

Calculating the area of the detection frame of the first image

Calculating the area of the detection frame of the current image

Calculation ofOr->Judgment r ₁ Or r ₂ Whether the hand area is out of the preset numerical range or not, if the hand area is out of the preset numerical range, the hand area is greatly changed; and within a preset numerical range, the hand area is not greatly changed.

If the detection frame area ratio is out of the range, step S409 is performed to set the hover condition to false.

If the detection frame area ratio is within the range, step S408 is performed to determine whether the cross-over ratio is smaller than the threshold.

S408, judging whether the intersection ratio (Intersection over Union, IOU) of the detection frames in the first image and the current image is smaller than a threshold A.

Calculating the maximum value in the minimum coordinates in the x direction in the detection frames of the two images; a minimum value among maximum coordinates in the x direction; a maximum value in the minimum coordinates in the y-direction; a minimum value among maximum coordinates in the y direction; thereby determining the proportion of the area where the two overlap.

Maximum value in minimum coordinates in x direction

Maximum value in minimum coordinates in y direction

Minimum value in maximum coordinates in x direction

Minimum value in maximum coordinates in y direction

Area of overlapping region of two detection frames area=max (0, x _max -x _min )×max(0,y _max -y _min )。

Calculating the proportion of the area of the overlapped area to the area of the two detection frames

In general, the greater the ratio of the overlapping areas of the two detection frames, the smaller the hand range variation is in a hovering state; the smaller the ratio of the overlapping areas of the two detection frames, the more the hand range change is likely to be in a non-hovering state, so if the IOU is smaller than the threshold A, the smaller the hand overlapping area in the two images is considered, the user may be performing hand actions or moving out of gestures, and therefore the hovering state needs to be ended.

Of course, if the IOU is greater than the threshold a, it is considered that the overlapping area of the hands in the two images is greater and still in the hovering state, and then step S301 may be returned, that is, the next image acquired by the camera is acquired, and the dynamic gesture recognition process is performed in a circulating manner.

S409, if any one of the determination results of step S405 to step S408 is yes, the hover condition is set to be false.

If any one of the determination results from step S405 to step S408 is yes, it means that the hovering condition is not set, and the gesture determination can be performed, of course, the next image acquired at the next cycle is not required to perform the hovering determination, but the gesture determination is directly performed, so that the hovering condition needs to be set to be false.

After the hover condition is set to be false, the hand information of the first image in the dynamic gesture recognition stage and the hand information corresponding to the currently acquired image can be used as the gesture judgment information to be input into a gesture judgment module for gesture judgment.

That is, after step S409, step S410 may be performed to perform gesture determination, referring to mode one in fig. 12.

In addition, since the gesture determination is limited in number (greater than M), it may not be enough to identify the dynamic gesture for the first time, and thus, after step S409, it may be set to return to step S301, that is, acquire the next image acquired by the camera, and perform the dynamic gesture identification process in a loop, referring to the second mode in fig. 12.

When the gesture judging module identifies the dynamic gesture, whether the rules corresponding to the three gestures are met or not can be sequentially judged, and the gesture is output under the condition that the rules corresponding to any gesture are met; if the rule corresponding to any gesture is not satisfied, it is determined whether or not the time-out has elapsed.

Referring to fig. 13, a flow chart corresponding to a process of determining whether a grasp gesture is satisfied according to an embodiment of the present application is shown.

S501, receiving information of a current image.

The information of the current image includes hand category, key points, orientation, detection frame coordinates, and the like. Of course, in practical application, the output result of the information selecting part required by the judgment rules of different gestures may be input into the corresponding gesture judgment model. In contrast, the information of the current image received by the different gesture judging process is also the information in the partial output result.

S502, judging whether the hand type of the first image is a palm and the hand type of the current image is a fist.

In the embodiment of the present application, the initial state of the grasp gesture may be that the palm faces the screen, so the hand type of the first image (the image that recognizes the initial gesture) is the palm, and the final state of the hand in the grasp gesture is the fist, so the hand type of the current image is the fist.

It should be noted that the screen of the palm face does not mean that the palm and the screen must face in parallel, but rather, allows a certain degree of oblique facing.

S503, in the case where the determination result of step S502 is yes, determining whether the IOU of the detection frames of the first image and the current image is greater than the threshold B.

In the embodiment of the application, referring to fig. 14, the first image and the second image are two images acquired by a camera and are used for performing gesture judgment, the starting state of the hand in the grasp gesture is the palm, and the final state is the fist. Generally, the hand position is substantially unchanged, so the hand detection frame of the current image is theoretically within the range of the hand detection frame of the first image or has a certain overlapping range.

In practical applications, the threshold a (e.g., 90%) for judging hovering (the position and morphology of the hand in the two images remain substantially unchanged) may be set relatively large, and the threshold B (e.g., 30%) for gesture judgment may be set relatively small. That is, after one image enters the gesture judgment module because the IOU is smaller than the threshold a (the hand changes), the gesture recognition will not fail because the IOU is smaller than the threshold B.

S504, if the determination result in step S503 is yes, it is determined whether or not the hand orientation of the first image is up.

In the embodiment of the present application, the finger in the hand starting state in the grip gesture is oriented upward (the upper direction is defined as the direction of the top of the electronic device relative to the bottom, and the lower direction is defined as the direction of the bottom of the electronic device relative to the top), and of course, the upper and lower directions are not straight upward and downward in absolute terms, but are determined according to the judgment rule.

S505, in the case where the determination result of step S504 is yes, it is determined that the ratio of the distance between the current image fingertip and the wrist and the distance between the finger root and the wrist is within a preset certain interval range.

In the embodiment of the present application, referring to fig. 14, after making a fist, the distance between the finger tip and the wrist and the distance between the finger root and the wrist should be within a certain range of values, so the parameter may be set as a rule for determining the hand.

In practical application, an average value of coordinates of fingertips of 5 fingers can be selected as the coordinates of the fingertips to calculate the distance from the fingertips to the wrist; the average value of the coordinates of the fingertips of 4 fingers other than the thumb may also be selected as the fingertip coordinates to calculate the distance from the fingertips to the wrist; of course, the distance between the finger tip and the wrist may be calculated by selecting the coordinates of the highest or lowest finger tip as the finger tip coordinates. In practical application, different interval ranges can be set according to different determination modes of fingertip coordinates.

Similarly, the coordinate of the finger root may be determined by referring to the coordinate of the fingertip.

If the determination in step S505 is yes, a grip gesture is output.

Of course, in the steps from step S502 to step S505, if the determination result of any step is no, it is determined that the grasp gesture determination fails.

It should be noted that the present application is not limited to the sequence of the above-mentioned several judging processes, and in practical application, the sequence of the above-mentioned 4 judging processes may be adjusted according to the actual situation.

Referring to fig. 15, a flow chart corresponding to a process for determining whether a slide gesture is satisfied according to an embodiment of the present application is shown.

S601, receiving information of a current image.

In the embodiment of the application, the information of the current image comprises part or all of hand category, key point, orientation, detection frame coordinates and the like.

S602, judging whether the hand category of the first image is the back of the hand and faces downwards, and the hand category of the current image is the palm and faces upwards.

In practical applications, in order to use the habit of the upward sliding gesture of different users, the palm may swing the wrist upward facing the screen, or the back of the hand may swing the wrist upward facing the screen. Therefore, a judgment rule of the slide-up gesture 1 (corresponding to the slide-up gesture shown in fig. 3) may be set, and a judgment rule of the slide-up gesture 2 may also be set. And taking the up-slide gesture 1 and the up-slide gesture 2 as different gestures, and judging whether the judgment rule of the up-slide gesture 2 is met or not when the judgment rule of the up-slide gesture 1 is not met.

If step S602 is a determination rule of the up gesture 1, it is determined whether the hand category of the first image is palm and facing down, and the hand category of the current image is back and facing up as a determination rule of the up gesture 2.

S603, in the case where the determination result of step S602 is yes, it is determined whether the line direction from the detection frame center point of the first image to the detection frame center point of the current image is upward.

After waving the wrist in the up-slide gesture, the gesture will be transferred from the lower part to the upper part, so the connecting line direction from the center point of the detection frame of the first image to the center point of the detection frame of the current image should be upward.

S604, in the case where the determination result of step S603 is yes, it is determined that the ratio of the distance from the first image fingertip to the current image fingertip and the current image hand width is within the specific section.

After waving the wrist in a swipe gesture, typically, the position of the wrist does not change much, the hand shifts from the lower portion to the upper portion centered on the wrist (not in an absolute sense), and the moving distance of the fingertip becomes large, but not particularly large (e.g., not so large as to be slid out of the viewing angle range). Therefore, a numerical range interval needs to be determined to determine whether it is a swipe gesture. The numerical range can be obtained according to an actual scene test.

If the determination in step S604 is yes, a slide-up gesture is output.

If the determination result of any one of steps S602 to S604 is no, the slide-up gesture determination fails.

It should be noted that the present application is not limited to the sequence of the above-mentioned several judging processes, and in practical application, the sequence of the above-mentioned 3 judging processes may be adjusted according to the actual situation.

Referring to fig. 16, a flow chart corresponding to a process of determining whether a slide gesture is satisfied according to an embodiment of the present application is shown.

S701, information of a current image is received.

In the embodiment of the application, the information of the current image comprises all or part of hand category, key point, orientation, detection frame coordinates and the like.

S702, judging whether the hand category of the first image is palm and faces upwards, and the hand category of the current image is back and faces downwards.

In the embodiment of the application, whether the initial state and the end state of the gesture are consistent with the initial state and the end state of the sliding gesture or not can be determined as a judging rule according to the initial state and the end state of the gesture in the first image.

S703, in the case where the determination result of step S702 is yes, determining whether the line direction from the center point of the detection frame of the first image to the center point of the detection frame of the current image is downward.

After the wrist is swung in the slide-down gesture, the gesture is transferred from the upper part to the lower part, so that the connecting line direction from the center point of the detection frame of the first image to the center point of the detection frame of the current image should be downward, which is equivalent to whether the determined hand movement direction is consistent with the hand movement direction (downward) in the slide-down gesture.

S704, in the case where the determination result of step S703 is yes, it is determined that the ratio of the distance from the first image fingertip to the current image fingertip and the current image hand width is within the specific section.

Referring to fig. 17, the first image and the second image are two images acquired by the camera, and are used for gesture determination. After waving the wrist in a swipe gesture, typically, the position of the wrist does not change much, the gesture will shift from top to bottom centered on the wrist (not in an absolute sense), the distance of the fingertip will become large, but not particularly large (e.g., not so large as to be about to slip out of the view angle range). Therefore, a numerical range interval needs to be determined to determine whether a swipe gesture is performed. The numerical range can be obtained according to an actual scene test. The range of values in different gestures may be different.

In addition, the distance between the fingertips determined by coordinates in the images and the actual distance conversion are complex, so that the hand width of the person is basically not changed greatly, and therefore, the hand width can be used as a reference, namely, the coordinate distance between the fingertips in the two images divided by the palm width can be expressed in a certain numerical range, and the movement distance between the fingertips can meet the movement distance condition in the first gesture.

Of course, it should be noted that, in the embodiment of the present application, the position coordinates in the two images to be compared are all calculated with the same origin of coordinates, for example, the point in the upper left corner in the acquired image is taken as the origin coordinate, and the points in other positions may also be taken as the origin coordinates.

The coordinates of the fingertip can be referred to in the manner determined in the above-described embodiment, which is not limited by the embodiment of the present application.

If the determination at step S704 is yes, a slide-down gesture is output.

If the determination result of any one of steps S702 to S704 is no, the slide gesture determination fails.

As another embodiment of the present application, a method for identifying a space gesture provided in the embodiment of the present application includes:

detecting the space-free gesture operation of the user;

when first characteristic information of the first image and second characteristic information of the second image meet a first preset condition, determining that the blank gesture operation of the user is a first gesture, wherein the first characteristic information comprises a first hand gesture and a first hand position, the second characteristic information comprises a second hand gesture and a second hand position, and the first preset condition comprises: the first hand gesture is consistent with the initial gesture of the first gesture, the second hand gesture is consistent with the end gesture of the first gesture, the direction from the first hand position to the second hand position is consistent with the movement direction of the first gesture, the initial gesture and the end gesture of the first gesture are different, and the first gesture is a slide-up gesture or a slide-down gesture.

The gesture operation may be a starting gesture of a gesture library, for example, a gesture in which a palm faces a screen and five fingers face upwards. The first image may be a reference image in the above embodiment, and the second image may be an image acquired after the reference image acquired by the camera.

When the first image and the second image satisfy a rule corresponding to a certain gesture, the gesture corresponding to the blank gesture operation (including a hand motion in the whole process from the hand of the user in the blank gesture operation to the hand of the user in the second image) may be considered as the gesture corresponding to the rule.

As another embodiment of the present application, the starting gesture of the first gesture is related to the hand category (refer to the hand category in the dynamic gesture shown in fig. 8) and the finger orientation; the ending gesture of the first gesture is related to a hand category and a finger orientation; the first hand position comprises the position of the hand detection frame in the first image on the first image; the second hand position includes a position of the hand detection frame in the second image on the second image.

As another embodiment of the present application, the first gesture is a swipe gesture, the initial gesture of the first gesture is a palm and the fingers are facing up, and the end gesture of the first gesture is a back of the hand and the fingers are facing down;

As another embodiment of the present application, the first gesture is a slide-up gesture, the initial gesture of the first gesture is a palm and the fingers are facing downward, and the end gesture of the first gesture is a back of the hand and the fingers are facing upward;

As another embodiment of the present application, the first gesture is a swipe gesture, the initial gesture of the first gesture is a back of hand and fingers are facing downward, and the end gesture of the first gesture is a palm and fingers are facing upward;

As another embodiment of the present application, the first preset condition further includes:

As another embodiment of the present application, the first hand position includes: the position of the fingertip in the first image on the first image, the second hand position comprising: the position of the fingertip in the second image on the second image.

As another embodiment of the present application, the distance from the first hand position to the second hand position satisfying the movement distance condition of the first gesture includes:

the ratio of the distance from the position of the fingertip in the first image to the position of the fingertip in the second image to the hand width in the second image is within the first interval.

As another embodiment of the present application, after acquiring the first image and the second image, the method further includes:

when the first characteristic information of the first image and the second characteristic information of the second image meet a second preset condition, determining that the blank gesture operation of the user is a second gesture, wherein the second gesture is a gesture except the first gesture in a gesture library, and the second preset condition comprises: the first hand gesture is consistent with the initial gesture of the second gesture, the second hand gesture is consistent with the end gesture of the second gesture, and the overlapping area between the first hand position and the second hand position meets the overlapping area condition between the initial hand position and the end hand position in the second gesture, wherein the second gesture is a grasping gesture.

As another embodiment of the present application, the initial gesture of the second gesture is a palm and the fingers are facing upwards, and the ending gesture of the first gesture is a fist;

the overlapping area between the first hand position and the second hand position meeting the overlapping area condition between the starting hand position and the ending hand position in the second gesture includes:

the ratio of the intersection of the hand detection frame in the first image and the hand detection frame in the second image is greater than a first threshold (e.g., threshold B in the above embodiment).

As another embodiment of the present application, the second preset condition further includes: the positional relationship between the fingertip and the finger root in the second image satisfies the positional relationship condition between the fingertip and the finger root in the ending gesture of the second gesture.

As another embodiment of the present application, the positional relationship between the fingertip and the finger root in the second image satisfying the positional relationship condition between the fingertip and the finger root in the second gesture includes: the ratio of the distance from the tip of the finger to the wrist and the distance from the root of the finger to the wrist in the second image is within the second interval.

As another embodiment of the present application, the first image and the second image are two images of a plurality of continuous images acquired by the camera, and other images exist between the first image and the second image;

Before determining that the first characteristic information of the first image and the second characteristic information of the second image meet the first preset condition, the method further comprises:

the number of images with a hand in the image set is determined to be greater than or equal to a first number (e.g., number M in the above embodiment), the image set being from a first image acquired by the camera to a second image acquired by the camera.

As another embodiment of the present application, the method further comprises:

determining whether the number of images with hands in the image set is greater than or equal to a second number (for example, the number Q in the above embodiment) in the case where the first feature information and the second feature information do not satisfy the condition corresponding to each gesture in the gesture library, the image set being from the first image acquired by the camera to the second image acquired by the camera;

if the number of the images with hands in the image set is greater than or equal to the second number, outputting other gestures;

if the number of the images with hands in the image set is smaller than the second number, acquiring a third image acquired by the camera, judging whether the first characteristic information and the third characteristic information of the third image meet the condition corresponding to any gesture in the gesture library, wherein the third image is an image after the second image acquired by the camera.

As another embodiment of the present application, before determining that the first feature information of the first image and the second feature information of the second image satisfy the first preset condition, the method further includes: and determining that the hand is not hovered.

As another embodiment of the present application, the method further comprises:

acquiring fourth characteristic information of a fourth image, wherein the fourth characteristic information comprises: the hand category, the finger orientation and the position of the hand detection frame on a fourth image, wherein the first image is an image which is recognized to be in a first initial gesture or an image which is recognized to be in a first initial gesture, the first initial gesture is the initial gesture of any gesture in a gesture library, and the acquisition time of the fourth image is later than that of the first image and earlier than that of the second image;

when the first feature information and the fourth feature information meet any one of the hovering conditions, determining that the hand is hovering, wherein the hovering conditions comprise: the hand category in the first feature information and the hand category in the fourth feature information are inconsistent, the finger orientation in the first feature information and the finger orientation in the fourth feature information are inconsistent, the area ratio of the hand detection frame in the first feature information to the hand detection frame in the second feature information is outside the third section, the intersection ratio of the hand detection frame in the first feature information and the hand detection frame in the second feature information is smaller than a second threshold (for example, the threshold a in the above embodiment), and the second threshold is larger than the first threshold.

In the above embodiment, the up-slide gesture (or the down-slide gesture) is determined first, and then the grasp gesture is determined, and in practical application, the grasp gesture may be determined first, and then the up-slide gesture (or the down-slide gesture) is determined. The method is characterized in that the judgment sequence of gestures in a gesture library is not limited.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

The embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the above-described method embodiments.

Embodiments of the present application also provide a computer program product enabling a first device to carry out the steps of the method embodiments described above, when the computer program product is run on the first device.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above-described embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a first device, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The embodiment of the application also provides a chip system, which comprises a processor, wherein the processor is coupled with the memory, and the processor executes a computer program stored in the memory to realize the steps of any method embodiment of the application. The chip system can be a single chip or a chip module composed of a plurality of chips.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for identifying a space-free gesture, comprising:

detecting the space-free gesture operation of the user;

determining that the blank gesture operation of the user is a first gesture when the first feature information of the first image and the second feature information of the second image meet a first preset condition, wherein the first feature information comprises a first hand gesture and a first hand position, the second feature information comprises a second hand gesture and a second hand position, and the first preset condition comprises: the first hand gesture is consistent with the initial gesture of the first gesture, the second hand gesture is consistent with the end gesture of the first gesture, the direction from the first hand position to the second hand position is consistent with the movement direction of the first gesture, the initial gesture and the end gesture of the first gesture are different, and the first gesture is a slide-up gesture or a slide-down gesture.

2. The method of claim 1, wherein the starting gesture of the first gesture is related to a hand category and a finger orientation; the ending gesture of the first gesture is related to a hand category and a finger orientation; the first hand position comprises the position of a hand detection frame in the first image on the first image; the second hand position includes a position of a hand detection frame in the second image on the second image.

3. The method of claim 2, wherein the first gesture is a swipe down gesture, the first gesture has a starting gesture of palm and fingers facing up, and the first gesture has an ending gesture of back of hand and fingers facing down;

4. The method of claim 2, wherein the first gesture is a swipe-up gesture, the first gesture has a starting gesture of palm and fingers facing down, and the first gesture has an ending gesture of back of hand and fingers facing up;

5. The method of claim 2, wherein the first gesture is a swipe up gesture, the first gesture has a starting gesture of back of hand and fingers facing down, and the first gesture has an ending gesture of palm and fingers facing up;

6. The method of any one of claims 1 to 5, wherein the first preset condition further comprises:

7. The method of claim 6, wherein the first hand position comprises: a position of a fingertip in the first image on the first image, the second hand position comprising: the position of the fingertip in the second image on the second image.

8. The method of claim 7, wherein the distance from the first hand position to the second hand position meeting the distance-of-motion condition of the first gesture comprises:

9. The method of any one of claims 1 to 8, wherein after acquiring the first image and the second image, the method further comprises:

10. The method of claim 9, wherein the starting gesture of the second gesture is palm and fingers are facing up, and the ending gesture of the first gesture is fist making;

11. The method of claim 9 or 10, wherein the second preset condition further comprises:

12. The method of claim 11, wherein the positional relationship of the fingertip and the finger root in the second image satisfies the positional relationship condition of the fingertip and the finger root in the ending gesture of the second gesture comprises:

13. The method of any one of claims 1 to 12, wherein the first image and the second image are two images of a continuous plurality of images acquired by the same camera, there being further images in between the first image and the second image;

14. The method of claim 13, wherein the method further comprises:

15. The method of any one of claims 1 to 14, wherein before determining that the first characteristic information of the first image and the second characteristic information of the second image satisfy a first preset condition, the method further comprises:

and determining that the hand is not hovered.

16. The method of claim 15, wherein the determining that the hand has not hovered comprises:

determining that the hand is not hovering when the first feature information and the fourth feature information meet any one of hovering conditions, the hovering conditions comprising: the hand category in the first characteristic information and the hand category in the fourth characteristic information are inconsistent, the finger orientation in the first characteristic information and the finger orientation in the fourth characteristic information are inconsistent, the area ratio of the hand detection frame in the first characteristic information to the hand detection frame in the second characteristic information is outside a third interval, and the intersection ratio of the hand detection frame in the first characteristic information and the hand detection frame in the second characteristic information is smaller than a second threshold.

17. The method of claim 16, wherein detecting a user's spaced-apart gesture operation comprises:

The gesture of the user is detected as the first starting gesture.

18. A method for identifying a space-free gesture, comprising:

detecting the space-free gesture operation of the user;

19. The method of claim 18, wherein the starting gesture of the second gesture is palm and fingers are facing up, and the ending gesture of the second gesture is fist making;

20. The method of claim 18, wherein the second preset condition further comprises:

21. The method of claim 20, wherein the positional relationship of the fingertip and the finger root in the second image satisfies the positional relationship condition of the fingertip and the finger root in the ending gesture of the second gesture comprises:

22. An electronic device comprising a processor for executing a computer program stored in a memory to cause the electronic device to implement the method of any one of claims 1 to 21.

23. A system on a chip, comprising a processor coupled to a memory, the processor executing a computer program stored in the memory to implement the method of any one of claims 1 to 21.