WO2021115181A1 - Gesture recognition method, gesture control method, apparatuses, medium and terminal device - Google Patents

Gesture recognition method, gesture control method, apparatuses, medium and terminal device Download PDF

Info

Publication number
WO2021115181A1
WO2021115181A1 PCT/CN2020/133410 CN2020133410W WO2021115181A1 WO 2021115181 A1 WO2021115181 A1 WO 2021115181A1 CN 2020133410 W CN2020133410 W CN 2020133410W WO 2021115181 A1 WO2021115181 A1 WO 2021115181A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
frame
face image
gesture recognition
trajectory
Prior art date
Application number
PCT/CN2020/133410
Other languages
French (fr)
Chinese (zh)
Inventor
刘高强
Original Assignee
RealMe重庆移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by RealMe重庆移动通信有限公司 filed Critical RealMe重庆移动通信有限公司
Publication of WO2021115181A1 publication Critical patent/WO2021115181A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to a gesture recognition method, a gesture control method, a gesture recognition device, a gesture control device, a computer-readable storage medium, and a terminal device.
  • Gesture control refers to the use of computer vision, graphics and other technologies to recognize human operation gestures without touching the terminal device, and convert them into control instructions for the device. It is a new interaction after the mouse, keyboard and touch screen. In this way, it can get rid of the dependence of traditional interaction methods on input devices and increase the diversity of interactions.
  • Gesture recognition is the premise of gesture control. Only by accurately and promptly recognizing the user's gestures can it be transformed into effective gesture control and achieve the interactive results that the user wants.
  • the present disclosure provides a gesture recognition method, a gesture control method, a gesture recognition device, a gesture control device, a computer-readable storage medium, and a terminal device, thereby improving at least to a certain extent the high processing volume and time-consuming of gesture recognition data The problem.
  • a gesture recognition method is provided, which is applied to a terminal device equipped with a camera, and the method includes: acquiring multiple frames of original images collected by the camera; and extracting from the multiple frames of original images respectively A face image to obtain multiple frames of face images; detecting hand key points in each frame of face image, and generating a hand trajectory according to the position changes of the hand key points in the multiple frames of face images; The hand trajectory is recognized, and the gesture recognition result is obtained.
  • a gesture control method which is applied to a terminal device with a camera, and the method includes: when the gesture control function is turned on, obtaining a gesture recognition result according to the gesture recognition method of the first aspect; executing; The control instruction corresponding to the gesture recognition result.
  • a gesture recognition device which is configured in a terminal device equipped with a camera, the device includes a processor; wherein the processor is used to execute the following program modules stored in the memory: original image acquisition module , Used to obtain multiple frames of original images collected by the camera; a face image extraction module, used to extract face images from the multiple frames of original images to obtain multiple frames of face images; hand trajectory generation module, It is used to detect the hand key points in each frame of face image, and generate hand trajectories according to the position changes of the hand key points in the multi-frame face image; the hand trajectory recognition module is used to compare The hand trajectory is recognized, and the gesture recognition result is obtained.
  • original image acquisition module Used to obtain multiple frames of original images collected by the camera
  • a face image extraction module used to extract face images from the multiple frames of original images to obtain multiple frames of face images
  • hand trajectory generation module It is used to detect the hand key points in each frame of face image, and generate hand trajectories according to the position changes of the hand key points in the multi-frame face image
  • a gesture control device which is configured in a terminal device equipped with a camera, the device includes a processor; wherein the processor is configured to execute the following program modules stored in the memory: original image acquisition module , Used to obtain multiple frames of original images collected by the camera when the gesture control function is turned on; a face image extraction module, used to extract face images from the multiple frames of original images to obtain multiple frames of face images
  • the hand trajectory generation module is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image; hand
  • the trajectory recognition module is used to recognize the hand trajectory to obtain the gesture recognition result;
  • the control instruction execution module is used to execute the control instruction corresponding to the gesture recognition result.
  • a computer-readable storage medium having a computer program stored thereon, and the computer program implements the gesture recognition method of the first aspect or the gesture control method of the second aspect when the computer program is executed by a processor .
  • a terminal device including: a processor; a memory for storing executable instructions of the processor; and a camera; wherein the processor is configured to execute the executable Instructions to execute the gesture recognition method of the first aspect or the gesture control method of the second aspect.
  • the camera collects multiple frames of original images, extracts the face images separately, and detects the key points of the hand from each frame of the face image, and then generates the hand trajectory according to the position change of the hand key points, and finally recognizes the hand trajectory.
  • Gesture recognition result Since the user's hand is generally located in front of or near the face when performing gesture operations, extracting the face image from the original image to detect the key points of the hand is equivalent to cropping the original image, and the cropping has nothing to do with gesture recognition. This reduces the amount of image processing data, so that the system only needs to perform gesture recognition in the face image, which reduces the time-consuming process, improves the real-time performance of gesture recognition, and does not require high hardware processing performance.
  • the control instruction corresponding to the gesture recognition result can be executed immediately, so as to achieve a fast interactive response, improve the interaction delay problem, and improve the user experience. For somatosensory games And so on has a high practicability.
  • Fig. 1 shows a flowchart of a gesture recognition method in this exemplary embodiment
  • Fig. 2 shows a sub-flow chart of a gesture recognition method in this exemplary embodiment
  • FIG. 3 shows a schematic flowchart of extracting hand candidate regions in this exemplary embodiment
  • FIG. 4 shows a schematic flowchart of gesture recognition in this exemplary embodiment
  • Fig. 5 shows a flow chart of a gesture control method in this exemplary embodiment
  • Fig. 6 shows a structural block diagram of a gesture recognition device in this exemplary embodiment
  • FIG. 7 shows a structural block diagram of another gesture recognition device in this exemplary embodiment
  • Fig. 8 shows a structural block diagram of a gesture control device in this exemplary embodiment
  • Fig. 9 shows a structural block diagram of another gesture control device in this exemplary embodiment.
  • FIG. 10 shows a computer-readable storage medium for implementing the above-mentioned method in this exemplary embodiment
  • Fig. 11 shows a terminal device for implementing the above-mentioned method in this exemplary embodiment.
  • gesture recognition methods are mostly based on gesture positioning and feature extraction in images captured by a camera.
  • the number of pixels of the camera on the terminal device becomes higher and the image resolution becomes higher and higher, the amount of data processing in the gesture recognition process becomes higher and more time-consuming, which affects the real-time performance of gesture recognition.
  • Resulting in a certain delay in gesture control, and poor user experience; and the above-mentioned method requires high hardware processing performance, which is not conducive to deployment in scenarios such as mobile terminals.
  • exemplary embodiments of the present disclosure provide a gesture recognition method, which can be applied to terminal devices equipped with cameras, such as mobile phones, tablet computers, digital cameras, virtual reality devices, and the like.
  • Fig. 1 shows a flow of the gesture recognition method, which may include the following steps S110 to S140:
  • Step S110 Obtain multiple frames of original images collected by the camera.
  • a gesture is an action that requires multiple frames to record the gesture completely.
  • the camera can collect a fixed number of original images, such as 10 frames, 50 frames, etc.; or it can detect whether there is an object in front of the camera through a matching infrared sensor.
  • the camera is started to collect the original image, and when the object is detected to move away, the camera stops collecting, thereby obtaining multiple frames of original images.
  • appropriate frame dropping can be performed, for example, one frame is reserved every three frames to reduce the amount of subsequent processing, and has little effect on gesture recognition. Specific frame dropping The rate depends on the number of frames of the original image collected by the camera, which is not limited in the present disclosure.
  • step S120 face images are extracted from the above-mentioned multiple frames of original images to obtain multiple frames of face images.
  • the face area can be recognized by color and shape detection, such as pre-setting the color range and shape range of the human face, and detecting whether there is a local area that satisfies both the color range and the shape range in the original image.
  • the local area is the person. Face area.
  • Deep learning techniques can also be used, such as YOLO (You Look Only Once, an algorithm framework for real-time target detection, including v1, v2, v3, etc.), and this disclosure can use any one of them), SSD (Single Shot) Multibox Detector, single-step multi-frame target detection), R-CNN (Region-Convolutional Neural Network, regional convolutional neural network, or improved versions such as Fast R-CNN, Faster R-CNN) and other neural networks for face region detection .
  • the face area When the face area is detected, the face area can be marked with a rectangular frame and extracted as a face image. In order to facilitate subsequent processing, the face image can be extracted or sampled according to a preset size (or resolution), so that the size (or resolution) of each frame of the face image is the same.
  • a hardware face detection module can be set on the terminal device, and after the collected multiple frames of original images are input to the HWFD, the coordinates of the face area are output, and the coordinates are mapped to From the original image, a face image can be extracted.
  • HWFD hardware face detection module
  • the resolution of the collected multiple frames of original images can be adjusted to a preset resolution, and in step S120, the original image after the adjusted resolution can be executed. Face image extraction.
  • the preset resolution may be determined according to the algorithm standard adopted in step S120. For example: YOLO is used for face detection, and the input layer of YOLO is set to 640*480, then the preset resolution can be 640*480; if the terminal's camera is 16 million pixels, the original image resolution collected is 4608* 3456, the system can down-sample the original image to get a 640*480 image, which can be input to YOLO for processing.
  • the preset resolution is lower than the resolution of the original image itself, which is equivalent to compressing the original image, reducing the data volume of the original image, and improving processing efficiency.
  • Step S130 Detect the key points of the hand in each frame of the face image, and generate a hand trajectory according to the position changes of the key points of the hand in the multiple frames of the face image.
  • the key points of the hand can be selected according to the needs of the scene and the image quality.
  • 21 bone points can be selected as the key points of the hand, including the 4 joint feature points of each finger and the palm feature points, or it can be based on It is necessary to select only a part of the bone points.
  • only the joint feature points or fingertip points of the index finger can be used as the key points of the hand.
  • the detection of the key points of the hand can be achieved through shape detection. For example: Perform fingertip shape detection on face images, detect areas with arcs in the face image, and match the arcs of these areas with the preset standard fingertip arcs, and the arcs of the areas with higher matching degrees The top of the shape is the fingertip point (that is, the key point of the hand). Or perform finger shape detection on the face image, and determine the area that is more similar to the standard finger shape as the finger area, and the circular boundary points of the finger area can be designated as the key points of the hand. Or perform ellipse fitting on the figure in the face image, and use the long axis end point of the fitted ellipse as the key point of the hand.
  • the detection of key points of the hand may be specifically implemented through the following steps S210 and S220:
  • Step S210 Perform region feature detection on each frame of face image, so as to extract hand candidate regions from each frame of face image;
  • Step S220 Detect key points of the hand in the candidate hand area.
  • regional feature detection refers to segmenting a lot of local areas from the face image, extracting and identifying the features of each local area, and when detecting a local area containing hand features, the local area is used as a hand candidate area. Then, further detecting the key points of the hand in the candidate hand area can improve the detection accuracy of the key points of the hand.
  • step S210 may be specifically implemented through the following steps:
  • RPN Random Proposal Network
  • R-CNN or Fast R-CNN, Faster R-CNN
  • RPN can be adopted as a whole.
  • the face image After the face image is input, it first passes through the convolution layer for convolution processing (usually including the pooling processing of the pooling layer) to extract the image features.
  • the feature enters the RPN, and the RPN can extract candidate frames.
  • the RPN can extract candidate frames.
  • NMS Non-Maximum Suppression
  • the candidate frames extracted at this time include various categories. For example, there are not only candidate frames for hands, but also candidate frames for nose, mouth, glasses and other parts.
  • the classification layer can use a Softmax (normalized index) function, etc., for the target categories that may exist in the face image, respectively output probability values, and the category with the highest probability value is the category of the candidate frame. Can delete the candidate frame of the non-hand category, and keep only the hand candidate frame. Finally, the hand candidate area is input into the regression layer.
  • Softmax normalized index
  • the regression layer can fine-tune the position and size of the hand candidate area to obtain the coordinate array (x, y, w, h) of the hand candidate area, where x and y represent the hand
  • x and y represent the hand
  • the position coordinates of the candidate area usually the coordinates of the upper left corner point
  • w and h represent the width and height of the hand candidate area.
  • the above-mentioned R-CNN can be obtained by training a large number of face image samples.
  • the hand candidate area is artificially labeled in the image to obtain the label.
  • Samples and labels are trained, network parameters are updated, and a usable R-CNN is obtained.
  • the method in FIG. 2 can be used for each frame of the face image, and the key points of the hand are detected in each frame.
  • the hands cannot be detected.
  • the hand candidate area extracted from the current frame of the face image is null
  • the hand key points detected in the previous frame are regarded as the hand key points of the current frame.
  • the hand candidate area is null, that is, the hand cannot be detected.
  • the hand key points of the previous frame can be directly copied to the current frame. This can improve the robustness of the algorithm.
  • the detection of hand key points in the hand candidate area can also be achieved by models such as R-CNN.
  • the hand key points are used as the target to be detected. Through the extraction and processing of image features, the area where the target is located can be output to mark the hand. Department of key points.
  • the hand trajectory can be in the form of an array, a vector, or a picture. This disclosure does not do this limited.
  • Step S140 Recognize the trajectory of the hand to obtain a gesture recognition result.
  • the hand trajectory reflects the user's gesture operation action, so to recognize it, the gesture made by the user can be recognized, and the gesture recognition result can be obtained.
  • the hand trajectory generated in step S130 may be matched with a preset standard trajectory.
  • the standard trajectory may include shaking the hand left and right, shaking the finger left and right, sliding the finger up and down, opening the hand, and the like. If there is a standard trajectory and the matching rate of the hand trajectory reaches a certain threshold, the hand trajectory is judged to be the standard trajectory, and the gesture represented by the standard trajectory is output as the gesture recognition result of the hand trajectory.
  • step S140 may be specifically implemented through the following steps:
  • the hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
  • the size of the bitmap can be preset, or it can be the same as the size of the face image or the candidate area of the hand.
  • the hand trajectory is the position change of the key points of the hand.
  • the position of each frame is mapped to a bitmap and connected in sequence, which is equivalent to representing the hand trajectory in the bitmap, and this bitmap is called the hand trajectory bitmap.
  • Bayesian classifier is based on the known probability and misjudgment loss to select the optimal category to minimize the risk of classification. Refer to the following formula:
  • h represents a Bayesian classifier
  • x is the sample
  • ⁇ ij is the loss generated when the error c j into c i
  • x) is the expected loss generated when misclassification
  • N is samples number.
  • the hand trajectory bitmap is input into the Bayesian classifier, and the gesture recognition result can be output.
  • Fig. 4 shows a schematic flow of a gesture recognition method.
  • the resolution can be adjusted according to the preset resolution to reduce the image; then the face image is extracted from the original image with the adjusted resolution through HWFD, so that the subsequent processing is concentrated on the original image
  • the hand trajectory is mapped as a hand trajectory bitmap; the hand trajectory bitmap is input to the Bayesian classifier, and the Bayesian classifier is processed to output the gesture recognition result.
  • the foregoing terminal device may include multiple cameras. After the gesture recognition result is obtained, the above-mentioned multiple cameras can be switched according to the gesture recognition result. For example, when the gesture recognition result is that the finger is shaken left and right, the terminal device is triggered to switch to the main camera; when the gesture recognition result is that the finger is swiped up and down, the terminal device is triggered to switch to the telephoto camera, and so on. In this way, when the user is at a certain distance from the terminal device, he can face the camera to perform operations through gestures, which is more convenient.
  • the gesture recognition method of this exemplary embodiment multiple frames of original images are collected by the camera, face images are extracted respectively, and key points of the hands are detected from each frame of face images, and then generated according to the position changes of the key points of the hands.
  • the hand trajectory, and finally the hand trajectory is recognized, and the gesture recognition result is obtained. Since the user's hand is generally located in front of or near the face when performing gesture operations, extracting the face image from the original image to detect the key points of the hand is equivalent to cropping the original image, and the cropping has nothing to do with gesture recognition. This reduces the amount of image processing data, so that the system only needs to perform gesture recognition in the face image, which reduces the time-consuming process, improves the real-time performance of gesture recognition, and does not require high hardware processing performance. It is conducive to deployment in lightweight scenarios such as mobile terminals.
  • Exemplary embodiments of the present disclosure also provide a gesture control method, which can be applied to a terminal device equipped with a camera.
  • the gesture control method may include:
  • the gesture recognition result is obtained according to the gesture recognition method in this exemplary embodiment
  • enabling the gesture control function includes but is not limited to: when a game program with gesture control function is started, the terminal automatically starts the gesture control function; in interfaces such as taking pictures or browsing the web, the user chooses to enable the gesture control function.
  • the corresponding relationship between gestures and control commands can be preset in the program. For example, waving the palm corresponds to the screenshot command, sliding the finger downward corresponds to the page turning command, etc., when the user's gesture is recognized, it can be quickly found and executed according to the gesture recognition result Corresponding control instructions.
  • the user can be allowed to take pictures through specific gesture control, for example, when the user makes a thumbs-up gesture, the terminal device is triggered to automatically press the shutter button for taking pictures; or when the terminal device is equipped with multiple cameras, the user is allowed
  • the camera switching is controlled through specific gestures. For example, when the user shakes a finger, the terminal device is triggered to switch between the main camera, the telephoto camera, and the wide-angle camera, thereby facilitating the user's photographing operation.
  • FIG. 5 shows a flow of a gesture control method, which may include the following steps S510 to S550:
  • Step S510 when the gesture control function is turned on, acquire multiple frames of original images collected by the camera;
  • Step S520 extracting face images from the foregoing multiple frames of original images respectively to obtain multiple frames of face images
  • Step S530 detecting the hand key points in each frame of the face image, and generating a hand trajectory according to the position changes of the hand key points in the multi-frame face image;
  • Step S540 Recognizing the trajectory of the hand to obtain a gesture recognition result
  • Step S550 execute the control instruction corresponding to the gesture recognition result.
  • the control instruction corresponding to the gesture recognition result can be executed immediately, so as to achieve a fast interactive response and improve the interaction delay Problems, improve user experience, and have high practicability for somatosensory games.
  • Exemplary embodiments of the present disclosure also provide a gesture recognition device, which can be configured in a terminal device equipped with a camera.
  • the gesture recognition device 600 may include a processor 610 and a memory 620; the memory 620 stores the following program modules:
  • the original image acquisition module 621 is used to acquire multiple frames of original images collected by the camera;
  • the face image extraction module 622 is configured to extract face images from the above-mentioned multiple frames of original images to obtain multiple frames of face images;
  • the hand trajectory generation module 623 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
  • the hand trajectory recognition module 624 is used for recognizing the hand trajectory to obtain a gesture recognition result
  • the processor 610 is configured to execute the foregoing program modules.
  • the original image acquisition module 621 is configured to:
  • the resolution of the multiple frames of original images is adjusted to a preset resolution.
  • the hand trajectory generating module 623 is configured to:
  • the hand trajectory generating module 623 is configured to:
  • the hand candidate region extracted from the face image of the current frame is a null value
  • the hand key points detected in the previous frame are used as the hand key points of the current frame.
  • the hand trajectory generating module 623 is configured to:
  • the hand trajectory recognition module 624 is configured to:
  • the hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
  • the foregoing terminal device includes multiple cameras; the hand track recognition module 624 is configured to:
  • Exemplary embodiments of the present disclosure also provide another gesture recognition device, which can be configured in a terminal device equipped with a camera.
  • the gesture recognition apparatus 700 may include:
  • the original image acquisition module 710 is configured to acquire multiple frames of original images collected by the camera;
  • the face image extraction module 720 is configured to extract face images from the foregoing multiple frames of original images to obtain multiple frames of face images;
  • the hand trajectory generating module 730 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
  • the hand trajectory recognition module 740 is used for recognizing the hand trajectory to obtain a gesture recognition result.
  • the original image acquisition module 710 is configured to:
  • the resolution of the multiple frames of original images is adjusted to a preset resolution.
  • the hand trajectory generating module 730 is configured to:
  • the hand trajectory generating module 730 is configured to:
  • the hand candidate region extracted from the face image of the current frame is a null value
  • the hand key points detected in the previous frame are used as the hand key points of the current frame.
  • the hand trajectory generating module 730 is configured to:
  • the hand trajectory recognition module 740 is configured to:
  • the hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
  • the foregoing terminal device includes multiple cameras; the hand track recognition module 740 is configured to:
  • Exemplary embodiments of the present disclosure also provide a gesture control method, which can be configured in a terminal device equipped with a camera.
  • the gesture control device 800 may include a processor 810 and a memory 820; wherein the memory 820 stores the following program modules:
  • the original image acquisition module 821 is configured to acquire multiple frames of original images collected by the camera when the gesture control function is turned on;
  • the face image extraction module 822 is configured to extract face images from the above-mentioned multiple frames of original images to obtain multiple frames of face images;
  • the hand trajectory generation module 823 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
  • the hand trajectory recognition module 824 is used for recognizing the hand trajectory to obtain a gesture recognition result
  • the control instruction execution module 825 is configured to execute the control instruction corresponding to the gesture recognition result
  • the processor 810 is used to execute the above-mentioned program modules.
  • the aforementioned control instruction includes a camera switching instruction.
  • the original image acquisition module 821 is configured to:
  • the resolution of the multiple frames of original images is adjusted to a preset resolution.
  • the hand trajectory generating module 823 is configured to:
  • the hand trajectory generating module 823 is configured to:
  • the hand candidate region extracted from the face image of the current frame is a null value
  • the hand key points detected in the previous frame are used as the hand key points of the current frame.
  • the hand trajectory generating module 823 is configured to:
  • the hand trajectory recognition module 824 is configured to:
  • the hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
  • Exemplary embodiments of the present disclosure also provide another gesture control device, which can be configured in a terminal device equipped with a camera.
  • the gesture control device 900 may include:
  • the original image acquisition module 910 is used to acquire multiple frames of original images collected by the camera when the gesture control function is turned on;
  • the face image extraction module 920 is configured to extract face images from the foregoing multiple frames of original images to obtain multiple frames of face images;
  • the hand trajectory generating module 930 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
  • the hand trajectory recognition module 940 is used for recognizing the hand trajectory to obtain a gesture recognition result
  • the control instruction execution module 950 is used to execute the control instruction corresponding to the gesture recognition result.
  • the aforementioned control instruction includes a camera switching instruction.
  • the original image acquisition module 910 is configured to:
  • the resolution of the multiple frames of original images is adjusted to a preset resolution.
  • the hand trajectory generating module 930 is configured to:
  • the hand trajectory generating module 930 is configured to:
  • the hand candidate region extracted from the face image of the current frame is a null value
  • the hand key points detected in the previous frame are used as the hand key points of the current frame.
  • the hand trajectory generating module 930 is configured to:
  • the hand trajectory recognition module 940 is configured to:
  • the hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
  • Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which can be implemented in the form of a program product, which includes program code.
  • program product runs on a terminal device
  • program code is used to make the terminal device Perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification.
  • a program product 1000 for implementing the above-mentioned method according to an exemplary embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and may be installed in a terminal Running on equipment, such as a personal computer.
  • CD-ROM compact disk read-only memory
  • the program product of the present disclosure is not limited thereto.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can adopt any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code for performing the operations of the present disclosure can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers
  • Exemplary embodiments of the present disclosure also provide a terminal device capable of implementing the above method.
  • the terminal device may be a mobile phone, a tablet computer, a digital camera, or the like.
  • the terminal device 1100 according to this exemplary embodiment of the present disclosure will be described below with reference to FIG. 11.
  • the terminal device 1100 shown in FIG. 11 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the terminal device 1100 may be represented in the form of a general-purpose computing device.
  • the components of the terminal device 1100 may include but are not limited to: at least one processing unit 1110, at least one storage unit 1120, a bus 1130 connecting different system components (including the storage unit 1120 and the processing unit 1110), a display unit 1140, and an image acquisition unit 1170,
  • the image acquisition unit 1170 includes at least one camera.
  • the storage unit 1120 stores program codes, and the program codes can be executed by the processing unit 1110, so that the processing unit 1110 executes the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification.
  • the processing unit 1110 may execute the method steps shown in FIG. 1, FIG. 2 or FIG. 5.
  • the storage unit 1120 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 1121 and/or a cache storage unit 1122, and may further include a read-only storage unit (ROM) 1123.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 1120 may also include a program/utility tool 1124 having a set of (at least one) program modules 1125.
  • program modules 1125 include but are not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 1130 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the terminal device 1100 may also communicate with one or more external devices 1200 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable the user to interact with the terminal device 1100, and/or communicate with Any device (such as a router, modem, etc.) that enables the terminal device 1100 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 1150.
  • the terminal device 1100 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 1160.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 1160 communicates with other modules of the terminal device 1100 through the bus 1130. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the terminal device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the exemplary embodiment of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

Abstract

A gesture recognition method, a gesture control method, apparatuses, a storage medium and a terminal device. The gesture recognition method is applied to a terminal device provided with a camera, and comprises: acquiring multiple frames of original images collected by the camera; extracting face images from among the multiple frames of original images respectively to obtain multiple frames of face images; detecting hand key points in each frame of the face images, and generating a hand trajectory according to position changes of the hand key points in the multiple frames of face images; and recognizing the hand trajectory to obtain a gesture recognition result. The amount of image processing data in gesture recognition is reduced, and the time consumed by the recognition process is reduced.

Description

手势识别方法、手势控制方法、装置、介质与终端设备Gesture recognition method, gesture control method, device, medium and terminal equipment
本申请要求于2019年12月13日提交的,申请号为201911284143.9,名称为“手势识别方法、手势控制方法、装置、介质与终端设备”的中国专利申请的优先权,该中国专利申请的全部内容通过引用结合在本文中。This application claims the priority of the Chinese patent application filed on December 13, 2019 with the application number 201911284143.9 and the name "gesture recognition method, gesture control method, device, medium and terminal equipment". All of the Chinese patent application The content is incorporated herein by reference.
技术领域Technical field
本公开涉及计算机视觉技术领域,尤其涉及一种手势识别方法、手势控制方法、手势识别装置、手势控制装置、计算机可读存储介质与终端设备。The present disclosure relates to the field of computer vision technology, and in particular to a gesture recognition method, a gesture control method, a gesture recognition device, a gesture control device, a computer-readable storage medium, and a terminal device.
背景技术Background technique
手势控制是指在不接触终端设备的情况下,利用计算机视觉、图形学等技术来识别人的操作手势,并转化为对设备的控制指令,是继鼠标、键盘和触控屏之后新的交互方式,其能够摆脱传统交互方式对于输入设备的依赖,提高交互的多样性。Gesture control refers to the use of computer vision, graphics and other technologies to recognize human operation gestures without touching the terminal device, and convert them into control instructions for the device. It is a new interaction after the mouse, keyboard and touch screen. In this way, it can get rid of the dependence of traditional interaction methods on input devices and increase the diversity of interactions.
手势识别是手势控制的前提。只有准确、及时地识别出用户的手势,才能将其转化为有效地手势控制,实现用户想要的交互结果。Gesture recognition is the premise of gesture control. Only by accurately and promptly recognizing the user's gestures can it be transformed into effective gesture control and achieve the interactive results that the user wants.
发明内容Summary of the invention
本公开提供了一种手势识别方法、手势控制方法、手势识别装置、手势控制装置、计算机可读存储介质与终端设备,进而至少在一定程度上改善手势识别数据处理量较高、耗时较长的问题。The present disclosure provides a gesture recognition method, a gesture control method, a gesture recognition device, a gesture control device, a computer-readable storage medium, and a terminal device, thereby improving at least to a certain extent the high processing volume and time-consuming of gesture recognition data The problem.
根据本公开的第一方面,提供一种手势识别方法,应用于具备摄像头的终端设备,所述方法包括:获取由所述摄像头采集的多帧原始图像;分别从所述多帧原始图像中提取人脸图像,得到多帧人脸图像;检测每帧人脸图像中的手部关键点,并根据所述手部关键点在所述多帧人脸图像中的位置变化,生成手部轨迹;对所述手部轨迹进行识别,得到手势识别结果。According to a first aspect of the present disclosure, a gesture recognition method is provided, which is applied to a terminal device equipped with a camera, and the method includes: acquiring multiple frames of original images collected by the camera; and extracting from the multiple frames of original images respectively A face image to obtain multiple frames of face images; detecting hand key points in each frame of face image, and generating a hand trajectory according to the position changes of the hand key points in the multiple frames of face images; The hand trajectory is recognized, and the gesture recognition result is obtained.
根据本公开的第二方面,提供一种手势控制方法,应用于具备摄像头的终端设备,所述方法包括:当开启手势控制功能时,根据上述第一方面的手势识别方法得到手势识别结果;执行所述手势识别结果对应的控制指令。According to a second aspect of the present disclosure, there is provided a gesture control method, which is applied to a terminal device with a camera, and the method includes: when the gesture control function is turned on, obtaining a gesture recognition result according to the gesture recognition method of the first aspect; executing; The control instruction corresponding to the gesture recognition result.
根据本公开的第三方面,提供一种手势识别装置,配置于具备摄像头的终端设备,所述装置包括处理器;其中,所述处理器用于执行存储器中存储的以下程序模块:原始图像获取模块,用于获取由所述摄像头采集的多帧原始图像;人脸图像提取模块,用于分别从所述多帧原始图像中提取人脸图像,得到多帧人脸图像;手部轨迹生成模块,用于检测每帧人脸图像中的手部关键点,并根据所述手部关键点在所述多帧人脸图像中的位置变化,生成手部轨迹;手部轨迹识别模块,用于对所述手部轨迹进行识别,得到手势识别结果。According to a third aspect of the present disclosure, there is provided a gesture recognition device, which is configured in a terminal device equipped with a camera, the device includes a processor; wherein the processor is used to execute the following program modules stored in the memory: original image acquisition module , Used to obtain multiple frames of original images collected by the camera; a face image extraction module, used to extract face images from the multiple frames of original images to obtain multiple frames of face images; hand trajectory generation module, It is used to detect the hand key points in each frame of face image, and generate hand trajectories according to the position changes of the hand key points in the multi-frame face image; the hand trajectory recognition module is used to compare The hand trajectory is recognized, and the gesture recognition result is obtained.
根据本公开的第四方面,提供一种手势控制装置,配置于具备摄像头的终端设备,所述装置包括处理器;其中,所述处理器用于执行存储器中存储的以下程序模块:原始图像获取模块,用于当开启手势控制功能时,获取由所述摄像头采集的多帧原始图像;人脸图像提取模块,用于分别从所述多帧原始图像中提取人脸图像,得到多帧人脸图像;手部轨迹生成模块,用于检测每帧人脸图像中的手部关键点,并根据所述手部关键点在所述多帧人脸图像中的位置变化,生成手部轨迹;手部轨迹识别模块,用于对所述手部轨迹进行识别,得到手势识别结果;控制指令执行模块,用于执行所述手势识别结果对应的控制指令。According to a fourth aspect of the present disclosure, there is provided a gesture control device, which is configured in a terminal device equipped with a camera, the device includes a processor; wherein the processor is configured to execute the following program modules stored in the memory: original image acquisition module , Used to obtain multiple frames of original images collected by the camera when the gesture control function is turned on; a face image extraction module, used to extract face images from the multiple frames of original images to obtain multiple frames of face images The hand trajectory generation module is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image; hand The trajectory recognition module is used to recognize the hand trajectory to obtain the gesture recognition result; the control instruction execution module is used to execute the control instruction corresponding to the gesture recognition result.
根据本公开的第五方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面的手势识别方法或第二方面的手势控制方法。According to a fifth aspect of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and the computer program implements the gesture recognition method of the first aspect or the gesture control method of the second aspect when the computer program is executed by a processor .
根据本公开的第六方面,提供一种终端设备,包括:处理器;存储器,用于存储所述处理器的可执行指令;以及摄像头;其中,所述处理器配置为经由执行所述可执行指令来执行上述第一方面的手势识别方法或第二方面的手势控制方法。According to a sixth aspect of the present disclosure, there is provided a terminal device, including: a processor; a memory for storing executable instructions of the processor; and a camera; wherein the processor is configured to execute the executable Instructions to execute the gesture recognition method of the first aspect or the gesture control method of the second aspect.
本公开的技术方案具有以下有益效果:The technical solution of the present disclosure has the following beneficial effects:
由摄像头采集多帧原始图像,分别提取人脸图像,并从每帧人脸图像中检测手部关键点,再根据手部关键点的位置变化生成手部轨迹,最后对手部轨迹进行识别,得到手势识别结果。由于用户在进行手势操作时,手部一般位于脸部的前方或附近,从原始图像中提取人脸图像以检测手部关键点,相当于对原始图像进行了裁剪,裁减掉了和手势识别无关的区域,从而降低了图像处理的数据量,使系统仅需在人脸图像中进行手势识别,减小了过程耗时,提高了手势识别的实时性,且对硬件的处理性能要求不高,有利于部署在移动终端等轻量化场景中。进一步的,基于实时性较强的手势识别,当用户做出手势操作后,可以立即执行手势识别结果对应的控制指令,从而实现快速的交互响应,改善交互延迟问题,提高用户体验,对于体感游戏等具有较高的实用性。The camera collects multiple frames of original images, extracts the face images separately, and detects the key points of the hand from each frame of the face image, and then generates the hand trajectory according to the position change of the hand key points, and finally recognizes the hand trajectory. Gesture recognition result. Since the user's hand is generally located in front of or near the face when performing gesture operations, extracting the face image from the original image to detect the key points of the hand is equivalent to cropping the original image, and the cropping has nothing to do with gesture recognition. This reduces the amount of image processing data, so that the system only needs to perform gesture recognition in the face image, which reduces the time-consuming process, improves the real-time performance of gesture recognition, and does not require high hardware processing performance. It is conducive to deployment in lightweight scenarios such as mobile terminals. Further, based on the real-time gesture recognition, when the user makes a gesture operation, the control instruction corresponding to the gesture recognition result can be executed immediately, so as to achieve a fast interactive response, improve the interaction delay problem, and improve the user experience. For somatosensory games And so on has a high practicability.
附图说明Description of the drawings
图1示出本示例性实施方式中一种手势识别方法的流程图;Fig. 1 shows a flowchart of a gesture recognition method in this exemplary embodiment;
图2示出本示例性实施方式中一种手势识别方法的子流程图;Fig. 2 shows a sub-flow chart of a gesture recognition method in this exemplary embodiment;
图3示出本示例性实施方式中提取手部候选区域的示意性流程图;FIG. 3 shows a schematic flowchart of extracting hand candidate regions in this exemplary embodiment;
图4示出本示例性实施方式中手势识别的示意性流程图;FIG. 4 shows a schematic flowchart of gesture recognition in this exemplary embodiment;
图5示出本示例性实施方式中一种手势控制方法的流程图;Fig. 5 shows a flow chart of a gesture control method in this exemplary embodiment;
图6示出本示例性实施方式中一种手势识别装置的结构框图;Fig. 6 shows a structural block diagram of a gesture recognition device in this exemplary embodiment;
图7示出本示例性实施方式中另一种手势识别装置的结构框图;FIG. 7 shows a structural block diagram of another gesture recognition device in this exemplary embodiment;
图8示出本示例性实施方式中一种手势控制装置的结构框图;Fig. 8 shows a structural block diagram of a gesture control device in this exemplary embodiment;
图9示出本示例性实施方式中另一种手势控制装置的结构框图;Fig. 9 shows a structural block diagram of another gesture control device in this exemplary embodiment;
图10示出本示例性实施方式中一种用于实现上述方法的计算机可读存储介质;FIG. 10 shows a computer-readable storage medium for implementing the above-mentioned method in this exemplary embodiment;
图11示出本示例性实施方式中一种用于实现上述方法的终端设备。Fig. 11 shows a terminal device for implementing the above-mentioned method in this exemplary embodiment.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, these embodiments are provided so that the present disclosure will be more comprehensive and complete, and the concept of the example embodiments will be fully conveyed To those skilled in the art. The described features, structures or characteristics can be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present disclosure. However, those skilled in the art will realize that the technical solutions of the present disclosure can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the well-known technical solutions are not shown or described in detail in order to avoid overwhelming the crowd and obscure all aspects of the present disclosure.
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。In addition, the drawings are only schematic illustrations of the present disclosure, and are not necessarily drawn to scale. The same reference numerals in the figures denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.
相关技术中,手势识别方法大多基于摄像头拍摄图像中的手势定位和特征提取。随着终端设备上摄像头的像素数越来越高,图像分辨率越来越高,使得手势识别过程的数据处理量越来越高,耗时越来越长,从而影响了手势识别的实时性,导致手势控制存在一定的延迟,用户体验较差;并且上述方法对硬件的处理性能要求较高,不利于部署在移动终端等场景中。In related technologies, gesture recognition methods are mostly based on gesture positioning and feature extraction in images captured by a camera. As the number of pixels of the camera on the terminal device becomes higher and the image resolution becomes higher and higher, the amount of data processing in the gesture recognition process becomes higher and more time-consuming, which affects the real-time performance of gesture recognition. , Resulting in a certain delay in gesture control, and poor user experience; and the above-mentioned method requires high hardware processing performance, which is not conducive to deployment in scenarios such as mobile terminals.
鉴于上述问题,本公开的示例性实施方式提供一种手势识别方法,可以应用于具备摄像头的终端设备,如手机、平板电脑、数码相机、虚拟现实设备等。图1示出了该手势识别方法的一种流程,可以包括以下步骤S110至S140:In view of the foregoing problems, exemplary embodiments of the present disclosure provide a gesture recognition method, which can be applied to terminal devices equipped with cameras, such as mobile phones, tablet computers, digital cameras, virtual reality devices, and the like. Fig. 1 shows a flow of the gesture recognition method, which may include the following steps S110 to S140:
步骤S110,获取由摄像头采集的多帧原始图像。Step S110: Obtain multiple frames of original images collected by the camera.
手势是一个动作,需要多帧画面才能完整地记录手势。本示例性实施方式中,当手势识别功能开启时,摄像头可以采集固定帧数的原始图像,例如10帧、50帧等;或者可以通过配套的红外传感器等感应摄像头前方是否存在物体,当感应到物体时(一般默认为手部),启动摄像头采集原始图像,当感应到物体移开时,摄像头停止采集,从而得到多帧原始图像。在一种可选的实施方式中,在采集原始图像后,可以进行适当的丢帧,例如每三帧保留一帧,以减少后续的处理量,且对于手势识别影响较小,具体的丢帧率视摄像头采集原始图像的帧数而定,本公开不做限定。A gesture is an action that requires multiple frames to record the gesture completely. In this exemplary embodiment, when the gesture recognition function is turned on, the camera can collect a fixed number of original images, such as 10 frames, 50 frames, etc.; or it can detect whether there is an object in front of the camera through a matching infrared sensor. When there is an object (usually the hand is the default), the camera is started to collect the original image, and when the object is detected to move away, the camera stops collecting, thereby obtaining multiple frames of original images. In an optional implementation manner, after the original image is collected, appropriate frame dropping can be performed, for example, one frame is reserved every three frames to reduce the amount of subsequent processing, and has little effect on gesture recognition. Specific frame dropping The rate depends on the number of frames of the original image collected by the camera, which is not limited in the present disclosure.
步骤S120,分别从上述多帧原始图像中提取人脸图像,得到多帧人脸图像。In step S120, face images are extracted from the above-mentioned multiple frames of original images to obtain multiple frames of face images.
其中,人脸区域可以通过颜色与形状检测而识别,例如预先设定人脸部分的颜色范围和形状范围,检测原始图像中是否存在同时满足颜色范围和形状范围的局部区域, 该局部区域即人脸区域。也可以采用深度学习技术,例如通过YOLO(You Look Only Once,一种实时目标检测的算法框架,包括v1、v2、v3等多个版本,本公开可以采用其中任一个版本)、SSD(Single Shot Multibox Detector,单步多框目标检测)、R-CNN(Region-Convolutional Neural Network,区域卷积神经网络,或Fast R-CNN、Faster R-CNN等改进版本)等神经网络进行人脸区域的检测。当检测到人脸区域时,可以将人脸区域用矩形框进行标注,并提取出来作为人脸图像。为了便于后续处理,可以根据预设尺寸(或分辨率)提取人脸图像或将人脸图像进行采样,使每帧人脸图像的尺寸(或分辨率)相同。Among them, the face area can be recognized by color and shape detection, such as pre-setting the color range and shape range of the human face, and detecting whether there is a local area that satisfies both the color range and the shape range in the original image. The local area is the person. Face area. Deep learning techniques can also be used, such as YOLO (You Look Only Once, an algorithm framework for real-time target detection, including v1, v2, v3, etc.), and this disclosure can use any one of them), SSD (Single Shot) Multibox Detector, single-step multi-frame target detection), R-CNN (Region-Convolutional Neural Network, regional convolutional neural network, or improved versions such as Fast R-CNN, Faster R-CNN) and other neural networks for face region detection . When the face area is detected, the face area can be marked with a rectangular frame and extracted as a face image. In order to facilitate subsequent processing, the face image can be extracted or sampled according to a preset size (or resolution), so that the size (or resolution) of each frame of the face image is the same.
在一种可选的实施方式中,可以在终端设备上设置硬件人脸检测模块(Hardware Face Detection,HWFD),将采集的多帧原始图像输入HWFD后,输出人脸区域坐标,将坐标映射到原始图像中,可以提取出人脸图像。In an optional implementation manner, a hardware face detection module (HWFD) can be set on the terminal device, and after the collected multiple frames of original images are input to the HWFD, the coordinates of the face area are output, and the coordinates are mapped to From the original image, a face image can be extracted.
在一种可选的实施方式中,在步骤S110之后,可以将所采集的多帧原始图像的分辨率调整为预设分辨率,在步骤S120中,可以对调整分辨率后的原始图像执行人脸图像的提取。预设分辨率可以根据步骤S120所采用的算法标准而定。例如:采用YOLO进行人脸检测,YOLO的输入层设置为640*480,则预设分辨率可以采用640*480;如果终端的摄像头为1600万像素,则其采集的原始图像分辨率为4608*3456,系统可以将原始图像进行下采样处理,得到640*480的图像,以输入YOLO进行处理。通常预设分辨率低于原始图像本身的分辨率,这样相当于对原始图像进行了压缩,降低了原始图像的数据量,有利于提高处理效率。In an optional implementation manner, after step S110, the resolution of the collected multiple frames of original images can be adjusted to a preset resolution, and in step S120, the original image after the adjusted resolution can be executed. Face image extraction. The preset resolution may be determined according to the algorithm standard adopted in step S120. For example: YOLO is used for face detection, and the input layer of YOLO is set to 640*480, then the preset resolution can be 640*480; if the terminal's camera is 16 million pixels, the original image resolution collected is 4608* 3456, the system can down-sample the original image to get a 640*480 image, which can be input to YOLO for processing. Generally, the preset resolution is lower than the resolution of the original image itself, which is equivalent to compressing the original image, reducing the data volume of the original image, and improving processing efficiency.
步骤S130,检测每帧人脸图像中的手部关键点,并根据手部关键点在多帧人脸图像中的位置变化,生成手部轨迹。Step S130: Detect the key points of the hand in each frame of the face image, and generate a hand trajectory according to the position changes of the key points of the hand in the multiple frames of the face image.
其中,手部关键点的选取可以根据场景需求以及图像质量情况而定,例如可以选取21个骨骼点为手部关键点,包括每个手指4个关节特征点与手心特征点,或者也可以根据需要仅选取一部分骨骼点,例如在进行食指手势识别时,可以仅将食指的关节特征点或指尖点作为手部关键点。Among them, the key points of the hand can be selected according to the needs of the scene and the image quality. For example, 21 bone points can be selected as the key points of the hand, including the 4 joint feature points of each finger and the palm feature points, or it can be based on It is necessary to select only a part of the bone points. For example, when performing index finger gesture recognition, only the joint feature points or fingertip points of the index finger can be used as the key points of the hand.
在一种可选的实施方式中,手部关键点的检测可以通过形状检测而实现。例如:对人脸图像进行指尖形状检测,检测人脸图像中具有弧形的区域,并将这些区域的弧形与预设的标准指尖弧形进行匹配,匹配度较高的区域的弧形顶部即为指尖点(即手部关键点)。或者对人脸图像进行手指形状检测,将与标准手指形状较为相似的区域确定为手指区域,可以指定手指区域的圆形边界点为手部关键点。或者对人脸图像中的图形进行椭圆拟合,并将所拟合的椭圆的长轴端点作为手部关键点。In an alternative embodiment, the detection of the key points of the hand can be achieved through shape detection. For example: Perform fingertip shape detection on face images, detect areas with arcs in the face image, and match the arcs of these areas with the preset standard fingertip arcs, and the arcs of the areas with higher matching degrees The top of the shape is the fingertip point (that is, the key point of the hand). Or perform finger shape detection on the face image, and determine the area that is more similar to the standard finger shape as the finger area, and the circular boundary points of the finger area can be designated as the key points of the hand. Or perform ellipse fitting on the figure in the face image, and use the long axis end point of the fitted ellipse as the key point of the hand.
在一种可选的实施方式中,参考图2所示,可以通过以下步骤S210和S220具体实现手部关键点的检测:In an optional implementation manner, referring to FIG. 2, the detection of key points of the hand may be specifically implemented through the following steps S210 and S220:
步骤S210,对每帧人脸图像进行区域特征检测,以从每帧人脸图像中提取出手部候选区域;Step S210: Perform region feature detection on each frame of face image, so as to extract hand candidate regions from each frame of face image;
步骤S220,在手部候选区域中检测手部关键点。Step S220: Detect key points of the hand in the candidate hand area.
其中,区域特征检测是指从人脸图像中分割出很多局部区域,对每个局部区域的特征进行提取和识别,当检测到包含手部特征的局部区域时,将该局部区域作为手部候选区域。然后在手部候选区域中进一步检测手部关键点,可以提高手部关键点的检测精度。Among them, regional feature detection refers to segmenting a lot of local areas from the face image, extracting and identifying the features of each local area, and when detecting a local area containing hand features, the local area is used as a hand candidate area. Then, further detecting the key points of the hand in the candidate hand area can improve the detection accuracy of the key points of the hand.
进一步的,步骤S210可以具体通过以下步骤实现:Further, step S210 may be specifically implemented through the following steps:
通过卷积层从人脸图像中提取特征;Extract features from face images through convolutional layers;
通过RPN(Region Proposal Network,区域生成网络)对所提取的特征进行处理,得到候选框;Process the extracted features through RPN (Region Proposal Network) to obtain candidate frames;
通过分类层对候选框进行分类,得到手部候选区域;Classify the candidate frame through the classification layer to obtain the hand candidate area;
通过回归层优化手部候选区域的位置和尺寸。Optimize the position and size of the hand candidate area through the regression layer.
上述过程可以参考图3所示,整体上可以采用R-CNN(或Fast R-CNN、Faster R-CNN)。将人脸图像输入后,首先经过卷积层进行卷积处理(通常还包括池化层的池化处理),提取出图像特征。特征进入RPN,RPN可以进行候选框的提取,一般提取候选框的数量较多,在此过程中还可以利用NMS(Non-Maximum Suppression,非极大值抑制)算法进行候选框的优化,以得到更准确的候选框。此时提取的候选框包括各个类别,例如不仅有手部候选框,可能还有鼻子、嘴巴、眼镜等部位的候选框,将这些候选框输入分类层,可以对各个候选框进行分类,由此得到手部候选框(即手部候选区域)。分类层可以采用Softmax(归一化指数)函数等,针对于人脸图像中可能存在的目标类别,分别输出概率值,概率值最高的类别为候选框的类别。可以将非手部类别的候选框删除,仅保留手部候选框。最后将手部候选区域输入回归层,回归层可以对手部候选区域的位置和尺寸进行精细调整,得到手部候选区域的坐标数组(x,y,w,h),其中x和y表示手部候选区域的位置坐标(通常是左上角点的坐标),w和h表示手部候选区域的宽和高。The foregoing process can be referred to as shown in FIG. 3, and R-CNN (or Fast R-CNN, Faster R-CNN) can be adopted as a whole. After the face image is input, it first passes through the convolution layer for convolution processing (usually including the pooling processing of the pooling layer) to extract the image features. The feature enters the RPN, and the RPN can extract candidate frames. Generally, there are a large number of candidate frames. In this process, you can also use the NMS (Non-Maximum Suppression) algorithm to optimize the candidate frames to obtain More accurate candidate frame. The candidate frames extracted at this time include various categories. For example, there are not only candidate frames for hands, but also candidate frames for nose, mouth, glasses and other parts. Input these candidate frames into the classification layer to classify each candidate frame. Obtain the hand candidate frame (ie, the hand candidate area). The classification layer can use a Softmax (normalized index) function, etc., for the target categories that may exist in the face image, respectively output probability values, and the category with the highest probability value is the category of the candidate frame. Can delete the candidate frame of the non-hand category, and keep only the hand candidate frame. Finally, the hand candidate area is input into the regression layer. The regression layer can fine-tune the position and size of the hand candidate area to obtain the coordinate array (x, y, w, h) of the hand candidate area, where x and y represent the hand The position coordinates of the candidate area (usually the coordinates of the upper left corner point), w and h represent the width and height of the hand candidate area.
上述R-CNN可以通过大量的人脸图像样本训练得到。将R-CNN设置为图3所示的结构,包括基础网络、卷积层(和池化层)、RPN、分类层、回归层,通过人为在图像中标注手部候选区域得到标签,以图像样本和标签进行训练,更新网络参数,得到可用的R-CNN。The above-mentioned R-CNN can be obtained by training a large number of face image samples. Set the R-CNN to the structure shown in Figure 3, including the basic network, convolutional layer (and pooling layer), RPN, classification layer, and regression layer. The hand candidate area is artificially labeled in the image to obtain the label. Samples and labels are trained, network parameters are updated, and a usable R-CNN is obtained.
需要说明的是,图2的方法可以对每帧人脸图像使用,在每一帧都检测手部关键点。但是考虑到部分帧可能不存在手部,或者图像质量较差,导致无法检测到手部,在一种可选的实施方式中,如果从当前帧人脸图像中提取的手部候选区域为空值,则以上一帧检测的手部关键点作为当前帧的手部关键点。其中,手部候选区域为空值即检测不到手部的情况,此时可以直接将上一帧的手部关键点复制到当前帧。这样可以提高算法的鲁棒性。It should be noted that the method in FIG. 2 can be used for each frame of the face image, and the key points of the hand are detected in each frame. However, considering that there may not be hands in some frames, or the image quality is poor, the hands cannot be detected. In an optional implementation, if the hand candidate area extracted from the current frame of the face image is null , The hand key points detected in the previous frame are regarded as the hand key points of the current frame. Among them, the hand candidate area is null, that is, the hand cannot be detected. In this case, the hand key points of the previous frame can be directly copied to the current frame. This can improve the robustness of the algorithm.
需要补充的是,如果手部候选区域为空值的帧数达到预设阈值,说明检测不到手部的帧数较多,则可以清空之前检测到的数据,重新检测,或者输出手势识别不成功的结果,在用户界面中显示相应信息,如“手势识别失败,请重新做出手势”。What needs to be added is that if the number of frames in which the hand candidate area is empty reaches the preset threshold, indicating that the number of frames where the hand cannot be detected is too large, the previously detected data can be cleared, re-detected, or the output gesture recognition is unsuccessful As a result, the corresponding information is displayed in the user interface, such as "gesture recognition failed, please make the gesture again."
在手部候选区域中检测手部关键点也可以通过R-CNN等模型实现,将手部关键点作为待检测的目标,通过图像特征的提取与处理,可以输出目标所在的区域,从而标记出手部关键点。The detection of hand key points in the hand candidate area can also be achieved by models such as R-CNN. The hand key points are used as the target to be detected. Through the extraction and processing of image features, the area where the target is located can be output to mark the hand. Department of key points.
通过在每帧人脸图像中确定手部关键点的位置,将该位置在不同帧之间的变化形成手部轨迹,手部轨迹可以是数组、向量或图片等形式,本公开对此不做限定。By determining the position of the key point of the hand in each frame of the face image, the change of the position between different frames forms a hand trajectory. The hand trajectory can be in the form of an array, a vector, or a picture. This disclosure does not do this limited.
步骤S140,对手部轨迹进行识别,得到手势识别结果。Step S140: Recognize the trajectory of the hand to obtain a gesture recognition result.
手部轨迹反映了用户的手势操作动作,因而对其进行识别,可以识别出用户所做的手势,得到手势识别结果。The hand trajectory reflects the user's gesture operation action, so to recognize it, the gesture made by the user can be recognized, and the gesture recognition result can be obtained.
在一种可选的实施方式中,可以将步骤S130中生成的手部轨迹与预设的标准轨迹进行匹配,标准轨迹可以包括左右摇手、左右摇手指、上下滑动手指、张开手部等。如果存在某一标准轨迹和手部轨迹的匹配率达到一定的阈值,则判断手部轨迹为该标准轨迹,输出该标准轨迹所代表的手势作为手部轨迹的手势识别结果。In an optional implementation manner, the hand trajectory generated in step S130 may be matched with a preset standard trajectory. The standard trajectory may include shaking the hand left and right, shaking the finger left and right, sliding the finger up and down, opening the hand, and the like. If there is a standard trajectory and the matching rate of the hand trajectory reaches a certain threshold, the hand trajectory is judged to be the standard trajectory, and the gesture represented by the standard trajectory is output as the gesture recognition result of the hand trajectory.
在一种可选的实施方式中,步骤S140可以具体通过以下步骤实现:In an optional implementation manner, step S140 may be specifically implemented through the following steps:
将手部轨迹映射到位图中,得到手部轨迹位图;Map the hand trajectory to the bitmap to get the hand trajectory bitmap;
通过贝叶斯分类器对手部轨迹位图进行处理,得到手势识别结果。The hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
其中,位图的尺寸可以预先设定,也可以和人脸图像或手部候选区域的尺寸相同。手部轨迹为手部关键点的位置变化,将每帧的位置映射到位图中,并按顺序连接,相当于在位图中表示手部轨迹,将该位图称为手部轨迹位图。Among them, the size of the bitmap can be preset, or it can be the same as the size of the face image or the candidate area of the hand. The hand trajectory is the position change of the key points of the hand. The position of each frame is mapped to a bitmap and connected in sequence, which is equivalent to representing the hand trajectory in the bitmap, and this bitmap is called the hand trajectory bitmap.
贝叶斯分类器是基于已知的概率和误判损失来选择最优类别,使分类的风险最小化。参考以下公式:Bayesian classifier is based on the known probability and misjudgment loss to select the optimal category to minimize the risk of classification. Refer to the following formula:
Figure PCTCN2020133410-appb-000001
Figure PCTCN2020133410-appb-000001
h表示贝叶斯分类器,x是样本,λ ij是指将c j误分为c i时产生的损失,p(c j|x)是误分类时产生的期望损失,N是样本的个数。将手部轨迹位图输入贝叶斯分类器,可以输出手势识别结果。 h represents a Bayesian classifier, x is the sample, λ ij is the loss generated when the error c j into c i, p (c j | x) is the expected loss generated when misclassification, N is samples number. The hand trajectory bitmap is input into the Bayesian classifier, and the gesture recognition result can be output.
图4示出了手势识别方法的一种示意性流程。如图所示,摄像头采集原始图像后,可以按照预设分辨率进行分辨率调整,以缩小图像;然后通过HWFD从调整分辨率后的原始图像中提取人脸图像,使得后续处理集中在原始图像的局部区域中;再从人脸图像中检测并提取手部候选区域,以进一步缩小图像范围;从手部候选区域检测手部关键点,并根据不同帧之间手部关键点的位置变化确定手部轨迹,以映射为手部轨迹位图;将手部轨迹位图输入贝叶斯分类器,通过贝叶斯分类器的处理,输出手势识别结果。Fig. 4 shows a schematic flow of a gesture recognition method. As shown in the figure, after the camera collects the original image, the resolution can be adjusted according to the preset resolution to reduce the image; then the face image is extracted from the original image with the adjusted resolution through HWFD, so that the subsequent processing is concentrated on the original image Detect and extract hand candidate areas from the face image to further narrow the image range; detect hand key points from the hand candidate area, and determine according to the position changes of the hand key points between different frames The hand trajectory is mapped as a hand trajectory bitmap; the hand trajectory bitmap is input to the Bayesian classifier, and the Bayesian classifier is processed to output the gesture recognition result.
在一种可选的实施方式中,上述终端设备可以包括多个摄像头。在得到手势识别结果后,可以根据手势识别结果在上述多个摄像头之间进行切换。例如手势识别结果为左右摇动手指时,触发终端设备切换到主摄像头,手势识别结果为上下滑动手指时, 触发终端设备切换到长焦摄像头等等。这样用户在相距终端设备一定距离的情况下,可以面对摄像头通过手势进行操作,较为方便。In an optional implementation manner, the foregoing terminal device may include multiple cameras. After the gesture recognition result is obtained, the above-mentioned multiple cameras can be switched according to the gesture recognition result. For example, when the gesture recognition result is that the finger is shaken left and right, the terminal device is triggered to switch to the main camera; when the gesture recognition result is that the finger is swiped up and down, the terminal device is triggered to switch to the telephoto camera, and so on. In this way, when the user is at a certain distance from the terminal device, he can face the camera to perform operations through gestures, which is more convenient.
在本示例性实施方式的手势识别方法中,由摄像头采集多帧原始图像,分别提取人脸图像,并从每帧人脸图像中检测手部关键点,再根据手部关键点的位置变化生成手部轨迹,最后对手部轨迹进行识别,得到手势识别结果。由于用户在进行手势操作时,手部一般位于脸部的前方或附近,从原始图像中提取人脸图像以检测手部关键点,相当于对原始图像进行了裁剪,裁减掉了和手势识别无关的区域,从而降低了图像处理的数据量,使系统仅需在人脸图像中进行手势识别,减小了过程耗时,提高了手势识别的实时性,且对硬件的处理性能要求不高,有利于部署在移动终端等轻量化场景中。In the gesture recognition method of this exemplary embodiment, multiple frames of original images are collected by the camera, face images are extracted respectively, and key points of the hands are detected from each frame of face images, and then generated according to the position changes of the key points of the hands. The hand trajectory, and finally the hand trajectory is recognized, and the gesture recognition result is obtained. Since the user's hand is generally located in front of or near the face when performing gesture operations, extracting the face image from the original image to detect the key points of the hand is equivalent to cropping the original image, and the cropping has nothing to do with gesture recognition. This reduces the amount of image processing data, so that the system only needs to perform gesture recognition in the face image, which reduces the time-consuming process, improves the real-time performance of gesture recognition, and does not require high hardware processing performance. It is conducive to deployment in lightweight scenarios such as mobile terminals.
本公开的示例性实施方式还提供一种手势控制方法,可以应用于具备摄像头的终端设备。该手势控制方法可以包括:Exemplary embodiments of the present disclosure also provide a gesture control method, which can be applied to a terminal device equipped with a camera. The gesture control method may include:
当开启手势控制功能时,根据本示例性实施方式中的手势识别方法得到手势识别结果;When the gesture control function is turned on, the gesture recognition result is obtained according to the gesture recognition method in this exemplary embodiment;
执行手势识别结果对应的控制指令。Execute the control instruction corresponding to the gesture recognition result.
其中,开启手势控制功能包括但不限于:启动具有手势控制功能的游戏程序时,终端自动开启手势控制功能;在拍照或浏览网页等界面中,用户选择开启手势控制功能。程序中可以预先设置手势和控制指令的对应关系,例如挥动手掌对应于截屏指令,向下滑动手指对应于翻页指令等,则在识别出用户的手势时,可以根据手势识别结果快速找到并执行对应的控制指令。特别的,在拍照界面中,可以允许用户通过特定手势控制进行拍照,例如用户做出竖大拇指手势时,触发终端设备自动按下拍照快门键;或者当终端设备配置多个摄像头时,允许用户通过特定手势控制摄像头切换,例如用户摇动手指时,触发终端设备在主摄像头、长焦摄像头、广角摄像头之间切换,从而为用户拍照操作提供便利。Wherein, enabling the gesture control function includes but is not limited to: when a game program with gesture control function is started, the terminal automatically starts the gesture control function; in interfaces such as taking pictures or browsing the web, the user chooses to enable the gesture control function. The corresponding relationship between gestures and control commands can be preset in the program. For example, waving the palm corresponds to the screenshot command, sliding the finger downward corresponds to the page turning command, etc., when the user's gesture is recognized, it can be quickly found and executed according to the gesture recognition result Corresponding control instructions. In particular, in the camera interface, the user can be allowed to take pictures through specific gesture control, for example, when the user makes a thumbs-up gesture, the terminal device is triggered to automatically press the shutter button for taking pictures; or when the terminal device is equipped with multiple cameras, the user is allowed The camera switching is controlled through specific gestures. For example, when the user shakes a finger, the terminal device is triggered to switch between the main camera, the telephoto camera, and the wide-angle camera, thereby facilitating the user's photographing operation.
图5示出了手势控制方法的一种流程,可以包括以下步骤S510至S550:FIG. 5 shows a flow of a gesture control method, which may include the following steps S510 to S550:
步骤S510,当开启手势控制功能时,获取由摄像头采集的多帧原始图像;Step S510, when the gesture control function is turned on, acquire multiple frames of original images collected by the camera;
步骤S520,分别从上述多帧原始图像中提取人脸图像,得到多帧人脸图像;Step S520, extracting face images from the foregoing multiple frames of original images respectively to obtain multiple frames of face images;
步骤S530,检测每帧人脸图像中的手部关键点,并根据手部关键点在多帧人脸图像中的位置变化,生成手部轨迹;Step S530, detecting the hand key points in each frame of the face image, and generating a hand trajectory according to the position changes of the hand key points in the multi-frame face image;
步骤S540,对手部轨迹进行识别,得到手势识别结果;Step S540: Recognizing the trajectory of the hand to obtain a gesture recognition result;
步骤S550,执行手势识别结果对应的控制指令。Step S550, execute the control instruction corresponding to the gesture recognition result.
在本示例性实施方式的手势控制方法中,基于实时性较强的手势识别,当用户做出手势操作后,可以立即执行手势识别结果对应的控制指令,从而实现快速的交互响应,改善交互延迟问题,提高用户体验,对于体感游戏等具有较高的实用性。In the gesture control method of this exemplary embodiment, based on real-time gesture recognition, when the user makes a gesture operation, the control instruction corresponding to the gesture recognition result can be executed immediately, so as to achieve a fast interactive response and improve the interaction delay Problems, improve user experience, and have high practicability for somatosensory games.
本公开的示例性实施方式还提供一种手势识别装置,可以配置于具备摄像头的终端设备。如图6所示,该手势识别装置600可以包括处理器610和存储器620;其中存 储器620存储有以下程序模块:Exemplary embodiments of the present disclosure also provide a gesture recognition device, which can be configured in a terminal device equipped with a camera. As shown in Fig. 6, the gesture recognition device 600 may include a processor 610 and a memory 620; the memory 620 stores the following program modules:
原始图像获取模块621,用于获取由摄像头采集的多帧原始图像;The original image acquisition module 621 is used to acquire multiple frames of original images collected by the camera;
人脸图像提取模块622,用于分别从上述多帧原始图像中提取人脸图像,得到多帧人脸图像;The face image extraction module 622 is configured to extract face images from the above-mentioned multiple frames of original images to obtain multiple frames of face images;
手部轨迹生成模块623,用于检测每帧人脸图像中的手部关键点,并根据手部关键点在多帧人脸图像中的位置变化,生成手部轨迹;The hand trajectory generation module 623 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
手部轨迹识别模块624,用于对手部轨迹进行识别,得到手势识别结果;The hand trajectory recognition module 624 is used for recognizing the hand trajectory to obtain a gesture recognition result;
处理器610用于执行上述程序模块。The processor 610 is configured to execute the foregoing program modules.
在一种可选的实施方式中,原始图像获取模块621,被配置为:In an optional implementation manner, the original image acquisition module 621 is configured to:
在获取由摄像头采集的多帧原始图像后,将多帧原始图像的分辨率调整为预设分辨率。After acquiring multiple frames of original images collected by the camera, the resolution of the multiple frames of original images is adjusted to a preset resolution.
在一种可选的实施方式中,手部轨迹生成模块623,被配置为:In an optional implementation manner, the hand trajectory generating module 623 is configured to:
对每帧人脸图像进行区域特征检测,以从每帧人脸图像中提取出手部候选区域;Perform regional feature detection on each frame of face image to extract hand candidate regions from each frame of face image;
在手部候选区域中检测手部关键点。Detect key points of the hand in the hand candidate area.
在一种可选的实施方式中,手部轨迹生成模块623,被配置为:In an optional implementation manner, the hand trajectory generating module 623 is configured to:
如果从当前帧人脸图像中提取的手部候选区域为空值,则以上一帧检测的手部关键点作为当前帧的手部关键点。If the hand candidate region extracted from the face image of the current frame is a null value, the hand key points detected in the previous frame are used as the hand key points of the current frame.
在一种可选的实施方式中,手部轨迹生成模块623,被配置为:In an optional implementation manner, the hand trajectory generating module 623 is configured to:
通过卷积层从人脸图像中提取特征;Extract features from face images through convolutional layers;
通过区域生成网络对所提取的特征进行处理,得到候选框;Process the extracted features through the area generation network to obtain candidate frames;
通过分类层对候选框进行分类,得到手部候选区域;Classify the candidate frame through the classification layer to obtain the hand candidate area;
通过回归层优化手部候选区域的位置和尺寸。Optimize the position and size of the hand candidate area through the regression layer.
在一种可选的实施方式中,手部轨迹识别模块624,被配置为:In an optional implementation manner, the hand trajectory recognition module 624 is configured to:
将手部轨迹映射到位图中,得到手部轨迹位图;Map the hand trajectory to the bitmap to get the hand trajectory bitmap;
通过贝叶斯分类器对手部轨迹位图进行处理,得到手势识别结果。The hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
在一种可选的实施方式中,上述终端设备包括多个摄像头;手部轨迹识别模块624,被配置为:In an optional implementation manner, the foregoing terminal device includes multiple cameras; the hand track recognition module 624 is configured to:
在得到手势识别结果后,根据手势识别结果在上述多个摄像头之间进行切换。After obtaining the gesture recognition result, switch between the above-mentioned multiple cameras according to the gesture recognition result.
本公开的示例性实施方式还提供另一种手势识别装置,可以配置于具备摄像头的终端设备。如图7所示,该手势识别装置700可以包括:Exemplary embodiments of the present disclosure also provide another gesture recognition device, which can be configured in a terminal device equipped with a camera. As shown in FIG. 7, the gesture recognition apparatus 700 may include:
原始图像获取模块710,用于获取由摄像头采集的多帧原始图像;The original image acquisition module 710 is configured to acquire multiple frames of original images collected by the camera;
人脸图像提取模块720,用于分别从上述多帧原始图像中提取人脸图像,得到多帧人脸图像;The face image extraction module 720 is configured to extract face images from the foregoing multiple frames of original images to obtain multiple frames of face images;
手部轨迹生成模块730,用于检测每帧人脸图像中的手部关键点,并根据手部关键点在多帧人脸图像中的位置变化,生成手部轨迹;The hand trajectory generating module 730 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
手部轨迹识别模块740,用于对手部轨迹进行识别,得到手势识别结果。The hand trajectory recognition module 740 is used for recognizing the hand trajectory to obtain a gesture recognition result.
在一种可选的实施方式中,原始图像获取模块710,被配置为:In an optional implementation manner, the original image acquisition module 710 is configured to:
在获取由摄像头采集的多帧原始图像后,将多帧原始图像的分辨率调整为预设分辨率。After acquiring multiple frames of original images collected by the camera, the resolution of the multiple frames of original images is adjusted to a preset resolution.
在一种可选的实施方式中,手部轨迹生成模块730,被配置为:In an optional implementation manner, the hand trajectory generating module 730 is configured to:
对每帧人脸图像进行区域特征检测,以从每帧人脸图像中提取出手部候选区域;Perform regional feature detection on each frame of face image to extract hand candidate regions from each frame of face image;
在手部候选区域中检测手部关键点。Detect key points of the hand in the hand candidate area.
在一种可选的实施方式中,手部轨迹生成模块730,被配置为:In an optional implementation manner, the hand trajectory generating module 730 is configured to:
如果从当前帧人脸图像中提取的手部候选区域为空值,则以上一帧检测的手部关键点作为当前帧的手部关键点。If the hand candidate region extracted from the face image of the current frame is a null value, the hand key points detected in the previous frame are used as the hand key points of the current frame.
在一种可选的实施方式中,手部轨迹生成模块730,被配置为:In an optional implementation manner, the hand trajectory generating module 730 is configured to:
通过卷积层从人脸图像中提取特征;Extract features from face images through convolutional layers;
通过区域生成网络对所提取的特征进行处理,得到候选框;Process the extracted features through the area generation network to obtain candidate frames;
通过分类层对候选框进行分类,得到手部候选区域;Classify the candidate frame through the classification layer to obtain the hand candidate area;
通过回归层优化手部候选区域的位置和尺寸。Optimize the position and size of the hand candidate area through the regression layer.
在一种可选的实施方式中,手部轨迹识别模块740,被配置为:In an optional implementation manner, the hand trajectory recognition module 740 is configured to:
将手部轨迹映射到位图中,得到手部轨迹位图;Map the hand trajectory to the bitmap to get the hand trajectory bitmap;
通过贝叶斯分类器对手部轨迹位图进行处理,得到手势识别结果。The hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
在一种可选的实施方式中,上述终端设备包括多个摄像头;手部轨迹识别模块740,被配置为:In an optional implementation manner, the foregoing terminal device includes multiple cameras; the hand track recognition module 740 is configured to:
在得到手势识别结果后,根据手势识别结果在上述多个摄像头之间进行切换。After obtaining the gesture recognition result, switch between the above-mentioned multiple cameras according to the gesture recognition result.
本公开的示例性实施方式还提供一种手势控制方法,可以配置于具备摄像头的终端设备。如图8所示,该手势控制装置800可以包括处理器810和存储器820;其中存储器820存储有以下程序模块:Exemplary embodiments of the present disclosure also provide a gesture control method, which can be configured in a terminal device equipped with a camera. As shown in FIG. 8, the gesture control device 800 may include a processor 810 and a memory 820; wherein the memory 820 stores the following program modules:
原始图像获取模块821,用于当开启手势控制功能时,获取由摄像头采集的多帧原始图像;The original image acquisition module 821 is configured to acquire multiple frames of original images collected by the camera when the gesture control function is turned on;
人脸图像提取模块822,用于分别从上述多帧原始图像中提取人脸图像,得到多帧人脸图像;The face image extraction module 822 is configured to extract face images from the above-mentioned multiple frames of original images to obtain multiple frames of face images;
手部轨迹生成模块823,用于检测每帧人脸图像中的手部关键点,并根据手部关键点在多帧人脸图像中的位置变化,生成手部轨迹;The hand trajectory generation module 823 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
手部轨迹识别模块824,用于对手部轨迹进行识别,得到手势识别结果;The hand trajectory recognition module 824 is used for recognizing the hand trajectory to obtain a gesture recognition result;
控制指令执行模块825,用于执行手势识别结果对应的控制指令;The control instruction execution module 825 is configured to execute the control instruction corresponding to the gesture recognition result;
处理器810用于执行上述程序模块。The processor 810 is used to execute the above-mentioned program modules.
在一种可选的实施方式中,上述控制指令包括摄像头切换指令。In an optional implementation manner, the aforementioned control instruction includes a camera switching instruction.
在一种可选的实施方式中,原始图像获取模块821,被配置为:In an optional implementation manner, the original image acquisition module 821 is configured to:
在获取由摄像头采集的多帧原始图像后,将多帧原始图像的分辨率调整为预设分辨率。After acquiring multiple frames of original images collected by the camera, the resolution of the multiple frames of original images is adjusted to a preset resolution.
在一种可选的实施方式中,手部轨迹生成模块823,被配置为:In an optional implementation manner, the hand trajectory generating module 823 is configured to:
对每帧人脸图像进行区域特征检测,以从每帧人脸图像中提取出手部候选区域;Perform regional feature detection on each frame of face image to extract hand candidate regions from each frame of face image;
在手部候选区域中检测手部关键点。Detect key points of the hand in the hand candidate area.
在一种可选的实施方式中,手部轨迹生成模块823,被配置为:In an optional implementation manner, the hand trajectory generating module 823 is configured to:
如果从当前帧人脸图像中提取的手部候选区域为空值,则以上一帧检测的手部关键点作为当前帧的手部关键点。If the hand candidate region extracted from the face image of the current frame is a null value, the hand key points detected in the previous frame are used as the hand key points of the current frame.
在一种可选的实施方式中,手部轨迹生成模块823,被配置为:In an optional implementation manner, the hand trajectory generating module 823 is configured to:
通过卷积层从人脸图像中提取特征;Extract features from face images through convolutional layers;
通过区域生成网络对所提取的特征进行处理,得到候选框;Process the extracted features through the area generation network to obtain candidate frames;
通过分类层对候选框进行分类,得到手部候选区域;Classify the candidate frame through the classification layer to obtain the hand candidate area;
通过回归层优化手部候选区域的位置和尺寸。Optimize the position and size of the hand candidate area through the regression layer.
在一种可选的实施方式中,手部轨迹识别模块824,被配置为:In an optional implementation manner, the hand trajectory recognition module 824 is configured to:
将手部轨迹映射到位图中,得到手部轨迹位图;Map the hand trajectory to the bitmap to get the hand trajectory bitmap;
通过贝叶斯分类器对手部轨迹位图进行处理,得到手势识别结果。The hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
本公开的示例性实施方式还提供另一种手势控制装置,可以配置于具备摄像头的终端设备。如图9所示,该手势控制装置900可以包括:Exemplary embodiments of the present disclosure also provide another gesture control device, which can be configured in a terminal device equipped with a camera. As shown in FIG. 9, the gesture control device 900 may include:
原始图像获取模块910,用于当开启手势控制功能时,获取由摄像头采集的多帧原始图像;The original image acquisition module 910 is used to acquire multiple frames of original images collected by the camera when the gesture control function is turned on;
人脸图像提取模块920,用于分别从上述多帧原始图像中提取人脸图像,得到多帧人脸图像;The face image extraction module 920 is configured to extract face images from the foregoing multiple frames of original images to obtain multiple frames of face images;
手部轨迹生成模块930,用于检测每帧人脸图像中的手部关键点,并根据手部关键点在多帧人脸图像中的位置变化,生成手部轨迹;The hand trajectory generating module 930 is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
手部轨迹识别模块940,用于对手部轨迹进行识别,得到手势识别结果;The hand trajectory recognition module 940 is used for recognizing the hand trajectory to obtain a gesture recognition result;
控制指令执行模块950,用于执行手势识别结果对应的控制指令。The control instruction execution module 950 is used to execute the control instruction corresponding to the gesture recognition result.
在一种可选的实施方式中,上述控制指令包括摄像头切换指令。In an optional implementation manner, the aforementioned control instruction includes a camera switching instruction.
在一种可选的实施方式中,原始图像获取模块910,被配置为:In an optional implementation manner, the original image acquisition module 910 is configured to:
在获取由摄像头采集的多帧原始图像后,将多帧原始图像的分辨率调整为预设分辨率。After acquiring multiple frames of original images collected by the camera, the resolution of the multiple frames of original images is adjusted to a preset resolution.
在一种可选的实施方式中,手部轨迹生成模块930,被配置为:In an optional implementation manner, the hand trajectory generating module 930 is configured to:
对每帧人脸图像进行区域特征检测,以从每帧人脸图像中提取出手部候选区域;Perform regional feature detection on each frame of face image to extract hand candidate regions from each frame of face image;
在手部候选区域中检测手部关键点。Detect key points of the hand in the hand candidate area.
在一种可选的实施方式中,手部轨迹生成模块930,被配置为:In an optional implementation manner, the hand trajectory generating module 930 is configured to:
如果从当前帧人脸图像中提取的手部候选区域为空值,则以上一帧检测的手部关键点作为当前帧的手部关键点。If the hand candidate region extracted from the face image of the current frame is a null value, the hand key points detected in the previous frame are used as the hand key points of the current frame.
在一种可选的实施方式中,手部轨迹生成模块930,被配置为:In an optional implementation manner, the hand trajectory generating module 930 is configured to:
通过卷积层从人脸图像中提取特征;Extract features from face images through convolutional layers;
通过区域生成网络对所提取的特征进行处理,得到候选框;Process the extracted features through the area generation network to obtain candidate frames;
通过分类层对候选框进行分类,得到手部候选区域;Classify the candidate frame through the classification layer to obtain the hand candidate area;
通过回归层优化手部候选区域的位置和尺寸。Optimize the position and size of the hand candidate area through the regression layer.
在一种可选的实施方式中,手部轨迹识别模块940,被配置为:In an optional implementation manner, the hand trajectory recognition module 940 is configured to:
将手部轨迹映射到位图中,得到手部轨迹位图;Map the hand trajectory to the bitmap to get the hand trajectory bitmap;
通过贝叶斯分类器对手部轨迹位图进行处理,得到手势识别结果。The hand trajectory bitmap is processed by Bayesian classifier, and the result of gesture recognition is obtained.
上述手势识别装置和手势控制装置中,各模块的具体细节已经分别在手势识别方法和手势控制方法部分的实施方式中详细说明,未披露的细节内容可以参见方法部分的实施方式相关内容,因而不再赘述。In the aforementioned gesture recognition device and gesture control device, the specific details of each module have been described in the implementation of the gesture recognition method and the gesture control method respectively. For undisclosed details, please refer to the related content of the implementation in the method section. Go into details again.
本公开的示例性实施方式还提供了一种计算机可读存储介质,可以实现为一种程序产品的形式,其包括程序代码,当程序产品在终端设备上运行时,程序代码用于使终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which can be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the terminal device Perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification.
参考图10所示,描述了根据本公开的示例性实施方式的用于实现上述方法的程序产品1000,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。As shown in FIG. 10, a program product 1000 for implementing the above-mentioned method according to an exemplary embodiment of the present disclosure is described. It may adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and may be installed in a terminal Running on equipment, such as a personal computer. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product can adopt any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在 用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。The program code for performing the operations of the present disclosure can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
本公开的示例性实施方式还提供了一种能够实现上述方法的终端设备,该终端设备可以是手机、平板电脑、数码相机等。下面参照图11来描述根据本公开的这种示例性实施方式的终端设备1100。图11显示的终端设备1100仅仅是一个示例,不应对本公开实施方式的功能和使用范围带来任何限制。Exemplary embodiments of the present disclosure also provide a terminal device capable of implementing the above method. The terminal device may be a mobile phone, a tablet computer, a digital camera, or the like. The terminal device 1100 according to this exemplary embodiment of the present disclosure will be described below with reference to FIG. 11. The terminal device 1100 shown in FIG. 11 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
如图11所示,终端设备1100可以以通用计算设备的形式表现。终端设备1100的组件可以包括但不限于:至少一个处理单元1110、至少一个存储单元1120、连接不同系统组件(包括存储单元1120和处理单元1110)的总线1130、显示单元1140和图像采集单元1170,图像采集单元1170包括至少一个摄像头。As shown in FIG. 11, the terminal device 1100 may be represented in the form of a general-purpose computing device. The components of the terminal device 1100 may include but are not limited to: at least one processing unit 1110, at least one storage unit 1120, a bus 1130 connecting different system components (including the storage unit 1120 and the processing unit 1110), a display unit 1140, and an image acquisition unit 1170, The image acquisition unit 1170 includes at least one camera.
存储单元1120存储有程序代码,程序代码可以被处理单元1110执行,使得处理单元1110执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。例如,处理单元1110可以执行图1、图2或图5所示的方法步骤。The storage unit 1120 stores program codes, and the program codes can be executed by the processing unit 1110, so that the processing unit 1110 executes the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification. For example, the processing unit 1110 may execute the method steps shown in FIG. 1, FIG. 2 or FIG. 5.
存储单元1120可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)1121和/或高速缓存存储单元1122,还可以进一步包括只读存储单元(ROM)1123。The storage unit 1120 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 1121 and/or a cache storage unit 1122, and may further include a read-only storage unit (ROM) 1123.
存储单元1120还可以包括具有一组(至少一个)程序模块1125的程序/实用工具1124,这样的程序模块1125包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 1120 may also include a program/utility tool 1124 having a set of (at least one) program modules 1125. Such program modules 1125 include but are not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
总线1130可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The bus 1130 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
终端设备1100也可以与一个或多个外部设备1200(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该终端设备1100交互的设备通信,和/或与使得该终端设备1100能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口1150进行。并且,终端设备1100还可以通过网络适配器1160与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器1160通过总线1130与终端设备1100的其它模块通信。应当明白,尽管图中未示出,可以结合终端设备1100使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The terminal device 1100 may also communicate with one or more external devices 1200 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable the user to interact with the terminal device 1100, and/or communicate with Any device (such as a router, modem, etc.) that enables the terminal device 1100 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 1150. In addition, the terminal device 1100 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 1160. As shown in the figure, the network adapter 1160 communicates with other modules of the terminal device 1100 through the bus 1130. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the terminal device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据 本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开示例性实施方式的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the exemplary embodiment of the present disclosure.
本技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, a method, or a program product. Therefore, various aspects of the present disclosure can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which may be collectively referred to herein as "Circuit", "Module" or "System".
此外,上述附图仅是根据本公开示例性实施方式的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiment of the present disclosure, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的示例性实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the exemplary embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施方式。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施方式仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the invention disclosed herein. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限定。It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is limited only by the appended claims.

Claims (20)

  1. 一种手势识别方法,应用于具备摄像头的终端设备,其特征在于,所述方法包括:A gesture recognition method applied to a terminal device equipped with a camera, characterized in that the method includes:
    获取由所述摄像头采集的多帧原始图像;Acquiring multiple frames of original images collected by the camera;
    分别从所述多帧原始图像中提取人脸图像,得到多帧人脸图像;Extracting face images from the multiple frames of original images to obtain multiple frames of face images;
    检测每帧人脸图像中的手部关键点,并根据所述手部关键点在所述多帧人脸图像中的位置变化,生成手部轨迹;Detecting hand key points in each frame of face image, and generating a hand trajectory according to position changes of the hand key points in the multi-frame face image;
    对所述手部轨迹进行识别,得到手势识别结果。The hand trajectory is recognized, and the gesture recognition result is obtained.
  2. 根据权利要求1所述的方法,其特征在于,在获取由所述摄像头采集的多帧原始图像后,所述方法还包括:The method according to claim 1, wherein after acquiring multiple frames of original images collected by the camera, the method further comprises:
    将所述多帧原始图像的分辨率调整为预设分辨率。The resolution of the multiple frames of original images is adjusted to a preset resolution.
  3. 根据权利要求1所述的方法,其特征在于,所述检测每帧人脸图像中的手部关键点,包括:The method according to claim 1, wherein the detecting the key points of the hands in each frame of the face image comprises:
    对所述每帧人脸图像进行区域特征检测,以从所述每帧人脸图像中提取出手部候选区域;Performing region feature detection on each frame of the face image, so as to extract a hand candidate region from each frame of the face image;
    在所述手部候选区域中检测手部关键点。The key points of the hand are detected in the hand candidate area.
  4. 根据权利要求3所述的方法,其特征在于,所述检测每帧人脸图像中的手部关键点,还包括:The method according to claim 3, wherein the detecting the key points of the hands in each frame of the face image further comprises:
    如果从当前帧人脸图像中提取的手部候选区域为空值,则以上一帧检测的手部关键点作为当前帧的手部关键点。If the hand candidate region extracted from the face image of the current frame is a null value, the hand key points detected in the previous frame are used as the hand key points of the current frame.
  5. 根据权利要求3所述的方法,其特征在于,所述对所述每帧人脸图像进行区域特征检测,以从所述每帧人脸图像中提取出手部候选区域,包括:The method according to claim 3, wherein said performing area feature detection on each frame of face image to extract a hand candidate area from said frame of face image comprises:
    通过卷积层从所述人脸图像中提取特征;Extracting features from the face image through a convolutional layer;
    通过区域生成网络对所提取的特征进行处理,得到候选框;Process the extracted features through the area generation network to obtain candidate frames;
    通过分类层对所述候选框进行分类,得到手部候选区域;Classify the candidate frame through a classification layer to obtain a hand candidate area;
    通过回归层优化所述手部候选区域的位置和尺寸。The position and size of the candidate hand region are optimized through the regression layer.
  6. 根据权利要求1所述的方法,其特征在于,对所述手部轨迹进行识别,得到手势识别结果,包括:The method according to claim 1, wherein the recognizing the hand trajectory to obtain a gesture recognition result comprises:
    将所述手部轨迹映射到位图中,得到手部轨迹位图;Mapping the hand trajectory to a bitmap to obtain a hand trajectory bitmap;
    通过贝叶斯分类器对所述手部轨迹位图进行处理,得到手势识别结果。The hand trajectory bitmap is processed by the Bayesian classifier to obtain the gesture recognition result.
  7. 根据权利要求1所述的方法,其特征在于,所述终端设备包括多个摄像头;在得到手势识别结果后,所述方法还包括:The method according to claim 1, wherein the terminal device comprises multiple cameras; after obtaining a gesture recognition result, the method further comprises:
    根据所述手势识别结果在所述多个摄像头之间进行切换。Switching between the multiple cameras according to the gesture recognition result.
  8. 一种手势控制方法,应用于具备摄像头的终端设备,其特征在于,所述方法包括:A gesture control method applied to a terminal device equipped with a camera, characterized in that the method includes:
    当开启手势控制功能时,根据权利要求1至7任一项所述的方法得到手势识别结 果;When the gesture control function is turned on, a gesture recognition result is obtained according to the method according to any one of claims 1 to 7;
    执行所述手势识别结果对应的控制指令。Execute the control instruction corresponding to the gesture recognition result.
  9. 根据权利要求8所述的方法,其特征在于,所述控制指令包括摄像头切换指令。The method according to claim 8, wherein the control instruction includes a camera switching instruction.
  10. 一种手势识别装置,配置于具备摄像头的终端设备,其特征在于,所述装置包括处理器;其中,所述处理器用于执行存储器中存储的以下程序模块:A gesture recognition device configured in a terminal device equipped with a camera, wherein the device includes a processor; wherein the processor is used to execute the following program modules stored in the memory:
    原始图像获取模块,用于获取由所述摄像头采集的多帧原始图像;An original image acquisition module for acquiring multiple frames of original images collected by the camera;
    人脸图像提取模块,用于分别从所述多帧原始图像中提取人脸图像,得到多帧人脸图像;The face image extraction module is configured to extract face images from the multiple frames of original images to obtain multiple frames of face images;
    手部轨迹生成模块,用于检测每帧人脸图像中的手部关键点,并根据所述手部关键点在所述多帧人脸图像中的位置变化,生成手部轨迹;The hand trajectory generating module is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
    手部轨迹识别模块,用于对所述手部轨迹进行识别,得到手势识别结果。The hand trajectory recognition module is used to recognize the hand trajectory and obtain the gesture recognition result.
  11. 根据权利要求10所述的装置,其特征在于,所述原始图像获取模块,被配置为:The device according to claim 10, wherein the original image acquisition module is configured to:
    在获取由所述摄像头采集的多帧原始图像后,将所述多帧原始图像的分辨率调整为预设分辨率。After acquiring the multiple frames of original images collected by the camera, the resolution of the multiple frames of original images is adjusted to a preset resolution.
  12. 根据权利要求10所述的装置,其特征在于,所述手部轨迹生成模块,被配置为:The device according to claim 10, wherein the hand trajectory generating module is configured to:
    对所述每帧人脸图像进行区域特征检测,以从所述每帧人脸图像中提取出手部候选区域;Performing region feature detection on each frame of the face image, so as to extract a hand candidate region from each frame of the face image;
    在所述手部候选区域中检测手部关键点。The key points of the hand are detected in the hand candidate area.
  13. 根据权利要求12所述的装置,其特征在于,所述手部轨迹生成模块,被配置为:The device according to claim 12, wherein the hand trajectory generating module is configured to:
    如果从当前帧人脸图像中提取的手部候选区域为空值,则以上一帧检测的手部关键点作为当前帧的手部关键点。If the hand candidate region extracted from the face image of the current frame is a null value, the hand key points detected in the previous frame are used as the hand key points of the current frame.
  14. 根据权利要求12所述的装置,其特征在于,所述手部轨迹生成模块,被配置为:The device according to claim 12, wherein the hand trajectory generating module is configured to:
    通过卷积层从所述人脸图像中提取特征;Extracting features from the face image through a convolutional layer;
    通过区域生成网络对所提取的特征进行处理,得到候选框;Process the extracted features through the area generation network to obtain candidate frames;
    通过分类层对所述候选框进行分类,得到手部候选区域;Classify the candidate frame through a classification layer to obtain a hand candidate area;
    通过回归层优化所述手部候选区域的位置和尺寸。The position and size of the candidate hand region are optimized through the regression layer.
  15. 根据权利要求10所述的装置,其特征在于,所述手部轨迹识别模块,被配置为:The device according to claim 10, wherein the hand trajectory recognition module is configured to:
    将所述手部轨迹映射到位图中,得到手部轨迹位图;Mapping the hand trajectory to a bitmap to obtain a hand trajectory bitmap;
    通过贝叶斯分类器对所述手部轨迹位图进行处理,得到手势识别结果。The hand trajectory bitmap is processed by the Bayesian classifier to obtain the gesture recognition result.
  16. 根据权利要求10所述的装置,其特征在于,所述终端设备包括多个摄像头;所述手部轨迹识别模块,被配置为:The apparatus according to claim 10, wherein the terminal device comprises a plurality of cameras; and the hand track recognition module is configured to:
    在得到所述手势识别结果后,根据所述手势识别结果在所述多个摄像头之间进行切换。After obtaining the gesture recognition result, switch between the multiple cameras according to the gesture recognition result.
  17. 一种手势控制装置,配置于具备摄像头的终端设备,其特征在于,所述装置包括处理器;其中,所述处理器用于执行存储器中存储的以下程序模块:A gesture control device configured in a terminal device equipped with a camera, wherein the device includes a processor; wherein the processor is used to execute the following program modules stored in the memory:
    原始图像获取模块,用于当开启手势控制功能时,获取由所述摄像头采集的多帧原始图像;An original image acquisition module for acquiring multiple frames of original images collected by the camera when the gesture control function is turned on;
    人脸图像提取模块,用于分别从所述多帧原始图像中提取人脸图像,得到多帧人脸图像;The face image extraction module is configured to extract face images from the multiple frames of original images to obtain multiple frames of face images;
    手部轨迹生成模块,用于检测每帧人脸图像中的手部关键点,并根据所述手部关键点在所述多帧人脸图像中的位置变化,生成手部轨迹;The hand trajectory generating module is used to detect the hand key points in each frame of face image, and generate the hand trajectory according to the position changes of the hand key points in the multi-frame face image;
    手部轨迹识别模块,用于对所述手部轨迹进行识别,得到手势识别结果;The hand trajectory recognition module is used to recognize the hand trajectory to obtain a gesture recognition result;
    控制指令执行模块,用于执行所述手势识别结果对应的控制指令。The control instruction execution module is used to execute the control instruction corresponding to the gesture recognition result.
  18. 根据权利要求17所述的装置,其特征在于,所述控制指令包括摄像头切换指令。The device according to claim 17, wherein the control instruction comprises a camera switching instruction.
  19. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the method according to any one of claims 1 to 9 when the computer program is executed by a processor.
  20. 一种终端设备,其特征在于,包括:A terminal device, characterized in that it comprises:
    处理器;processor;
    存储器,用于存储所述处理器的可执行指令;以及A memory for storing executable instructions of the processor; and
    摄像头;webcam;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至9任一项所述的方法。Wherein, the processor is configured to execute the method according to any one of claims 1 to 9 by executing the executable instructions.
PCT/CN2020/133410 2019-12-13 2020-12-02 Gesture recognition method, gesture control method, apparatuses, medium and terminal device WO2021115181A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911284143.9A CN111062312B (en) 2019-12-13 2019-12-13 Gesture recognition method, gesture control device, medium and terminal equipment
CN201911284143.9 2019-12-13

Publications (1)

Publication Number Publication Date
WO2021115181A1 true WO2021115181A1 (en) 2021-06-17

Family

ID=70301548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/133410 WO2021115181A1 (en) 2019-12-13 2020-12-02 Gesture recognition method, gesture control method, apparatuses, medium and terminal device

Country Status (2)

Country Link
CN (1) CN111062312B (en)
WO (1) WO2021115181A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469017A (en) * 2021-06-29 2021-10-01 北京市商汤科技开发有限公司 Image processing method and device and electronic equipment
CN113808007A (en) * 2021-09-16 2021-12-17 北京百度网讯科技有限公司 Method and device for adjusting virtual face model, electronic equipment and storage medium
CN115097936A (en) * 2022-06-16 2022-09-23 慧之安信息技术股份有限公司 Display screen control method based on gesture action deep learning
CN115565253A (en) * 2022-12-08 2023-01-03 季华实验室 Dynamic gesture real-time recognition method and device, electronic equipment and storage medium
CN115830642A (en) * 2023-02-13 2023-03-21 粤港澳大湾区数字经济研究院(福田) 2D whole body key point labeling method and 3D human body grid labeling method

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062312B (en) * 2019-12-13 2023-10-27 RealMe重庆移动通信有限公司 Gesture recognition method, gesture control device, medium and terminal equipment
CN111625102A (en) * 2020-06-03 2020-09-04 上海商汤智能科技有限公司 Building display method and device
CN111757065A (en) * 2020-07-02 2020-10-09 广州博冠智能科技有限公司 Method and device for automatically switching lens, storage medium and monitoring camera
CN114153308B (en) * 2020-09-08 2023-11-21 阿里巴巴集团控股有限公司 Gesture control method, gesture control device, electronic equipment and computer readable medium
CN112100075B (en) * 2020-09-24 2024-03-15 腾讯科技(深圳)有限公司 User interface playback method, device, equipment and storage medium
CN112203015B (en) * 2020-09-28 2022-03-25 北京小米松果电子有限公司 Camera control method, device and medium system
CN112328090B (en) * 2020-11-27 2023-01-31 北京市商汤科技开发有限公司 Gesture recognition method and device, electronic equipment and storage medium
CN112527113A (en) * 2020-12-09 2021-03-19 北京地平线信息技术有限公司 Method and apparatus for training gesture recognition and gesture recognition network, medium, and device
CN112488059B (en) * 2020-12-18 2022-10-04 哈尔滨拓博科技有限公司 Spatial gesture control method based on deep learning model cascade
CN112866064A (en) * 2021-01-04 2021-05-28 欧普照明电器(中山)有限公司 Control method, control system and electronic equipment
CN112965602A (en) * 2021-03-22 2021-06-15 苏州惠显智能科技有限公司 Gesture-based human-computer interaction method and device
CN112965604A (en) * 2021-03-29 2021-06-15 深圳市优必选科技股份有限公司 Gesture recognition method and device, terminal equipment and computer readable storage medium
CN113253837A (en) * 2021-04-01 2021-08-13 作业帮教育科技(北京)有限公司 Air writing method and device, online live broadcast system and computer equipment
CN113058260B (en) * 2021-04-22 2024-02-02 杭州当贝网络科技有限公司 Method, system and storage medium for identifying motion of body feeling based on player image
CN113936338A (en) * 2021-12-15 2022-01-14 北京亮亮视野科技有限公司 Gesture recognition method and device and electronic equipment
CN113934307B (en) * 2021-12-16 2022-03-18 佛山市霖云艾思科技有限公司 Method for starting electronic equipment according to gestures and scenes
CN114265499A (en) * 2021-12-17 2022-04-01 交控科技股份有限公司 Interaction method and system applied to customer service terminal
CN115297263B (en) * 2022-08-24 2023-04-07 广州方图科技有限公司 Automatic photographing control method and system suitable for cube shooting and cube shooting
CN115576417A (en) * 2022-09-27 2023-01-06 广州视琨电子科技有限公司 Interaction control method, device and equipment based on image recognition

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150253864A1 (en) * 2014-03-06 2015-09-10 Avago Technologies General Ip (Singapore) Pte. Ltd. Image Processor Comprising Gesture Recognition System with Finger Detection and Tracking Functionality
CN105045399A (en) * 2015-09-07 2015-11-11 哈尔滨市一舍科技有限公司 Electronic device with 3D camera assembly
CN105824406A (en) * 2015-11-30 2016-08-03 维沃移动通信有限公司 Photographing method and terminal
CN106682585A (en) * 2016-12-02 2017-05-17 南京理工大学 Dynamic gesture identifying method based on kinect 2
CN107239731A (en) * 2017-04-17 2017-10-10 浙江工业大学 A kind of gestures detection and recognition methods based on Faster R CNN
CN107846555A (en) * 2017-11-06 2018-03-27 深圳慧源创新科技有限公司 Automatic shooting method, device, user terminal and computer-readable storage medium based on gesture identification
CN111062312A (en) * 2019-12-13 2020-04-24 RealMe重庆移动通信有限公司 Gesture recognition method, gesture control method, device, medium and terminal device

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324922B (en) * 2008-07-30 2012-04-18 北京中星微电子有限公司 Method and apparatus for acquiring fingertip track
IL204436A (en) * 2010-03-11 2016-03-31 Deutsche Telekom Ag System and method for hand gesture recognition for remote control of an internet protocol tv
CN102402680B (en) * 2010-09-13 2014-07-30 株式会社理光 Hand and indication point positioning method and gesture confirming method in man-machine interactive system
CN102467657A (en) * 2010-11-16 2012-05-23 三星电子株式会社 Gesture recognizing system and method
CN102200834B (en) * 2011-05-26 2012-10-31 华南理工大学 Television control-oriented finger-mouse interaction method
KR101302638B1 (en) * 2011-07-08 2013-09-05 더디엔에이 주식회사 Method, terminal, and computer readable recording medium for controlling content by detecting gesture of head and gesture of hand
CN102368290B (en) * 2011-09-02 2012-12-26 华南理工大学 Hand gesture identification method based on finger advanced characteristic
TWI454966B (en) * 2012-04-24 2014-10-01 Wistron Corp Gesture control method and gesture control device
CN102854982B (en) * 2012-08-01 2015-06-24 华平信息技术(南昌)有限公司 Method for recognizing customized gesture tracks
JP5665140B2 (en) * 2012-08-17 2015-02-04 Necソリューションイノベータ株式会社 Input device, input method, and program
CN104407694B (en) * 2014-10-29 2018-02-23 山东大学 The man-machine interaction method and device of a kind of combination face and gesture control
CN104809387B (en) * 2015-03-12 2017-08-29 山东大学 Contactless unlocking method and device based on video image gesture identification
CN104992192A (en) * 2015-05-12 2015-10-21 浙江工商大学 Visual motion tracking telekinetic handwriting system
CN105046199A (en) * 2015-06-17 2015-11-11 吉林纪元时空动漫游戏科技股份有限公司 Finger tip point extraction method based on pixel classifier and ellipse fitting
CN106971130A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of gesture identification method using face as reference
CN107679860A (en) * 2017-08-09 2018-02-09 百度在线网络技术(北京)有限公司 A kind of method, apparatus of user authentication, equipment and computer-readable storage medium
CN108229324B (en) * 2017-11-30 2021-01-26 北京市商汤科技开发有限公司 Gesture tracking method and device, electronic equipment and computer storage medium
CN109190461B (en) * 2018-07-23 2019-04-26 中南民族大学 A kind of dynamic gesture identification method and system based on gesture key point
CN110069126B (en) * 2018-11-16 2023-11-03 北京微播视界科技有限公司 Virtual object control method and device
CN109977791A (en) * 2019-03-04 2019-07-05 山东海博科技信息系统股份有限公司 A kind of hand physiologic information detection method
CN109977906B (en) * 2019-04-04 2021-06-01 睿魔智能科技(深圳)有限公司 Gesture recognition method and system, computer device and storage medium
CN110333785B (en) * 2019-07-11 2022-10-28 Oppo广东移动通信有限公司 Information processing method and device, storage medium and augmented reality equipment
CN110490165B (en) * 2019-08-26 2021-05-25 哈尔滨理工大学 Dynamic gesture tracking method based on convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150253864A1 (en) * 2014-03-06 2015-09-10 Avago Technologies General Ip (Singapore) Pte. Ltd. Image Processor Comprising Gesture Recognition System with Finger Detection and Tracking Functionality
CN105045399A (en) * 2015-09-07 2015-11-11 哈尔滨市一舍科技有限公司 Electronic device with 3D camera assembly
CN105824406A (en) * 2015-11-30 2016-08-03 维沃移动通信有限公司 Photographing method and terminal
CN106682585A (en) * 2016-12-02 2017-05-17 南京理工大学 Dynamic gesture identifying method based on kinect 2
CN107239731A (en) * 2017-04-17 2017-10-10 浙江工业大学 A kind of gestures detection and recognition methods based on Faster R CNN
CN107846555A (en) * 2017-11-06 2018-03-27 深圳慧源创新科技有限公司 Automatic shooting method, device, user terminal and computer-readable storage medium based on gesture identification
CN111062312A (en) * 2019-12-13 2020-04-24 RealMe重庆移动通信有限公司 Gesture recognition method, gesture control method, device, medium and terminal device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469017A (en) * 2021-06-29 2021-10-01 北京市商汤科技开发有限公司 Image processing method and device and electronic equipment
WO2023273071A1 (en) * 2021-06-29 2023-01-05 北京市商汤科技开发有限公司 Image processing method and apparatus and electronic device
CN113808007A (en) * 2021-09-16 2021-12-17 北京百度网讯科技有限公司 Method and device for adjusting virtual face model, electronic equipment and storage medium
CN115097936A (en) * 2022-06-16 2022-09-23 慧之安信息技术股份有限公司 Display screen control method based on gesture action deep learning
CN115565253A (en) * 2022-12-08 2023-01-03 季华实验室 Dynamic gesture real-time recognition method and device, electronic equipment and storage medium
CN115565253B (en) * 2022-12-08 2023-04-18 季华实验室 Dynamic gesture real-time recognition method and device, electronic equipment and storage medium
CN115830642A (en) * 2023-02-13 2023-03-21 粤港澳大湾区数字经济研究院(福田) 2D whole body key point labeling method and 3D human body grid labeling method
CN115830642B (en) * 2023-02-13 2024-01-12 粤港澳大湾区数字经济研究院(福田) 2D whole body human body key point labeling method and 3D human body grid labeling method

Also Published As

Publication number Publication date
CN111062312A (en) 2020-04-24
CN111062312B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
WO2021115181A1 (en) Gesture recognition method, gesture control method, apparatuses, medium and terminal device
US10168794B2 (en) Motion-assisted visual language for human computer interfaces
US10429944B2 (en) System and method for deep learning based hand gesture recognition in first person view
CN109240576B (en) Image processing method and device in game, electronic device and storage medium
JP7073522B2 (en) Methods, devices, devices and computer readable storage media for identifying aerial handwriting
CN110209273B (en) Gesture recognition method, interaction control method, device, medium and electronic equipment
US11430265B2 (en) Video-based human behavior recognition method, apparatus, device and storage medium
WO2019128508A1 (en) Method and apparatus for processing image, storage medium, and electronic device
WO2019120290A1 (en) Dynamic gesture recognition method and device, and gesture interaction control method and device
US9104242B2 (en) Palm gesture recognition method and device as well as human-machine interaction method and apparatus
CN108960163B (en) Gesture recognition method, device, equipment and storage medium
WO2020228643A1 (en) Interactive control method and apparatus, electronic device and storage medium
US10990226B2 (en) Inputting information using a virtual canvas
CN109919077B (en) Gesture recognition method, device, medium and computing equipment
EP4053735A1 (en) Method for structuring pedestrian information, device, apparatus and storage medium
CN114138121B (en) User gesture recognition method, device and system, storage medium and computing equipment
WO2021204037A1 (en) Detection method and apparatus for facial key point, and storage medium and electronic device
CN113014846B (en) Video acquisition control method, electronic equipment and computer readable storage medium
CN111399638A (en) Blind computer and intelligent mobile phone auxiliary control method adapted to same
JP2001016606A (en) Operation recognition system and recording medium recording operation recognition program
WO2021203368A1 (en) Image processing method and apparatus, electronic device and storage medium
CN108227923A (en) A kind of virtual touch-control system and method based on body-sensing technology
WO2023197648A1 (en) Screenshot processing method and apparatus, electronic device, and computer readable medium
CN109241942B (en) Image processing method and device, face recognition equipment and storage medium
WO2020224127A1 (en) Video stream capturing method and apparatus, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20897997

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20897997

Country of ref document: EP

Kind code of ref document: A1