WO2022174605A1 - Procédé de reconnaissance de geste, appareil de reconnaissance de geste et dispositif intelligent - Google Patents

Procédé de reconnaissance de geste, appareil de reconnaissance de geste et dispositif intelligent Download PDF

Info

Publication number
WO2022174605A1
WO2022174605A1 PCT/CN2021/124613 CN2021124613W WO2022174605A1 WO 2022174605 A1 WO2022174605 A1 WO 2022174605A1 CN 2021124613 W CN2021124613 W CN 2021124613W WO 2022174605 A1 WO2022174605 A1 WO 2022174605A1
Authority
WO
WIPO (PCT)
Prior art keywords
gesture
information
target video
key point
gesture recognition
Prior art date
Application number
PCT/CN2021/124613
Other languages
English (en)
Chinese (zh)
Inventor
汤志超
程骏
郭渺辰
钱程浩
邵池
庞建新
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Publication of WO2022174605A1 publication Critical patent/WO2022174605A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • the present application belongs to the technical field of gesture recognition, and in particular, relates to a gesture recognition method, a gesture recognition device, a smart device, and a computer-readable storage medium.
  • gesture recognition plays an important role in the field of human-computer interaction.
  • Gesture recognition technology can help people solve problems in corresponding scenarios, such as recognizing the sign language of deaf people and playing guessing games with robots.
  • the current gesture recognition technology does not have high recognition accuracy and high robustness.
  • the present application provides a gesture recognition method, a gesture recognition device, a smart device, and a computer-readable storage medium, which can improve the accuracy and robustness of gesture recognition.
  • the present application provides a gesture recognition method, including:
  • the annotation information includes the category information, positioning frame information, and key point information of the gesture in the above-mentioned sample gesture image.
  • the present application provides a gesture recognition device, including:
  • the recognition unit is used to input the above-mentioned target video into the trained gesture recognition model, and obtain the category information, positioning frame information and key point information of the gesture in the above-mentioned target video, wherein the above-mentioned gesture recognition model uses the sample gesture image carrying the annotation information. It is obtained through training that the above-mentioned labeling information includes the category information, positioning frame information and key point information of the gesture in the above-mentioned sample gesture image.
  • the present application provides a smart device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the first aspect when the processor executes the computer program. steps of the method.
  • the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the method in the first aspect.
  • the present application provides a computer program product, wherein the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method of the first aspect are implemented.
  • the above target video is input into the trained gesture recognition model, and the category information, positioning frame information and key point information of the gesture in the above target video are obtained, wherein
  • the above-mentioned gesture recognition model is obtained by training sample gesture images carrying annotation information, and the above-mentioned annotation information includes gesture category information, positioning frame information and key point information in the above-mentioned sample gesture images.
  • the solution of the present application uses the sample gesture images carrying the annotation information to train the gesture recognition model.
  • the gesture recognition model can implicitly combine the various gesture information for learning, so that the trained gesture recognition model has high accuracy and robustness. It can be understood that, for the beneficial effects of the second aspect to the fifth aspect, reference may be made to the relevant description in the first aspect, which is not repeated here.
  • FIG. 1 is a schematic flowchart of a gesture recognition method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an application environment of the gesture recognition method provided by an embodiment of the present application
  • FIG. 3 is a structural block diagram of a gesture recognition device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a smart device provided by an embodiment of the present application.
  • a gesture recognition method provided by an embodiment of the present application is described below.
  • the gesture recognition method is applied to a smart device. Referring to Figure 1, the gesture recognition method includes:
  • Step 101 Acquire a target video including gestures.
  • the target video includes gestures, that is, the target video is a video obtained by photographing a human hand by a photographing device.
  • the target video can be a video input in real time through a camera connected to a smart device, or it can be a pre-recorded video, which is not limited here.
  • the user can pre-shoot the hand that is making the gesture through his mobile phone, and then send the captured video to the smart device, and the smart device can use the captured video as the target video.
  • the target video includes several frames of images, and among the several frames of images, at least one frame of images contains gestures, that is, there are two cases, one is that each frame of the target video contains gestures, and the other is The situation is that part of the image of the target video contains gestures and another part of the images does not contain gestures.
  • Step 102 Input the target video into the trained gesture recognition model, and obtain the category information, positioning frame information and key point information of the gesture in the target video.
  • the gesture recognition model is obtained by training sample gesture images.
  • the number of sample gesture images used for training the gesture recognition model should be as large as possible, for example, the number of sample gesture images may be 10,000. Due to the flexibility of the human hand, the number of categories of gestures that the human hand can make is very large, so the gesture recognition model cannot recognize all the categories of gestures that the human hand can make. Based on this, at least one gesture can be selected as the preset gesture based on the application scenario and user requirements, and then sample gesture images including the preset gesture are collected, wherein each sample gesture image includes a preset gesture.
  • 9 kinds of gestures can be selected as preset gestures, and the 9 kinds of preset gestures are palm gestures, stone gestures, scissor gestures, OK gestures, and handsome gestures. ) gesture, call gesture, swear gesture, rock gesture, and one gesture.
  • the sample gesture image For each sample gesture image, it can be annotated, so that the sample gesture image carries annotation information, and the annotation information can include the category information, positioning frame information and key point information of the gesture in the sample gesture image, wherein the category information It is used to indicate the category of the gesture, the positioning frame information is used to indicate the positioning frame of the gesture, the positioning frame is the circumscribing rectangle of the gesture, and the key point information is used to indicate the key points of the gesture (ie, 21 skeleton points of a single hand).
  • the gesture recognition model after training can be obtained by training the gesture recognition model through the sample gesture images.
  • the task model can complete multiple tasks, including the category information of the output gesture, the positioning box information of the output gesture, and the key point information of the output gesture.
  • the multi-task model can improve the learning efficiency and quality of each task by learning the connections and differences of different tasks. Therefore, the gesture recognition accuracy of the trained gesture recognition model in the embodiment of the present application is compared to Traditional gesture recognition models are higher.
  • the gesture recognition model after inputting the target video into the trained gesture recognition model, the gesture recognition model actually performs gesture recognition on each frame of the target video. For each frame of the target video, the gesture recognition model can detect whether the image contains gestures, and if the image contains gestures, output the category information, positioning frame information and key point information of the gestures in the image, if the image contains gestures If gestures are not included, no information is output.
  • the category information of the gesture in each frame of image in the target video is used to indicate which gesture in the at least one preset gesture the gesture in the frame of image belongs to;
  • the positioning frame information of the gesture in each frame of image in the target video The position of the positioning frame used to indicate the gesture in the frame image, for example, the positioning frame information is the coordinates of the upper left corner and the lower right corner of the positioning frame;
  • the key point information of the gesture in each frame image in the target video is used to indicate the frame image
  • the position of the key point of the gesture in for example, the key point information is the coordinate of the key point.
  • the method before inputting the target video into the trained gesture recognition model, the method further includes:
  • step 102 specifically includes:
  • the normalized video is input into the trained gesture recognition model, and the category information, positioning frame information and key point information of the gesture in the target video are obtained.
  • the normalization process may be to perform mean and variance operations on the pixel values of the three RGB channels in each frame of the target video, so that the pixel values are converted from a range of 0 to 255 to -1 to 1 In the range.
  • each frame image of the target video can meet the requirements of the gesture recognition model for the image format, which facilitates the subsequent use of the gesture recognition model for gesture recognition.
  • the normalized target video is recorded as a normalized video, and the normalized video is input into the trained gesture recognition model, so that the gesture recognition model outputs the gesture recognition model in the target video based on this. Category information, positioning box information and key fixed information.
  • the gesture recognition model is a multi-task model and can complete various tasks
  • the gesture recognition model can be made to include a gesture classification branch, a gesture localization branch and a key point detection branch, wherein each branch correspondingly completes a Task.
  • the gesture classification branch is used to output category information of gestures in the target video.
  • the implementation of the gesture classification branch is to perform one-hot encoding on the gesture category, and use the softmax layer to output the probability of the gesture category.
  • a target preset gesture with the highest matching probability with the gesture in the target video can be determined among at least one preset gesture, and the category information of the gesture in the target video can be determined based on the target preset gesture.
  • the target video contains an unknown gesture X.
  • the matching probability between the gesture X and the preset gesture A is 14%
  • the matching probability between the gesture X and the preset gesture B is 14%.
  • the probability is 85%, and the matching probability between the gesture X and the preset gesture C is 1%, then the preset gesture B can be determined as the target preset gesture, and the category information indicates that the unknown gesture X is the preset gesture B.
  • the gesture positioning branch is used to output the positioning frame information of the gesture in the target video.
  • the position of the gesture in the target video can be positioned, and then the positioning frame information of the gesture in the target video can be determined based on the position.
  • the keypoint detection branch is used to output keypoint information of gestures in the target video.
  • the implementation of the keypoint detection branch is network regression. Through the key point detection branch, the position of the key point of the gesture in the target video can be detected, and then the key point information of the gesture in the target video can be determined based on the position.
  • the gesture recognition model further includes a feature extraction layer (ie BackBone network), which can be a deep residual network (ResNet), such as ResNet50, or a lightweight network such as shuffleNet and MobileNet. , which network to choose as the feature extraction layer can be determined according to the performance of the smart device. For example, if the smart device is a desktop computer with strong performance, ResNet50 can be selected as the feature extraction layer. If the smart device is a mobile phone with weak performance, then MobileNet can be selected as the feature extraction layer.
  • the feature extraction layer can perform feature extraction on the target video to obtain the feature information of the target video.
  • the feature information of the target video is obtained through the feature extraction layer
  • the feature information will be input to the gesture classification branch, the gesture localization branch and the key point detection branch respectively.
  • the gesture classification branch can output the category information of the gesture in the target video based on the feature information
  • the gesture localization branch can output the positioning frame information of the gesture in the target video based on the feature information
  • the key point detection branch can output the gesture in the target video based on the feature information. key point information.
  • the gesture classification branch, the gesture localization branch and the keypoint detection branch can be obtained by training with different loss functions respectively.
  • the cross-entropy loss function can be used to guide the training of the gesture classification branch
  • the GloU loss function can be used to guide the training of the gesture location branch
  • the WingLoss loss function can be used to guide the training of the key point detection branch. Since different branches are trained with different loss functions, the accuracy of the branches obtained by training can be higher.
  • the sample gesture images can also be enhanced, and then the enhanced sample gesture images are used to train the gesture recognition model, so that the sample gesture images are more generalized, which is beneficial to gestures.
  • the recognition accuracy of the recognition model is improved.
  • the enhancement processing may include flipping and rotation, etc.
  • step 102 also includes:
  • positioning frame information and key point information of the gesture in the target video mark the gesture category, positioning frame and key points in the target video
  • the gesture recognition model after the gesture recognition model outputs the category information, positioning frame information, and key point information of the gesture in the target video, it can be based on the category information, positioning frame information and key point information of the gesture in the target video.
  • the category, positioning frame and key points of the gesture are marked in the video, and then a target video marked with the category, positioning frame and key points of the gesture is output to show the target video to the user.
  • users can see the categories, positioning boxes, and key points of the marked gestures, bringing users a more visually impactful experience.
  • the type of the gesture may be marked in the gesture image based on the type information of the gesture in the gesture image, and the type of the gesture may be marked in the gesture image based on the positioning frame information of the gesture in the gesture image.
  • the positioning frame, and the key points of the gesture are marked in the gesture image based on the key point information of the gesture in the gesture image.
  • the gesture image refers to an image containing gestures. It can be understood that, for the non-gesture images in the target video, no marking operation will be performed, wherein the non-gesture images refer to images that do not contain gestures.
  • the above target video is input into the trained gesture recognition model, and the category information, positioning frame information and key point information of the gesture in the above target video are obtained, wherein
  • the above-mentioned gesture recognition model is obtained by training sample gesture images carrying annotation information, and the above-mentioned annotation information includes gesture category information, positioning frame information and key point information in the above-mentioned sample gesture images.
  • the solution of the present application uses the sample gesture images carrying the annotation information to train the gesture recognition model.
  • the gesture recognition model can implicitly combine the various gesture information for learning, so that the trained gesture recognition model has high accuracy and robustness.
  • an embodiment of the present application provides a gesture recognition device.
  • the gesture recognition device 300 in the embodiment of the present application includes:
  • the identification unit 302 is configured to input the above-mentioned target video into a trained gesture recognition model, and obtain the category information, positioning frame information and key point information of the gesture in the above-mentioned target video, wherein the above-mentioned gesture recognition model uses sample gestures carrying label information
  • the image is obtained by training, and the above-mentioned label information includes the category information, positioning frame information and key point information of the gesture in the above-mentioned sample gesture image.
  • the above gesture recognition apparatus 300 further includes:
  • a marking unit configured to mark the category, positioning frame and key points of the gesture in the above-mentioned target video based on the category information, positioning frame information and key point information of the gesture in the above-mentioned target video;
  • the output unit is used for outputting the above-mentioned target video marked with the category of the gesture, the positioning frame and the key points.
  • the above-mentioned marking unit specifically for each frame of the gesture image of the above-mentioned target video, marks the type of the gesture in the above-mentioned gesture image based on the category information of the gesture in the above-mentioned gesture image, based on the positioning frame of the gesture in the above-mentioned gesture image.
  • the information indicates the positioning frame of the gesture in the gesture image, and indicates the key point of the gesture in the gesture image based on the key point information of the gesture in the gesture image, wherein the gesture image is an image including the gesture.
  • the above gesture recognition model includes a gesture classification branch, a gesture positioning branch and a key point detection branch;
  • the above-mentioned gesture classification branch is used to output the category information of gestures in the above-mentioned target video
  • the above-mentioned gesture positioning branch is used to output the positioning frame information of the gesture in the above-mentioned target video;
  • the above-mentioned key point detection branch is used to output the key point information of the gesture in the above-mentioned target video.
  • the above-mentioned gesture recognition model further includes a feature extraction layer, which is used to perform feature extraction on the above-mentioned target video to obtain feature information;
  • the above-mentioned gesture classification branch is specifically configured to output the category information of gestures in the above-mentioned target video based on the above-mentioned feature information;
  • the above-mentioned gesture positioning branch is specifically configured to output the positioning frame information of the gesture in the above-mentioned target video based on the above-mentioned feature information;
  • the above-mentioned key point detection branch is specifically configured to output the key point information of the gesture in the above-mentioned target video based on the above-mentioned feature information.
  • the above-mentioned gesture classification branch, the above-mentioned gesture localization branch, and the above-mentioned key point detection branch are respectively obtained by training with different loss functions.
  • the above gesture recognition apparatus 300 further includes:
  • a normalization unit which is used to normalize each frame of the above-mentioned target video to obtain a normalized video
  • the above-mentioned recognition unit 302 is specifically configured to input the above-mentioned normalized video into the trained gesture recognition model, and obtain the category information, positioning frame information and key point information of the gesture in the above-mentioned target video.
  • the above target video is input into the trained gesture recognition model, and the category information, positioning frame information and key point information of the gesture in the above target video are obtained, wherein
  • the above-mentioned gesture recognition model is obtained by training sample gesture images carrying annotation information, and the above-mentioned annotation information includes gesture category information, positioning frame information and key point information in the above-mentioned sample gesture images.
  • the solution of the present application uses the sample gesture images carrying the annotation information to train the gesture recognition model.
  • the gesture recognition model can implicitly combine the various gesture information for learning, so that the trained gesture recognition model has high accuracy and robustness.
  • the embodiment of the present application also provides a smart device, and the smart device may be a robot, a mobile phone, a desktop computer, or a tablet computer, which is not limited here.
  • the smart device 4 in this embodiment of the present application includes: a memory 401 , one or more processors 402 (only one is shown in FIG. 4 ), a binocular camera 403 , and a binocular camera 403 , which is stored in the memory 401 and can be processed during processing. computer program running on the device.
  • the binocular camera 403 includes a first camera and a second camera; the memory 401 is used to store software programs and units, and the processor 402 executes various functional applications and data processing by running the software programs and units stored in the memory 401, to obtain the resources corresponding to the above preset events. Specifically, the processor 402 implements the following steps by running the above-mentioned computer program stored in the memory 401:
  • the annotation information includes the category information, positioning frame information, and key point information of the gesture in the above-mentioned sample gesture image.
  • the processor 402 also implements the following steps by running the above computer program stored in the memory 401:
  • the above-mentioned gesture is marked in the above-mentioned target video based on the category information, positioning frame information and key point information of the gesture in the above-mentioned target video categories, anchor boxes, and keypoints, including:
  • the type of the gesture is marked in the gesture image based on the type information of the gesture in the gesture image, and the type of the gesture is marked in the gesture image based on the positioning frame information of the gesture in the gesture image.
  • the positioning frame, and marking the key points of the gesture in the gesture image based on the key point information of the gesture in the gesture image, wherein the gesture image is an image including the gesture.
  • the above-mentioned gesture recognition model includes a gesture classification branch, a gesture localization branch, and a key point detection branch;
  • the above-mentioned gesture classification branch is used to output the category information of gestures in the above-mentioned target video
  • the above-mentioned gesture positioning branch is used to output the positioning frame information of the gesture in the above-mentioned target video;
  • the above-mentioned key point detection branch is used to output the key point information of the gesture in the above-mentioned target video.
  • the above-mentioned gesture recognition model further includes a feature extraction layer, which is used to perform feature extraction on the above-mentioned target video to obtain characteristic information;
  • the above-mentioned gesture classification branch is specifically configured to output the category information of gestures in the above-mentioned target video based on the above-mentioned feature information;
  • the above-mentioned gesture positioning branch is specifically configured to output the positioning frame information of the gesture in the above-mentioned target video based on the above-mentioned feature information;
  • the above-mentioned key point detection branch is specifically configured to output the key point information of the gesture in the above-mentioned target video based on the above-mentioned feature information.
  • the gesture classification branch, the gesture localization branch, and the key point detection branch are respectively obtained by training with different loss functions.
  • the processor 402 also implements the following steps by running the above-mentioned computer program stored in the memory 401:
  • the above-mentioned target video is input into the trained gesture recognition model, and the category information, positioning frame information and key point information of the gesture in the above-mentioned target video are obtained, including:
  • the above normalized video is input into the trained gesture recognition model, and the category information, positioning frame information and key point information of the gesture in the target video are obtained.
  • the processor 402 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • Memory 401 may include read-only memory and random access memory, and provides instructions and data to processor 402 . Part or all of memory 401 may also include non-volatile random access memory. For example, the memory 401 may also store information of device categories.
  • the above target video is input into the trained gesture recognition model, and the category information, positioning frame information and key point information of the gesture in the above target video are obtained, wherein
  • the above-mentioned gesture recognition model is obtained by training sample gesture images carrying annotation information, and the above-mentioned annotation information includes gesture category information, positioning frame information and key point information in the above-mentioned sample gesture images.
  • the solution of the present application uses the sample gesture images carrying the annotation information to train the gesture recognition model.
  • the gesture recognition model can implicitly combine the various gesture information for learning, so that the trained gesture recognition model has high accuracy and robustness.
  • the disclosed apparatus and method may be implemented in other manners.
  • the system embodiments described above are only illustrative.
  • the division of the above-mentioned modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined. Either it can be integrated into another system, or some features can be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the above-mentioned integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium.
  • the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing the associated hardware through a computer program, and the above computer program can be stored in a computer-readable storage medium, the computer When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the above-mentioned computer program includes computer program code
  • the above-mentioned computer program code may be in the form of source code, object code form, executable file or some intermediate form.
  • the above-mentioned computer-readable storage medium may include: any entity or device capable of carrying the above-mentioned computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer-readable memory, a read-only memory (ROM, Read-Only Memory) ), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media, etc.
  • a recording medium a U disk, a removable hard disk, a magnetic disk, an optical disk
  • a computer-readable memory a read-only memory (ROM, Read-Only Memory) ), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signals telecommunication signals
  • software distribution media etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention est appropriée au domaine technique de la reconnaissance de gestes, et concerne un procédé de reconnaissance de geste, un appareil de reconnaissance de geste et un dispositif intelligent. Le procédé comprend l'obtention d'une vidéo cible comprenant un geste ; et l'entrée de la vidéo cible dans un modèle de reconnaissance de geste entraîné de façon à obtenir des informations de catégorie, des informations de boîte de positionnement et des informations de point clé du geste de la vidéo cible, le modèle de reconnaissance de geste étant obtenu par entraînement à l'aide d'une image de geste d'échantillon portant des informations d'annotation, et les informations d'annotation comprenant des informations de catégorie, des informations de boîte de positionnement et des informations de point clé d'un geste de l'image de geste d'échantillon. Au moyen de la solution de la présente invention, la précision et la robustesse de la reconnaissance de geste peuvent être améliorées.
PCT/CN2021/124613 2021-02-21 2021-10-19 Procédé de reconnaissance de geste, appareil de reconnaissance de geste et dispositif intelligent WO2022174605A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110194549.9A CN112949437A (zh) 2021-02-21 2021-02-21 一种手势识别方法、手势识别装置及智能设备
CN202110194549.9 2021-02-21

Publications (1)

Publication Number Publication Date
WO2022174605A1 true WO2022174605A1 (fr) 2022-08-25

Family

ID=76244979

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124613 WO2022174605A1 (fr) 2021-02-21 2021-10-19 Procédé de reconnaissance de geste, appareil de reconnaissance de geste et dispositif intelligent

Country Status (2)

Country Link
CN (1) CN112949437A (fr)
WO (1) WO2022174605A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893413A (zh) * 2024-03-15 2024-04-16 博创联动科技股份有限公司 基于图像增强的车载终端人机交互方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949437A (zh) * 2021-02-21 2021-06-11 深圳市优必选科技股份有限公司 一种手势识别方法、手势识别装置及智能设备
CN113407083A (zh) * 2021-06-24 2021-09-17 上海商汤科技开发有限公司 一种数据标注方法及装置、电子设备和存储介质
CN114155562A (zh) * 2022-02-09 2022-03-08 北京金山数字娱乐科技有限公司 手势识别方法及装置
CN117409473A (zh) * 2022-07-04 2024-01-16 北京字跳网络技术有限公司 一种多任务预测方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2980728A1 (fr) * 2014-08-01 2016-02-03 Imersivo, S.L. Méthode d'identification d'un geste manuel
CN108229318A (zh) * 2017-11-28 2018-06-29 北京市商汤科技开发有限公司 手势识别和手势识别网络的训练方法及装置、设备、介质
CN110796096A (zh) * 2019-10-30 2020-02-14 北京达佳互联信息技术有限公司 一种手势识别模型的训练方法、装置、设备及介质
CN111104820A (zh) * 2018-10-25 2020-05-05 中车株洲电力机车研究所有限公司 一种基于深度学习的手势识别方法
CN112949437A (zh) * 2021-02-21 2021-06-11 深圳市优必选科技股份有限公司 一种手势识别方法、手势识别装置及智能设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229324B (zh) * 2017-11-30 2021-01-26 北京市商汤科技开发有限公司 手势追踪方法和装置、电子设备、计算机存储介质
CN109063653A (zh) * 2018-08-07 2018-12-21 北京字节跳动网络技术有限公司 图像处理方法和装置
CN109359538B (zh) * 2018-09-14 2020-07-28 广州杰赛科技股份有限公司 卷积神经网络的训练方法、手势识别方法、装置及设备
CN109657537A (zh) * 2018-11-05 2019-04-19 北京达佳互联信息技术有限公司 基于目标检测的图像识别方法、系统和电子设备
CN111126339A (zh) * 2019-12-31 2020-05-08 北京奇艺世纪科技有限公司 手势识别方法、装置、计算机设备和存储介质
CN111857356B (zh) * 2020-09-24 2021-01-22 深圳佑驾创新科技有限公司 识别交互手势的方法、装置、设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2980728A1 (fr) * 2014-08-01 2016-02-03 Imersivo, S.L. Méthode d'identification d'un geste manuel
CN108229318A (zh) * 2017-11-28 2018-06-29 北京市商汤科技开发有限公司 手势识别和手势识别网络的训练方法及装置、设备、介质
CN111104820A (zh) * 2018-10-25 2020-05-05 中车株洲电力机车研究所有限公司 一种基于深度学习的手势识别方法
CN110796096A (zh) * 2019-10-30 2020-02-14 北京达佳互联信息技术有限公司 一种手势识别模型的训练方法、装置、设备及介质
CN112949437A (zh) * 2021-02-21 2021-06-11 深圳市优必选科技股份有限公司 一种手势识别方法、手势识别装置及智能设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893413A (zh) * 2024-03-15 2024-04-16 博创联动科技股份有限公司 基于图像增强的车载终端人机交互方法
CN117893413B (zh) * 2024-03-15 2024-06-11 博创联动科技股份有限公司 基于图像增强的车载终端人机交互方法

Also Published As

Publication number Publication date
CN112949437A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2022174605A1 (fr) Procédé de reconnaissance de geste, appareil de reconnaissance de geste et dispositif intelligent
CN109359538B (zh) 卷积神经网络的训练方法、手势识别方法、装置及设备
WO2020207190A1 (fr) Procédé de détermination d'informations tridimensionnelles, dispositif de détermination d'informations tridimensionnelles et appareil terminal
CN109902659B (zh) 用于处理人体图像的方法和装置
WO2020103700A1 (fr) Procédé de reconnaissance d'image basé sur des expressions microfaciales, appareil et dispositif associé
WO2018028546A1 (fr) Procédé de positionnement de point clé, terminal et support de stockage informatique
WO2020024484A1 (fr) Procédé et dispositif de production de données
CN112052186B (zh) 目标检测方法、装置、设备以及存储介质
WO2018170663A1 (fr) Procédé et dispositif d'annotation d'image et appareil électronique
WO2023010758A1 (fr) Procédé et appareil de détection d'action, dispositif terminal et support de stockage
WO2021082692A1 (fr) Procédé et dispositif d'étiquetage d'images de piéton, support de stockage et appareil intelligent
WO2020029466A1 (fr) Procédé et appareil de traitement d'image
Vazquez-Fernandez et al. Built-in face recognition for smart photo sharing in mobile devices
CN113128368B (zh) 一种人物交互关系的检测方法、装置及系统
CN113011403B (zh) 手势识别方法、系统、介质及设备
CN111832449A (zh) 工程图纸的显示方法及相关装置
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
CN111080670A (zh) 图像提取方法、装置、设备及存储介质
US20200211413A1 (en) Method, apparatus and terminal device for constructing parts together
WO2023197648A1 (fr) Procédé et appareil de traitement de capture d'écran, dispositif électronique et support lisible par ordinateur
CN111290684A (zh) 图像显示方法、图像显示装置及终端设备
AU2021333957A1 (en) Information display method and device, and storage medium
CN109981989B (zh) 渲染图像的方法、装置、电子设备和计算机可读存储介质
KR20230051704A (ko) 가상 키보드에 기초한 텍스트 입력 방법 및 장치
CN111722700A (zh) 一种人机交互方法及人机交互设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21926319

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21926319

Country of ref document: EP

Kind code of ref document: A1