CN111368800A - Gesture recognition method and device - Google Patents
Gesture recognition method and device Download PDFInfo
- Publication number
- CN111368800A CN111368800A CN202010227340.3A CN202010227340A CN111368800A CN 111368800 A CN111368800 A CN 111368800A CN 202010227340 A CN202010227340 A CN 202010227340A CN 111368800 A CN111368800 A CN 111368800A
- Authority
- CN
- China
- Prior art keywords
- gesture
- training
- information
- voice
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 241000282414 Homo sapiens Species 0.000 claims abstract description 34
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 58
- 238000010801 machine learning Methods 0.000 claims description 20
- 230000015654 memory Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 12
- 230000000694 effects Effects 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000003068 static effect Effects 0.000 description 7
- 238000001514 detection method Methods 0.000 description 5
- 238000010304 firing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 240000008042 Zea mays Species 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Psychiatry (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Social Psychology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- User Interface Of Digital Computer (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a gesture recognition method and a device, wherein the gesture recognition method comprises the following steps: acquiring a color image, a depth image, an infrared image and human body bone point information of a gesture made by a user; and inputting the color image, the depth image, the infrared image and the human body skeleton point information into a trained gesture recognition model to obtain semantic information of the gesture made by the user. The invention realizes the technical effect of quickly and accurately identifying the semantic information contained in the user gesture.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a gesture recognition method and device.
Background
Gestures are one of the most convenient and common communication modes among people, and play an important role in long-term social production practice activities of human beings. With the development of artificial intelligence and the like, human-computer interaction is more and more gradually applied to the aspects of people's life, and meanwhile, gestures are more and more important in the human-computer interaction process. The characteristics of nature and convenience of the gestures greatly improve the efficiency of human-computer interaction and greatly expand the application scene of the human-computer interaction. However, human gestures are originally complex, different recognition methods receive various environmental interferences, and how to quickly and accurately recognize complex semantic information contained in human gestures becomes a problem to be solved in a gesture recognition research process.
Disclosure of Invention
In order to solve at least one technical problem in the background art, the invention provides a gesture recognition method and device.
In order to achieve the above object, according to an aspect of the present invention, there is provided a gesture recognition method including:
acquiring a color image, a depth image, an infrared image and human body bone point information of a gesture made by a user;
and inputting the color image, the depth image, the infrared image and the human body skeleton point information into a trained gesture recognition model to obtain semantic information of the gesture made by the user.
Optionally, the trained gesture recognition model is obtained by using a gesture sample labeled with semantic information as training data and training by using a preset machine learning algorithm, wherein the gesture sample includes a color image, a depth image, an infrared image and human skeleton point information of a gesture made by a user.
Optionally, the gesture recognition method further includes:
acquiring a training sample set, wherein the training sample set comprises a plurality of gesture samples marked with semantic information, and the gesture samples comprise color images, depth images, infrared images and human skeleton point information of gestures made by a user;
and performing model training by adopting a preset machine learning algorithm according to the training sample set to obtain a trained gesture recognition model.
Optionally, the machine learning algorithm includes: the centrnet algorithm.
Optionally, the gesture recognition method further includes:
acquiring collected voice information of a user;
inputting the voice information into a trained voice recognition model to obtain a voice recognition result, wherein the trained voice recognition model is obtained by training a voice sample preset as a root source by adopting a Transformer algorithm;
and outputting gesture information corresponding to the voice recognition result.
In order to achieve the above object, according to another aspect of the present invention, there is provided a gesture recognition apparatus including:
the gesture acquisition unit is used for acquiring a color image, a depth image, an infrared image and human body bone point information of a gesture made by a user;
and the gesture recognition unit is used for inputting the color image, the depth image, the infrared image and the human skeleton point information into a trained gesture recognition model to obtain semantic information of the gesture made by the user.
Optionally, the trained gesture recognition model is obtained by using a gesture sample labeled with semantic information as training data and training by using a preset machine learning algorithm, wherein the gesture sample includes a color image, a depth image, an infrared image and human skeleton point information of a gesture made by a user.
Optionally, the gesture recognition apparatus further includes:
the training sample set acquisition unit is used for acquiring a training sample set, wherein the training sample set comprises a plurality of gesture samples marked with semantic information, and the gesture samples comprise color images, depth images, infrared images and human skeleton point information of gestures made by a user;
and the model training unit is used for performing model training by adopting a preset machine learning algorithm according to the training sample set to obtain a trained gesture recognition model.
Optionally, the machine learning algorithm includes: the centrnet algorithm.
Optionally, the gesture recognition apparatus further includes:
the voice information acquisition unit is used for acquiring the acquired voice information of the user;
the voice recognition unit is used for inputting the voice information into a trained voice recognition model to obtain a voice recognition result, wherein the trained voice recognition model is obtained by training a preset voice sample as a root through a Transformer algorithm;
and the gesture output unit is used for outputting gesture information corresponding to the voice recognition result.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the gesture recognition method when executing the computer program.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the gesture recognition method described above.
The invention has the beneficial effects that: according to the invention, the gesture recognition model is trained through the color image, the depth image, the infrared image and the human body skeleton point information when the user makes the static gesture, and then the semantic information corresponding to the user gesture is recognized according to the trained gesture recognition model, so that the technical effect of quickly and accurately recognizing the semantic information contained in the user gesture is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts. In the drawings:
FIG. 1 is a flow chart of a gesture recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a training process of a gesture recognition model according to an embodiment of the present invention;
FIG. 3 is a flow chart of speech translation to gestures in accordance with an embodiment of the present invention;
FIG. 4 is a first block diagram of a gesture recognition apparatus according to an embodiment of the present invention;
FIG. 5 is a second block diagram of a gesture recognition apparatus according to an embodiment of the present invention;
FIG. 6 is a third block diagram of a gesture recognition apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a computer apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a gesture recognition method based on Kinect, which realizes the translation from static sign language to voice and the translation from voice to corresponding sign language and effectively improves the accuracy of static sign language recognition. The invention can realize two functions of converting sign language into voice and converting voice into sign language by means of the computer and the Kinect camera.
Fig. 1 is a flowchart of a gesture recognition method according to an embodiment of the present invention, and as shown in fig. 1, the gesture recognition method according to the embodiment includes steps S101 to S102.
And step S101, acquiring a color image, a depth image, an infrared image and human body bone point information of a gesture made by a user.
In an optional embodiment of the invention, the step can be completed by the Kinect camera to acquire the gesture image of the user. The Kinect is internally provided with a color camera, a depth camera, an infrared camera and a microphone array; the color camera, the depth camera and the infrared camera can respectively acquire color image information, depth image information and infrared image information of the gesture action. In addition, a Kinect camera can be used for collecting and generating human body bone point information of the user.
During collection, the user stands in front of the Kinect camera to make gesture actions, and after the Kinect camera collects color images, depth images, infrared image information and human body skeleton point information of gestures made by the user, the obtained image information is preprocessed through geometric transformation and image enhancement through an image preprocessing algorithm, so that the number of low-quality images is reduced.
And S102, inputting the color image, the depth image, the infrared image and the human skeleton point information into a trained gesture recognition model to obtain semantic information of the gesture made by the user.
In an optional embodiment of the present invention, the trained gesture recognition model is obtained by using a gesture sample labeled with semantic information as training data and training the training data by using a preset machine learning algorithm, wherein the gesture sample includes a color image, a depth image, an infrared image and human skeleton point information of a gesture made by a user.
In an alternative embodiment of the present invention, semantic information of the gesture made by the user may be represented in the form of semantic words or preset numbers. In an optional embodiment of the present invention, after obtaining the semantic information of the gesture made by the user, the step may further determine the voice information corresponding to the semantic information according to the preset corresponding relationship, and play the voice information, so as to implement the conversion from the gesture to the voice.
From the above description, it can be seen that the gesture recognition model is trained through the color image, the depth image, the infrared image and the human body skeleton point information when the user makes the static gesture, and then the semantic information corresponding to the user gesture is recognized according to the trained gesture recognition model, so that the technical effect of quickly and accurately recognizing the semantic information contained in the user gesture is realized.
Fig. 2 is a training flowchart of the gesture recognition model according to the embodiment of the present invention, and as shown in fig. 2, the specific training process of the gesture recognition model of step S102 includes steps S201 to S202.
Step S201, a training sample set is obtained, wherein the training sample set comprises a plurality of gesture samples marked with semantic information, and the gesture samples comprise color images, depth images, infrared images and human skeleton point information of gestures made by a user.
And S202, performing model training by adopting a preset machine learning algorithm according to the training sample set to obtain a trained gesture recognition model.
In an alternative embodiment of the present invention, the machine learning algorithm may adopt various existing machine learning algorithms. Preferably, the machine learning algorithm may adopt a centret algorithm.
The CenterNet algorithm is an algorithm with good performance in the One-Stage target detection algorithm, and utilizes key point triples to detect objects.
The centret algorithm model derives the center heatmaps (center heat maps) and corner heatmaps (corner heat maps) from the center firing (center pooling) and the case corner firing (cascade corner pooling), respectively, to predict the locations of keypoints.
Center firing: the center of an object does not necessarily contain strong semantic information that is easily distinguished from other classes. While center firing can be used to enrich the center point features. The center firing extracts the maximum values of the horizontal direction and the vertical direction of the center point and adds the maximum values, thereby providing information beyond the position of the center point. This operation gives the central point the opportunity to obtain semantic information that is more easily distinguished from other categories.
Cascade corn pooling; generally, the corner points are located outside the object, and the located positions do not contain semantic information of the associated object, which brings difficulty to the detection of the corner points. The method comprises the steps of extracting object boundary maximum values at first, then continuously extracting maximum values towards the inside (along the direction of a dotted line in the figure) at the boundary maximum values, and adding the maximum values with the boundary maximum values, so as to provide richer associated object semantic information for corner point features.
After the positions and the types of the angular points are obtained, the positions of the angular points are mapped to the corresponding positions of the output picture through offsets, and then which two angular points belong to the same object is judged through embeddings so as to form a detection frame. As mentioned above, the lack of assistance from the information inside the target area during the combination process results in a large number of false detections. To solve this problem, the centret algorithm predicts not only the corner points, but also the center points. Defining each central area for each prediction frame, and determining whether the central area of each target frame contains a central point, if so, retaining, and taking the confidence of the frame as the average of the central point, the upper left corner point and the lower right corner point, if not, removing, so that the network has the capability of sensing the internal information of the target area, and can effectively remove the wrong target frame.
Too small a central region results in many small-scale false target boxes not being removed, while too large a central region results in many large-scale false target boxes not being removed, so the centret algorithm uses a scale-adjustable central region definition method, which can be as follows:
the method may define a relatively small central region when the dimensions of the prediction box are large and predict a relatively large central region when the dimensions of the prediction box are small.
Therefore, the gesture recognition model trained by the CenterNet algorithm has high recognition accuracy and recognition efficiency.
The invention can also realize the conversion of the voice into the sign language by means of the computer and the Kinect camera. Fig. 3 is a flowchart of voice conversion to gesture according to an embodiment of the present invention, and as shown in fig. 3, the flow of voice conversion to gesture according to an embodiment of the present invention includes steps S301 to S303.
Step S301, acquiring the collected voice information of the user.
In an optional embodiment of the invention, the step can finish the collection of the user voice through a microphone array of the Kinect camera to obtain the voice information of the user.
In the optional embodiment of the invention, after the voice information of the user is collected, corresponding processing such as filtering, preprocessing and the like is required to be carried out, so that the quality of the voice information is improved.
Step S302, inputting the voice information into a trained voice recognition model to obtain a voice recognition result, wherein the trained voice recognition model is obtained by training a voice sample preset as a root through a Transformer algorithm.
The Transformer algorithm model improves the defect of slow training of the RNN most suffered from the fouling, and a self-attack mechanism is utilized to realize quick parallelism; and a residual error structure is added in the Transformer algorithm, so that the depth can be increased to a very deep depth, the characteristics of the DNN model are fully explored, and the model identification accuracy is improved.
In an optional embodiment of the present invention, the voice recognition result may be semantic information corresponding to the voice information, and the semantic information may be represented in the form of semantic characters or preset numbers.
And step S303, outputting gesture information corresponding to the voice recognition result.
In an optional embodiment of the present invention, the step determines, plays and displays the sign language video and/or the text information corresponding to the voice recognition result.
The embodiment can show that the invention realizes the method for inter-translating the static gesture and the voice, various image information of a person when the person makes the static gesture can be collected by utilizing the Kinect camera, the human body skeleton information obtained by the Kinect is supplemented, the static sign language identification function is realized by utilizing the target detection CenterNet algorithm, compared with the common sign language identification method based on computer vision, the influence of a noisy background on the identification effect is reduced, the multi-dimensional information is fully utilized, and the identification accuracy and stability are improved. Meanwhile, the invention can realize the voice recognition function by using a microphone with a built-in Kinect and a voice recognition algorithm.
Therefore, compared with the common sign language recognition technology, the Kinect-based gesture recognition method has higher accuracy, and can better reduce the influence caused by complex background and different illumination intensities; moreover, the bidirectional translation of the sign language voice can be realized, and the method has stronger functionality.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Based on the same inventive concept, an embodiment of the present invention further provides a gesture recognition apparatus, which can be used to implement the gesture recognition method described in the foregoing embodiment, as described in the following embodiments. Because the principle of the gesture recognition apparatus for solving the problem is similar to that of the gesture recognition method, the embodiment of the gesture recognition apparatus can be referred to the embodiment of the gesture recognition method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a first structural block diagram of a gesture recognition apparatus according to an embodiment of the present invention, and as shown in fig. 4, the gesture recognition apparatus according to the embodiment of the present invention includes: gesture collection unit 1 and gesture recognition unit 2.
The gesture collection unit 1 is used for obtaining a color image, a depth image, an infrared image and human body skeleton point information of a gesture made by a user.
And the gesture recognition unit 2 is used for inputting the color image, the depth image, the infrared image and the human skeleton point information into a trained gesture recognition model to obtain semantic information of the gesture made by the user.
In an optional embodiment of the present invention, the trained gesture recognition model is obtained by using a gesture sample labeled with semantic information as training data and training the training data by using a preset machine learning algorithm, wherein the gesture sample includes a color image, a depth image, an infrared image and human skeleton point information of a gesture made by a user.
Fig. 5 is a second structural block diagram of the gesture recognition apparatus according to the embodiment of the present invention, and as shown in fig. 5, the gesture recognition apparatus according to the embodiment of the present invention further includes: a training sample set acquisition unit 3 and a model training unit 4.
The training sample set obtaining unit 3 is configured to obtain a training sample set, where the training sample set includes a plurality of gesture samples labeled with semantic information, and the gesture samples include color images, depth images, infrared images, and human skeleton point information of gestures made by a user.
And the model training unit 4 is used for performing model training by adopting a preset machine learning algorithm according to the training sample set to obtain a trained gesture recognition model.
In an alternative embodiment of the invention, the machine learning algorithm comprises: the centrnet algorithm.
Fig. 6 is a block diagram of a third structure of the gesture recognition apparatus according to the embodiment of the present invention, and as shown in fig. 6, the gesture recognition apparatus according to the embodiment of the present invention further includes: a voice information acquisition unit 5, a voice recognition unit 6, and a gesture output unit 7.
And the voice information acquisition unit 5 is used for acquiring the acquired voice information of the user.
And the voice recognition unit 6 is configured to input the voice information into a trained voice recognition model to obtain a voice recognition result, where the trained voice recognition model is obtained by training a voice sample preset as a root through a Transformer algorithm.
And the gesture output unit 7 is used for outputting gesture information corresponding to the voice recognition result.
To achieve the above object, according to another aspect of the present application, there is also provided a computer apparatus. As shown in fig. 7, the computer device comprises a memory, a processor, a communication interface and a communication bus, wherein a computer program that can be run on the processor is stored in the memory, and the steps of the method of the above embodiment are realized when the processor executes the computer program.
The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and units, such as the corresponding program units in the above-described method embodiments of the present invention. The processor executes various functional applications of the processor and the processing of the work data by executing the non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more units are stored in the memory and when executed by the processor perform the method of the above embodiments.
The specific details of the computer device may be understood by referring to the corresponding related descriptions and effects in the above embodiments, and are not described herein again.
In order to achieve the above object, according to another aspect of the present application, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the gesture recognition method described above. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (12)
1. A gesture recognition method, comprising:
acquiring a color image, a depth image, an infrared image and human body bone point information of a gesture made by a user;
and inputting the color image, the depth image, the infrared image and the human body skeleton point information into a trained gesture recognition model to obtain semantic information of the gesture made by the user.
2. The gesture recognition method according to claim 1, wherein the trained gesture recognition model is obtained by using a gesture sample labeled with semantic information as training data and training the training data by using a preset machine learning algorithm, wherein the gesture sample comprises a color image, a depth image, an infrared image and human skeleton point information of a gesture made by a user.
3. The gesture recognition method according to claim 1, further comprising:
acquiring a training sample set, wherein the training sample set comprises a plurality of gesture samples marked with semantic information, and the gesture samples comprise color images, depth images, infrared images and human skeleton point information of gestures made by a user;
and performing model training by adopting a preset machine learning algorithm according to the training sample set to obtain a trained gesture recognition model.
4. The gesture recognition method according to claim 2 or 3, wherein the machine learning algorithm comprises: the centrnet algorithm.
5. The gesture recognition method according to claim 1, further comprising:
acquiring collected voice information of a user;
inputting the voice information into a trained voice recognition model to obtain a voice recognition result, wherein the trained voice recognition model is obtained by training a voice sample preset as a root source by adopting a Transformer algorithm;
and outputting gesture information corresponding to the voice recognition result.
6. A gesture recognition apparatus, comprising:
the gesture acquisition unit is used for acquiring a color image, a depth image, an infrared image and human body bone point information of a gesture made by a user;
and the gesture recognition unit is used for inputting the color image, the depth image, the infrared image and the human skeleton point information into a trained gesture recognition model to obtain semantic information of the gesture made by the user.
7. The gesture recognition device according to claim 6, wherein the trained gesture recognition model is obtained by using a gesture sample labeled with semantic information as training data and training the training data by using a preset machine learning algorithm, wherein the gesture sample comprises a color image, a depth image, an infrared image and human skeleton point information of a gesture made by a user.
8. The gesture recognition device of claim 6, further comprising:
the training sample set acquisition unit is used for acquiring a training sample set, wherein the training sample set comprises a plurality of gesture samples marked with semantic information, and the gesture samples comprise color images, depth images, infrared images and human skeleton point information of gestures made by a user;
and the model training unit is used for performing model training by adopting a preset machine learning algorithm according to the training sample set to obtain a trained gesture recognition model.
9. The gesture recognition apparatus according to claim 7 or 8, wherein the machine learning algorithm comprises: the centrnet algorithm.
10. The gesture recognition device of claim 6, further comprising:
the voice information acquisition unit is used for acquiring the acquired voice information of the user;
the voice recognition unit is used for inputting the voice information into a trained voice recognition model to obtain a voice recognition result, wherein the trained voice recognition model is obtained by training a preset voice sample as a root through a Transformer algorithm;
and the gesture output unit is used for outputting gesture information corresponding to the voice recognition result.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 5 when executing the computer program.
12. A computer-readable storage medium, in which a computer program is stored which, when executed in a computer processor, implements the method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010227340.3A CN111368800B (en) | 2020-03-27 | 2020-03-27 | Gesture recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010227340.3A CN111368800B (en) | 2020-03-27 | 2020-03-27 | Gesture recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111368800A true CN111368800A (en) | 2020-07-03 |
CN111368800B CN111368800B (en) | 2023-11-28 |
Family
ID=71212100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010227340.3A Active CN111368800B (en) | 2020-03-27 | 2020-03-27 | Gesture recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111368800B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112750437A (en) * | 2021-01-04 | 2021-05-04 | 欧普照明股份有限公司 | Control method, control device and electronic equipment |
CN113515191A (en) * | 2021-05-12 | 2021-10-19 | 中国工商银行股份有限公司 | Information interaction method and device based on sign language identification and synthesis |
CN115471917A (en) * | 2022-09-29 | 2022-12-13 | 中国电子科技集团公司信息科学研究院 | Gesture detection and recognition system and method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130077820A1 (en) * | 2011-09-26 | 2013-03-28 | Microsoft Corporation | Machine learning gesture detection |
CN104598915A (en) * | 2014-01-24 | 2015-05-06 | 深圳奥比中光科技有限公司 | Gesture recognition method and gesture recognition device |
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN110209273A (en) * | 2019-05-23 | 2019-09-06 | Oppo广东移动通信有限公司 | Gesture identification method, interaction control method, device, medium and electronic equipment |
CN110728191A (en) * | 2019-09-16 | 2020-01-24 | 北京华捷艾米科技有限公司 | Sign language translation method, and MR-based sign language-voice interaction method and system |
-
2020
- 2020-03-27 CN CN202010227340.3A patent/CN111368800B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130077820A1 (en) * | 2011-09-26 | 2013-03-28 | Microsoft Corporation | Machine learning gesture detection |
CN104598915A (en) * | 2014-01-24 | 2015-05-06 | 深圳奥比中光科技有限公司 | Gesture recognition method and gesture recognition device |
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN110209273A (en) * | 2019-05-23 | 2019-09-06 | Oppo广东移动通信有限公司 | Gesture identification method, interaction control method, device, medium and electronic equipment |
CN110728191A (en) * | 2019-09-16 | 2020-01-24 | 北京华捷艾米科技有限公司 | Sign language translation method, and MR-based sign language-voice interaction method and system |
Non-Patent Citations (1)
Title |
---|
KAIWEN DUAN等: "CenterNet: Keypoint Triplets for Object Detection", pages 1, Retrieved from the Internet <URL:https://arxiv.org/pdf/1904.08189.pdf> * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112750437A (en) * | 2021-01-04 | 2021-05-04 | 欧普照明股份有限公司 | Control method, control device and electronic equipment |
CN113515191A (en) * | 2021-05-12 | 2021-10-19 | 中国工商银行股份有限公司 | Information interaction method and device based on sign language identification and synthesis |
CN115471917A (en) * | 2022-09-29 | 2022-12-13 | 中国电子科技集团公司信息科学研究院 | Gesture detection and recognition system and method |
CN115471917B (en) * | 2022-09-29 | 2024-02-27 | 中国电子科技集团公司信息科学研究院 | Gesture detection and recognition system and method |
Also Published As
Publication number | Publication date |
---|---|
CN111368800B (en) | 2023-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107273800B (en) | Attention mechanism-based motion recognition method for convolutional recurrent neural network | |
TWI714834B (en) | Human face live detection method, device and electronic equipment | |
CN109740534B (en) | Image processing method, device and processing equipment | |
Nguyen et al. | Yolo based real-time human detection for smart video surveillance at the edge | |
US8792722B2 (en) | Hand gesture detection | |
JP2021502627A (en) | Image processing system and processing method using deep neural network | |
CN111368800A (en) | Gesture recognition method and device | |
US12033374B2 (en) | Image processing method, apparatus, and device, and storage medium | |
CN113704531A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
US10997730B2 (en) | Detection of moment of perception | |
WO2021047587A1 (en) | Gesture recognition method, electronic device, computer-readable storage medium, and chip | |
CN107786867A (en) | Image identification method and system based on deep learning architecture | |
CN111722700A (en) | Man-machine interaction method and man-machine interaction equipment | |
CN112749646A (en) | Interactive point-reading system based on gesture recognition | |
CN109977875A (en) | Gesture identification method and equipment based on deep learning | |
Lahiani et al. | Hand pose estimation system based on Viola-Jones algorithm for android devices | |
CN112580395A (en) | Depth information-based 3D face living body recognition method, system, device and medium | |
CN113792807A (en) | Skin disease classification model training method, system, medium and electronic device | |
Hebda et al. | A compact deep convolutional neural network architecture for video based age and gender estimation | |
CN109871128B (en) | Question type identification method and device | |
Rayeed et al. | Bangla sign digits recognition using depth information | |
Wang et al. | Research on Gesture Recognition Algorithm Based on Lightweight YOLOv4 | |
Rawat et al. | Indian sign language recognition system for interrogative words using deep learning | |
Kheldoun et al. | Algsl89: An algerian sign language dataset | |
Jeny et al. | Hand Gesture Recognition for Sign Language Using Convolutional Neural Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |