CN110992426B

CN110992426B - Gesture recognition method and device, electronic equipment and storage medium

Info

Publication number: CN110992426B
Application number: CN201911252882.XA
Authority: CN
Inventors: 谭志鹏; 谭北平
Original assignee: Tsinghua University; Beijing Mininglamp Software System Co ltd
Current assignee: Tsinghua University; Beijing Mininglamp Software System Co ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2024-03-22
Anticipated expiration: 2039-12-09
Also published as: CN110992426A

Abstract

The application provides a gesture recognition method and device, electronic equipment and a storage medium, and relates to the technical field of image processing. In the present application, first, a target video stream is acquired, wherein the target video stream includes at least one frame of target image. And secondly, respectively carrying out first gesture matching and second gesture matching on the target video stream based on a preset video template, wherein the video template comprises at least one frame of template image. And then, based on a first matching result of the first gesture matching and a second matching result of the second gesture matching, obtaining a recognition result of gesture information in the target video stream. By the method, the problem that the accuracy of the recognition result is low in the existing gesture recognition technology can be solved.

Description

Gesture recognition method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a gesture recognition method and apparatus, an electronic device, and a storage medium.

Background

With the continuous development of image processing technology, the application range of the technology is also expanded continuously. Among them, image recognition based on image processing techniques is applied in many occasions, such as gesture recognition. The inventor researches and discovers that in the existing gesture recognition technology, the problem of low accuracy of a recognition result exists.

Disclosure of Invention

In view of the foregoing, an object of the present application is to provide a gesture recognition method and apparatus, an electronic device, and a storage medium, so as to improve the problem that the accuracy of the recognition result is low in the existing gesture recognition technology.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:

a gesture recognition method, comprising:

obtaining a target video stream, wherein the target video stream comprises at least one frame of target image;

respectively carrying out first gesture matching and second gesture matching on the target video stream based on a preset video template, wherein the video template comprises at least one frame of template image;

and obtaining a recognition result of the gesture information in the target video stream based on the first matching result of the first gesture matching and the second matching result of the second gesture matching.

In a preferred option of the embodiment of the present application, in the gesture recognition method, the step of performing first gesture matching and second gesture matching on the target video stream based on a preset video template includes:

for each frame of target image of the target video stream, respectively carrying out matching processing on the frame of target image and each frame of template image of a preset video template to obtain a first matching result;

and carrying out time regularization processing on the target video stream and the video template, and carrying out matching processing on each frame of target image of the target video stream and a corresponding frame of template image in the video template based on the result of the time regularization processing to obtain a second matching result.

In a preferred option of the embodiment of the present application, in the gesture recognition method, the video templates are plural, and the step of obtaining the recognition result of the gesture information in the target video stream based on the first matching result of the first gesture matching and the second matching result of the second gesture matching includes:

determining at least two video templates from a plurality of the video templates based on a first matching result of the first gesture matching and a second matching result of the second gesture matching;

and determining a target video template in the at least two video templates based on the motion trail information in the target video stream, and taking the gesture information of the target video template as the recognition result of the gesture information in the target video stream.

In a preferred option of the embodiment of the present application, in the gesture recognition method, the determining, based on a first matching result of the first gesture matching and a second matching result of the second gesture matching, at least two video templates among a plurality of video templates;

calculating a weighted average of a first matching result and a second matching result included in each group of matching results, wherein the group of matching results are matching results obtained by respectively performing first gesture matching and second gesture matching on one video template and the target video stream;

at least two video templates are determined among the plurality of video templates based on the magnitude relation of each of the weighted averages.

In a preferred option of the embodiment of the present application, in the gesture recognition method, the step of determining a target video template from the at least two video templates based on the motion trail information in the target video stream includes:

acquiring centroid position information of each frame of target image in the target video stream, and determining target motion trail information of the target video stream based on the centroid position information;

and comparing the target motion trail information with the template motion trail information of the at least two video templates respectively, and determining a target video template in the at least two video templates based on the comparison result.

In a preferred option of the embodiment of the present application, in the gesture recognition method, the step of obtaining the target video stream includes:

acquiring at least one frame of gesture image obtained by shooting a target object;

and respectively carrying out foreground image extraction processing on each frame of gesture image to obtain a target image of each frame of gesture image, and forming a target video stream comprising at least one frame of target image.

In a preferred option of the embodiment of the present application, in the gesture recognition method, the step of performing foreground image extraction processing on each frame of gesture image includes:

for each frame posture image, carrying out graying treatment on the frame posture image to obtain a gray level image of the frame posture image;

determining a target threshold value of each frame of gray level image based on a maximum inter-class variance algorithm, and performing binarization processing on each frame of gray level image based on the target threshold value to obtain a binary image of the frame of gray level image;

and carrying out foreground image extraction processing on each frame of binary image to obtain a target image of the frame of binary image.

The embodiment of the application also provides a gesture recognition device, which comprises:

the video stream acquisition module is used for acquiring a target video stream, wherein the target video stream comprises at least one frame of target image;

the matching processing module is used for respectively carrying out first gesture matching and second gesture matching on the target video stream based on a preset video template, wherein the video template comprises at least one frame of template image;

the recognition result obtaining module is used for obtaining a recognition result of gesture information in the target video stream based on the first matching result of the first gesture matching and the second matching result of the second gesture matching.

On the basis of the above, the embodiment of the application also provides an electronic device, which comprises:

a memory for storing a computer program;

and the processor is connected with the memory and is used for executing the computer program to realize the gesture recognition method.

On the basis of the above, the embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed implements the gesture recognition method described above.

According to the gesture recognition method and device, the electronic equipment and the storage medium, the first gesture matching and the second gesture matching are respectively carried out on the target video stream to obtain the first matching result and the second matching result, and then the recognition result of the gesture information in the target video stream is obtained based on the first matching result and the second matching result. Therefore, the recognition result is obtained based on the matching result of the two gestures, so that the recognition result is obtained based on the matching result, and a more sufficient basis is provided, and the problem that the accuracy of the recognition result is lower due to single gesture matching in the existing gesture recognition technology is solved.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

Fig. 1 is a block schematic diagram of an electronic device according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating steps included in a gesture recognition method according to an embodiment of the present application.

Fig. 3 is a flow chart illustrating the sub-steps included in step S110 in fig. 2.

Fig. 4 is a flow chart illustrating the sub-steps included in step S120 in fig. 2.

Fig. 5 is a schematic diagram illustrating an effect of time normalization processing according to an embodiment of the present application.

Fig. 6 is a flow chart illustrating the sub-steps included in step S130 in fig. 2.

Fig. 7 is a schematic diagram of an effect of determining motion trail information based on centroid position information according to an embodiment of the present application.

Fig. 8 is a block schematic diagram of each functional module included in the gesture recognition apparatus provided in the embodiment of the present application.

Icon: 10-an electronic device; 12-memory; 14-a processor; 100-gesture recognition means; 110-a video stream acquisition module; 120-a matching processing module; 130-a recognition result obtaining module.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

As shown in fig. 1, an embodiment of the present application provides an electronic device 10 that may include a memory 12 and a processor 14, where a gesture recognition apparatus 100 may be disposed within the memory 12.

Wherein, the memory 12 and the processor 14 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, electrical connection may be made to each other via one or more communication buses or signal lines. The gesture recognition apparatus 100 comprises at least one software functional module which may be stored in the memory 12 in the form of software or firmware (firmware). The processor 14 is configured to execute executable computer programs stored in the memory 12, for example, software functional modules and computer programs included in the gesture recognition apparatus 100, so as to implement the gesture recognition method provided in the embodiment of the present application.

Alternatively, the Memory 12 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

The processor 14 may be a general purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a System on Chip (SoC), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

It will be appreciated that the configuration shown in fig. 1 is merely illustrative, and that the electronic device 10 may also include more or fewer components than shown in fig. 1, or may have a different configuration than shown in fig. 1, for example, may also include a communication unit for information interaction with other devices.

The specific type of the electronic device 10 is not limited, and may be selected according to practical application requirements, so long as the electronic device has a certain data processing capability.

For example, in an alternative example, the electronic device 10 may include, but is not limited to, a mobile phone, a computer, a tablet, a personal computer, and other terminal devices, and a server.

Referring to fig. 2, an embodiment of the present application further provides a gesture recognition method applicable to the electronic device 10. Wherein the method steps defined by the flow of the gesture recognition method may be implemented by said electronic device 10. The specific flow shown in fig. 2 will be described in detail.

Step S110, a target video stream is acquired.

In this embodiment, the target video stream to be identified may be acquired first. Wherein the target video stream may include at least one frame of target image. That is, the target video stream may include one frame of target image or may include multiple frames of target images.

Step S120, performing a first gesture matching and a second gesture matching on the target video stream based on a preset video template.

In this embodiment, after the target video stream is acquired in step S110, the first gesture matching and the second gesture matching may be performed on the target video stream based on a preset video template, respectively. For example, a first gesture may be performed on the video module and the target video stream to obtain a first matching result, and a second gesture may be performed on the video module and the target video stream to obtain a second gesture.

Wherein the video template may include at least one frame of template image. That is, the video template may include one frame of template image or a plurality of frames of template image.

Step S130, obtaining a recognition result of the gesture information in the target video stream based on the first matching result of the first gesture matching and the second matching result of the second gesture matching.

In this embodiment, after the first gesture matching and the second gesture matching are performed based on step S120, the recognition result of the gesture information in the target video stream may be obtained based on the obtained first matching result and second matching result.

Based on the method, the recognition result is obtained based on the two gesture matching (the first gesture matching and the second gesture matching), so that the recognition result is obtained based on the matching result (the first matching result and the second matching result) and has more sufficient basis, the problem that the accuracy of the recognition result is lower due to single gesture matching in the existing gesture recognition technology is solved, and some other processes based on the recognition result can be ensured to have higher reliability.

In the first aspect, it should be noted that, in step S110, a specific manner of acquiring the target video stream is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, at least one frame of gesture image obtained by photographing a target object may be directly taken as the target video stream.

For another example, in another alternative example, in order to improve reliability of gesture recognition or matching, at least one frame of gesture image obtained by photographing a target object may be preprocessed, thereby obtaining the target video stream. Based on this, in connection with fig. 3, step S110 may include step S111 and step S113, the details of which are as follows.

Step S111, at least one frame of pose image obtained by photographing the target object is acquired.

In this embodiment, the target object may be photographed by an image capturing device (such as a webcam) communicatively connected to the electronic device 10 or an image capturing device carried by the electronic device 10 itself (such as a camera carried by a mobile phone), and then at least one frame of pose image obtained by photographing the target object is obtained from the image capturing device.

Wherein, the gesture image comprises gesture information to be recognized, such as gesture information and the like.

Step S113, foreground image extraction processing is respectively carried out on each frame of gesture image, so as to obtain a target image of each frame of gesture image, and a target video stream comprising at least one frame of target image is formed.

In this embodiment, after the gesture image is acquired based on step S111, it is considered that the gesture image may have other information (such as background information of the target object) in addition to gesture information to be recognized, so that a foreground image extraction process is required for each frame of gesture image to obtain a foreground image in each frame of gesture image, so as to obtain a target image of each frame of gesture image, and further form a target video stream including at least one frame of target image.

Alternatively, the specific manner of acquiring the gesture image based on step S111 is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, after receiving an analog signal carrying pose information of a target object transmitted by an image acquisition device, the analog signal may be converted into a digital signal, thereby obtaining the pose image.

For another example, after receiving an analog signal carrying pose information of a target object sent by an image acquisition device, the analog signal may be converted into a digital signal, and then the digital signal may be stored based on BMP (Bitmap) format, so as to obtain the pose image.

Alternatively, the specific manner of performing the foreground image extraction process to obtain the target image based on step S113 is not limited, and may be selected according to the actual application requirements.

For example, in an alternative example, the obtained gesture image may be subjected to gray-scale processing, then subjected to binarization processing based on a preset threshold value, and then subjected to foreground image processing, thereby obtaining the target image.

For another example, in another alternative example, in order to improve the accuracy of performing foreground image processing, step S113 may include the following sub-steps to obtain a target image:

first, for each frame posture image, the frame posture image may be subjected to gradation processing to obtain a gradation image of the frame posture image. And secondly, determining a target threshold value of each frame gray level image based on a maximum inter-class variance algorithm, and carrying out binarization processing on each frame gray level image based on the target threshold value to obtain a binary image of the frame gray level image. Then, for each frame of binary image, foreground image extraction processing is carried out on the frame of binary image, and a target image of the frame of binary image is obtained.

That is, in the present embodiment, the obtained posture image may be subjected to the gradation processing first, then subjected to the binarization processing based on the target threshold value determined by the maximum inter-class variance (Otsu) algorithm, and then subjected to the foreground image processing, thereby obtaining the target image. Based on the method, the target threshold value determined based on the maximum inter-class variance algorithm is adopted, so that on one hand, the efficiency of binarization processing can be improved, and on the other hand, the result of binarization processing can be ensured to have higher accuracy.

In the second aspect, it should be noted that, in step S120, a specific manner of performing the first gesture matching and the second gesture matching is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, in order to make the obtained first matching result and the second matching result have a higher reference effect, two gesture matching manners with a larger difference need to be set, and in conjunction with fig. 4, step S120 may include step S121 and step S123, which are described in detail below.

Step S121, for each frame of target image of the target video stream, performing matching processing on the frame of target image and each frame of template image of the preset video template, so as to obtain a first matching result.

In this embodiment, after the target video stream is obtained in step S110, for each frame of target image in the target video stream, the frame of target image and each frame of template image of a preset video template may be respectively subjected to matching processing, so as to obtain a first matching result.

For example, in one specific application example, the target video stream includes 4 frames of target images, image a, image B, image C, and image D, respectively, and the video template includes 7 frames of template images, image 1, image 2, image 3, image 4, image 5, image 6, and image 7, respectively. In this way, image a can be matched with image 1, image 2, image 3, image 4, image 5, image 6, and image 7, respectively, image B can be matched with image 1, image 2, image 3, image 4, image 5, image 6, and image 7, respectively, and image D can be matched with image 1, image 2, image 3, image 4, image 5, image 6, and image 7, respectively. Thus, a first matching result of the target video stream and the template video can be obtained based on the aforementioned 28 matching results.

That is, in the above example, if the target video stream includes N frames of target images, the video template includes M frames of template images, and image matching needs to be performed n×m times. The matching of the n×m images may be performed in a certain sequence, or may be performed together, and may be performed according to the actual application requirement, for example, the processing capability of the electronic device 10 applied in the gesture recognition method may be configured accordingly.

Step S123, performing time normalization processing on the target video stream and the video template, and performing matching processing on each frame of target image of the target video stream and a corresponding frame of template image in the video template based on the result of the time normalization processing, so as to obtain a second matching result.

In this embodiment, after the target video stream is acquired based on step S110, considering that the time lengths of the target video stream and the video template are generally different, the target video stream and the video template may be first subjected to time normalization processing so that the time lengths of the target video stream and the video template are the same, and then each frame of target image and the corresponding one frame of template image are subjected to matching processing based on time information.

For example, in a specific application example, in conjunction with fig. 5, the target video stream includes 4 frames of target images, namely image a, image B, image C, and image D, and the video template includes 7 frames of template images, namely image 1, image 2, image 3, image 4, image 5, image 6, and image 7. Therefore, the time length of the video template is larger than that of the target video stream, so that the time length of the video template can be compressed first, and the time length of the compressed video template is equal to that of the target video stream.

And after the time length of the video template is compressed, the matching processing can be performed on the target image of each frame and the corresponding template image of one frame based on the corresponding relation in time. In this way, image a can be made to match image 1, image B can be made to match image 3, image C can be made to match image 5, and image D can be made to match image 7. Thus, a second matching result of the target video stream and the template video can be obtained based on the obtained 7 matching results.

It will be appreciated that in the above example, the time length of the video template is compressed when the time length of the video template is greater than the time length of the target video stream. In another example, the time length of the target video stream may also be elongated, such that the elongated target video stream has the same time length as the video template. And the time length of the video template can be compressed, and the time length of the target video stream can be lengthened, so that the compressed video template and the lengthened target video stream have the same time length. In this embodiment, the selection is performed according to the actual application requirements without specific limitation.

The specific sequence of the step S121 and the step S123 is not limited, and may be selected according to the actual application requirement.

For example, in an alternative example, step S121 may be performed first for the first gesture matching, and then step S123 may be performed for the second gesture matching. For another example, in another alternative example, step S123 may be performed first for the second gesture matching, and then step S121 may be performed for the first gesture matching. For another example, in another alternative example, step S121 and step S123 may also be performed simultaneously to perform the first gesture matching and the second gesture matching.

Also, since the template video may be generally plural and each template video has different pose information, after the target video stream is obtained based on step S110, step S120 described above may be performed plural times to perform the first pose matching and the second pose matching for each of the template videos, respectively, with respect to the target video stream.

For example, if the number of the template videos is 4, namely, video a, video B, video C and video D, the first gesture matching and the second gesture matching may be performed on the video a and the target video stream, so as to obtain a first set of matching results. And respectively performing first gesture matching and second gesture matching on the video B and the target video stream to obtain a second group of matching results. And respectively performing first gesture matching and second gesture matching on the video C and the target video stream to obtain a third group of matching results. And respectively carrying out first gesture matching and second gesture matching on the video D and the target video stream to obtain a fourth group of matching results.

In the third aspect, it should be noted that, in step S130, a specific manner of obtaining the identification result based on the first matching result and the second matching result is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, if the video template is one, it may be determined whether the first matching result and the second matching result satisfy a certain condition, and if the condition is satisfied, identification information (for example, for gesture information of "OK", the identification information may be "determined", for header information of "nod", the identification information may also be "determined") generated for gesture information of the video template may be used as the gesture information of the target video stream.

In contrast, the first matching result and the second matching result do not meet a certain condition, and the recognition result of the gesture information in the target video stream is that the target video stream does not have the gesture information of the video template.

For another alternative example, if the video templates are plural, and in order to achieve dynamic gesture recognition, in conjunction with fig. 6, step S130 may include step S131 and step S133, which are described in detail below.

Step S131, determining at least two video templates from the plurality of video templates based on the first matching result of the first gesture matching and the second matching result of the second gesture matching.

In this embodiment, after the first matching result and the second matching result are obtained in step S120, at least two video templates may be determined from the plurality of video templates.

For example, for the 4 template videos in the foregoing example, if the video a and the video B are gesture information, the video C and the video D are head information, and the video a and the video B are all palm-motion, the directions of motion in the two videos are different, such as a left-right motion and an up-down motion. In this way, if the gesture information in the target video stream is also gesture information, the two sets of matching results obtained in step S120 for video a and video B will be the same, so as to determine video a and video B from video a, video B, video C, and video D.

And step S133, determining a target video template in the at least two video templates based on the motion trail information in the target video stream, and taking the gesture information of the target video template as the recognition result of the gesture information in the target video stream.

In this embodiment, after determining at least two video templates (such as the aforementioned video a and video B) based on step S131, in order to determine one target video template among the at least two video templates, in order to take pose information of the target video template as a recognition result of the pose information in the target video stream, one target may be determined among the at least two video templates based on the motion trajectory information in the target video stream in consideration of the difference in motion trajectories (such as the motion direction of the palm) of the target objects in the at least two videos.

For example, based on the foregoing example, if the palm in the target video stream is moving left and right, the palm in video a is moving up and down, and the palm in video B is also moving left and right. Accordingly, the video B may be determined as the target video template based on the motion trajectory information.

Optionally, the specific manner of executing step S131 to determine at least two video templates based on the first matching result and the second matching result is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, the first matching result and the second matching result may be directly added to obtain the target matching result. In this way, for a plurality of sets of matching results obtained for a plurality of video templates, a plurality of target matching results can be obtained, and then at least two video templates are determined among a plurality of the video templates based on the magnitude relation of the plurality of target matching results.

As another example, in another alternative example, step S131 may include the substeps of:

first, for each set of matching results, a weighted average of the first matching result and the second matching result included in the set of matching results may be calculated. Second, at least two video templates may be determined among the plurality of video templates based on the magnitude relation of each of the weighted averages.

For example, in combination with the foregoing example, the plurality of video templates includes video a, video B, video C, and video D, wherein a first set of matching results between video a and the target video stream includes results A1 and results A2, a second set of matching results between video B and the target video stream includes results B1 and results B2, a third set of matching results between video C and the target video stream includes results C1 and results C2, and a fourth set of matching results between video D and the target video stream includes results D1 and results D2.

In this way, the weighted average of the result A1 and the result A2 can be calculated based on the weight coefficient allocated to the first matching result in advance and the weight coefficient allocated to the second matching result, so as to obtain the average 1; calculating a weighted average value of the result B1 and the result B2 to obtain an average value 2; calculating a weighted average value of the result C1 and the result C2 to obtain an average value 3; and calculating a result D1 and a result D2 to obtain a mean value 4. Then, at least two video templates may be determined among video a, video B, video C, and video D based on the magnitude relation of the mean 1, the mean 2, the mean 3, and the mean 4.

Alternatively, the specific manner of performing step S133 to determine the target video template based on the motion trail information is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, in order to improve the efficiency of determining the target video template, the motion direction may be directly used as the trajectory information, so that the target video template is determined based on the motion direction.

For another example, in another alternative example, to improve the accuracy of determining the target video template, step S133 may include the sub-steps of:

first, centroid position information of each frame of target image in the target video stream can be acquired, and target motion trail information of the target video stream can be determined based on the centroid position information. And secondly, comparing the target motion trail information with the template motion trail information of the at least two video templates respectively, and determining a target video template in the at least two video templates based on the comparison result.

For example, based on the foregoing example, the target video stream includes 4 frames of target images, image a, image B, image C, and image D, respectively. Referring to fig. 7, the centroid position is X in image a, Y in image B, Z in image C, and W in image D.

In this way, curve fitting can be performed based on the above 4 positions (X, Y, Z and W), so as to obtain the target motion trajectory information of the target video stream. And then, comparing the target moving track information with the template moving track information of the at least two video templates respectively, so as to determine a target video template in the at least two video templates.

When the curve fitting is performed, a least bisection method can be adopted, so that the fitting efficiency is higher, and the gesture recognition method improved in the embodiment of the application is guaranteed to have higher recognition efficiency. Moreover, the mode of acquiring the template motion trail information may be the same as the mode of acquiring the target motion trail information, for example, the mode may also be based on the change of the centroid position, which is not described here again.

Referring to fig. 8, an embodiment of the present application further provides a gesture recognition apparatus 100 applicable to the electronic device 10 described above. The gesture recognition apparatus 100 may include a video stream acquisition module 110, a matching processing module 120, and a recognition result obtaining module 130.

The video stream obtaining module 110 is configured to obtain a target video stream, where the target video stream includes at least one frame of target image. In this embodiment, the video stream obtaining module 110 may be configured to perform step S110 shown in fig. 2, and the description of step S110 may be referred to as to the relevant content of the video stream obtaining module 110.

The matching processing module 120 is configured to perform first pose matching and second pose matching on the target video stream based on a preset video template, where the video template includes at least one frame of template image. In this embodiment, the matching process module 120 may be used to perform step S120 shown in fig. 2, and the description of step S120 may be referred to above with respect to the relevant content of the matching process module 120.

The recognition result obtaining module 130 is configured to obtain a recognition result of gesture information in the target video stream based on the first matching result of the first gesture matching and the second matching result of the second gesture matching. In this embodiment, the recognition result obtaining module 130 may be used to perform step S130 shown in fig. 2, and the description of step S130 may be referred to above with respect to the relevant content of the recognition result obtaining module 130.

In an embodiment of the present application, corresponding to the above gesture recognition method, there is also provided a computer-readable storage medium having a computer program stored therein, which when executed performs the steps of the above gesture recognition method.

The steps executed when the computer program runs are not described in detail herein, and reference may be made to the explanation of the gesture recognition method.

In summary, according to the gesture recognition method and device, the electronic device and the storage medium, the first gesture matching and the second gesture matching are respectively performed on the target video stream to obtain a first matching result and a second matching result, and then the recognition result of the gesture information in the target video stream is obtained based on the first matching result and the second matching result. Therefore, the recognition result is obtained based on the matching result of the two gestures, so that the recognition result is obtained based on the matching result, and a more sufficient basis is provided, and the problem that the accuracy of the recognition result is lower due to single gesture matching in the existing gesture recognition technology is solved.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A gesture recognition method, comprising:

based on a first matching result of the first gesture matching and a second matching result of the second gesture matching, obtaining a recognition result of gesture information in the target video stream; the step of performing first gesture matching and second gesture matching on the target video stream based on a preset video template comprises the following steps:

performing time regularization processing on the target video stream and the video template, and performing matching processing on each frame of target image of the target video stream and a corresponding frame of template image in the video template based on a result of the time regularization processing to obtain a second matching result; the step of obtaining the recognition result of the gesture information in the target video stream based on the first matching result of the first gesture matching and the second matching result of the second gesture matching includes:

2. The gesture recognition method of claim 1, wherein the determining at least two video templates among the plurality of video templates is based on a first matching result of the first gesture matching and a second matching result of the second gesture matching;

3. The gesture recognition method of claim 1, wherein the step of determining a target video template among the at least two video templates based on the motion trail information in the target video stream comprises:

4. The gesture recognition method of claim 1, wherein the step of acquiring the target video stream comprises:

5. The gesture recognition method of claim 4, wherein the step of performing the foreground image extraction process on each frame of the gesture image, respectively, comprises:

6. A gesture recognition apparatus, comprising:

the recognition result obtaining module is used for obtaining a recognition result of gesture information in the target video stream based on the first matching result of the first gesture matching and the second matching result of the second gesture matching;

the matching processing module is specifically configured to:

performing time regularization processing on the target video stream and the video template, and performing matching processing on each frame of target image of the target video stream and a corresponding frame of template image in the video template based on a result of the time regularization processing to obtain a second matching result;

the identification result obtaining module is specifically configured to:

7. An electronic device, comprising:

a memory for storing a computer program;

a processor coupled to the memory for executing the computer program to implement the gesture recognition method of any one of claims 1-5.

8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed, implements the gesture recognition method of any one of claims 1-5.