CN116403288A

CN116403288A - Motion gesture recognition method and device and electronic equipment

Info

Publication number: CN116403288A
Application number: CN202310478121.6A
Authority: CN
Inventors: 彭情; 黄伟红; 黄佳; 李靖; 高武强; 刘冠宇; 吴瑞文; 于永福; 吴邑岑; 刘硕
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-07-07

Abstract

The embodiment of the invention provides a motion gesture recognition method, a motion gesture recognition device and electronic equipment. The method comprises the following steps: and acquiring images of a plurality of persons at a plurality of view angles at the same moment, identifying the boundary box of each person in the images, matching each image according to the boundary box, checking according to the matching result, and identifying and determining the gesture information of each person after the boundary box passes the checking. According to the process, the multiple images of multiple persons under multiple visual angles are obtained, so that the boundary box of each person in the images is identified, the gesture information of each person is determined through the boundary box, and the accuracy of gesture identification under the multiple visual angles of multiple persons is improved.

Description

Motion gesture recognition method and device and electronic equipment

Technical Field

The invention relates to the technical field of image recognition, in particular to a recognition method and device for motion gestures and electronic equipment.

Background

In the field of single-view motion recognition, the common point of the existing common algorithms is to perform motion recognition based on input such as pictures or videos of a single view, and process image or video frame information by using a neural network model so as to obtain key point positions to perform a motion recognition task. The method has good performance under the condition that the whole human body is not blocked, but has poor recognition effect under the condition that human body information in an image is lost due to human body blocking.

In the field of multi-view motion recognition, the multi-view motion recognition of a single person is mainly related, compared with the multi-person, the single person image information extraction and processing are simple, the single person multi-view information only needs to perform the works of feature extraction, fusion and the like, but the multi-person recognition effect is poor.

Therefore, single-view motion recognition or multi-view motion recognition has poor motion recognition effect in a relatively special case, such as motion shielding or the presence of multiple persons.

Disclosure of Invention

Based on this, the first aspect of the present invention provides a method for recognizing a motion gesture, which improves accuracy of gesture recognition, and the method includes:

acquiring images of multiple visual angles of multiple persons at the same moment;

identifying a bounding box of each person in each image based on a preset target detection algorithm;

selecting images of any two visual angles, and matching the boundary frames of the same person in the two images to obtain a matching score;

constructing a matching score matrix of the same person under a plurality of view angles according to the matching scores, wherein the rows and columns of the matching score matrix are boundary boxes in the image;

performing matching verification on a plurality of boundary boxes of each person under a plurality of view angles according to the matching score matrix;

and after the multiple bounding boxes of each person pass the verification, determining the gesture information of each person according to each bounding box.

In the embodiment of the invention, images of any two visual angles are selected, and the boundary box of the same person in the two images is matched to obtain a matching score, which comprises the following steps:

convolving the boundary frame of each person in the two images by using a first preset neural network to obtain the appearance characteristics of each person;

calculating appearance matching scores of each person according to appearance characteristics of each person;

identifying key points in each bounding box by using a second preset neural network;

calculating an action matching score according to the relative positions of the visual angles corresponding to any two images and the coordinates of each key point;

and calculating the product of the action matching score and the appearance matching score to obtain the matching score.

In an embodiment of the present invention, calculating an appearance matching score of each person according to appearance characteristics of each person includes:

calculating similarity scores of the two images according to the appearance characteristics;

and normalizing the similarity score to obtain an appearance matching score.

In the embodiment of the invention, calculating the action matching score according to the relative positions of the viewing angles corresponding to any two images and the coordinates of each key point comprises the following steps:

constructing a corresponding transformation matrix according to the relative positions of the view angles corresponding to any two images;

mapping the key point coordinates of the same person in a first image in any two images to a second image by utilizing a transformation matrix to obtain mapped coordinates;

calculating the distance between each key point coordinate and the mapping coordinate in the second image;

and normalizing the sum of the distances to obtain an action matching score.

In the embodiment of the invention, the matching verification of a plurality of bounding boxes of each person under a plurality of view angles according to the matching score matrix comprises the following steps:

and executing a first circulation step aiming at the images of any two visual angles until all values in the matching score matrix are first preset values to obtain a matching result of the images of any two visual angles, wherein the first circulation step comprises the following steps:

searching a matching score maximum value in the matching score matrix, obtaining a matching result according to the row and the column where the matching score maximum value is located, determining a matching boundary box in the image, and setting all the values of the row and the column where the matching score maximum value is located as a first preset value;

selecting the image with the largest number of bounding boxes as a target image;

classifying the boundary boxes of the same person in all images except the target image into a boundary box set according to the matching result and the target image;

the loop consistency is checked against a set of bounding boxes of the same person.

In an embodiment of the present invention, verifying loop consistency from a set of bounding boxes of the same person includes:

dividing the boundary frames in the boundary frame set into a plurality of groups according to the boundary frames of the same person in the target image by using a second preset value;

in the case that the bounding boxes in each panel match each other, it is determined that the bounding boxes of the same person meet the loop consistency.

In the embodiment of the invention, after a plurality of bounding boxes of each person pass verification, determining pose information of each person according to each bounding box comprises the following steps:

when the multiple bounding boxes of each person pass the verification, extracting the image characteristics of each image by using a third preset neural network model;

and inputting the bounding box of each person in the plurality of images into a fourth preset neural network model to obtain the gesture information of each person.

A second aspect of the present invention provides a motion gesture recognition apparatus, including:

the image acquisition module is used for acquiring images of multiple visual angles of multiple persons at the same moment;

the identification module is used for identifying the boundary box of each person in each image based on a preset target detection algorithm;

the matching score determining module is used for selecting any two images at any two visual angles, and matching the boundary boxes of the same person in the two images to obtain a matching score;

the matching score matrix determining module is used for constructing a matching score matrix of the same person under a plurality of view angles according to the matching score, wherein the rows and columns of the matching score matrix are boundary boxes in the image;

the matching verification module is used for carrying out matching verification on a plurality of boundary boxes of each person under a plurality of view angles according to the matching score matrix;

and the gesture information determining module is used for determining gesture information of each person according to each boundary box after the boundary boxes of each person pass verification.

A third aspect of the present invention provides an electronic device comprising a processor and a memory storing computer executable instructions executable by the processor to implement the method of identifying a motion gesture of any of the above.

A fourth aspect of the invention provides a machine-readable storage medium having instructions stored thereon which, when executed by a processor, implement a method of recognizing a motion gesture as described in any one of the above.

Through the technical scheme, images of multiple visual angles of multiple persons at the same moment are acquired; identifying a bounding box of each person in each image based on a preset target detection algorithm; selecting images of any two visual angles, and matching the boundary frames of the same person in the two images to obtain a matching score; constructing a matching score matrix of the same person under a plurality of view angles according to the matching scores, wherein the rows and columns of the matching score matrix are boundary boxes in the image; performing matching verification on a plurality of boundary boxes of each person under a plurality of view angles according to the matching score matrix; and after the multiple bounding boxes of each person pass the verification, determining the gesture information of each person according to each bounding box. According to the process, the multiple images of the multiple persons at the multiple view angles are obtained, the boundary box of each person in the images is identified according to the multiple images, the gesture information of each person is determined through the boundary box, and the accuracy of gesture identification at the multiple view angles of the multiple persons is improved.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:

fig. 1 is a schematic flow chart of a method for identifying a motion gesture according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a motion gesture recognition device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following describes the detailed implementation of the embodiments of the present invention with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

Based on this, the present invention provides a method for identifying a motion gesture, and fig. 1 is a schematic flow chart of a method for identifying a motion gesture provided by an embodiment of the present invention, as shown in fig. 1, where the method includes:

step S101: images of multiple perspectives including multiple persons at the same time are acquired.

In practical application, the cameras are respectively arranged at 8 positions including the front, the rear, the left, the right, the left front, the right front, the left rear and the right rear of the crowd of multiple people.

In practical application, images of multiple viewing angles of multiple persons at the same time are shot by cameras at various positions. If the video of a plurality of persons is shot through the camera, the images of each frame or frames with a certain interval in the video are extracted.

Step S102: the bounding box of each person in each image is identified based on a preset object detection algorithm.

In practical application, the preset target detection algorithm may be an SSD (Single Shot MultiBox Detector, single-time polygonal frame detection) target detection algorithm, and the images with multiple angles of view are input to the SSD target detection algorithm to obtain a bounding box of each person in the multiple images, where the bounding box is a 2D bounding box, and the 2D bounding box is a part of the originally input image and is a rectangular frame, and the rectangular frame shows all body parts of the person.

Step S103: and selecting images of any two visual angles, and matching the boundary boxes of the same person in the two images to obtain a matching score.

In practical application, images corresponding to two visual angles are selected at will, and boundary boxes belonging to the same person in the two images are matched to obtain matching scores of the boundary boxes of the same person.

Step S104: and constructing a matching score matrix of the same person under a plurality of view angles according to the matching scores, wherein the rows and columns of the matching score matrix are boundary boxes in the image.

In practical application, the matching score matrix is composed of matching scores, all bounding boxes in one view angle of a plurality of view angles are arranged in a row of the matching score matrix, all bounding boxes in another view angle of the plurality of view angles are arranged in a column of the matching matrix, and matrix values are the matching scores between different bounding boxes corresponding to the row and the column.

Step S105: and carrying out matching verification on a plurality of bounding boxes of each person under a plurality of view angles according to the matching score matrix.

In practical application, the matching score matrix is utilized to perform matching verification on the matching result of the bounding box of each person under a plurality of view angles.

In practical applications, bounding boxes belonging to the same human body in different view angles should have the same matching result in another view angle. The matching result is subjected to matching verification, so that the accuracy of the matching result can be ensured. If the matching verification fails, the rest bounding boxes are matched again by combining the relative positions of the visual angles until all bounding boxes of the same person pass the matching verification.

Step S106: and after the multiple bounding boxes of each person pass the verification, determining the gesture information of each person according to each bounding box.

In practical application, after the multiple bounding boxes of each person pass verification, determining the posture information of each person according to the bounding box of each person, wherein the posture information is 3D posture information.

Through the above embodiment, images of multiple view angles including multiple persons at the same time are acquired; identifying a bounding box of each person in each image based on a preset target detection algorithm; selecting images of any two visual angles, and matching the boundary frames of the same person in the two images to obtain a matching score; constructing a matching score matrix of the same person under a plurality of view angles according to the matching scores, wherein the rows and columns of the matching score matrix are boundary boxes in the image; performing matching verification on a plurality of boundary boxes of each person under a plurality of view angles according to the matching score matrix; and after the multiple bounding boxes of each person pass the verification, determining the gesture information of each person according to each bounding box. According to the process, the multiple images of the multiple persons at the multiple view angles are obtained, the boundary box of each person in the images is identified according to the multiple images, the gesture information of each person is determined through the boundary box, and the accuracy of gesture identification at the multiple view angles of the multiple persons is improved.

In one embodiment, step S103 includes:

In practical applications, the matching score is calculated from the appearance matching score and the action matching score. The first preset neural network may be a convolutional neural network (Convolutional Neural Networks, CNN), and convolve a bounding box of each person in the image corresponding to any two view angles to obtain appearance characteristics of each person. And calculating the appearance characteristics of each person to obtain the appearance matching score of each person.

In practical applications, the second preset neural network may also be a convolutional neural network, and the key points of the person in each bounding box are identified through the convolutional neural network, where the key points include, but are not limited to, the weight points of the person such as nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right buttocks, left and right knees, and left and right ankles.

In practical application, according to the positions of a plurality of visual angles, the relative positions of images corresponding to any two visual angles are calculated. And acquiring the coordinates of key points of the same person in the two images. And calculating the action matching score of the corresponding person according to the relative position of the image and the coordinates of the key points. The match score is the product of the appearance match score and the action match score.

Through the embodiment, the first preset neural network is utilized to convolve the boundary frame of each person in the two images to obtain the appearance characteristics of each person; calculating appearance matching scores of each person according to appearance characteristics of each person; identifying key points in each bounding box by using a second preset neural network; calculating an action matching score according to the relative positions of the visual angles corresponding to any two images and the coordinates of each key point; and calculating the product of the action matching score and the appearance matching score to obtain the matching score. The process obtains the appearance matching score by convolving the boundary box of the same person in the two images, and further calculates the action matching score by identifying key points in the boundary box, and determines the matching score of the boundary box of the same person in the two images according to the action matching score and the appearance matching score, thereby ensuring the accuracy of boundary box matching.

In one embodiment, calculating the appearance matching score for each person based on the appearance characteristics of each person includes:

and normalizing the similarity score to obtain an appearance matching score.

In practical application, similarity scores of the two images belonging to the same person are calculated according to the appearance characteristics. And normalizing the obtained similarity score to obtain an appearance matching score.

In practical applications, a Sigmoid function may be used to map similarity scores to intervals [0,1] ^] And obtaining the final appearance matching score. If the two images are identical, the appearance matching score is 1; if the two images are completely different, the appearance matching score is 0.

With the above embodiment, the similarity score of the two images is calculated from the appearance characteristics; and normalizing the similarity score to obtain an appearance matching score. According to the process, the similarity score of the two images belonging to the same person is calculated according to the appearance characteristics, the similarity score is normalized to obtain appearance matching score, the appearance matching score is determined by using the similarity score, and the accuracy of calculating the appearance matching score is improved.

In an embodiment, calculating the action matching score according to the relative positions of the viewing angles corresponding to any two images and the coordinates of each key point includes:

and normalizing the sum of the distances to obtain an action matching score.

In practical application, the relative positions of the viewing angles corresponding to the two images are calculated respectively, and the transformation matrix of the two images is constructed according to the relative positions. And mapping the key point coordinates of the same person in the first image in the two images into the second image by utilizing a corresponding transformation matrix to obtain mapped coordinates.

In practical application, distances between a plurality of key point coordinates and a plurality of mapping coordinates of the same person in the second image are calculated respectively. And calculating the sum value of all the key point coordinates and the mapping coordinate distances of the same person, and carrying out normalization processing on the sum value to obtain the action matching score. Specifically, the Sigmoid function distance and value may be mapped into interval [0,1] to obtain an action matching score.

Through the embodiment, a corresponding transformation matrix is constructed according to the relative positions of the viewing angles corresponding to any two images; mapping the key point coordinates of the same person in a first image in any two images to a second image by utilizing a transformation matrix to obtain mapped coordinates; calculating the distance between each key point coordinate and the mapping coordinate in the second image; and normalizing the sum of the distances to obtain an action matching score. The process determines a change matrix based on the relative positions of the target image and other images, maps the key point coordinates of the same person in the other images into the target image according to the change matrix, calculates the distance between the key point coordinates of the same person in the target image and the mapped coordinates, determines an action matching score according to the distance and the value, and improves the accuracy of action matching.

In one embodiment, step S105 includes:

In practical application, a plurality of bounding boxes of each person under a plurality of view angles are subjected to matching verification, including a uniqueness check and a cyclic consistency check, wherein the uniqueness check means that the bounding box in one view angle corresponds to one bounding box in another view angle at most, and under the condition that the condition is met, a preliminary matching result can be obtained by utilizing a matching score matrix. A cyclic consistency check means that bounding boxes that match each other in any three views should be concatenated into a ring, i.e. two bounding boxes that match each other in the third view should be identical.

In practical application, searching a matching score maximum value in a matrix by using a matching score matrix, determining a boundary box matched with each other in an image according to a row and a column where the matching score maximum value is located, setting all values of the row and the column where the maximum matching score is located as a first preset value, namely 0, continuously searching the matching score maximum value in the matching score matrix, repeating the above operation until the matrix values are all 0, and thus obtaining a boundary box matching relation between all view angles, so that the boundary box accords with the uniqueness test.

In practical application, searching the image with the most bounding boxes in all images as a target image, taking the target image as a reference, respectively matching each bounding box with the bounding boxes matched in other images according to a matching result, and obtaining a bounding box set for a group, wherein the bounding box set represents images of the same person at different visual angles, and checking the cycle consistency according to the bounding box set of the same person.

Through the above embodiment, for any two view images, a first circulation step is performed until all values in the matching score matrix are first preset values, so as to obtain a matching result of any two view images, where the first circulation step includes: searching a matching score maximum value in the matching score matrix, obtaining a matching result according to the row and the column where the matching score maximum value is located, determining a matching boundary box in the image, and setting all the values of the row and the column where the matching score maximum value is located as a first preset value; selecting the image with the largest number of bounding boxes as a target image; classifying the boundary boxes of the same person in all images except the target image into a boundary box set according to the matching result and the target image; the loop consistency is checked against a set of bounding boxes of the same person. The process improves the accuracy of gesture recognition by checking the uniqueness and cyclic consistency of the bounding box for a plurality of bounding boxes of each person under a plurality of view angles.

In one embodiment, verifying loop consistency from a set of bounding boxes of the same person includes:

In practical application, the bounding boxes in the bounding box set are subjected to cyclic consistency check according to different combination forms, such as 3, of which the number is a second preset value, and the individual bounding boxes which do not meet the cyclic consistency check are excluded.

Through the above embodiment, according to the bounding box of the same person in the target image, the bounding boxes in the bounding box set are divided into a plurality of subgroups with the second preset value; in the case that the bounding boxes in each panel match each other, it is determined that the bounding boxes of the same person meet the loop consistency. The process determines the accuracy of the bounding box matches by checking the cyclical consistency of the bounding box.

In one embodiment, step S106 includes:

In practical applications, the pose information determination is performed for each person according to bounding boxes belonging to different perspectives of the same person.

In practical application, the third preset neural network model includes an HRNet (High-Resolution Net) neural network model, and a bounding box belonging to the same person obtained corresponding to a plurality of view angles is input into the third preset neural network model to extract image features of each view angle.

In practical application, the fourth preset neural network model comprises a transducer neural network model, the extracted image features of each image are input into the transducer neural network model to aggregate the image features of different visual angles, and finally the full-connection layer is connected to output predicted gesture information.

Through the above embodiment, when the multiple bounding boxes of each person pass the verification, the image features of each image are extracted by using the third preset neural network model; and inputting the bounding box of each person in the plurality of images into a fourth preset neural network model to obtain the gesture information of each person. The process can enable an individual who appears in any view angle to recognize and obtain the action gesture of the individual, so that gesture recognition is more accurate.

Based on the above-mentioned method for recognizing a motion gesture, the embodiment of the present invention further provides a device 200 for recognizing a motion gesture, and fig. 2 is a schematic structural diagram of the device for recognizing a motion gesture provided in the embodiment of the present invention, where the device 200 includes:

an image acquisition module 201, configured to acquire images of multiple perspectives including multiple persons at the same time;

an identification module 202, configured to identify a bounding box of each person in each image based on a preset target detection algorithm;

the matching score determining module 203 is configured to select images of any two viewing angles, and match bounding boxes of the same person in the two images to obtain a matching score;

a matching score matrix determining module 204, configured to construct a matching score matrix of the same person under multiple viewing angles according to the matching score, where rows and columns of the matching score matrix are bounding boxes in the image;

the matching verification module 205 is configured to perform matching verification on a plurality of bounding boxes of each person under a plurality of view angles according to the matching score matrix;

the gesture information determining module 206 is configured to determine gesture information of each person according to each bounding box after the bounding boxes of each person pass verification.

The motion gesture recognition device provided by the embodiment of the invention can realize each process of the motion gesture recognition method in the method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the description is omitted.

The embodiment of the present invention further provides an electronic device, as shown in fig. 3, where the electronic device includes a processor 130 and a memory 131, where the memory 131 stores machine executable instructions that can be executed by the processor 130, and the processor 130 executes the machine executable instructions to implement the above-mentioned motion gesture recognition method.

Further, the electronic device shown in fig. 3 further includes a bus 132 and a communication interface 133, and the processor 130, the communication interface 133, and the memory 131 are connected through the bus 132.

The memory 131 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 133 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 132 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 3, but not only one bus or type of bus.

The processor 130 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in processor 130. The processor 130 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 131, and the processor 130 reads the information in the memory 131, and in combination with its hardware, performs the steps of the method of the foregoing embodiment.

The embodiment of the invention also provides a machine-readable storage medium, which stores machine-executable instructions that, when being called and executed by a processor, cause the processor to implement the above-mentioned motion gesture recognition method, and specific implementation can be referred to the method embodiment and will not be described herein.

The method, the device and the electronic equipment for identifying the motion gesture provided by the embodiment of the invention comprise a computer readable storage medium storing program codes, and instructions included in the program codes can be used for executing the method in the previous method embodiment, and specific implementation can be referred to the method embodiment and will not be repeated here.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In addition, in the description of embodiments of the present invention, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood by those skilled in the art in specific cases.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention for illustrating the technical solution of the present invention, but not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that the present invention is not limited thereto: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for recognizing a motion gesture, comprising:

selecting the images of any two visual angles, and matching the boundary boxes of the same person in the two images to obtain matching scores;

constructing a matching score matrix of the same person under the multiple view angles according to the matching scores, wherein the rows and columns of the matching score matrix are boundary boxes in the image;

performing matching verification on the multiple bounding boxes of each person under the multiple view angles according to the matching score matrix;

2. The method according to claim 1, wherein the selecting the images from any two perspectives, and matching the bounding boxes of the same person in the two images, to obtain a matching score, includes:

calculating appearance matching scores of each person according to the appearance characteristics of each person;

identifying key points in each boundary box by using a second preset neural network;

3. The method of claim 2, wherein said calculating an appearance matching score for each person based on the appearance characteristics of each person comprises:

and normalizing the similarity score to obtain the appearance matching score.

4. A method according to claim 3, wherein said calculating an action matching score from the relative positions of the viewing angles corresponding to any two of said images and the coordinates of each of said keypoints comprises:

mapping the coordinates of key points of the same person in a first image in any two images to a second image by using the transformation matrix to obtain mapped coordinates;

calculating the distance between each key point coordinate in the second image and the mapping coordinate;

and normalizing the sum of the distances to obtain the action matching score.

5. The method of claim 1, wherein the performing a matching check on the plurality of bounding boxes of each person under the plurality of perspectives according to the matching score matrix comprises:

and executing a first circulation step for the images of any two view angles until all values in the matching score matrix are first preset values, so as to obtain a matching result of the images of any two view angles, wherein the first circulation step comprises the following steps:

searching a matching score maximum value in the matching score matrix, obtaining a matching result according to the row and the column where the matching score maximum value is located, determining the matched boundary box in the image, and setting all the values of the row and the column where the matching score maximum value is located as the first preset value;

classifying the boundary boxes of the same person in all the images except the target image into a boundary box set according to the matching result and the target image;

the loop consistency is checked against the set of bounding boxes of the same person.

6. The method of claim 5, wherein the verifying cyclical compliance from the set of bounding boxes for the same person comprises:

dividing the boundary boxes in the boundary box set into a plurality of subgroups according to the boundary boxes of the same person in the target image by a second preset value;

in the case that the bounding boxes in each of the subgroups match each other, it is determined that the bounding boxes of the same person conform to the cyclical consistency.

7. The method according to claim 1, wherein said determining pose information of each person from each of said bounding boxes after said plurality of said bounding boxes of each person pass verification comprises:

and inputting the boundary box of each person in the plurality of images to a fourth preset neural network model to obtain the gesture information of each person.

8. A motion gesture recognition apparatus, comprising:

the matching score determining module is used for selecting the images of any two visual angles, and matching the boundary boxes of the same person in the two images to obtain a matching score;

a matching score matrix determining module, configured to construct a matching score matrix of the same person under the multiple view angles according to the matching score, where rows and columns of the matching score matrix are bounding boxes in the image;

the matching verification module is used for carrying out matching verification on the plurality of boundary boxes of each person under the plurality of view angles according to the matching score matrix;

and the gesture information determining module is used for determining gesture information of each person according to each bounding box after the bounding boxes of each person pass the verification.

9. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of identifying a motion gesture of any one of claims 1 to 7.

10. A machine-readable storage medium having instructions stored thereon, which when executed by a processor implement the method of motion gesture recognition of any one of claims 1 to 7.