CN114782994A

CN114782994A - Gesture recognition method and device, storage medium and electronic equipment

Info

Publication number: CN114782994A
Application number: CN202210473734.6A
Authority: CN
Inventors: 王超; 陈奕名; 霍卫涛; 沐天宇; 张清宇; 程章焱; 阚海鹏; 马丁
Original assignee: Beijing Jinghong Software Technology Co ltd
Current assignee: Beijing Jinghong Software Technology Co ltd
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-07-22

Abstract

The disclosure relates to a gesture recognition method, a gesture recognition device, a storage medium and an electronic device, and aims to solve the problems in the related art. The gesture recognition method comprises the following steps: acquiring an image to be recognized, wherein the image to be recognized comprises a plurality of objects to be recognized; inputting the images to be recognized into the trained key point recognition model to obtain a central key point and non-central key points of each object to be recognized and a central tendency vector field, wherein the central tendency vector field represents the association degree of each non-central key point and each central key point; aiming at a target center key point, determining a target non-center key point which belongs to the same target object to be identified as the target center key point according to the center trend vector field, wherein the target center key point is any center key point; and determining the posture of the target object to be recognized according to the target non-central key point and the target central key point.

Description

Gesture recognition method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a gesture recognition method and apparatus, a storage medium, and an electronic device.

Background

At present, human body posture recognition is used as an important reference basis for behavior monitoring, and is widely applied to the fields of intelligent teaching, intelligent home, automatic driving, intelligent monitoring and the like. For example, in an intelligent teaching scene, posture correction, posture answering and the like can be performed through human posture recognition.

In the related art, for multi-person gesture recognition, the positions of all persons can be detected by a human body detector, and then, human body key points of each person are respectively recognized. The key points can be directly identified, and then the identified key points are matched with the corresponding human body. And then, determining the posture of each human body through the recognized key points. However, the former method relies on a human body detector, which is time consuming due to the large number of people. The latter method is fast, but the processing logic is complex, and when the number of people is too large, the key points are difficult to be matched to the corresponding human body correctly.

Disclosure of Invention

An object of the present disclosure is to provide a gesture recognition method, apparatus, storage medium, and electronic device, so as to solve the problems in the related art.

In order to achieve the above object, according to a first aspect of embodiments of the present disclosure, there is provided a gesture recognition method, including:

acquiring an image to be recognized, wherein the image to be recognized comprises a plurality of objects to be recognized;

inputting the images to be recognized into a trained key point recognition model to obtain a central key point and non-central key points of each object to be recognized and a central tendency vector field, wherein the central tendency vector field represents the association degree of each non-central key point and each central key point;

aiming at a target central key point, determining a target non-central key point which belongs to the same target object to be identified as the target central key point according to the central tendency vector field, wherein the target central key point is any one of the central key points;

and determining the posture of the target object to be identified according to the target non-central key point and the target central key point.

Optionally, the central tendency vector field includes a plurality of central tendency vector subfields, different ones of the central tendency vector subfields correspond to different sets of non-central key points, and different ones of the sets of non-central key points correspond to different categories of key points;

the step of determining target non-central key points which belong to the same target object to be identified as the target central key points according to the central tendency vector field for the target central key points comprises the following steps:

and aiming at the target central key point, determining a non-central key point with the maximum matching degree value with the target central key point from each non-central key point set to obtain a plurality of categories of target non-central key points.

Optionally, the method further comprises:

determining the matching degree of the non-central key point and the target central key point by the following formula:

wherein M is the value of the degree of matching, CTV_xTo the target central keypoint (x) for any non-central keypoint (x, y) of the set of non-central keypoints_c，y_c) Central tendency vector ofK is a predetermined value, (x)_j，y_j) And the coordinates of a preset key point on the connection between the target central key point and the non-central key point are obtained by dividing the connection into K sections.

Optionally, after the image to be recognized is input into the trained keypoint recognition model, the keypoint recognition model is used for:

performing key point identification on the image to be identified to obtain a central key point thermodynamic diagram and a non-central key point thermodynamic diagram;

the central keypoint is determined from the central keypoint thermodynamic diagram, and the non-central keypoint is determined from the non-central keypoint thermodynamic diagram.

Optionally, the training process of the keypoint recognition model includes:

acquiring an image sample to be identified, wherein the image sample to be identified corresponds to a central key point label, a non-central key point label and a central trend vector field label, the image sample to be identified comprises a plurality of object samples to be identified, each object sample to be identified corresponds to a sample central key point marked with the central key point label and a plurality of sample non-central key points marked with the non-central key point labels;

and training according to the image sample to be recognized to obtain a trained key point recognition model.

Optionally, the sample central key point corresponding to the object sample to be identified is a mean value of a plurality of sample non-central key points corresponding to the object sample to be identified.

Optionally, the central tendency vector field label represents a degree of association between each sample non-central key point and each sample central key point, and when the sample non-central key point and the sample central key point are located on the same object sample to be identified, the degree of association is calculated by the following formula:

wherein, CTV_x' is the value of the degree of association, (x)_c′，y_c') coordinates of the sample center keypoint, (x ', y ') coordinates of the sample non-center keypoint;

and setting the value of the association degree to be 0 under the condition that the sample non-central key point and the sample central key point correspond to different object samples to be identified.

According to a second aspect of the embodiments of the present disclosure, there is provided a gesture recognition apparatus, the apparatus including:

the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring an image to be recognized, and the image to be recognized comprises a plurality of objects to be recognized;

the input module is used for inputting the images to be recognized into the trained key point recognition model to obtain a central key point and non-central key points of each object to be recognized and a central tendency vector field, and the central tendency vector field represents the association degree of each non-central key point and each central key point;

a first determining module, configured to determine, for a target central key point, a target non-central key point that belongs to the same target object to be identified as the target central key point according to the central tendency vector field, where the target central key point is any one of the central key points;

and the second determining module is used for determining the posture of the target object to be recognized according to the target non-central key point and the target central key point.

According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first to third aspects.

According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any of the first aspects above.

By the technical scheme, the target non-central key points belonging to the same target object to be recognized can be determined aiming at the target central key points by adopting the central tendency vector field representing the association degree of the non-central key points and the central key points, so that the posture of the target object to be recognized can be determined according to the target non-central key points and the target central key points. In the process, the central key point and the central trend vector field representing the association degree of the non-central key point and the central key point are introduced to match the key point with the object to be recognized, so that the matching accuracy of the key point can be improved under the condition of ensuring the speed advantage, and the accuracy of gesture recognition is improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flowchart illustrating a gesture recognition method according to an exemplary embodiment of the present disclosure.

FIG. 2 is a schematic diagram of a keypoint identification model shown in the present disclosure according to an exemplary embodiment.

FIG. 3 is a block diagram of a gesture recognition device shown in the present disclosure according to an exemplary embodiment.

Fig. 4 is a block diagram of an electronic device shown in accordance with an exemplary embodiment of the present disclosure.

Detailed Description

The following detailed description of the embodiments of the disclosure refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

In the related art, for multi-person gesture recognition, the key points of each human body can be determined by directly recognizing the key points and then matching the recognized key points to the corresponding human body, so that the gesture of each human body is determined according to the recognized key points.

For example, in the paper "Association Embedding: End-to-End Learning for join Detection and Grouping", the author proposes to predict each human body key point and also predict the TAG value corresponding to each key point, and the TAG values corresponding to key points belonging to the same human body are as consistent as possible. Thus, the human body corresponding to each key point can be determined according to the TAG value corresponding to the key point. However, in this way, only at key points, supervision information exists, and too little supervision information is not beneficial to the learning of the smartbody. Moreover, key points (e.g., head key points and foot key points) located at different positions on the same human body have a large difference, and under such a condition, TAG values corresponding to key points at different positions of the same human body predicted by the intelligent agent are difficult to keep consistent. Particularly, when the number of people is too large, the problem of TAG confusion corresponding to key points of the human body is easy to occur, so that the accuracy rate of matching the key points to the human body is low.

For example, in the proposal described in patent document CN111310625A, bones (for example, bones at an elbow key point to a wrist key point) are defined for each human body. And (2) predicting the key points of each human body and the affinity of each bone connection, wherein the affinity of each bone connection is represented by a vector field (PAF), and the unit vector of the vector field corresponding to any one bone on the same human body is equal to the unit vector corresponding to the connection of the two key points on the bone. Therefore, the skeleton to which the key points belong can be determined according to the unit vectors corresponding to the connection of every two key points, and the human body corresponding to the key points can be determined. However, this approach is computationally complex, and there is a dependency between key points: if there is a case where a key point on a skeleton does not exist (for example, an elbow key point connecting a shoulder key point or a wrist key point is not visible), it is impossible to know that other visible key points (for example, a shoulder key point and a wrist key point) connected to the invisible key point belong to the same human body.

In view of this, the embodiment of the present disclosure provides a gesture recognition method, which performs matching between a key point and an object to be recognized by introducing a central key point and a central trend vector field representing a degree of association between a non-central key point and the central key point, so as to eliminate dependence between the non-central key points and improve accuracy of matching of the key points, thereby improving accuracy of gesture recognition.

Fig. 1 is a flowchart illustrating a gesture recognition method according to an exemplary embodiment of the present disclosure, as shown in fig. 1, the method including:

and S101, acquiring an image to be identified.

The image to be recognized comprises a plurality of objects to be recognized. It should be understood that the object to be recognized may be a human body or an animal.

And S102, inputting the images to be recognized into the trained key point recognition model to obtain the central key points, non-central key points and a central trend vector field of each object to be recognized.

The central tendency vector field represents the degree of association between each non-central key point and each central key point.

It can be understood that the number of the to-be-identified key points on the to-be-identified object is the same. When the object to be recognized is a human body, the non-central key points may be key points corresponding to various parts of the human body, such as a nose key point, a right shoulder key point, a left knee key point, and the like. When the object to be identified is an animal, the non-central key points may be key points corresponding to various parts of the animal, for example, an eye key point, a throat key point, a claw key point, and the like. On this basis, the central key point of the object to be recognized may be an average of non-central key points of the respective portions of the object to be recognized. The central trend vector field may be a collection of central trend vectors from each non-central keypoint to each central keypoint in each object to be identified.

S103, aiming at the target central key point, determining a target non-central key point which belongs to the same target object to be identified as the target central key point according to the central tendency vector field.

Wherein, the target central key point is any central key point.

Since, the center trend vector can be used to characterize the degree of association of each non-center keypoint with each center keypoint. Therefore, for the target central key point, the association degree of the target central key point and each non-central key point can be determined according to the central tendency vector field, and the target non-central key point belonging to the same target object to be identified as the target central key point is determined based on the association degree.

And S104, determining the posture of the target object to be recognized according to the target non-central key point and the target central key point.

It is understood that after the target non-central key point and the target central key point belonging to the same target object to be recognized are determined, the posture of the target object to be recognized corresponding to the target central key point can be determined according to the target non-central key point.

In addition, it should be noted that, in the related art, the calculation complexity is high by predicting a plurality of bone connections and matching key points with bones, and in the case that a certain key point is missing, key points belonging to the same bone as the missing key point cannot be classified, that is, there is a dependency between key points. The method provided by the embodiment of the disclosure calculates the central tendency vector field of all the non-central key points and the central key points by introducing the central key points, so that the target non-central key points and the target central key points belonging to the same target object to be recognized can be determined according to the central tendency vector field. In the process, no dependency exists between the non-central key points, and even if one non-central key point is absent, the attribution of other key points is not influenced.

Optionally, the central tendency vector field includes a plurality of central tendency vector subfields, different central tendency vector subfields correspond to different sets of non-central keypoints, and different sets of non-central keypoints correspond to different keypoint categories. On this basis, the step S103 may include:

It should be noted that different key point categories may correspond to different portions of the object to be identified, for example, the key point category of the right shoulder of the human body corresponds to the right shoulder of the human body, and the key point category of the cat paw corresponds to the paw of the cat. The non-central keypoint set corresponding to the keypoint category comprises all non-central keypoints belonging to the keypoint category in each object to be identified. For example, for an image to be recognized including 3 human bodies (each human body includes 1 right shoulder keypoint), the set of non-center keypoints corresponding to the right shoulder keypoint category includes 3 right shoulder keypoints.

On this basis, for the target central keypoint, a non-central keypoint with the maximum matching degree with the target central keypoint is determined from each non-central keypoint set according to a plurality of central tendency vector sub-fields, so as to obtain a plurality of categories of target non-central keypoints.

For example, for the left shoulder key point set, the matching degree between each left shoulder key point in the left shoulder key point set and the target center key point may be determined according to the center trend vector subfield corresponding to the left shoulder key point, and the left shoulder key point with the largest value of the matching degree may be used as the target left shoulder key point. On the basis, target non-central key points of a plurality of categories can be obtained by traversing each non-central key point set.

Optionally, the method provided by the embodiment of the present disclosure may determine the matching degree between the non-central key point and the target central key point by using the following formula:

wherein M is the value of the degree of matching, CTV_xFrom any non-central keypoint (x, y) in the set of non-central keypoints to the target central keypoint (x)_c，y_c) K is a predetermined value, (x)_j，y_j) The coordinates of preset key points on the connection of the target central key point and the non-central key point are obtained by dividing the connection into K sections.

For example, if the target center keypoint coordinates are (2, 4) and the non-center keypoint coordinates are (6, 6), the preset keypoint coordinates on the connection of the target center keypoint and the non-center keypoint may include (2, 4), (4, 5), and (6, 6) in the case where K is 2.

Optionally, after the image to be recognized is input into the trained keypoint recognition model, the keypoint recognition model may be used to:

performing key point identification on an image to be identified to obtain a central key point thermodynamic diagram and a non-central key point thermodynamic diagram;

the method further includes determining a center keypoint from the center keypoint thermodynamic diagram and determining a non-center keypoint from the non-center keypoint thermodynamic diagram.

Among them, the thermodynamic diagram is a statistical chart for displaying data by coloring color blocks (i.e., pixels). In the central key point thermodynamic diagram, a region is predicted for each central key point of each object to be recognized in the image to be recognized. And aiming at each color block in the corresponding area of any central key point, if the probability that the color block is the central key point is higher, the numerical value of the color block is higher. On the basis, a pixel point with the maximum color block value can be determined from an area corresponding to any central key point, and the pixel point is determined as the central key point. The same may be done for non-central keypoints from non-central keypoint thermodynamic diagrams.

Referring to fig. 2, fig. 2 is a schematic diagram of a keypoint identification model shown in the present disclosure according to an exemplary embodiment. As shown in fig. 2, after the image to be recognized (which may be an RGB three-channel image) is input into the keypoint recognition model 400, features of the image to be recognized may be extracted through the feature extraction network, resulting in keypoint thermodynamic diagrams (including center keypoint thermodynamic diagrams and non-center keypoint thermodynamic diagrams) and a center tendency vector field. The number of the channels of the key point thermodynamic diagrams is the same as the number of the key points to be identified on the object to be identified, and the number of the key points to be identified on the object to be identified can be preset. Since the central tendency vector can be represented by a two-channel feature map (the two channels represent the components in the x direction and the y direction respectively), the number of channels of the central tendency vector field is 2 times of the number of non-central key points to be identified on the object to be identified.

On the basis, the matching degree of each non-central key point and the central key point of each object to be recognized can be determined through the matching module, so that the non-central key points (namely key point detection results) belonging to the same object to be recognized with each central key point are determined. Then, the posture of the object to be recognized can be determined according to the key point detection result.

Optionally, the training process of the keypoint recognition model may include:

and acquiring an image sample to be recognized, and training according to the image sample to be recognized to obtain a trained key point recognition model.

The image sample to be recognized corresponds to a central key point label, a non-central key point label and a central trend vector field label, and the image sample to be recognized comprises a plurality of object samples to be recognized, each object sample to be recognized corresponds to a sample central key point marked with the central key point label and a plurality of sample non-central key points marked with the non-central key point labels.

It should be noted that the keypoint identification model may be obtained by training an image sample to be identified labeled with a real label (i.e., a central keypoint label, a non-central keypoint label, and a central trend vector field label). In the training process, the to-be-trained key point identification model predicts the to-be-identified image sample to obtain a prediction center key point, a prediction non-center key point and a prediction center trend vector field, on the basis, a loss value can be calculated according to a prediction result and a real label, and training parameters of the to-be-trained key point identification model are adjusted through the loss value, so that the trained key point identification model is obtained.

Optionally, the sample central key point corresponding to the object sample to be identified is a mean value of a plurality of sample non-central key points corresponding to the object sample to be identified. In a possible implementation, the mean value may be calculated by the following formula:

wherein (x)_c′，y_c') is the coordinate of the key point at the center of the sample, N is the number of key points on the same object to be identified, (x)_i，y_i) The coordinates of the non-central key point of the ith sample on the same object to be identified.

Optionally, the central trend vector field label represents the association degree between each sample non-central key point and each sample central key point, and when the sample non-central key point and the sample central key point are located on the same object sample to be identified, the association degree is calculated by the following formula:

wherein, CTV_x' is the sample center trend vector (i.e. the value of the degree of correlation) from the sample non-center key point to the sample center key point, and is used to characterize the degree of correlation between the two, (x)_c′，y_c') coordinates of sample center keypoints, (x ', y ') coordinates of sample non-center keypoints;

and under the condition that the sample non-central key point corresponds to a different object sample to be identified from the sample central key point, setting the value of the association degree to be 0.

Based on the same inventive concept, the present disclosure also provides a gesture recognition apparatus, referring to fig. 3, fig. 3 is a block diagram of a gesture recognition apparatus shown according to an exemplary embodiment of the present disclosure, as shown in fig. 3, the gesture recognition apparatus 100 includes:

the device comprises an acquisition module 101, a recognition module and a recognition module, wherein the acquisition module is used for acquiring an image to be recognized, and the image to be recognized comprises a plurality of objects to be recognized;

an input module 102, configured to input the image to be recognized into a trained key point recognition model, so as to obtain a central key point and a non-central key point of each object to be recognized, and a central tendency vector field, where the central tendency vector field represents a degree of association between each non-central key point and each central key point;

a first determining module 103, configured to determine, according to the central tendency vector field, a target non-central key point that belongs to the same target object to be identified as the target central key point, where the target central key point is any one of the central key points;

a second determining module 104, configured to determine the posture of the target object to be recognized according to the target non-central key point and the target central key point.

By the device, the target non-central key point belonging to the same target object to be recognized can be determined aiming at the target central key point by adopting the central trend vector field representing the association degree of the non-central key point and the central key point, so that the posture of the target object to be recognized can be determined according to the target non-central key point and the target central key point. In the process, the central key point and the central trend vector field representing the association degree of the non-central key point and the central key point are introduced to match the key point with the object to be recognized, so that the matching accuracy of the key point can be improved under the condition of ensuring the speed advantage, and the accuracy of gesture recognition is improved.

the first determining module 103 is further configured to:

Optionally, the apparatus 100 further includes a third determining module, configured to determine a matching degree between the non-central key point and the target central key point by using the following formula:

wherein M is the value of the degree of matching, CTV_xFrom any non-central keypoint (x, y) in the set of non-central keypoints to the target central keypoint (x)_c，y_c) K is a predetermined value, (x)_j，y_j) And the coordinates of a preset key point on the connection between the target central key point and the non-central key point are obtained by dividing the connection into K sections by the preset key point.

Optionally, after the image to be recognized is input into the trained keypoint recognition model, the keypoint recognition model is used to:

the central keypoint is determined from the central keypoint thermodynamic diagram, and the non-central keypoints are determined from the non-central keypoint thermodynamic diagram.

Optionally, the apparatus 100 further includes a training module, configured to train the keypoint identification model, where the training process of the keypoint model includes:

Optionally, the central tendency vector field label characterizes a degree of association between each of the sample non-central key points and each of the sample central key points, and the apparatus 100 further includes a calculation module configured to:

when the sample non-central key point and the sample central key point are positioned on the same object sample to be identified, calculating the association degree through the following formula:

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Based on the same inventive concept, an embodiment of the present disclosure further provides an electronic device, including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the gesture recognition method described above.

Fig. 4 is a block diagram of an electronic device 700 shown in accordance with an example embodiment. As shown in fig. 4, the electronic device 200 may include: a processor 201 and a memory 202. The electronic device 200 may also include one or more of a multimedia component 203, an input/output (I/O) interface 204, and a communication component 205.

The processor 201 is configured to control the overall operation of the electronic device 200, so as to complete all or part of the steps in the gesture recognition method. The memory 202 is used to store various types of data to support operation at the electronic device 200, such as instructions for any application or method operating on the electronic device 200, and application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 202 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 203 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 202 or transmitted through the communication component 205. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 204 provides an interface between the processor 201 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 205 is used for wired or wireless communication between the electronic device 200 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G or 5G, NB-IOT (Narrow Band Internet of Things), or a combination of one or more of them, so that the corresponding Communication component 205 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above gesture recognition methods.

Based on the same inventive concept, embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the steps of the gesture recognition method. For example, the non-transitory computer readable storage medium may be the memory 202 described above including program instructions executable by the processor 201 of the electronic device 200 to perform the gesture recognition method described above.

Specifically, the computer-readable storage medium may be a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, a public cloud server, and the like.

With respect to the non-transitory computer-readable storage medium in the above embodiments, the steps of implementing the gesture recognition method when executed by a computer program stored thereon will be described in detail in relation to the embodiments of the method, and will not be elaborated upon here.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned gesture recognition method when executed by the programmable apparatus.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method of gesture recognition, the method comprising:

inputting the images to be recognized into a trained key point recognition model to obtain a central key point, non-central key points and a central tendency vector field of each object to be recognized, wherein the central tendency vector field represents the association degree of each non-central key point and each central key point;

and determining the posture of the target object to be recognized according to the target non-central key point and the target central key point.

2. The method of claim 1, wherein the central trend vector field comprises a plurality of central trend vector subfields, different ones of the central trend vector subfields corresponding to different sets of non-central keypoints, different ones of the sets of non-central keypoints corresponding to different categories of keypoints;

and aiming at the target central key point, determining a non-central key point with the maximum matching degree value with the target central key point from each non-central key point set to obtain the target non-central key points of multiple categories.

3. The method of claim 2, further comprising:

wherein M is the value of the degree of matching, CTV_xTo the target central keypoint (x) for any non-central keypoint (x, y) of the set of non-central keypoints_c，y_c) K is a predetermined value, (x)_j，y_j) And the coordinates of a preset key point on the connection between the target central key point and the non-central key point are obtained by dividing the connection into K sections by the preset key point.

4. The method according to claim 1, wherein after inputting the image to be recognized into the trained keypoint recognition model, the keypoint recognition model is used to:

the center keypoint is determined from the center keypoint thermodynamic diagram, and the non-center keypoints are determined from the non-center keypoint thermodynamic diagram.

5. The method according to any one of claims 1-4, wherein the training process of the keypoint recognition model comprises:

acquiring an image sample to be identified, wherein the image sample to be identified corresponds to a central key point label, a non-central key point label and a central trend vector field label, the image sample to be identified comprises a plurality of object samples to be identified, each object sample to be identified corresponds to a sample central key point marked with the central key point label and a plurality of sample non-central key points marked with the non-central key point label;

6. The method according to claim 5, wherein the sample center key point corresponding to the object sample to be identified is a mean value of a plurality of sample non-center key points corresponding to the object sample to be identified.

7. The method of claim 5, wherein the central trend vector field label characterizes a degree of association between each of the sample non-central key points and each of the sample central key points, and the degree of association is calculated by the following formula when the sample non-central key points and the sample central key points are located on the same sample of the object to be identified:

wherein, CTV_x' is a value of the degree of association, (x)_c′，y_c') is the coordinates of the sample center keypoint, (x ', y ') is the coordinates of the sample non-center keypoint;

and under the condition that the sample non-central key point corresponds to the target sample to be identified, which is different from the sample central key point, setting the value of the association degree to be 0.

8. A gesture recognition apparatus, characterized in that the apparatus comprises:

the input module is used for inputting the images to be recognized into the trained key point recognition model to obtain a central key point and a non-central key point of each object to be recognized and a central tendency vector field, and the central tendency vector field represents the association degree of each non-central key point and each central key point;

a first determining module, configured to determine, according to the central tendency vector field, a target non-central key point that belongs to a same target object to be identified as the target central key point, for the target central key point, where the target central key point is any one of the central key points;

9. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-7.