CN107219925B

CN107219925B - Posture detection method and device and server

Info

Publication number: CN107219925B
Application number: CN201710392473.4A
Authority: CN
Inventors: 陈志超; 李轩; 杨贤立; 周剑
Original assignee: Chengdu Topplusvision Science & Technology Co ltd
Current assignee: Chengdu Topplusvision Science & Technology Co ltd
Priority date: 2017-05-27
Filing date: 2017-05-27
Publication date: 2021-02-26
Anticipated expiration: 2037-05-27
Also published as: CN107219925A

Abstract

The invention provides a posture detection method, a posture detection device and a server, and relates to the technical field of computer vision. The method comprises the following steps: detecting the acquired image data; segmenting the person image data from the image data; acquiring characteristic information of each joint point in the figure image data; establishing a confidence map corresponding to each type of joint points; sequentially acquiring the position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points; constructing a feature map of the intimacy region of each type of limb; according to the intimacy region feature map and the position information, calculating intimacy between any two adjacent joint points which respectively belong to different categories; and generating a posture skeleton line of the person in the person image data according to the intimacy. The complexity of subsequent processing is reduced, the requirement on hardware equipment is further reduced, the processing speed is high, and the detection precision is high.

Description

Posture detection method and device and server

Technical Field

The invention relates to the technical field of computer vision, in particular to a posture detection method, a posture detection device and a server.

Background

Human posture detection is an important component of the field of computer vision. The human body posture detection has high social application value. For example, human body posture estimation is combined with fitness software to score and correct fitness actions of people, so that people can enjoy the treatment of fitness coaches at home; the skeleton line of the patient can be extracted according to the human posture estimation, and the auxiliary effect on the subsequent medical diagnosis and treatment is achieved. The gesture skeleton line can be used for recognizing human gestures. The result obtained by detecting the human body posture by utilizing the posture skeleton line is more accurate and is not easy to expose privacy.

However, even after these years of development, human posture detection remains a challenge in computer vision. Human body detection requires extremely high hardware conditions: in terms of GPUs, most employ multiple graphics cards,the model of each display card is not inferior to NVIDIA Tesla K80; in terms of CPU, most of them adopt

Core^TMi5, the general application of life under such conditions is difficult, and the popularization and the application are influenced.

Disclosure of Invention

In order to solve the above problems, the embodiments of the present invention adopt the following technical solutions:

the embodiment of the invention provides a gesture detection method, which comprises the following steps: detecting the acquired image data; when the image data is detected to have the figure image data, segmenting the figure image data from the image data; acquiring characteristic information of each joint point in the figure image data; establishing a confidence map corresponding to each type of joint points according to the characteristic information of the joint points; sequentially acquiring the position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points; constructing a close region characteristic map of each type of limb according to limb information of each limb section detected from the human image data, wherein each limb section comprises two adjacent joint points belonging to different classes; according to the intimacy region feature map and the position information, calculating intimacy between any two adjacent joint points which respectively belong to different categories; and generating a posture skeleton line of the person in the image data of the person according to the intimacy so as to realize the detection of the posture of the person in the image data.

An embodiment of the present invention further provides a gesture detection apparatus, where the apparatus includes: the system comprises a detection module, a joint point detection module, a first establishment module, a positioning module, a second establishment module, a calculation module and a generation module. The detection module is used for detecting the acquired image data; and further for segmenting the person image data from the image data when the presence of the person image data in the image data is detected; the joint point detection module is used for acquiring the characteristic information of each joint point in the person image data; the first establishing module is used for establishing a confidence map corresponding to each type of joint point according to the characteristic information of the joint point; the positioning module is used for sequentially acquiring the position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points; the second establishing module is used for establishing a close region characteristic map of each type of limb according to the limb information of each limb section detected from the person image data, wherein each limb section comprises two adjacent joint points belonging to different types; the calculation module is used for calculating the intimacy between any two adjacent joint points which respectively belong to different categories according to the intimacy region feature map and the position information; and the generating module is used for generating a posture skeleton line of the person in the image data of the person according to the intimacy so as to realize the detection of the posture of the person in the image data.

An embodiment of the present invention further provides a server, where the server includes: a memory; a processor; and a gesture detection device installed in the first memory and including one or more software functional modules executed by the first processor, the gesture detection device comprising: the system comprises a detection module, a joint point detection module, a first establishment module, a positioning module, a second establishment module, a calculation module and a generation module. The detection module is used for detecting the acquired image data; and further for segmenting the person image data from the image data when the presence of the person image data in the image data is detected; the joint point detection module is used for acquiring the characteristic information of each joint point in the person image data; the first establishing module is used for establishing a confidence map corresponding to each type of joint point according to the characteristic information of the joint point; the positioning module is used for sequentially acquiring the position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points; the second establishing module is used for establishing a close region characteristic map of each type of limb according to the limb information of each limb section detected from the person image data, wherein each limb section comprises two adjacent joint points belonging to different types; the calculation module is used for calculating the intimacy between any two adjacent joint points which respectively belong to different categories according to the intimacy region feature map and the position information; and the generating module is used for generating a posture skeleton line of the person in the image data of the person according to the intimacy so as to realize the detection of the posture of the person in the image data.

Compared with the prior art, the gesture detection method, the gesture detection device and the server are provided by the invention. Wherein the method includes segmenting the personal image data from the image data; acquiring characteristic information of each joint point in the figure image data; establishing a confidence map corresponding to each type of joint points according to the characteristic information of the joint points; sequentially acquiring the position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points; constructing a close region characteristic diagram of each type of limb according to the limb information of each limb section detected from the person image data; according to the intimacy region feature map and the position information, calculating intimacy between any two adjacent joint points which respectively belong to different categories; and generating a posture skeleton line of the person in the image data of the person according to the intimacy so as to realize the detection of the posture of the person in the image data. The complexity of subsequent calculation is reduced, the requirement on hardware equipment is also reduced, the speed is high, and the precision is high.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a block diagram of a server according to a preferred embodiment of the present invention.

FIG. 2 is a flow chart of a gesture detection method provided by an embodiment of the invention.

Fig. 3 is a flowchart illustrating sub-steps of step S106 in fig. 2.

Fig. 4 is a flowchart illustrating sub-steps of step S108 in fig. 2.

Fig. 5 is an exemplary view of a limb area.

Fig. 6 shows a schematic diagram of a gesture detection apparatus provided in an embodiment of the present invention.

Icon: 100-a server; 111-a memory; 112-a processor; 113-a communication unit; 200-a gesture detection device; 201-an acquisition module; 202-a processing module; 203-a detection module; 204-a joint detection module; 205-a first setup module; 206-a positioning module; 207-a second setup module; 208-a calculation module; 209-generation module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Fig. 1 is a block diagram of a server 100. The server 100 includes a gesture detection apparatus 200, a memory 111, a processor 112, and a communication unit 113.

The memory 111, the processor 112 and the communication unit 113 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The gesture detection apparatus 200 includes at least one software function module which may be stored in the memory 111 in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of the server 100. The processor 112 is used for executing executable modules stored in the memory 111, such as software functional modules and computer programs included in the gesture detection apparatus 200.

The Memory 111 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 111 is used to store programs or data. The communication unit 113 is configured to establish a communication connection between the server 100 and another communication terminal via the network, and to transceive data via the network.

It should be understood that the configuration shown in fig. 1 is merely a schematic diagram of the configuration of the server 100, and that the server 100 may include more or less components than those shown in fig. 1, or have a different configuration than that shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

First embodiment

Referring to fig. 2, fig. 2 is a flowchart of a gesture detection method according to a preferred embodiment of the invention. The gesture detection method comprises the following steps:

step S101, an RGB initial image is acquired.

In the present embodiment, the RGB initial image may receive the RGB initial image transmitted from the terminal communicatively connected to the server 100 through the communication unit 113. The terminal can be an image acquisition device (e.g., a camera, a mobile phone) and also can be an electronic device capable of storing images. In this embodiment, the terminal is a camera, and the server 100 may obtain one frame of RGB initial images from a video captured by the camera.

Step S102, carrying out normalization processing on the RGB initial image to generate image data of a specific type.

In this embodiment, the standard deviation of the pixels of the RGB initial image can be obtained by calculating the average value of all the pixels on the RGB initial image. And (3) carrying out an operation of subtracting the mean value and dividing the mean value by the standard deviation on the pixels on the RGB initial image, and normalizing the pixel values to the range of [ -1,1 ]. To form a particular type of image data. The specific type of image data refers to image data of a type that facilitates extraction of feature points, and the specific type of image data is used as model training set data. The specific type of image data may be LMDB-formatted image data, but is not limited thereto.

Step S103, the acquired image data is detected.

In this embodiment, the acquired image data is detected by using a convolutional neural network, and whether a person is present in the acquired image data is detected. When the personal image data appears in the image data, the flow advances to step S104; when the image data does not include the human image data, the process returns to step S101, and a frame of RGB initial image is obtained from the video again.

Step S104 is to divide the personal image data from the image data.

In the present embodiment, when there is a person in the image data, the image data includes person image data and background image data. The image data of the person is segmented from the image data through a convolution neural network, namely the image data of the person is separated from the background image data.

Step S105, feature information of each joint point in the person image data is acquired.

In the present embodiment, the joint points are predefined locations to be extracted from the image data of the person, and for example, the categories of the joint points may be defined as nose, left ear, right ear, left eye, right eye, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left crotch, right crotch, left knee, right knee, left ankle, and right ankle. And inputting the segmented person image data into a convolutional neural network to obtain the characteristic information of each joint point in the person image data. The feature information of the joint point may be extracted position information of the feature point belonging to the corresponding joint point. For example, the coordinate values of the feature points in the personal image data. It should be noted that the number of joint points belonging to the same class obtained from the person image data is at most the number of the appearing persons, depending on how many persons appear in the person image data. For example, if 5 persons appear in the personal image data, 5 noses, 5 left ears, and the like can be obtained at the maximum from the personal image data.

And S106, establishing a confidence map corresponding to each type of joint point according to the characteristic information of the joint point.

In this embodiment, the confidence map may be a joint point confidence map. Each type of joint point corresponds to a confidence map which is a Gaussian distribution map with each type of joint point as a center. The closer the part value to the joint point is, the larger. For example, the left shoulder corresponds to the left shoulder confidence map. As shown in fig. 3, step S106 includes the following sub-steps:

substep S1061, according to the feature point position information in the feature information of the joint point, using a formula:

and generating a Gaussian distribution map corresponding to each joint point. Wherein S is_iAnd the Gaussian distribution map corresponding to the joint points corresponding to the extracted characteristic information. p is a point within a circle with the preset distance as the radius, with the feature point corresponding to the joint point as the center. x is the number of_iAnd feature point position information corresponding to the joint point.

And a substep S1062 of establishing a confidence map corresponding to each type of the joint points according to the Gaussian distribution map corresponding to the joint points.

In this embodiment, the following formula is used according to the gaussian distribution maps corresponding to the joint points belonging to the same class:

and establishing a confidence map corresponding to each type of the joint points. Wherein S is^*(p) is a confidence map of a class of joint pairs. S_iA gaussian distribution map representing the ith joint point in the class of joint points, and N represents the total number of joint points belonging to the class detected from the image data of the person.

The method is represented by solving the maximum value of Gaussian distribution graphs of N similar joint points. For example, if five human noses are detected from the human image data, the confidence maps of the joint points such as the noses are obtained by sequentially obtaining the maximum values of the gaussian distribution maps corresponding to the five obtained human noses. Each type of joint point corresponds to a confidence map.

And S107, sequentially acquiring accurate position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points.

In this embodiment, according to the confidence map corresponding to the joint point to be positioned, the position information of the joint point to be positioned is sequentially obtained by comparing the feature information of the joint points. The position information of the feature point is only the approximate position of the corresponding joint point. And the joint point which does not acquire accurate position information is the joint point to be positioned. Therefore, each joint point in the human image data needs to be located in turn, and accuracy is ensured. Specifically, the joint point to be positioned is positioned through the indirect relationship between the contrast joint points which belong to different classes with the joint point to be positioned, so as to acquire accurate position information.

It should be noted that, the joint point to be positioned is selected as the joint point which is adjacent to the joint point to be positioned and belongs to the same limb as the joint point to be positioned. Two ends of the limb are two joint points belonging to different classes. For example, when the limb is a left arm, the limb is formed by connecting a left elbow and a left wrist, and when the joint point to be positioned is the left elbow, the left wrist can be selected by comparing the joint points. According to the characteristic information of the joint point to be positioned, the confidence map corresponding to the type of the joint point to be positioned and the characteristic information of the comparison joint point, utilizing a formula:

v_k,j(P)＝S^*(p)||u_j-x_i||，

obtaining a position vector, v, of the joint to be positioned relative to the limb formed by the comparison joint_k,j(P) represents the position vector of the joint to be positioned relative to the limb k formed by the comparison joint j. S^*(p) a confidence map corresponding to the type of joint to which the joint to be positioned belongs, u_jFor comparing the feature point position information, x, in the feature information of the joint point j_iThe position information of the characteristic point in the characteristic information of the joint point i to be positioned. And then obtaining the maximum value of the position vector, namely the accurate position information of the joint point to be positioned.

Step S108, according to the limb information of each limb section detected from the human image data, constructing a close region characteristic map of each class of limb, wherein each limb section comprises two adjacent joint points belonging to different classes.

In this embodiment, the limb is formed by two adjacent joints belonging to different classes. The limb information includes position information of line segments of the limb. As shown in fig. 4, step S108 includes the following sub-steps:

and a substep S1081, obtaining a normal vector of each limb segment.

In this embodiment, the normal vector direction of the limb is found according to the limb line segment of each limb segment, so as to preset the width normal vector mode.

And a substep S1082 of dividing the limb area according to a preset width along the corresponding normal vector direction by taking the line segment of each limb section as a central line.

In this embodiment, the line segment of each limb segment is taken as a central line, and the limb regions corresponding to the normal vector in the positive direction and the negative direction along the central line are divided, that is, the central line is taken as a center, and the range in the modulus of twice the normal vector is taken, as shown in fig. 5, the normal vector is a, the vector in the normal vector in the negative direction is-a, and the shaded region is the limb region.

And a substep S1083 of setting a non-zero vector in said limb area.

In this embodiment, each point in the limb area is set to a non-zero vector. Wherein, the direction of the non-zero vector is the direction between two joint points included in the limb. Specifically, one joint point in the limb is predefined to correspond to the limb. The direction of the non-zero vector corresponding to the limb is the direction from the joint point corresponding to the limb towards the other joint point of the limb. For example, the left arm includes two joints, namely a left elbow and a left wrist, and it is predefined that the left elbow corresponds to the left arm, and the non-zero vector direction of the limb area of the left arm is set to be the direction from the left elbow to the left wrist. And the area outside the limb area is set as a zero vector. It should be noted that a limb is formed by connecting two adjacent joint points belonging to different classes, and how many joint points and how many limbs are. Thus, one joint point in a limb may be predefined to correspond to a limb.

And in the sub-step S1084, generating a close region characteristic diagram of each type of limb according to the non-zero vectors of the limb regions corresponding to the same type of limb.

In this embodiment, the close region feature maps of each type of limb are obtained by averaging the non-zero vectors corresponding to the limb regions of each limb in each type of limb in the human image data. Therefore, the relationship that a certain joint point is related to two limbs can be achieved, and the joint point detection precision is high. If the overlapped part of the two limbs exists, the calculation of the average value is equivalent to the calculation of the sum vector of the normal vectors of the two limbs, then the sum vector is divided by 2, and the vector information is added to obtain the maximum value, so that the relation between the point of the overlapped part and both the two limbs is obtained. Further, an affinity domain feature map for such a limb is obtained. Each joint point and a joint point structure adjacent to the joint point form a limb, in this embodiment, 19 types of joint points are defined, so that there are 19 types of limbs in total, and the limb can be regarded as a vector and is represented by [ x, y ]. Each limb is not only a straight line but also a section of area, and the values of points in the section of area are non-zero vectors, the vector values are unit vectors of the limb direction, and the point values outside the section of area are zero vectors. Then a bipartite feature graph is formed, and the bipartite feature graph is finally output by the network. Since there are 19 joint points, each predicting a limb individually, each represented by a two-dimensional vector, the output is a set of 19 × 2 channels of signatures.

Step S109, according to the intimacy region feature map and the position information, calculating intimacy between any two adjacent joint points which respectively belong to different categories.

In this embodiment, a non-zero vector corresponding to a limb is obtained according to the calculated feature map of the intimacy region of the class to which a segment of the limb composed of two joint points belongs. The position information includes position information of the two joint points being calculated. And calculating the intimacy between the two joint points by utilizing an interpolation integral algorithm according to the acquired non-zero vector and the position information of the corresponding limb. Specifically, the inner product is calculated at each insertion location between two joint points and the non-zero vector of the corresponding limb, and then integrated. In the specific operation, the integral operation is replaced by the point summation at equal intervals, and finally an integral value is obtained, wherein the integral value is the intimacy between two joint points to be calculated.

And step S110, generating a posture skeleton line of the person in the image data of the person according to the intimacy so as to realize detection of the posture of the person in the image data.

In this embodiment, when the intimacy degree between two joint points is greater than a preset intimacy degree threshold value, which indicates that the limb formed by the two joint points is predicted to be actually present, the two joint points are connected to generate a gesture skeleton line of the person in the person image data. If the intimacy between two of the joint points is less than a predetermined intimacy threshold, a false prediction, so-called false detection, is determined. This prevents the left elbow of the first person and the left wrist of the second person from being erroneously determined as a left arm when a plurality of persons are present in the person image data.

In this embodiment, step S110 is followed by a step of drawing the joint points and the gesture skeleton lines on the RGB initial image through development software, and displaying the joint points and the gesture skeleton lines.

Second embodiment

Referring to fig. 6, fig. 6 is a functional module schematic diagram of a gesture detection apparatus 200 according to an embodiment of the present invention. The gesture detection apparatus 200 includes: the system comprises an acquisition module 201, a processing module 202, a detection module 203, a joint detection module 204, a first establishment module 205, a positioning module 206, a second establishment module 207, a calculation module 208 and a generation module 209.

An obtaining module 201, configured to obtain an RGB initial image.

In this embodiment of the present invention, the step S101 may be executed by the obtaining module 201.

And the processing module 202 is configured to perform normalization processing on the RGB initial image to generate image data of a specific type.

In the embodiment of the present invention, the step S102 may be executed by the processing module 202.

A detection module 203, configured to detect acquired image data; and the image processing device is also used for segmenting the person image data from the image data when the person image data is detected to appear in the image data.

In the embodiment of the present invention, the steps S103 and S104 may be executed by the detection module 203.

And the joint point detection module 204 is configured to obtain feature information of each joint point in the person image data.

In the embodiment of the present invention, the step S105 may be performed by the joint point detecting module 204.

The first establishing module 205 is configured to establish a confidence map corresponding to each type of joint point according to the feature information of the joint point.

In this embodiment of the present invention, the step S106 may be performed by the first establishing module 205. Specifically, the manner in which the first establishing module 205 executes step S106 is as follows: according to the position information of the characteristic points in the characteristic information of the joint points, utilizing a formula:

and generating a Gaussian distribution map corresponding to each joint point. Wherein S is_iAnd the Gaussian distribution map corresponding to the joint points corresponding to the extracted characteristic information. p is a point within a circle with the preset distance as the radius, with the feature point corresponding to the joint point as the center. x is the number of_iAnd feature point position information corresponding to the joint point. And establishing a confidence map corresponding to each type of joint point according to the Gaussian distribution map corresponding to the joint point. Respectively according to the Gaussian distribution diagram corresponding to the joint points belonging to the same class, using a formula:

And the positioning module 206 is configured to sequentially acquire accurate position information of each joint point to be positioned according to the joint point and the confidence map corresponding to the joint point.

In the embodiment of the present invention, the step S107 may be performed by the positioning module 206. The positioning module 206 performs step S107 by sequentially obtaining the position information of the joint point to be positioned according to the confidence map corresponding to the joint point to be positioned and the feature information of the comparison joint point, where the joint point to be positioned and the comparison joint point are the same limb and belong to different types of joint points.

The second establishing module 207 is configured to construct a close-proximity region feature map of each type of limb according to the limb information of each limb segment detected from the human image data, where each limb segment includes two adjacent joint points belonging to different classes.

In this embodiment of the present invention, the step S108 may be performed by the second establishing module 207. Specifically, the manner in which the second establishing module 207 executes step S108 is as follows: and acquiring a normal vector of each limb section. And finding out the normal vector direction of the limb according to the limb line segment of each limb section so as to preset the width normal vector mode. And dividing the limb area into limb areas according to a preset width along the corresponding normal vector direction by taking the line segment of each limb section as a central line. The line segment of each limb section is taken as a central line, and the limb areas corresponding to the positive direction and the negative direction of the normal vector are divided along the central line, namely the range in a mode of taking the central line as the center and doubling the normal vector is taken as the center. And setting a non-zero vector in the limb area. Each point in the limb area is set to a non-zero vector. Wherein, the direction of the non-zero vector is the direction between two joint points included in the limb. Specifically, one joint point in the limb is predefined to correspond to the limb. The direction of the non-zero vector corresponding to the limb is the direction from the joint point corresponding to the limb towards the other joint point of the limb. For example, the left arm includes two joints, namely a left elbow and a left wrist, and it is predefined that the left elbow corresponds to the left arm, and the non-zero vector direction of the limb area of the left arm is set to be the direction from the left elbow to the left wrist. And the area outside the limb area is set as a zero vector. It should be noted that a limb is formed by connecting two adjacent joint points belonging to different classes, and how many joint points and how many limbs are. Thus, one joint point in a limb may be predefined to correspond to a limb. And generating a close region characteristic diagram of each type of limb according to the non-zero vector of the limb region corresponding to the same type of limb. And respectively obtaining an average value according to the non-zero vectors corresponding to the limb areas of each limb in each class of limbs in the character image data to obtain the close area characteristic diagram of the class of limbs. If the overlapped part of the two limbs exists, the calculation of the average value is equivalent to the calculation of the sum vector of the normal vectors of the two limbs, then the sum vector is divided by 2, and the vector information is added to obtain the maximum value, so that the relation between the point of the overlapped part and both the two limbs is obtained. Further, an affinity domain feature map for such a limb is obtained. Each joint point and a joint point structure adjacent to the joint point form a limb, in this embodiment, 19 types of joint points are defined, so that there are 19 types of limbs in total, and the limb can be regarded as a vector and is represented by [ x, y ]. Each limb is not only a straight line but also a section of area, and the values of points in the section of area are non-zero vectors, the vector values are unit vectors of the limb direction, and the point values outside the section of area are zero vectors. Then a bipartite feature graph is formed, and the bipartite feature graph is finally output by the network. Since there are 19 joint points, each predicting a limb individually, each represented by a two-dimensional vector, the output is a set of 19 × 2 channels of signatures.

A calculating module 208, configured to calculate, according to the close region feature map and the location information, the closeness between any two adjacent joint points that belong to different categories respectively.

In the embodiment of the present invention, the step S109 may be executed by the calculating module 208. The calculation module 208 executes step S109 by obtaining a non-zero vector of the limb corresponding to the joint point according to the intimate area feature map of the class to which the limb corresponding to the joint point belongs; and calculating the intimacy between the two joint points by utilizing an interpolation integral algorithm according to the non-zero vector of the limb corresponding to the joint point and the position information respectively corresponding to the two joint points.

And the generating module 209 is configured to generate a gesture skeleton line of the person in the image data of the person according to the intimacy degree, so as to implement detection of the gesture of the person in the image data.

In this embodiment of the present invention, the step S110 may be executed by the generating module 209. Specifically, the generating module 209 performs step S110 by: and when the intimacy between the two joint points is greater than a preset intimacy threshold value, connecting the two joint points to generate a gesture skeleton line of the person in the person image data.

In summary, the present invention provides a gesture detection method, a gesture detection device and a server. The method comprises the steps of obtaining characteristic information of each joint point in the person image data; establishing a confidence map corresponding to each type of joint points according to the characteristic information of the joint points; acquiring accurate position information of each joint point; constructing a close region characteristic diagram of each type of limb according to the limb information of each limb section detected from the person image data; according to the intimacy region feature map and the position information, calculating intimacy between any two adjacent joint points which respectively belong to different categories; and generating a posture skeleton line of the person in the image data of the person according to the intimacy so as to realize the detection of the posture of the person in the image data. The complexity of subsequent processing is reduced, the requirement on hardware is also reduced, the processing speed is high, and the detection precision is high.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A gesture detection method, the method comprising:

detecting the acquired image data;

when the image data is detected to have the figure image data, segmenting the figure image data from the image data;

acquiring characteristic information of each joint point in the figure image data;

establishing a confidence map corresponding to each type of joint points according to the characteristic information of the joint points;

sequentially acquiring the position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points;

constructing a close region characteristic map of each type of limb according to limb information of each limb section detected from the human image data, wherein each limb section comprises two adjacent joint points belonging to different classes;

according to the intimacy region feature map and the position information, calculating intimacy between any two adjacent joint points which respectively belong to different categories;

generating a posture skeleton line of a person in the person image data according to the intimacy so as to realize detection of the person posture in the image data;

the step of acquiring the position information of each joint point comprises:

according to a confidence map corresponding to a joint point to be positioned, combining with characteristic information of a comparison joint point, and sequentially acquiring position information of the joint point to be positioned, wherein the joint point to be positioned and the comparison joint point are the same limb and belong to different types of joint points;

the step of establishing a confidence map corresponding to each type of joint points comprises the following steps:

generating a Gaussian distribution map corresponding to each joint point according to the position information of the feature points in the feature information of the joint points;

establishing a confidence map corresponding to each type of joint point according to the Gaussian distribution map corresponding to the joint point;

the step of sequentially acquiring the position information of the joint point to be positioned according to the confidence map corresponding to the joint point to be positioned and by comparing the characteristic information of the joint point comprises the following steps:

and acquiring a position vector of the joint point to be positioned relative to the limb formed by the comparison joint point by using the following calculation formula:

v_k,j(P)＝S^*(p)||u_j-x_i||，

wherein v is_k,j(P) represents the position vector of the joint to be positioned relative to the limb constituted by the contrasting joints; s^*(p) is a confidence map corresponding to the joint point category to which the joint point to be positioned belongs; u. of_jComparing the position information of the characteristic points in the characteristic information of the joint points; x is the number of_iThe position information of the characteristic point in the characteristic information of the joint point to be positioned;

determining the maximum value of the position vector as the position information of the joint point to be positioned;

the step of generating a gesture skeleton line of the person in the person image data according to the intimacy degree comprises the following steps:

and when the intimacy between the two joint points is greater than a preset intimacy threshold value, connecting the two joint points to generate a gesture skeleton line of the person in the person image data.

2. The gesture detection method of claim 1, wherein the step of constructing a close zone feature map for each class of limb comprises:

acquiring a normal vector of each limb section;

dividing the limb area into limb areas according to a preset width along the corresponding normal vector direction by taking the line segment of each limb section as a central line;

setting a non-zero vector in the limb area;

and generating a close region characteristic diagram of each type of limb according to the non-zero vector of the limb region corresponding to the same type of limb.

3. The gesture detection method according to claim 2, wherein the step of calculating the intimacy between any two adjacent joint points respectively belonging to different classes comprises:

acquiring a non-zero vector of the limb corresponding to the joint point according to the close region characteristic diagram of the class of the limb corresponding to the joint point;

and calculating the intimacy between the two joint points by utilizing an interpolation integral algorithm according to the non-zero vector of the limb corresponding to the joint point and the position information respectively corresponding to the two joint points.

4. The gesture detection method of claim 1, wherein the method further comprises:

acquiring an RGB initial image;

and carrying out normalization processing on the RGB initial image to generate image data of a specific type.

5. A gesture detection apparatus, characterized in that the apparatus comprises:

the detection module is used for detecting the acquired image data; and further for segmenting the person image data from the image data when the presence of the person image data in the image data is detected;

the joint point detection module is used for acquiring the characteristic information of each joint point in the person image data;

the first establishing module is used for establishing a confidence map corresponding to each type of joint point according to the characteristic information of the joint point;

the positioning module is used for sequentially acquiring the position information of each joint point to be positioned according to the joint points and the confidence maps corresponding to the joint points;

the second establishing module is used for establishing a close region characteristic map of each type of limb according to the limb information of each limb section detected from the person image data, wherein each limb section comprises two adjacent joint points belonging to different types;

the calculation module is used for calculating the intimacy between any two adjacent joint points which respectively belong to different categories according to the intimacy region feature map and the position information;

the generating module is used for generating a posture skeleton line of a person in the person image data according to the intimacy so as to realize detection of the person posture in the image data;

the positioning module is specifically configured to: according to a confidence map corresponding to a joint point to be positioned, combining with characteristic information of a comparison joint point, and sequentially acquiring position information of the joint point to be positioned, wherein the joint point to be positioned and the comparison joint point are the same limb and belong to different types of joint points;

the first establishing module is specifically configured to: generating a Gaussian distribution map corresponding to each joint point according to the position information of the feature points in the feature information of the joint points;

the positioning module is specifically configured to obtain a position vector of a joint point to be positioned relative to a limb formed by the joint point to be positioned and the comparison joint point by using the following calculation formula:

v_k,j(P)＝S^*(p)||u_j-x_i||，

6. The gesture detection apparatus of claim 5, wherein the apparatus further comprises:

the acquisition module is used for acquiring an RGB initial image;

and the processing module is used for carrying out normalization processing on the RGB initial image to generate image data of a specific type.

7. A server, characterized in that the server comprises:

a memory;

a processor; and

a gesture detection apparatus installed in the memory and including one or more software functional modules executed by the processor, the gesture detection apparatus comprising:

v_k,j(P)＝S^*(p)||u_j-x_i||，