CN114708647A

CN114708647A - Multi-human body posture estimation detection and segmentation optimization method and device based on example segmentation

Info

Publication number: CN114708647A
Application number: CN202210186525.3A
Authority: CN
Inventors: 薛素金; 杨焜
Original assignee: Xiamen Nongxin Digital Technology Co ltd
Current assignee: Xiamen Nongxin Digital Technology Co ltd
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-07-05

Abstract

The invention discloses a multi-human body posture estimation detection and segmentation optimization method and device based on example segmentation, which comprises the following steps: step 1: acquiring a frame of image at regular intervals, and preprocessing the image by utilizing example segmentation; step 2: marking different human individuals by masks with different colors based on example segmentation, marking key points of human postures, and estimating and detecting the postures of the human bodies; different human individuals are marked by masks with different colors based on example segmentation, key points of human postures are marked, the postures of the human bodies are estimated and detected, different human individuals are continuously marked by the masks with the corresponding colors, the key points of the human postures are continuously marked until the human postures can be estimated and detected, finally, the human posture estimation and detection results are subjected to difference judgment, human posture estimation and detection results are obtained, the human postures are dynamically detected, and the detection precision is improved.

Description

Multi-human body posture estimation detection and segmentation optimization method and device based on example segmentation

Technical Field

The invention belongs to the technical field of multi-person posture estimation and detection, and particularly relates to a multi-person posture estimation and detection and division optimization method and device based on example division.

Background

Object detection or localization is a progressive process of digital images from coarse to fine. It provides not only the class of the image object, but also the location of the object in the classified image. The position is given in the form of a frame or a center. Semantic segmentation gives a good reasoning by predicting the label of each pixel in the input image. Each pixel is labeled according to the object class in which it is located. For further development, the segmentation of instances provides different labels for individual instances of objects belonging to the same class; thus, instance segmentation may be defined as a technique that solves both the target detection problem and the semantic segmentation problem; example segmentation has become one of the more important, complex and challenging areas in machine vision research; in order to predict object class labels and pixel-specific object instance masks, it localizes different classes of object instances that appear in various images; the purpose of instance segmentation is primarily to aid robotics, autopilot, surveillance, etc.

Human body posture estimation is an important task in computer vision and is also an essential step for understanding human actions and behaviors by a computer; when the multi-human body posture estimation is detected, a human body is detected firstly, then single-human body posture estimation is carried out, but the occlusion problem is easily frustrated, the detection precision is low, and the human body posture estimation result has larger deviation, so that the multi-human body posture estimation detection and division optimization method and device based on example division are provided.

Disclosure of Invention

The technical problem to be solved by the invention is to overcome the existing defects and provide a method and a device for multi-person posture estimation detection and segmentation optimization based on example segmentation, so as to solve the problems that the human body posture estimation result has larger deviation due to easy frustration and low detection precision when the shielding problem is faced, which are proposed in the background art.

In order to achieve the purpose, the invention provides the following technical scheme: a multi-human body posture estimation detection and segmentation optimization method based on example segmentation comprises the following steps:

step 1: acquiring a frame of image at regular intervals, and preprocessing the image by utilizing example segmentation;

step 2: marking different human individuals by masks with different colors based on example segmentation, marking key points of human postures, and estimating and detecting the postures of the human bodies;

and step 3: continuously marking different human individuals by masks with corresponding colors based on example segmentation, and continuously marking key points of human postures until human posture estimation and detection can be carried out;

and 4, step 4: and performing difference judgment according to the human body posture estimation detection result to obtain a human body posture estimation detection result.

Preferably, the step 1 comprises the following steps:

step 1.1: performing framing processing on the acquired image, and storing image data through a memory;

step 1.2: identifying corresponding feature points on the image after framing, and reserving the feature points;

step 1.3: inputting the image subjected to framing processing into a CNN (continuous tone network), and performing feature extraction;

step 1.4: fine-tuning the extracted features;

step 1.5: generating suggestion windows by using FPN, generating N suggestion windows by each picture, and mapping the suggestion windows to the last layer of convolution feature map of CNN;

step 1.6: and enabling each RoI to generate a feature map with a fixed size through a RoI Align layer, performing pixel correction on each ROI by using ROIAlign, predicting different case belongings and classifying each ROI by using a designed FCN framework, and finally obtaining an image case segmentation result.

Preferably, the step 2 comprises the following steps:

step 2.1: when the image example segmentation result is detected to have human body characteristics, carrying out one-hot coding on the position of each key point on the human body, and generating a mask corresponding to each key point of the human body;

step 2.2: performing joint point splicing according to the generated mask;

step 2.3: because the number of people in the picture is uncertain and simultaneously along with the problems of shielding, deformation and the like, only local optimum can be ensured by using the joint pair similarity calculation, and therefore, a point set of a certain joint point of different human bodies needs to be uniquely matched with other different point sets, such as: and a group of the points represent a point set of the elbow and a point set of the wrist, the points in the two point sets must have unique matching, as the correlation PAF between the joint points is known, the key points are taken as the vertexes of the graph, and the correlation PAF between the key points is taken as the edge weight of the graph, so that the multi-person detection problem is converted into a bipartite graph matching problem, and the Hungary algorithm is used for obtaining the optimal matching of the connected key points.

Preferably, the step 3 comprises the following steps:

step 3.1: inputting the feature points reserved in the step 1.2 into CNN for feature comparison;

step 3.2: extracting feature points with higher similarity in subsequent images;

step 3.3: matching masks of corresponding colors based on the extracted feature points, and generating new masks corresponding to the newly identified feature points;

step 3.4: splicing and matching the joints again according to the step 2;

step 3.5: and carrying out posture estimation on the human body with the new generated mask.

Preferably, the step 4 comprises the following steps:

step 4.1: obtaining a plurality of first estimated human body posture results { x₁,x₂,x₃......x_nObtaining a plurality of human body posture results { x 'estimated for the second time'₁,x'₂,x'₃......x'_n}；

Step 4.2: comparing the two estimation results one by one, and estimating the attitude according to the change condition of the results;

step 4.3: and by analogy, comparing a plurality of the m-th estimated human body posture results with the m-1-th estimated human body posture results.

Preferably, the step 1.4 comprises the following steps:

step 1.4.1: for N vertices { x_ii 1.... N }, a feature vector of each vertex is first constructed, and vertex x is then constructed_iCharacteristic f of_iConcate [ F (x) for corresponding network feature and vertex coordinates_i)；x_i']Wherein F is a characteristic diagram of the output of the backbone network, F (x)_i) Is a vertex x_iAt a bilinear difference output, additional x_i' for describing positional relationship between vertices, x_i' is translation invariant, the minimum x and y of all vertices in the contour are subtracted from each vertex coordinate to get the relative coordinate;

step 1.4.2: after the vertex features are obtained, the contour features need to be further learned, and the features of the vertices can be regarded as 1-D discrete signals f: Z → R^DThen, the vertices are processed one by one using standard convolution, but this destroys the topology of the contour;

step 1.4.3: defining the vertex features as periodic signals of formula 1, and then performing feature learning by using the cyclic convolution of formula 2, k [ -r, r]→R^DFor learnable convolution kernels, for standard convolution operations, the formula is as follows:

preferably, said step 2.2 comprises the steps of:

step 2.2.1: for any two joint positions d_j1And d_j2The correlation of the bone point pairs, that is, the confidence of the bone point pairs, is characterized by calculating the linear integral of PAFs, and the formula is as follows:

step 2.2.2: for fast integration calculation, the similarity between the two joint points is generally approximated by uniform sampling, and the formula is as follows: p (u) ═ 1-u) d_j1+ud_j2。

Preferably, the hungarian algorithm in step 2.3 mainly comprises the following steps:

step 2.3.1: firstly, carrying out row transformation and then carrying out column transformation on the expense matrix; line transformation: subtracting the minimum element of each row of the cost matrix from each element of each row of the cost matrix; column transformation: each element of each column of the cost matrix is subtracted by the minimum element of the column, and columns with 0 do not need to be subjected to column transformation;

step 2.3.2: searching 0 elements of n different rows and columns in the expense matrix subjected to row transformation and column transformation, if the 0 elements of the n different rows and columns are found, corresponding to optimal assignment, and if the 0 elements of the n different rows and columns are not found, carrying out the next step; in this step, the 0 elements of different columns of n different rows are typically found using a labeling method; sequentially checking each row of the new expense matrix, finding out the row which only has one unmarked 0 element, marking the 0 element, and completely marking out the 0 element which is in the same column with the 0 element; sequentially checking each column of the new expense matrix, finding out a column with only one unmarked 0 element, marking the 0 element, and completely marking out the 0 element in the same row with the 0 element;

step 2.3.3: adjusting the new cost matrix; drawing a horizontal line or a vertical line for each marked 0 element, so that the horizontal line and the vertical line cover all the 0 elements; finding the minimum element in the elements which are not passed by the horizontal line and the vertical line; subtracting the minimum number from each row element without drawing a horizontal line, and adding the minimum number to each column element with drawing a vertical line; and finding 0 elements of n different rows and different columns in the expense matrix again so as to find the optimal assignment.

The utility model provides a many human posture estimate detects and cut apart optimization method and device based on example is cut apart, includes the device shell, the device shell outside is provided with the louvre, device shell one side is provided with image acquisition module, device shell inside is provided with image processing module, image storage module, data storage module and data processing module, wherein:

the image acquisition module is used for acquiring a human body image;

the image processing module is used for carrying out processing such as framing and denoising on the acquired image;

the image storage module is used for storing the processed image data;

the data storage module is used for storing a required operation program;

the data processing module is used for operating a required operation program.

Preferably, the image processing module, the image storage module and the data storage module are all electrically connected with the data processing module, and the image acquisition module is electrically connected with the image processing module.

Compared with the prior art, the invention provides a method and a device for multi-human body posture estimation detection and segmentation optimization based on example segmentation, which have the following beneficial effects:

1. according to the method, different human body individuals are marked by mask codes of different colors based on example segmentation, key points of human body postures are marked, the postures of the human bodies are estimated and detected, the different human body individuals are continuously marked by the mask codes of corresponding colors, the key points of the human body postures are continuously marked until the human body postures can be estimated and detected, finally, the human body posture estimation and detection results are subjected to difference judgment to obtain human body posture estimation and detection results, the human body postures are dynamically detected, and the detection precision is improved;

2. according to the invention, the characteristic points are reserved, the reserved characteristic points are input into the CNN and then are compared, the characteristic points with higher similarity in the subsequent images are extracted, the masks with corresponding colors are matched based on the extracted characteristic points, and new masks are generated corresponding to the newly identified characteristic points, so that the segmentation burden is reduced, and the data analysis load is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention without limiting the invention in which:

FIG. 1 is a block diagram of a method and apparatus for multi-human pose estimation detection and segmentation optimization based on example segmentation according to the present invention;

FIG. 2 is a block diagram of a multi-human pose estimation detection and segmentation optimization method and apparatus based on example segmentation according to the present invention;

FIG. 3 is a block diagram of a multi-human pose estimation detection and segmentation optimization method and apparatus according to an embodiment of the present invention;

in the figure: the device comprises an image acquisition module 1, a device shell 2, a heat dissipation hole 3, an image processing module 4, an image storage module 5, a data storage module 6 and a data processing module 7.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution: a multi-human body posture estimation detection and segmentation optimization method based on example segmentation comprises the following steps:

and 2, step: marking different human individuals by masks with different colors based on example segmentation, marking key points of human postures, and estimating and detecting the postures of the human bodies;

and 4, step 4: and carrying out difference judgment according to the human body posture estimation detection result to obtain a human body posture estimation detection result.

In the present invention, preferably, step 1 comprises the steps of:

step 1.4: fine-tuning the extracted features;

In the present invention, preferably, step 2 comprises the steps of:

step 2.2: performing joint point splicing according to the generated mask;

In the present invention, preferably, step 3 comprises the steps of:

step 3.4: splicing and matching the joints again according to the step 2;

In the present invention, preferably, step 4 includes the steps of:

step 4.3: and by analogy, comparing a plurality of m-th estimated human body posture results with m-1-th estimated human body posture results.

In the present invention, preferably, step 1.4 comprises the following steps:

step 1.4.1: for N vertices { x_ii 1.... N }, a feature vector of each vertex is first constructed, vertex x_iCharacteristic f of_iConcate [ F (x) for corresponding network feature and vertex coordinates_i)；x_i']Wherein F is a characteristic diagram of the output of the backbone network, F (x)_i) Is a vertex x_iAt bilinear difference output, additional x_i' for describing positional relationship between vertices, x_i' is translation invariant, the minimum x and y of all vertices in the contour are subtracted from each vertex coordinate to get the relative coordinate;

in the present invention, preferably, step 2.2 comprises the following steps:

step 2.2.1: for any two joint positions d_j1And d_j2The correlation of the bone point pairs, i.e. the confidence of the bone point pairs, is characterized by calculating the linear integral of PAFs, and the formula is as follows:

In the present invention, preferably, the hungarian algorithm in step 2.3 mainly comprises the following steps:

step 2.3.1: firstly, carrying out row transformation and then carrying out column transformation on the expense matrix; line transformation: subtracting the minimum element of each row of the cost matrix from each element of each row of the cost matrix; and (3) column transformation: each element of each column of the cost matrix is subtracted by the minimum element of the column, and columns with 0 do not need to be subjected to column transformation;

step 2.3.2: searching 0 elements of different columns of n different rows in the expense matrix subjected to row transformation and column transformation, if the 0 elements of different columns of the n different rows are found, the positions of the 0 elements of different columns of the n different rows correspond to the optimal assignment, and if the 0 elements of different columns of the n different rows do not correspond to the optimal assignment, performing the next step; in this step, the labeling method is generally used to find 0 elements in n different rows and columns; sequentially checking each row of the new expense matrix, finding out the row which only has one unmarked 0 element, marking the 0 element, and completely marking out the 0 element which is in the same column with the 0 element; checking each column of the new expense matrix in sequence, finding out a column with only one unmarked 0 element, marking the 0 element, and completely marking out the 0 element in the same row with the 0 element;

step 2.3.3: adjusting the new cost matrix; drawing a horizontal line or a vertical line for each marked 0 element, so that the horizontal line and the vertical line cover all the 0 elements; finding the smallest element among the elements which are not passed by the horizontal line and the vertical line; each row element of the horizontal line is not drawn minus the minimum number, and each column element of the vertical line is drawn plus the minimum number; and finding 0 elements of n different rows and different columns in the expense matrix again so as to find the optimal assignment.

The utility model provides a many human posture estimation detection and segmentation optimization method and device based on example is cut apart, includes device shell 2, and the device shell 2 outside is provided with louvre 3, and device shell 2 one side is provided with image acquisition module 1, and device shell 2 is inside to be provided with image processing module 4, image storage module 5, data storage module 6 and data processing module 7, wherein:

the image acquisition module 1 is used for acquiring a human body image;

the image processing module 4 is used for performing processing such as framing and denoising on the acquired image;

the image storage module 5 is used for storing the processed image data;

the data storage module 6 is used for storing required operation programs;

the data processing module 7 is used for running a required operation program.

In the present invention, preferably, the image processing module 4, the image storage module 5 and the data storage module 6 are electrically connected to the data processing module 7, and the image acquisition module 1 is electrically connected to the image processing module 4.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A multi-human body posture estimation detection and segmentation optimization method based on example segmentation is characterized by comprising the following steps: the method comprises the following steps:

2. The method of claim 1, wherein the method comprises: the step 1 comprises the following steps:

step 1.2: identifying corresponding feature points on the framed image, and reserving the feature points;

step 1.4: fine-tuning the extracted features;

3. The method of claim 1, wherein the method comprises: the step 2 comprises the following steps:

step 2.2: joint point splicing is carried out according to the generated mask;

step 2.3: because the number of people in the picture is uncertain and simultaneously along with the problems of shielding, deformation and the like, only local optimum can be ensured by only using the joint pair similarity calculation, and therefore, a point set of a certain joint point of different human bodies needs to be uniquely matched with other different point sets, such as: and a group of the points represent a point set of the elbow and a point set of the wrist, the points in the two point sets must have unique matching, as the correlation PAF between the joint points is known, the key points are taken as the vertexes of the graph, and the correlation PAF between the key points is taken as the edge weight of the graph, so that the multi-person detection problem is converted into a bipartite graph matching problem, and the Hungary algorithm is used for obtaining the optimal matching of the connected key points.

4. The method of claim 1, wherein the method comprises: the step 3 comprises the following steps:

step 3.4: splicing and matching the joints again according to the step 2;

step 3.5: and performing posture estimation on the human body with the new generated mask.

5. The method of claim 1, wherein the method comprises: the step 4 comprises the following steps:

6. The method of claim 2, wherein the method comprises: the step 1.4 comprises the following steps:

step 1.4.1: for N vertices { x_iI 1.... N }, a feature vector of each vertex is first constructed, and vertex x is a vertex x_iCharacteristic f of_iConcate [ F (x) for corresponding network feature and vertex coordinates_i)；x′_i]Wherein F is a characteristic diagram of the output of the backbone network, F (x)_i) Is a vertex x_iThe bilinear difference output of (d), additional x'_iFor describing the position relationship, x 'between the vertexes'_iThe translation is unchanged, and the minimum x and y of all vertexes in the contour are subtracted from each vertex coordinate to obtain a relative coordinate;

step 1.4.2: after the vertex features are obtained, the contour features need to be further learned, and the features of the vertices can be regarded as 1-D discrete signals f: Z → R^DThen using standard convolution to pair the verticesProcessing one by one;

7. the method of claim 3, wherein the method comprises: the step 2.2 comprises the following steps:

step 2.2.2: for fast integration calculation, the similarity between the two joint points is generally approximated by uniform sampling, and the formula is as follows:

p(u)＝(1-u)d_j1+ud_j2。

8. the method of claim 3, wherein the method comprises: the main steps of the Hungarian algorithm in step 2.3 are as follows:

step 2.3.2: searching 0 elements of different columns of n different rows in the expense matrix subjected to row transformation and column transformation, if the 0 elements of different columns of the n different rows are found, the positions of the 0 elements of different columns of the n different rows correspond to the optimal assignment, and if the 0 elements of different columns of the n different rows do not correspond to the optimal assignment, performing the next step; in this step, the 0 elements of different columns of n different rows are typically found using a labeling method; sequentially checking each row of the new expense matrix, finding out the row which only has one unmarked 0 element, marking the 0 element, and completely marking out the 0 element which is in the same column with the 0 element; sequentially checking each column of the new expense matrix, finding out a column with only one unmarked 0 element, marking the 0 element, and completely marking out the 0 element in the same row with the 0 element;

step 2.3.3: adjusting the new cost matrix; drawing a horizontal line or a vertical line for each marked 0 element, so that the horizontal line and the vertical line cover all the 0 elements; finding the smallest element among the elements which are not passed by the horizontal line and the vertical line; subtracting the minimum number from each row element without drawing a horizontal line, and adding the minimum number to each column element with drawing a vertical line; find the 0 elements of n different rows and columns in the cost matrix again, thereby finding the optimal assignment.

9. The method and apparatus for multi-human pose estimation detection and segmentation optimization based on example segmentation as claimed in claim 1, wherein: including device shell (2), device shell (2) outside is provided with louvre (3), device shell (2) one side is provided with image acquisition module (1), inside image processing module (4), image storage module (5), data storage module (6) and the data processing module (7) of being provided with of device shell (2), wherein:

the image acquisition module (1) is used for acquiring a human body image;

the image processing module (4) is used for performing processing such as framing and denoising on the acquired image;

the image storage module (5) is used for storing the processed image data;

the data storage module (6) is used for storing an operation program required by any one of claims 1 to 7;

the data processing module (7) is used for operating the calculation program required by any one of claims 1 to 7.

10. The method and apparatus for multi-human pose estimation detection and segmentation optimization based on example segmentation as claimed in claim 9, wherein: the image processing module (4), the image storage module (5) and the data storage module (6) are electrically connected with the data processing module (7), and the image acquisition module (1) is electrically connected with the image processing module (4).