Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
The embodiment of the invention relates to a key point detection network training method. The specific flow is shown in figure 1.
Step 101, obtaining a preset key point template, a training sample image and a sample key point corresponding to the training sample image, wherein the key point template is a pre-constructed shape standard of an initial key point detection network output result.
In the present embodiment, the training sample image is generally an image including a human face.
Optionally, the sample image is randomly occluded and/or randomly transformed to obtain a training sample image.
The random shielding specifically comprises the following steps: the data of the shelters are collected in advance, and when the detected object is a human face, the shelters can be a mask, sunglasses, eyes, hands and the like. The data of each shelter comprises pictures of multiple visual angles of the shelter, relevant key points are marked, for example, a mask is used as the shelter, and the left view, the front view and the right view of the mask are collected respectively. Randomly selecting an added obstruction in a sample image, carrying out affine transformation on a plurality of pictures of the obstruction at different view angles and the sample image after the added obstruction is determined, calculating to obtain an error of the affine transformation, and selecting a picture with the minimum error of the affine transformation from the plurality of pictures of the obstruction at different view angles as an obstruction picture added into the sample image; and performing Poisson fusion on the selected shelter image and the sample image to realize random shelter on the sample image, and finally taking the sample image subjected to random shelter as a training sample image. Because affine transformation errors between the shielding objects with different visual angles and the sample image are calculated, the shielding object picture with the minimum error can be selected, so that the visual angle of the selected shielding object is closer to the visual angle of the sample image, the fusion between the sample image and the shielding object picture is more natural, and the generated training sample image subjected to random shielding processing is closer to the real shielding condition.
It should be noted that the number of selected obstacles may not be 1, that is, a plurality of obstacles may be selected to perform the occlusion processing on the sample image at the same time.
The random transformation is specifically as follows: the sample image is subjected to random map image transformation, which may include mirroring, cropping, stretching, erasing, rotating, and the like. The random mirroring is to turn the sample image left and right and determine the key points after turning; randomly cutting a part in the sample image, adjusting the cut image into a preset size, and determining a transformed key point; randomly stretching is to stretch the sample image along the horizontal direction or the vertical direction, fill the blank of the stretched image, adjust the image to a preset size and determine the transformed key points; randomly erasing is to randomly generate a random color area in the sample image, wherein the random color area is an erasing area, and a key point after random erasing is determined; the random rotation is to randomly select a group of key points from a plurality of preset key points with rotation angles (Roll angles), perform affine transformation on the selected key points and the sample image, adjust the selected key points to a preset size, and determine transformed key points. The random transformation method is not limited to the above five methods, and other image transformation methods may be applied.
It should be noted that the randomly blocked and randomly erased area needs to include a region to be detected, for example, for an application scene in which a key point of a human face is detected, the region to be detected is a human face region, a blocking object needs to be blocked in the human face region, and the erased region at least includes a part of the human face region.
Training sample images are obtained through random occlusion and/or random transformation, so that the trained key point detection network can be suitable for processing images with occlusion and/or special states.
And 102, obtaining training detection key points by combining the training sample images and the key point templates through an initial key point detection network.
Optionally, determining a first training detection key point by an initial key point detection network in combination with the training sample image and the key point template; determining a second training detection key point by a preset network node in the initial key point detection network and combining a training sample image; and determining training detection key points according to the first training detection key points and the second training detection key points.
The initial key point detection network comprises a plurality of network nodes, N network nodes are preset, N groups of key points are output through the N network nodes, and the N groups of key points are added to serve as second training detection key points.
Optionally, inputting the training sample image into an initial key point detection network to obtain the key point classification probability corresponding to the training sample image; and determining a first training detection key point according to the key point classification probability and the key point template.
Specifically, a certain network node in the initial key point detection network is set as an output node of a first training detection key point, and the key points generated by the output node are normalized through a full-connection layer to generate a key point classification probability; or processing all nodes of the initial key point detection network to generate key points, and normalizing the key points through the full-connection layer to generate the key point classification probability. Since there are a large number of key points in the image, the classification probability is calculated for the key points that typically most represent the image features, for example, at least five key points of the left eye, the right eye, the nose, the left mouth corner and the right mouth corner are set as typical key points in the face image, and the classification probability of the typical key points is calculated.
And the key point classification probability is a probability vector, and the key point classification probability is multiplied by a preset key point template to obtain a first training detection key point. The method for acquiring the key point template comprises the following steps: obtaining a certain number of sample images marked with key points; clustering key points in the sample image marked with the key points to obtain a key point cluster; and aligning and unifying the key point clusters to obtain a key point template. Since the keypoint classification probability is the distribution probability of a typical keypoint, the keypoint template is also a point cluster of the typical keypoint.
Taking preset three network nodes as an example, taking the product of the classification probability of the key points and the key point template as a first training detection key point, outputting three groups of second training detection key points by the three network nodes, and adding the first training detection key point and the three groups of second training detection key points to be used as a training detection key point. The training detection key points finally obtained by the initial key point detection network are shown in formula (1).
Wherein, W
clThe keypoint classification probability, Centers the keypoint template,
a second detection keypoint output for the first network node,
a second detection keypoint output for a second network node,
a second detection keypoint output for the third network node.
In addition, in order to prevent the initial key point detection network from generating preference to a certain type of pictures due to the fact that the type of the training sample images is single, attributes are set for the training sample images, and adaptive weights are set according to the attributes in a calculation mode.
In one example, three attributes are set for the training sample image, respectively: the method comprises the following steps of obtaining a course angle (yaw angle) attribute, a pitch angle (pitch angle) attribute and a shielding attribute, wherein the course angle attribute is the offset of a training sample image at the course angle, the pitch angle attribute is the offset of the training sample image at the pitch angle, and the shielding attribute is whether the training sample image is shielded or not. And calculating the distribution probability of each attribute in the training sample image. The offset angle of the pitch angle and the heading angle is-90 ° to 90 °, and every 10 ° is set to be one interval, that is, 18 intervals, which are-9 to 9 intervals, respectively. The distribution probabilities of the heading angle (yaw angle) attribute, the pitch angle (pitch angle) attribute and the occlusion attribute in a set of training sample images (batch) are calculated, respectively.
The calculation of the course angle distribution probability is shown in formula (2).
Among them, frequency (yaw)i) For the distribution probability of the heading angle, batch is the number of training sample images in a set of training sample images, yawiInterval value of course angular distribution, # { yaw = yawiIs yaiThe training sample images of the interval are counted in a set of training sample images.
Specifically, the intervals of the course angles appearing in a group of training sample images are counted, and the times of appearance of each course angle interval in the group of training sample images are calculated.
For example, if the number of training sample images in a set of training sample images is 50, where the heading angle offset is 10 in the-1 interval, then
Therefore, if the course angle deviation of the training sample image is in the-1 interval, the distribution probability of the training sample image on the heading angle is 20%, and the attribute weight of the training sample image is calculated according to the distribution probability of the course angle.
The pitch angle distribution probability is calculated as shown in equation (3).
Among them, frequency (pitch)i) For the distribution probability of pitch angle, pitchiFor interval values of pitch angle distribution, # { pitch = pitchiIs pitchiThe training sample images of the interval are counted in a set of training sample images.
Specifically, the intervals of the pitch angles appearing in a set of training sample images are counted, and the interval of each pitch angle in the interval of the appearance of the set of training sample images is calculated.
For the occlusion attribute, if the object to be detected in the training sample image is occluded, it is 1, otherwise it is 0. The occlusion property distribution probability is calculated as shown in equation (4).
Among them, frequency (occ)i) For distribution probability of occlusion property, occiFor the value of the occlusion attribute, # { occ = occiThe number of occluded training sample images in a set of training sample images.
And calculating the attribute weight of each training sample image according to the distribution probability of the course angle (yaw angle) attribute, the pitch angle (pitch angle) attribute and the shielding attribute. Attribute weight (W)i) The calculation of (c) is shown in equation (5).
By setting the weight of the preset attribute for each training sample image, the attribute preference of the trained key point detection network caused by excessive training of a plurality of training sample images with the same attribute is prevented, and the detection accuracy of the key point detection network is improved.
And 103, performing iterative training on the initial key point detection network according to the training detection key points and the sample key points, and taking the initial key point detection network meeting preset iterative conditions as the trained key point detection network.
Optionally, calculating losses of the first training detection key point and the second training detection key point respectively through a key point loss function and the sample key point to obtain key point loss values, wherein the key point loss values include a loss value corresponding to the first training detection key point and a loss value corresponding to the second training detection key point; acquiring the weight of a preset attribute corresponding to a training sample image; calculating the loss of the weight of the preset attribute through an attribute loss function to obtain an attribute loss value; and performing iterative training on the key detection network according to the key point loss value and the attribute loss value until convergence so as to obtain the trained key point detection network.
And performing regression calculation on the initial key point detection network according to the training detection key points and the sample key points. The training detection key points are formed by adding the first training detection key points and the second training detection key points, so that the first training detection key points and the second training detection key points are subjected to regression calculation respectively, and the loss values of the first training detection key points and the loss values of the second training detection key points are added to be used as key point loss values. In this embodiment, three groups of second training detection key points output by three network nodes are taken as examples, and are shown in equations (6) to (10).
Therein, Loss
clLoss values for the keypoints were examined for the first training, Centers is the keypoint template, Land
gtIs a sample key point;
a loss value corresponding to the second training detection key point output for the first network node,
a loss value corresponding to the second training detection key point output for the second network node,
and outputting a Loss value corresponding to the second training detection key point for the third network node, wherein the Loss is the key point Loss value.
Optionally, obtaining a distribution probability of preset attributes of the training sample image, where the preset attributes at least include an occlusion attribute and a rotation attribute; and obtaining the weight of the preset attribute of the training sample image according to the distribution probability.
Wherein the rotation attribute is a heading angle (yaw angle) attribute and a pitch angle (pitch angle) attribute. The weights of the preset attributes have been recorded in detail in step 102, and are not described in detail here. When the initial key point detection network is trained, the weight of the preset attribute also calculates a loss value, and the attribute loss value is calculated according to the key point loss value and the weight of the preset attribute, as shown in formula (11).
Therein, LosslandA loss value which is a weight of a preset attribute; lossiIs the key point loss value; wiIs the weight of the preset attribute.
And performing iterative training on the initial key point detection network until convergence through the key point loss value and the attribute loss value, and taking the converged initial key point detection network as the trained key point detection network. It should be noted that the preset iteration condition may be a degree of convergence of the training.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
According to the key point detection network training method provided by the embodiment of the invention, the preset key point template is used as the prior of the key point, so that the abnormal shape of the detected key point is avoided, and the stability of the shape of the detected key point is improved. Due to the key point template, the subsequent network training is actually the residual error calculated by the key point template and the sample key point after the shape of the detected key point is corrected through the key point template, so that the accuracy of the detection result is improved, and the difficulty of the network training is reduced. In addition, the key points are calculated by adopting the network twice, so that the detected key points are more accurate. And obtaining the key points with normal shapes by calculating the distribution probability and the key point template. And the regression loss is respectively calculated for each training detection key point, so that the training process is more stable and sufficient. And setting attribute weight for the image, avoiding the condition that the training key point network generates preference to certain attributes due to the attributes of the training samples, and improving the accuracy of the prediction result of the key point detection network. And occlusion is artificially added to the training sample image, so that the trained detection network can be suitable for application scenes with more scenes.
The embodiment of the invention relates to a key point detection network training method. This embodiment is substantially the same as the other embodiments, and is mainly different in that a correction network is input before a training sample image is input to an initial keypoint detection network, so that the sample image is subjected to correction processing by the correction network. In the present embodiment, three groups of second training detection key points are taken as an example.
The object to be detected in the sample image may have problems of rotation angle, position deviation or non-uniform size, and the like, so the sample image can be corrected through a correction network, and the correction at least comprises attitude correction and position correction. The attitude correction may include a rotation angle correction and a cropping correction. The rotation angle correction, the cropping correction, and the position correction are exemplified as shown in equation (12).
Wherein scale is clipping correction, R
rollFor rotation angle correction, trans for position correction,
a loss value corresponding to the second training detection key point output for the first network node,
a loss value corresponding to the second training detection key point output for the second network node,
and outputting the loss value corresponding to the second training detection key point for the third network node.
It should be noted that, for the modified network, it may be a network branch of the initial keypoint detection network, and the modified network and the initial keypoint detection network may be trained jointly or separately, which is not limited herein.
Taking joint training as an example, the modified network and the initial keypoint detection network are combined to serve as the first keypoint detection network, and therefore, the training method for the first keypoint detection network is shown in equations (13) to (17).
Therein, Loss
cl'is the Loss value corresponding to the first training detection key point of the first key point detection network, and Loss' is the key point Loss value of the embodiment, Land
gtIn order to identify the key points of the sample,
a loss value corresponding to the second training detection key point output for the first network node,
a loss value corresponding to the second training detection key point output for the second network node,
and outputting the loss value corresponding to the second training detection key point for the third network node.
Since the key point detection network training method in this embodiment is substantially the same as that described in other embodiments, the details of the related art mentioned in other embodiments are still valid in this embodiment, and the description related to this embodiment is also applicable to other embodiments, and is not repeated here in order to reduce the repetition.
According to the key point detection network training method provided by the embodiment of the invention, the preset key point template is used as the prior of the key point, so that the abnormal shape of the detected key point is avoided, and the stability of the shape of the detected key point is improved. Due to the key point template, the subsequent network training is actually the residual error calculated by the key point template and the sample key point after the shape of the detected key point is corrected through the key point template, so that the accuracy of the detection result is improved, and the difficulty of the network training is reduced. In addition, the key points are calculated by adopting the network twice, so that the detected key points are more accurate. And obtaining the key points with normal shapes by calculating the distribution probability and the key point template. And the regression loss is respectively calculated for each training detection key point, so that the training process is more stable and sufficient. And setting attribute weight for the image, avoiding the condition that the training key point network generates preference to certain attributes due to the attributes of the training samples, and improving the accuracy of the initial key point detection network prediction result. And occlusion is artificially added to the sample image, so that the trained detection network can be suitable for application scenes with more scenes. And the attitude correction and the position correction are carried out on the image in advance, so that the key points detected by the initial key point detection network are more accurate.
The embodiment of the invention relates to a key point detection method, and a key point detection network obtained by training based on the embodiment of the key point detection network training method is shown in fig. 2 and comprises the following steps:
step 201, inputting a first detection image into a first key point detection network to obtain a first detection key point.
The first detection image is an image to be detected which is taken out from the frame in the original detection image.
Optionally, the first keypoint detection network further includes a correction network, and inputting the first detection image into the first keypoint detection network to obtain the first detection keypoint, where the method includes: inputting the first detection image into a correction network to obtain a corrected first detection image, wherein the correction network is at least used for correcting attitude errors and position errors; and obtaining a first detection key point by combining the corrected first detection image and the key point template through a key point detection network.
In the present embodiment, a method of combining the correction network, the keypoint detection network, and the keypoint template to form the first keypoint detection network and generating the first detected keypoints is shown in formula (18).
Wherein scale is clipping correction, R
rollFor rotation angle correction, trans for position correction, Centers for key point template,
a loss value corresponding to the second training detection key point output for the first network node,
a loss value corresponding to the second training detection key point output for the second network node,
and outputting a loss value corresponding to the second training detection key point for the Nth network node.
The purpose of correcting the network is to make the position of the framed image to be detected (first detection image) more accurate.
It should be noted that the key point template may be used for detecting network layer data in the network for the key point, and also used for further calculating with the output result of the network for the key point detection. Specifically, if the key point template is a key point detection network layer data, the output result of the key point detection network is a detection key point; if the key point template is used for further calculation with the key point detection network output result, the key point detection network output result is the key point classification probability; and calculating to obtain the detection key points through the key point classification probability and the key point template.
Step 202, acquiring a second detection image through the first detection key point.
And determining a second detection image on the first detection image according to the position of the first detection key point, wherein the image to be detected framed by the second detection image is more accurate than the first detection image.
Step 203, inputting the second detection image into a second key point detection network to obtain a second detection key point as a key point detection result, wherein the first key point detection network and the second key point detection network at least comprise a key point detection network and a key point template.
Since the pose correction has been previously performed on the first detected image, there is no need to perform the image correction processing again in the second detected image, and the second detected keypoint network includes a keypoint detection network and a keypoint template.
Therefore, the embodiment may be configured to input the first detection image into the first keypoint detection network to obtain the first detection keypoint; acquiring a second detection image according to the first detection key point; and inputting the second detection image into a second key point detection network to obtain a second detection key point.
It should be noted that, in order to obtain a higher accuracy of detecting the keypoints, the generated detection image may be repeatedly input into the second keypoint detection network, i.e., steps 202 to 203 are repeated.
Since the key point detection method in this embodiment is substantially the same as that described in other embodiments, the details of the related art mentioned in other embodiments are still valid in this embodiment, and the description related to this embodiment is also applicable to other embodiments, and is not repeated here in order to reduce redundancy.
The embodiment of the invention relates to a key point detection method, which applies the key point detection network described in the embodiment and takes an image to be detected as a face image as an example.
And framing the face image in the original detection image as a first detection image through a common face recognition network.
Since the face image framed by the first detection image may have a positional deviation, an inappropriate size, or a rotation angle (Roll angle) of the face image, the first detection key point is acquired by the first key point detection network including the correction network.
And framing the second detection image through the first detection key point, so that the face image framed by the second detection image is more accurate.
And inputting the second detection image into a second key point detection network to obtain a second detection key point.
Through two times of key point detection, the detected second detection key point has higher precision, and therefore the second detection key point is used as a key point detection result.
Since the key point detection method in this embodiment is substantially the same as that described in other embodiments, the details of the related art mentioned in other embodiments are still valid in this embodiment, and the description related to this embodiment is also applicable to other embodiments, and is not repeated here in order to reduce redundancy.
The embodiment of the invention relates to a key point detection network training device, as shown in fig. 3, comprising:
the data obtaining module 301 is configured to obtain a preset key point template, a training sample image, and a sample key point corresponding to the training sample image, where the key point template is a shape standard of a pre-constructed initial key point detection network output result.
And a keypoint generation module 302, configured to obtain training detection keypoints by combining the training sample images and the keypoint templates through an initial keypoint detection network.
And the network training module 303 is configured to perform iterative training on the initial key point detection network according to the training detection key points and the sample key points, and use the initial key point detection network meeting the preset iterative condition as the trained key point detection network.
It is obvious that this embodiment is an embodiment of an apparatus corresponding to the embodiment of the method for training a keypoint detection network, and this embodiment can be implemented in cooperation with the embodiment of the method for training a keypoint detection network. The relevant technical details mentioned in the embodiment of the key point detection network training method are still valid in the embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to other embodiments.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
An embodiment of the present invention relates to a key point detecting device, as shown in fig. 4, including:
the first network detection module 401 is configured to input the first detection image into the first keypoint detection network to obtain the first detection keypoint.
A second image obtaining module 402, configured to obtain a second detection image through the first detection key point.
The second network detection module 403 is configured to input a second detection image into a second keypoint detection network to obtain a second detection keypoint as a result of keypoint detection, where the first keypoint detection network and the second keypoint detection network at least include a keypoint detection network and a keypoint template.
It should be understood that the present embodiment is an embodiment of an apparatus corresponding to the embodiment of the method for detecting a keypoint, and the present embodiment can be implemented in cooperation with the embodiment of the method for detecting a keypoint. Relevant technical details mentioned in the embodiment of the key point detection method are still valid in the embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to other embodiments.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
An embodiment of the present invention relates to an electronic device, as shown in fig. 5, including:
at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; wherein the memory stores instructions executable by the at least one processor 501 to cause the at least one processor 501 to perform the keypoint detection network training method of any embodiment or the keypoint detection method of any embodiment.
The memory 502 and the processor 501 are connected by a bus, which may include any number of interconnected buses and bridges that link one or more of the various circuits of the processor 501 and the memory 502. The bus may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.
The processor 501 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 502 may be used to store data used by the processor in performing operations.
The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.
The present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes any of the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific embodiments for practicing the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.