CN116469127A

CN116469127A - Method and device for detecting key points of cow face, electronic equipment and storage medium

Info

Publication number: CN116469127A
Application number: CN202310334078.6A
Authority: CN
Inventors: 胡魁; 李佼; 盛建达; 戴磊; 陈远旭
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-03-23
Filing date: 2023-03-23
Publication date: 2023-07-21

Abstract

The invention relates to the technical field of artificial intelligence, and provides a method, a device, electronic equipment and a storage medium for detecting a cow face key point. According to the invention, the key points are classified into rigid key points and non-rigid key points, different loss functions are used for training, and the detection accuracy of the model is improved; according to the offset determined by the key points of the cattle ears, the overall structure perception capability of the model can be improved.

Description

Method and device for detecting key points of cow face, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for detecting key points of a cow face, electronic equipment and a storage medium.

Background

The cattle face recognition technology is a cattle face recognition method for distinguishing cattle face features in images. The whole process of the cattle face recognition comprises three main steps of cattle face detection, cattle face key point detection and cattle face recognition. The face detection is a special case of target detection, and is to frame all faces in the image. The detection of the key points of the cow face is to locate the position points of the key areas of the cow face, and the cow face is unified 'righted' by using affine transformation by utilizing the key points, so that errors caused by different postures in the cow face identification are eliminated as much as possible. Therefore, the detection of the cow face and the detection of the key points of the cow face are the basis of the recognition of the cow face.

There are two main current face key point detection algorithms: one is heat map-based keypoint detection, and one is coordinate point regression-based keypoint detection.

The inventor finds out in the process of realizing the invention that the key point detection based on the heat map has strong spatial generalization capability, but has obvious defects, the heat map cannot be made end-to-end tiny in the training process, namely the actual coordinate value from the heat map to the key point is not conductive, meanwhile, the output of the heat map prediction has the up-sampling process, the training and reasoning speed is relatively slow, and the detection efficiency is relatively low. The key point detection based on the coordinate point regression has the advantages that the training and reasoning speed is high, but the whole structure information is insufficient due to the fact that a full link layer is finally used, and particularly under the condition that the data size is small, the coordinate point regression mode is easy to cause overfitting, so that the generalization capability of a model is reduced.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a method, an apparatus, an electronic device, and a storage medium for detecting a cow face key point, which can improve the detection efficiency and generalization ability of a cow face key point detection model.

The first aspect of the invention provides a method for detecting a cow face key point, comprising the following steps:

acquiring a plurality of original cow face images, processing the original cow face images, and creating a cow face image set according to the processed original cow face images;

marking the key points of the cow face and the key points of the cow ears in each cow face image in the cow face image set;

acquiring coordinate values of the cow face key points and coordinate values of the cow ear key points, and calculating coordinate offset values according to the coordinate values of the cow ear key points;

initializing a face key point detection network architecture and initializing a loss function according to coordinate values of the face key points and the coordinate offset values;

taking the cow face image set as input of the cow face key point detection network, taking the minimized loss function as a training target, and performing iterative training on the cow face key point detection network to obtain a cow face key point detection model;

And responding to a detection instruction of the target cow face image, and detecting cow face key points of the target cow face image by using the cow face key point detection model.

According to an optional embodiment of the invention, the face keypoint detection network architecture comprises three branches:

the first branch is used for the face key point regression;

the second branch is used for offset regression of the key points of the cattle ears;

the third branch is used for classification confidence of the keypoints.

According to an alternative embodiment of the invention, the loss function is:

L _total ＝L _landmark +αL _oTfset +βL _score ，

wherein alpha and beta are superparameters, L _landmark Is the regression loss function of key points of the cow face, L _offset Is the offset regression loss function of the key points of the cattle ears, L _score Is a confidence constraint function of the keypoints.

According to an optional embodiment of the invention, the calculating the coordinate offset value according to the coordinate value of the key point of the cow ear comprises:

acquiring a plurality of appointed key points and a plurality of non-appointed key points in the cattle ear key points;

calculating a coordinate mean value according to coordinate values of the specified key points;

and calculating according to the coordinate value of each non-designated key point and the coordinate mean value to obtain a coordinate offset value.

According to an optional embodiment of the invention, the processing the plurality of original face images, and creating the face image set according to the processed original face images includes:

calculating to obtain a mean image according to a plurality of original cow face images;

calculating an index value based on each original cow face image and the average value image by adopting a preset index calculation model;

intercepting the original cow face image according to the index value to obtain an intercepted cow face image;

normalizing the original face images and the intercepted face images to obtain normalized face images;

and creating a cattle face image set according to the normalized cattle face images.

According to an optional embodiment of the present invention, the capturing the original face image according to the index value includes:

acquiring an index threshold corresponding to the preset index calculation model;

comparing the index value with the index threshold value to obtain a comparison result;

determining a cut-out frame according to the comparison result;

carrying out cow face detection on the original cow face image to obtain a cow face detection frame;

Taking the center of the cow face detection frame as the center of the interception frame, and intercepting the original cow face image according to the interception frame to obtain the intercepted cow face image.

According to an optional embodiment of the invention, the detecting the face keypoints of the target face image using the face keypoint detection model comprises:

calculating a target index value based on the target cow face image and the mean image by adopting the preset index calculation model;

comparing the target index value with the index threshold value to obtain a target comparison result;

determining a target interception frame according to the target comparison result;

carrying out cow face detection on the target cow face image to obtain a target cow face detection frame;

taking the center of the target cow face detection frame as the center of the target interception frame, and intercepting the target cow face image according to the target interception frame to obtain a target intercepted cow face image;

normalizing the target truncated face image to obtain a target normalized face image;

and detecting the target normalized cow face image by using the cow face key point detection model to obtain the cow face key points.

A second aspect of the present invention provides a face key point detection apparatus, the apparatus comprising:

the processing module is used for acquiring a plurality of original cow face images, processing the original cow face images and creating a cow face image set according to the processed original cow face images;

the marking module is used for marking the cow face key points and the cow ear key points in each cow face image in the cow face image set;

the calculating module is used for obtaining the coordinate values of the cow face key points and the cow ear key points and calculating coordinate offset values according to the coordinate values of the cow ear key points;

the initialization module is used for initializing a bovine face key point detection network architecture and initializing a loss function according to the coordinate value of the bovine face key point and the coordinate offset value;

the training module is used for taking the cow face image set as the input of the cow face key point detection network, taking the minimized loss function as a training target, and carrying out iterative training on the cow face key point detection network to obtain a cow face key point detection model;

and the detection module is used for responding to a detection instruction of the target cow face image and detecting cow face key points of the target cow face image by using the cow face key point detection model.

A third aspect of the present invention provides an electronic device, the electronic device including a processor and a memory, the processor being configured to implement the method for detecting a key point of a cow face when executing a computer program stored in the memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for detecting a key point of a cow face.

The method, the device, the electronic equipment and the storage medium for detecting the key points of the cow face, provided by the embodiment of the invention, avoid using the heat map as the input of the key point detection model of the cow face, thereby solving the problems of low speed and difficult landing in the actual use scene (mobile equipment and edge equipment); meanwhile, the key points are classified into rigid key points (cow face key points) and non-rigid key points (cow ear key points), and different loss functions are used for training the cow face key point detection model, so that the adaptability of the model can be improved, and the detection accuracy of the model is improved; in addition, the offset determined according to the key points of the cow ears is added into the training process of the cow face key point detection model, so that the overall structure perception capability of the model can be improved, and the stability of model output is improved.

Furthermore, after the plurality of original cow face images are acquired, the cow face images acquired from different shooting distances are simulated by processing the plurality of original cow face images, an image data set is expanded, the technical problem of insufficient training samples is solved, and more training samples can improve the robustness of a cow face key point detection model.

Drawings

Fig. 1 is a flowchart of a method for detecting a key point of a cow face according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of key points marked in a face image according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of key points of a cow ear according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a network architecture for detecting key points of a cow face according to an embodiment of the present invention.

Fig. 5 is a block diagram of a cow face key point detection device according to a second embodiment of the present invention.

Fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

The method for detecting the key points of the cow face provided by the embodiment of the invention is executed by the electronic equipment, and correspondingly, the device for detecting the key points of the cow face is operated in the electronic equipment.

The embodiment of the invention can perform standardized treatment on symptoms based on artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Example 1

Fig. 1 is a flowchart of a method for detecting a key point of a cow face according to an embodiment of the present invention. The method for detecting the key points of the cow face specifically comprises the following steps, the sequence of the steps in the flow chart can be changed according to different requirements, and some steps can be omitted.

S11, acquiring a plurality of original cow face images, processing the original cow face images, and creating a cow face image set according to the processed original cow face images.

The electronic device may obtain a plurality of original face images in advance, thereby creating a face image dataset from the plurality of original face images.

In an alternative embodiment, the electronic device may acquire a plurality of original face images in one or more of the following combinations:

(1) Crawling the images of the cattle from a plurality of domestic networks by utilizing a web crawler, and selecting an original image with the cattle face from the collected images;

(2) Selecting a video of cattle raising from domestic agricultural programs, intercepting an image every preset frame, and selecting an original image with a cattle face from the intercepted image;

(3) And (3) acquiring videos of the cattle in the field by using a camera, intercepting an image every other preset frames, and selecting an original image with the cattle face from the intercepted images.

In this embodiment, the breed of the cow is not limited, and the shooting environment of the picture is not limited, and the breed of the cow may include, but is not limited to: chinese black and white cattle, holstein cows, silk cattle, siemens hybrid cattle, red cattle, black cattle, yellow cattle, and cattle.

In an optional embodiment, the processing the plurality of original face images, and creating the face image set according to the processed original face images includes:

In this embodiment, since the sources of the plurality of original face images acquired by the electronic device are different, the sizes and the qualities of the plurality of original face images are different, and before the mean value image is calculated according to the plurality of original face images, normalization processing may be performed on the plurality of original face images to ensure that the sizes of the plurality of original face images are the same, so that the mean value image can be calculated according to the original face images with the same size.

The process for calculating the mean image according to the original cow face images with the same size comprises the following steps: and adding pixel values at the same position in each original cow face image to obtain pixel sum values, calculating according to the pixel sum values to obtain pixel average values at the corresponding positions, and obtaining an average image according to the pixel average values. For example, assuming that there are 10 normalized raw face images with a size of w×h, extracting pixel values of a first row and a first column of each raw face image in the 10 raw face images, adding the extracted 10 pixel values to obtain a pixel sum value, and obtaining a pixel average value according to the pixel sum value, where the pixel average value is the pixel value of the first row and the first column of the average value image; extracting pixel values of a first row and a second column of each cattle face image in the 10 cattle face images, adding the extracted 10 pixel values to obtain a pixel sum value, and obtaining a pixel mean value according to the pixel sum value, wherein the pixel mean value is the pixel value of the first row and the second column of the mean image; and so on; and obtaining the average value image. The mean image reflects the average pixel information of the acquired plurality of original cow face images.

The preset index calculation model is preset to evaluate the quality of each original face image relative to the average image, and the preset index calculation model can be a Peak signal-to-noise ratio (Peak-Signal to Noise Ratio, PSNR) calculation function or a mean square error (Mean Square Error, MSE) calculation function. When the preset index calculation model is a peak signal-to-noise ratio calculation function, the index value calculated based on each original cow face image and the average value image is the peak signal-to-noise ratio. When the preset index calculation model is a mean square error calculation function, the index value calculated based on each original cow face image and the mean image is the mean square error. And taking the mean image as a reference image, and measuring the image quality of each original face image by calculating the peak signal-to-noise ratio between each original face image and the reference image or calculating the mean square error between each original face image and the reference image. The larger the peak signal-to-noise ratio, the smaller the distortion between the original face image and the reference image, and the better the image quality of the original face image. The smaller the peak signal-to-noise ratio, the greater the distortion between the original face image and the reference image, the poorer the image quality of the original face image. The smaller the mean square error is, the better the image quality of the original cow face image is, the larger the mean square error is, and the worse the image quality of the original cow face image is.

Because the index values corresponding to each original face image are different, when the corresponding original face image is intercepted according to the index values, the intercepted scale is different, and therefore the sizes of the obtained intercepted face images are also different. And normalizing the original face images and the intercepted face images to obtain normalized face images with the same size, wherein the normalized face images are collected together to obtain a face image set. The size of the cow face images in the cow face image set is the same, so that the convergence speed of the cow face key point detection model is improved when the cow face key point detection model is trained subsequently, and the training speed of the cow face key point detection model is improved.

In an optional implementation manner, the capturing the original face image according to the index value to obtain a captured face image includes:

determining a cut-out frame according to the comparison result;

The corresponding relation between the preset index calculation model and the index threshold value can be stored in a local database of the electronic equipment. According to the corresponding relation, the electronic equipment can acquire an index threshold value corresponding to the preset index calculation model. When the preset index calculation model is a peak signal-to-noise ratio calculation function, the index threshold is a preset first value, and when the preset index calculation model is a mean square error calculation function, the index threshold is a preset second value. The preset first value and the preset second value may be the same or different.

When the preset index calculation model is a peak signal-to-noise ratio calculation function and the index value is larger than the index threshold, the obtained comparison result is that the image quality of the original cow face image is good; when the preset index calculation model is a peak signal-to-noise ratio calculation function and the index value is smaller than the index threshold value, the obtained comparison result is that the image quality of the original cow face image is poor. When a preset index calculation model is a mean square error calculation function and the index value is larger than the index threshold, the obtained comparison result is that the image quality of the original cow face image is poor; when the preset index calculation model is a peak signal-to-noise ratio calculation function and the index value is smaller than the index threshold value, the obtained comparison result is that the image quality of the original cow face image is good.

When the obtained comparison result is that the image quality of the original cow face image is good, determining the interception frame as a preset first size, and when the obtained comparison result is that the image quality of the original cow face image is poor, determining the interception frame as a preset second size, wherein the preset first size is larger than the preset second size.

The electronic device may perform a face detection on each original face image using a face detection model, and use the original face image as an input of the face detection model, and input an original face image having a face detection frame through the face detection model, where the face detection frame is used to frame-select a face region in the original face image.

According to the optional implementation manner, the index value is compared with the index threshold value corresponding to the preset index calculation model, and the cut frame is dynamically determined according to the obtained comparison result, so that the cut frame is used for cutting in the original cow face image, more detail information of the cut cow face image can be reserved, and better image quality of the cut cow face image can be ensured. When the obtained comparison result is that the image quality of the original cow face image is good, the determined size of the intercepting frame is larger, the intercepting frame with larger size is used for intercepting in the original cow face image, the obtained intercepting cow face image is larger in size, the quality of the intercepting cow face image is still good, and more cow face detail information is prevented from being lost when the intercepting frame with smaller size is used for intercepting in the original cow face image; when the obtained comparison result is that the image quality of the original cow face image is poor, the determined size of the intercepting frame is smaller, the intercepting frame with the smaller size is used for intercepting the original cow face image, the obtained intercepting cow face image is smaller in size, and the image quality of the obtained intercepting cow face image is poorer when the intercepting frame with the larger size is used for intercepting the original cow face image.

In addition, in the above optional embodiment, by intercepting the original face image, the face images (including the face and the environmental information) with different sizes on the original face image are intercepted, and then the intercepted face images are normalized to unify the format sizes of the face images, so that the face images acquired from different shooting distances are simulated, the image dataset is expanded, training samples are enriched, and more training samples can improve the robustness of the face key point detection model.

S12, marking the key points of the cow face and the key points of the cow ears in each cow face image in the cow face image set.

The electronic device may use VGGImage Annotator software to mark multiple keypoints on the face and multiple keypoints on the cow ear, as shown in fig. 2, where each black dot represents a respective keypoint. The marking principle of the marking points can be to mark according to different parts of the cow face, for example, the parts of the cow nose, the cow mouth, the cow eyes, the cow ears and the like can mark more key points, while other parts of the cow face can mark a small number of key points or no key points.

S13, acquiring coordinate values of the cow face key points and coordinate values of the cow ear key points, and calculating coordinate offset values according to the coordinate values of the cow ear key points.

When the electronic equipment acquires the coordinate values of the cow face key points and the coordinate values of the cow ear key points, a coordinate system can be established according to a preset rule, so that the coordinate values of the cow face key points in the coordinate system and the coordinate values of the cow ear key points in the coordinate system can be confirmed. The preset rule may be, for example, an X-axis with an origin of a coordinate axis at an upper left corner of the face image, a horizontal line where a wide side of the face image is located, and a Y-axis with a vertical line where a high side of the face image is located.

In an alternative embodiment, the calculating the coordinate offset value according to the coordinate value of the ear key point includes:

The electronic device may designate a plurality of keypoints among the labeled plurality of cow ear keypoints, the designated keypoints being referred to as designated keypoints, and keypoints other than the designated keypoints among the plurality of cow ear keypoints being referred to as non-designated keypoints.

In an alternative embodiment, the electronic device may specify two keypoints among the marked plurality of bovine ear keypoints.

As shown in fig. 3, the schematic diagram of the key points of the bovine ear is shown, wherein the point location 1 and the point location 5 are the root of the bovine ear, the point location 2 and the point location 4 are in the bovine ear, the point location 3 is the point of the bovine ear, the electronic device determines the point location 1 and the point location 5 as the designated key points, and the point location 2, the point location 3 and the point location 4 are the non-designated key points. The electronic equipment firstly calculates the coordinate mean value according to the coordinate values of the point position 1 and the point position 5, namely calculates the coordinate value of the middle coordinate point of the point position 1 and the point position 5. Then, calculating a coordinate offset value according to the coordinate value and the coordinate mean value of the point location 2, namely calculating the coordinate offset of the point location 2 relative to the middle coordinate point; calculating a coordinate offset value according to the coordinate value and the coordinate mean value of the point location 3, namely calculating the coordinate offset of the point location 3 relative to the middle coordinate point; and calculating a coordinate offset value according to the coordinate value and the coordinate mean value of the point location 4, namely calculating the coordinate offset of the point location 4 relative to the middle coordinate point.

Illustratively, assume that the coordinate value of point location 1 is (x ₁ ，y ₁ ) The coordinate value of the point location 2 is (x ₂ ，y ₂ ) The coordinate value of the point location 3 is (x ₃ ，y ₃ ) Coordinate value of point location 4Is (x) ₄ ，y ₄ ) The coordinate value of the point location 5 is (x ₅ ，y ₅ ) The mean value of the coordinates of the designated points (point location 1, point location 5) is (x _m ，y _m )，x _m ＝(x ₁ +x ₅ )/2，y _m ＝(y ₁ +y ₅ ) If the coordinate offset value of the unspecified key point (point location 2, point location 3 and point location 4) with respect to the coordinate mean is (δx) _i ，δy _i )，δx _i ＝x _i -x _m ，δy _i ＝y _i -y _m Wherein i= [2,3,4 ]]。

S14, initializing a face key point detection network architecture and initializing a loss function according to the coordinate values of the face key points and the coordinate offset values.

The electronic device may initialize a deep convolutional neural network as a face keypoint detection network architecture, the face keypoint detection network architecture comprising three branches: the first branch is used for the regression of the key points of the cow face, the second branch is used for the regression of the offset of the key points of the cow ears, and the third branch is used for the classification confidence of the key points. As shown in fig. 4, a schematic diagram of a face keypoint detection network architecture includes an Input layer (Input), a plurality of convolution layers (Conv), and three outputs (Landmark, offset, score). The input layer of the face key point detection network receives face images for training the detection network, namely face images for marking the face key points and the ear key points, and the size of the face images can be 1024 x 1024. The method comprises the steps of carrying out feature extraction of different scales on a convolution layer, gradually increasing the size of a convolution kernel from 64 to 512, carrying out pooling operation by using a maximum pooling operation, carrying out 5 times of downsampling, setting a preset region of interest for each point by the extracted features according to the downsampling frequency of 2, screening candidate regions of a plurality of regions of interest, mapping the screened candidate regions back to the original scale, and extracting a feature map of a fixed size for each candidate region of interest by using a RoIAlign layer. The Landmark layer is used for predicting and outputting the key points of the cow face according to the feature map extracted by the last layer of convolution layer, the offset layer is used for predicting and outputting the key points of the cow ear according to the feature map extracted by the last layer of convolution layer, and the score layer is used for predicting and outputting whether the key points are visible or not according to the feature map extracted by the last layer of convolution layer.

In an alternative embodiment, the loss function may comprise three parts: the first part is a cow face key point regression loss function, the second part is an offset regression loss function of cow ear key points, and the third part is a confidence constraint function of the key points.

Specifically, the loss function can be expressed as:

L _total ＝L _landmark +αL _offset +βL _score ，

where α and β are hyper-parameters used to adjust the weights of the loss functions, equalizing the orders of magnitude of the different loss functions. L (L) _landmark Is the regression loss function of key points of the cow face, L _offset Is the offset regression loss function of the key points of the cattle ears, L _score Is a confidence constraint function of the keypoints.

In an alternative embodiment, the face keypoint regression loss function L _landmark The expression is as follows:

wherein N represents the number of key points of the cow face,euclidean distance between predicted coordinate position and marked coordinate position of key point of representing cow face>Representing the predicted coordinate position of the ith cow face key point, (x) _i ，y _i ) Marking coordinate position of ith cow face key point, c _i Indicating whether the keypoint is visible, a 0 indicates that the keypoint is not visible, and a 1 indicates that the keypoint is visible.

In an alternative embodiment, the offset regression of the ear keypoints Loss function L _offset The expression is as follows:

wherein (δx) _i ，δy _i ) A marker coordinate offset value representing a key point of the bovine ear,and the predicted coordinate offset value of the key point of the cow ear is represented.

In an alternative embodiment, the confidence constraint function L of the keypoints _score The expression is as follows:

wherein s is _i Is the confidence of the key points of the detection and prediction of the key points of the cow face.

And S15, taking the cow face image set as input of the cow face key point detection network, taking the minimized loss function as a training target, and performing iterative training on the cow face key point detection network to obtain a cow face key point detection model.

The electronic equipment inputs the cow face images in the cow face image set and the cow face key points and the cow ear key points corresponding to each cow face image into a cow face key point detection network, takes the constraint of the error between the prediction output and the real output of the cow face key points, the error between the prediction output and the real output of the cow ear key points and the confidence coefficient of the key points as training targets, carries out iterative training, and finishes the training process of the cow face key point detection network when the loss function value reaches the minimum value to obtain a cow face key point detection model.

S16, responding to a detection instruction of a target cow face image, and detecting cow face key points of the target cow face image by using the cow face key point detection model.

The target cow face image is an image which needs to be subjected to cow face key point detection.

When the electronic equipment receives the target cow face image uploaded by the user, an instruction for cow face detection on the target cow face image can be triggered, a cow face key point detection model which is trained is called to detect cow faces on the target cow face image, and cow face key points of the cow face image are output.

In an alternative embodiment, the detecting the face keypoints of the target face image using the face keypoint detection model includes:

It should be appreciated that in order to meet the input requirements of the face keypoint detection model, the size of the target face image is the same as the size of the face image in the set of face images. If the size of the target cow face image uploaded by the user is different from the size of the cow face image concentrated by the cow face image, the size of the target cow face image uploaded by the user needs to be normalized, so that the size of the target cow face image uploaded by the user is consistent with the size of the cow face image concentrated by the cow face image.

When the preset index calculation model is a peak signal-to-noise ratio calculation function, the electronic equipment calculates a target index value based on the target cow face image and the mean image to be a target peak signal-to-noise ratio. And if the target peak signal-to-noise ratio is larger, comparing the target peak signal-to-noise ratio with an index threshold corresponding to the peak signal-to-noise ratio calculation function, and obtaining a comparison result that the image quality of the target cow face image is good, and intercepting the target cow face image by adopting an intercepting frame with a preset first size to obtain the target intercepted cow face image. And comparing the target peak signal-to-noise ratio with an index threshold corresponding to the peak signal-to-noise ratio calculation function to obtain a comparison result that the image quality of the target cow face image is poor, and intercepting the target cow face image by adopting an intercepting frame with a preset second size to obtain the target intercepted cow face image.

When the preset index calculation model is a mean square error calculation function, the electronic equipment takes the index value calculated based on the target cow face image and the mean image as the target mean square error. The smaller the target mean square error is, the more the target peak signal-to-noise ratio is compared with an index threshold corresponding to the mean square error calculation function, the better the image quality of the target cow face image is, and the interception frame with the preset first size is adopted to intercept the target cow face image, so that the target intercepted cow face image is obtained. And comparing the target peak signal-to-noise ratio with an index threshold corresponding to the mean square error calculation function to obtain a comparison result, wherein the comparison result is that the image quality of the target cow face image is poor, and intercepting the target cow face image by adopting an intercepting frame with a preset second size to obtain the target intercepted cow face image.

Because the image quality levels of the target cow face images uploaded by the user are uneven, if the trained cow face key point detection model is directly used for detecting the target cow face images uploaded by the user, the confidence of the detected target key points is higher for the target cow face images with better image quality, but the confidence of the detected target key points is lower for the target cow face images with poorer image quality, and some target key points cannot be detected. In the above optional embodiment, the target index value is calculated based on the target cow face image and the mean image by using the preset index calculation model, and the target index value is compared with the index threshold value to obtain a target comparison result, so that a target intercepting frame is determined according to the target comparison result, the target cow face image is intercepted according to the target intercepting frame, the image quality of the obtained target intercepting cow face image is ensured, the detailed information of the cow face area in the target cow face image can be effectively reserved, and the interference of the non-cow face area in the target cow face image is avoided. Therefore, the bovine face key point detection model is used for detecting the target normalized bovine face image obtained by normalizing the target intercepted bovine face image, and the confidence of the obtained bovine face key point is high, namely, the target key point with high accuracy is obtained.

The detection method of the cow face key points provided by the embodiment of the invention avoids using a heat map as the input of a cow face key point detection model, thereby solving the problems of low speed and difficult landing in actual use scenes (mobile equipment and edge equipment); meanwhile, the key points are classified into rigid key points (cow face key points) and non-rigid key points (cow ear key points), and different loss functions are used for training the cow face key point detection model, so that the adaptability of the model can be improved, and the detection accuracy of the model is improved; in addition, the offset determined according to the key points of the cow ears is added into the training process of the cow face key point detection model, so that the overall structure perception capability of the model can be improved, and the stability of model output is improved.

Example two

In some embodiments, the cow face key point detecting device 50 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the apparatus 50 may be stored in a memory of an electronic device and executed by at least one processor to perform (see fig. 1 for details) the functions of face keypoint detection.

In this embodiment, the cow face key point detecting device 50 may be divided into a plurality of functional modules according to the functions performed by the device. The functional module may include: processing module 501, marking module 502, computing module 503, initialization module 504, training module 505, and detection module 506. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.

The processing module 501 is configured to acquire a plurality of original face images, process the plurality of original face images, and create a face image set according to the processed original face images.

determining a cut-out frame according to the comparison result;

The marking module 502 is configured to mark the face key points and the ear key points in each face image in the face image set.

The calculating module 503 is configured to obtain the coordinate values of the cow face key point and the cow ear key point, and calculate the coordinate offset value according to the coordinate values of the cow ear key point.

Illustratively, assume that the coordinate value of point location 1 is (x ₁ ，y ₁ ) The coordinate value of the point location 2 is (x ₂ ，y ₂ ) The coordinate value of the point location 3 is (x ₃ ，y ₃ ) The coordinate value of the point location 4 is (x ₄ ，y ₄ ) Point location 5Coordinate value is (x) ₅ ，y ₅ ) The mean value of the coordinates of the designated points (point location 1, point location 5) is (x _m ，y _m )，x _m ＝(x ₁ +x ₅ )/2，y _m ＝(y ₁ +y ₅ ) If the coordinate offset value of the unspecified key point (point location 2, point location 3 and point location 4) with respect to the coordinate mean is (δx) _i ，δy _i )，δx _i ＝x _i -x _m ，δy _i ＝y _i -y _m Wherein i= [2,3,4 ]]。

The initializing module 504 is configured to initialize a face key point detection network architecture and initialize a loss function according to coordinate values of the face key points and the coordinate offset values.

The electronic device may initialize a deep convolutional neural network as a face keypoint detection network architecture, the face keypoint detection network architecture comprising three branches: the first branch is used for the regression of the key points of the cow face, the second branch is used for the regression of the offset of the key points of the cow ears, and the third branch is used for the classification confidence of the key points. As shown in fig. 4, a schematic diagram of a face keypoint detection network architecture includes an Input layer (Input), a plurality of convolution layers (Conv), and three output layers (Landmark, offset, score). The input layer of the face key point detection network receives face images for training the detection network, namely face images for marking the face key points and the ear key points, and the size of the face images can be 1024 x 1024. The method comprises the steps of carrying out feature extraction of different scales on a convolution layer, gradually increasing the size of a convolution kernel from 64 to 512, carrying out pooling operation by using a maximum pooling operation, carrying out 5 times of downsampling, setting a preset region of interest for each point by the extracted features according to the downsampling frequency of 2, screening candidate regions of a plurality of regions of interest, mapping the screened candidate regions back to the original scale, and extracting a feature map of a fixed size for each candidate region of interest by using a RoIAlign layer. The Landmark layer is used for predicting and outputting the key points of the cow face according to the feature map extracted by the last layer of convolution layer, the offset layer is used for predicting and outputting the key points of the cow ear according to the feature map extracted by the last layer of convolution layer, and the score layer is used for predicting and outputting whether the key points are visible or not according to the feature map extracted by the last layer of convolution layer.

Specifically, the loss function can be expressed as:

L _total ＝L _landmark +αL _offset +βL _score ，

wherein N represents the number of key points of the cow face,euclidean distance between predicted coordinate position and marked coordinate position of key point of representing cow face>Representing the predicted coordinate position of the ith cow face key point, (x) _i ，y _i ) Marking coordinate position of ith cow face key point, c _i Indicating whether the keypoint is visible, O is indicating that the keypoint is not visible, and 1 is indicating that the keypoint is visible.

In an alternative embodiment, the offset of the ear keypoints returns a loss function L _offset The expression is as follows:

where si is the confidence of the key points of the cow face key point detection prediction.

The training module 505 is configured to perform iterative training on the face keypoint detection network with the face image set as an input of the face keypoint detection network and the target, to obtain a face keypoint detection model, and perform iterative training on the face keypoint detection network with the loss function minimized as a training target, to obtain the face keypoint detection model.

The detection module 506 is configured to detect a face key point of the target face image using the face key point detection model in response to a detection instruction for the target face image.

The cow face key point detection device provided by the embodiment of the invention avoids using a heat map as the input of a cow face key point detection model, thereby solving the problems of low speed and difficult landing in actual use scenes (mobile equipment and edge equipment); meanwhile, the key points are classified into rigid key points (cow face key points) and non-rigid key points (cow ear key points), and different loss functions are used for training the cow face key point detection model, so that the adaptability of the model can be improved, and the detection accuracy of the model is improved; in addition, the offset determined according to the key points of the cow ears is added into the training process of the cow face key point detection model, so that the overall structure perception capability of the model can be improved, and the stability of model output is improved.

Example III

The present embodiment provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps in the above-described embodiment of the method for detecting a key point of a cow face, for example, S11 to S16 shown in fig. 1:

S11, acquiring a plurality of original cow face images, processing the original cow face images, and creating a cow face image set according to the processed original cow face images;

s12, marking the key points of the cow face and the key points of the cow ears in each cow face image in the cow face image set;

s13, acquiring coordinate values of the cow face key points and coordinate values of the cow ear key points, and calculating coordinate offset values according to the coordinate values of the cow ear key points;

s14, initializing a face key point detection network architecture and initializing a loss function according to coordinate values of the face key points and the coordinate offset values;

s15, taking the cow face image set as input of the cow face key point detection network, taking the minimized loss function as a training target, and performing iterative training on the cow face key point detection network to obtain a cow face key point detection model;

Alternatively, the computer program, when executed by a processor, performs the functions of the modules/units in the above apparatus embodiments, e.g., modules 501-506 in fig. 5:

The processing module 501 is configured to obtain a plurality of original face images, process the plurality of original face images, and create a face image set according to the processed original face images;

the marking module 502 is configured to mark a face key point and an ear key point in each face image in the face image set;

the calculating module 503 is configured to obtain the coordinate values of the cow face key points and the coordinate values of the cow ear key points, and calculate coordinate offset values according to the coordinate values of the cow ear key points;

the initializing module 504 is configured to initialize a face key point detection network architecture and initialize a loss function according to coordinate values of the face key points and the coordinate offset values;

the training module 505 is configured to iteratively train the face keypoint detection network with the face image set as an input of the face keypoint detection network and with a minimized loss function as a training target, to obtain a face keypoint detection model;

Example IV

Fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In a preferred embodiment of the invention, the electronic device 6 comprises a memory 61, at least one processor 62, at least one communication bus 63 and a transceiver 64.

It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 6 is not limiting of the embodiments of the present invention, and that either a bus-type configuration or a star-type configuration may be used, and that the electronic device 6 may include more or less other hardware or software than that shown, or a different arrangement of components.

In some embodiments, the electronic device 6 is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 6 may also include a client device, including but not limited to any electronic product that can interact with a client by way of a keyboard, mouse, remote control, touch pad, or voice control device, such as a personal computer, tablet, smart phone, digital camera, etc.

The electronic device 6 is only an example, and other electronic products that may be present in the present invention or may be present in the future, such as those that may be adapted to the present invention, are also included in the scope of the present invention and are incorporated herein by reference.

In some embodiments, the memory 61 stores a computer program that, when executed by the at least one processor 62, performs all or part of the steps in the method for detecting a key point of a cow face as described. The Memory 61 includes Read-Only Memory (ROM), programmable Read-Only Memory (PROM), erasable programmable Read-Only Memory (EPROM), one-time programmable Read-Only Memory (One-time Programmable Read-Only Memory, OTPROM), electrically erasable rewritable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic tape Memory, or any other medium that can be used for computer-readable carrying or storing data.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

In some embodiments, the at least one processor 62 is a Control Unit (Control Unit) of the electronic device 6, connects the various components of the entire electronic device 6 using various interfaces and lines, and performs various functions of the electronic device 6 and processes data by running or executing programs or modules stored in the memory 61, and invoking data stored in the memory 61. For example, the at least one processor 62 may implement all or part of the steps of the method for detecting a key point of a cow face in the embodiments of the present invention when executing the computer program stored in the memory; or realize all or part of the functions of the cow face key point detection device. The at least one processor 62 may be comprised of integrated circuits, such as a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functionality, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like.

In some embodiments, the at least one communication bus 63 is arranged to enable connected communication between the memory 61 and the at least one processor 62 or the like.

Although not shown, the electronic device 6 may further include a power source (e.g., a battery) for powering the various components, and preferably the power source may be logically coupled to the at least one processor 62 via a power management device to perform functions such as managing charging, discharging, and power consumption via the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 6 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described in detail herein.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) or a processor (processor) to perform portions of the methods described in the various embodiments of the invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or that the singular does not exclude a plurality. Several of the elements or devices recited in the specification may be embodied by one and the same item of software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. The method for detecting the key points of the cow face is characterized by comprising the following steps:

2. The method of claim 1, wherein the face keypoint detection network architecture comprises three branches:

the first branch is used for the face key point regression;

the third branch is used for classification confidence of the keypoints.

3. The method for detecting a key point of a cow face as claimed in claim 2, wherein the loss function is:

L _total ＝L _landmark +αL _offset +βL _score ，

4. A face key point detection method according to any one of claims 1 to 3, wherein calculating a coordinate offset value from coordinate values of the ear key point comprises:

5. A method of face keypoint detection as claimed in any one of claims 1 to 3 wherein said processing a plurality of said raw face images and creating a set of face images from the processed raw face images comprises:

6. The method of claim 5, wherein the capturing the original face image according to the index value to obtain a captured face image comprises:

Determining a cut-out frame according to the comparison result;

7. The method of claim 6, wherein the detecting the face keypoints of the target face image using the face keypoint detection model comprises:

8. Cow face key point detection device, its characterized in that, the device includes:

9. An electronic device comprising a processor and a memory, wherein the processor is configured to implement the method for detecting a key point of a cow face according to any one of claims 1 to 7 when executing a computer program stored in the memory.

10. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the method of detecting a face keypoint as defined in any one of claims 1 to 7.