CN114724190A

CN114724190A - Mood recognition method based on pet posture

Info

Publication number: CN114724190A
Application number: CN202210653859.7A
Authority: CN
Inventors: 吴琎; 何振东
Original assignee: Beijing Kitten and Puppy Technology Co Ltd
Current assignee: Beijing Kitten and Puppy Technology Co Ltd
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2022-07-08

Abstract

The invention relates to a mood identification method based on pet postures, which comprises the steps of carrying out gray processing on an original image to obtain a gray image; performing edge processing on the gray image based on a Sobel operator to obtain an edge outline image, processing the edge outline image to obtain a maximum circumscribed rectangle, and then extracting a foreground target from an original image to obtain a preprocessed image; respectively using different feature extraction networks to process and fuse the preprocessed image and the original image, and predicting to obtain the pet category; carrying out contour filling on the preprocessed image to obtain a mask image, sequentially corroding 8-pixel neighborhoods of contour points of the mask image to obtain skeleton characteristics, and further calculating to obtain an included angle between a tail node and a body node; determining the mood of the pet based on the included angle and the category; the skeleton is obtained by judging and processing each pixel point of the mask image, so that the situation that the tail is not separated from the body is not worried, the included angle between the tail and the body is easy to calculate, and the pet classification identification is added for the identification accuracy.

Description

Mood recognition method based on pet posture

Technical Field

The present invention relates to the field of computer vision. In particular to a mood identification method based on pet postures.

Background

Because companion pets can bring warmth, pleasure and physical and mental changes to people, the number of people raising the pets is increasing, and people treat the pets as family members, but because the language is not clear, communication with the pets is obstructed, and people can only judge the moods of the pets from the behavior of the pets.

In the prior art, animal mood recognition is carried out based on deep learning and SVM, thirteen body key points of a large number of animal pictures are marked, an animal posture estimation model for deep learning is constructed, the model can estimate the positions of the center points of the animals and the positions of the thirteen body key points, and a plurality of SVM classifiers are constructed based on the position change of the center points of the animals, the position relation and the relative position change relation of the thirteen body key points including tail key points, and the SVM classifiers are used for judging the behaviors and the mood of the animals; in the prior art, the sources of thirteen body key points are obtained through model training, the accuracy is possibly insufficient, a large amount of labor is consumed for marking the thirteen body key points in the picture, and a thirteen body key point model can be accurately identified through training in a long time, so that the invention provides a mood identification method based on pet postures in order to enable the sources of the key points to be more accurate, reduce the labor cost and the calculation time and improve the working efficiency of a computer.

With the development of the field of computer vision and the improvement of the performance of image acquisition equipment, people gradually increase the requirements on image definition, and provide higher challenges for image classification and detection tasks on computer vision technology in terms of both precision and speed. The image classification task is to classify the category of the complete image, and typical methods thereof can be divided into a digital image processing technology and a deep learning technology.

In the digital image processing technology, classification robustness is low because the classification is only carried out according to the pixel characteristics of the image, but the sensitivity of the digital image processing method to the pixels is very obvious; in the deep learning technology, the feature extraction network based on various structures can automatically extract features, so that the manual participation process is avoided, the classification accuracy is improved, and the labor cost is reduced. Most of images faced by the existing image classification task are images with complex scenes and disordered backgrounds, and how to extract typical features from the images is a difficult problem which needs to be faced by the two methods together.

Disclosure of Invention

The invention is based on the above requirements of the prior art, and aims to provide a mood identification method based on pet postures, so that the sources of key points are more accurate, the labor cost and the calculation time are reduced, the working efficiency of a computer is improved, and the animal mood identification is more accurate.

In order to solve the problems, the invention adopts the following technical scheme:

a mood recognition method based on pet postures is characterized by comprising the following steps:

carrying out graying processing on the original image to obtain a grayed image; carrying out edge processing on the grayed image based on a Sobel operator to obtain an edge contour image of a pet foreground target, processing the edge contour image to obtain a maximum circumscribed rectangle, and scratching out the foreground target from the original image based on the maximum circumscribed rectangle to obtain a preprocessed image;

respectively using different feature extraction networks to process and fuse the preprocessed image and the original image, and predicting to obtain the pet category;

filling the outline image of the preprocessed image to obtain a mask image, and sequentially corroding 8-pixel neighborhoods of outline points of the mask image, wherein the method specifically comprises the following steps: judging whether a pixel field contains 3 connected pixels, if so, deleting the point from the contour point, then sequentially judging whether the pixel field contains 3 or 4 connected pixels, 3 or 4 or 5 or 6 or 7 connected pixels, if so, deleting the point from the contour point, if not, retaining the point, obtaining a pseudo skeleton of the foreground target after the judgment of all the contour points is completed, finally detecting whether pixel points in the pseudo skeleton contain 2 or 3 or 4 or 5 or 6 or 7 connected pixels, if so, deleting the point from the contour point, if not, retaining the point, and obtaining the skeleton characteristic of the foreground target after the judgment of all the pixel points in the pseudo skeleton is completed;

and calculating an included angle between the tail node and the body node by using the skeleton characteristics, and determining the mood of the pet based on the included angle and the pet category.

Optionally, calculating an included angle between the tail skeleton and the body skeleton by using the skeleton features includes:

based on a feature extraction network, processing and identifying the preprocessed image to obtain a pet body area and a pet tail area, determining the position of each pixel point contained in the skeleton feature relative to the preprocessed image, taking the pixel point located in the tail area as a tail node, and taking the pixel point located in the body area as a body node;

calculating an included angle between the tail node and the body node according to an included angle formula, wherein the included angle formula is as follows:

Tanα=|(k2-k1)/(1+k1k2)|

wherein alpha is the included angle of the two skeletons, k1 is the slope of the tail skeleton, and k2 is the slope of the body skeleton.

Optionally, when there are two pixel points located in the tail region, the pixel point adjacent to the body node is used as a tail head node, another pixel point is used as a tail node, the included angle between the tail head node and the body node and the included angle between the tail head node and the tail node are respectively calculated according to the included angle formula, and the two obtained included angles are combined with the pet category to determine the mood of the pet.

Optionally, the preprocessing image and the original image are respectively processed and fused by using different feature extraction networks, and the predicting step of obtaining the pet category includes:

performing first feature extraction network processing on the preprocessed image to obtain a local feature image, performing second feature extraction network processing on the original image to obtain a global feature image, performing full-connection layer processing on the global feature image and the local feature image respectively to obtain corresponding one-dimensional vectors, and performing classifier calculation on the one-dimensional vectors to obtain class probabilities corresponding to preset class labels respectively;

carrying out weighted summation on the category probability of the global feature images and the category probability of the local feature images corresponding to the same category labels to obtain the fused category probability; and selecting the category corresponding to the maximum probability value of the fused category as the pet identification result.

Optionally, performing a graying process on the original image to obtain a grayed image, including:

performing feature dimensionality reduction on an original image based on RGB three channels, endowing different weight coefficients to pixel values of each channel, and realizing graying processing of the image to obtain a grayed image, wherein a graying formula is as follows:

Y_i=0.3R_i+0.59G_i+0.11B_i

wherein i represents a current image pixel value, Y represents a gray scale value, R represents an R channel pixel value, G represents a G channel pixel value, and B represents a B channel pixel value.

Optionally, performing edge processing on the grayed image based on a Sobel operator to obtain an edge contour image of the pet foreground target, including:

respectively processing the grayed image by a transverse filter and a longitudinal filter to respectively obtain a transverse processing value and a longitudinal processing value corresponding to each pixel point contained in each target pixel block, and performing evolution operation on the square sum of the transverse processing value and the longitudinal processing value corresponding to each pixel point to obtain a convolution value corresponding to each pixel point;

judging whether the convolution value corresponding to each pixel point exceeds a preset threshold value, and if the convolution value exceeds the threshold value, taking the pixel point as an edge pixel point of the foreground target; and combining all edge pixel points of the gray images to form an edge contour image of the foreground target.

Optionally, the processing the edge contour image to obtain a maximum bounding rectangle includes:

obtaining the maximum value and the minimum value of a horizontal coordinate and the maximum value and the minimum value of a vertical coordinate according to the position coordinates of each pixel point of the edge contour image of the foreground target relative to the original image; determining the range of the maximum circumscribed rectangle according to the maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate, wherein the calculation formula is as follows:

X_min=Min(X_i)

X_max=Max(X_i)

Y_min=Min(Y_i)

Y_max=Max(Y_i)

wherein X_iAbscissa, Y, representing edge pixel points_iThe ordinate of the edge pixel point is represented, Min (.) represents a minimum function, Max (.) represents a maximum function, and X_minAnd X_maxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissa_minAnd Y_maxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis.

Optionally, the foreground object is extracted from the original image to obtain a preprocessed image, and a processing formula of the preprocessed image is as follows:

I=A[X_min：X_max：Y_min：Y_max]

wherein A represents an original image, X_minAnd X_maxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissa_minAnd Y_maxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis, and I represents the preprocessed image.

Optionally, the classifier is a Softmax classifier, and the calculation formula is

Wherein y in the formula_iOne-dimensional vector, P, representing the fully-connected layer output of the first or second feature extraction network of the ith class_sThe output probability of the Softmax classifier is shown, c is the number of classes, and e is a natural constant.

Optionally, the grayed image is respectively processed by a transverse filter and a longitudinal filter, so as to respectively obtain a transverse processing value and a longitudinal processing value corresponding to each pixel point included in each target pixel block, and a specific calculation formula is as follows:

wherein A represents a grayed image, G_xAnd G_yRespectively representing the convolution results of the Sobel convolution factors in different directions.

Compared with the prior art, the mood recognition method based on pet postures provided by the invention has the advantages that the condition that the tail is not separated from the body is not worried through the method, the included angle between the tail and the body is easy to calculate, the tail cannot be always in a straight state and is also possibly bent, in this case, two nodes are arranged in the tail area, the mood of the pet is judged according to the calculated included angle between the head node of the tail and the tail node of the tail, the included angle between the body and the head node of the tail and the recognized pet type through joint, and the judgment result is more accurate. The body area and the tail area of the pet are determined only by marking the tail and the body area of the acquired pet picture during model training, and compared with marking thirteen key points of the body, the method can reduce the labor cost and the calculation time and improve the working efficiency of a computer.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flow chart of a method for recognizing a mood based on pet postures according to an embodiment of the present invention;

FIG. 2 is a flow chart of a pet classification recognition method based on pet postures according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a model framework of a method for recognizing a mood based on pet postures according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

The embodiment provides a mood recognition method based on pet postures, the flow of which is shown in fig. 1 and fig. 2, and the method comprises the following steps:

s1: carrying out graying processing on the original image to obtain a grayed image; and carrying out edge processing on the grayed image based on a Sobel operator to obtain an edge contour image of the pet foreground target, processing the edge contour image to obtain a maximum external rectangle, and scratching out the foreground target from the original image based on the maximum external rectangle to obtain a preprocessed image.

The method comprises the steps of obtaining an original image, wherein the original image only has one target, and the target area of the original image accounts for at least 50% of the original image, and in the embodiment of the invention, the target area is a relevant area related to a pet in a pet image.

In this step, performing a graying process on the original image to obtain a grayed image, including:

Y_i=0.3R_i+0.59G_i+0.11B_i

Carrying out edge processing on the gray image based on a Sobel operator to obtain an edge contour image of the pet foreground target, wherein the edge contour image comprises the following steps:

s10: respectively processing the grayed image by a transverse filter and a longitudinal filter to respectively obtain a transverse processing value and a longitudinal processing value corresponding to each pixel point contained in each target pixel block, and performing evolution operation on the square sum of the transverse processing value and the longitudinal processing value corresponding to each pixel point to obtain a convolution value corresponding to each pixel point;

first, a horizontal filter and a vertical filter are sequentially applied to each pixel in the grayscale image.

Next, the values in the transversal filter and the longitudinal filter are multiplied by the corresponding gray values in the grayed image, respectively.

The transversal filter result is then added to the longitudinal filter result, the resulting sum being the convolution result of the target pixel.

And finally, performing evolution operation on the convolution result to obtain a convolution value corresponding to each pixel point.

In the embodiment of the present invention, the calculation formula for multiplying the values in the transversal filter and the longitudinal filter with the corresponding gray-scale values in the grayed image respectively is as follows:

The convolution result is brought into the formula of the evolution operation:

wherein G represents the convolution value of the image after the evolution operation, G_xAnd G_yRespectively representing the convolution results of the Sobel convolution factors in different directions.

S11: judging whether the convolution value corresponding to each pixel point exceeds a preset threshold value, and if the convolution value exceeds the threshold value, taking the pixel point as an edge pixel point of the foreground target; and combining all edge pixel points of the gray images to form an edge contour image of the foreground target.

After the same pixel block is processed by the transverse filter and the longitudinal filter, judging whether each pixel point contained in the pixel block exceeds a preset threshold value or not according to the preset threshold value, if so, taking the pixel point as an edge pixel point of the foreground target, otherwise, not doing any operation; then, processing the adjacent pixel blocks in the gray images based on the step length by utilizing a transverse filter and a longitudinal filter to obtain convolution values corresponding to the pixel points contained in the adjacent pixel blocks, and judging the pixel points contained in the adjacent pixel blocks again according to the threshold; and when all the pixel points in the grayed image are judged, acquiring all the edge pixel points.

After the edge contour image is obtained, determining the maximum circumscribed rectangle of the edge contour image, so that the foreground object is extracted from the original image to obtain a preprocessed image.

After the edge contour image is obtained, acquiring a maximum external rectangle by using the position coordinates of each pixel point of the edge contour image relative to the original image; extracting a significant target based on the maximum external rectangle to obtain a preprocessed image, performing first feature extraction network processing on the preprocessed image to obtain a classification result of a local feature image, performing second feature extraction network processing on an original image to obtain a classification result of a global feature image, and fusing the two classification results to obtain a final image classification result, wherein the model classification result is specifically shown in a model frame schematic diagram in fig. 3.

Determining a maximum circumscribed rectangle according to the position coordinates of each pixel point of the edge contour image of the foreground target relative to the original image, wherein the determining comprises the following steps:

X_min=Min(X_i)

X_max=Max(X_i)

Y_min=Min(Y_i)

Y_max=Max(Y_i)

wherein X_iAbscissa, Y, representing edge pixel points_iRepresents the ordinate of the edge pixel point, and Min (.) represents the minimum functionMax (.) denotes the maximum function, X_minAnd X_maxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissa axis_minAnd Y_maxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis.

In the embodiment of the invention, aiming at the original image, the two-dimensional coordinates (x, y) of all the foreground target edge pixel points relative to the original image are obtained, and the maximum coordinate value and the minimum coordinate value relative to the horizontal axis of the image in the coordinates and the maximum coordinate value and the minimum coordinate value of the vertical axis are calculated according to the calculation formula for determining the maximum value and the minimum value.

Determining the range of the maximum circumscribed rectangle according to the maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate; according to the maximum external rectangular range of the foreground target, the foreground image is extracted from the original image to obtain a final preprocessed image, and the processing formula is as follows:

I=A[X_min：X_max：Y_min：Y_max]

wherein A represents an original image, X_minAnd X_maxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissa_minAnd Y_maxRespectively representing the longitudinal coordinate value of the minimum edge pixel point and the longitudinal coordinate value of the maximum edge pixel point on the longitudinal axis, and I represents a preprocessed image.

And performing edge extraction on the original image through a Sobel operator, extracting a foreground significant target, extracting local characteristics for the first characteristic extraction network, filtering the interference of background information, and improving the accuracy of classification.

After the foreground object is extracted from the original image and a preprocessed image is obtained, the method further comprises the following steps:

and uniformly adjusting the size of the preprocessed image by using a bilinear interpolation method, so that the extracted significant region of the foreground target is the size required to be input by the first feature extraction network.

S2: and respectively using different feature extraction networks to process and fuse the preprocessed image and the original image, and predicting to obtain the pet category.

In this step, the method comprises the following steps:

s20: performing first feature extraction network processing on the preprocessed image to obtain a local feature image, performing second feature extraction network processing on the original image to obtain a global feature image, performing full-connection layer processing on the global feature image and the local feature image respectively to obtain corresponding one-dimensional vectors, and performing classifier calculation on the one-dimensional vectors to obtain class probabilities corresponding to preset class labels respectively.

S21: carrying out weighted summation on the category probability of the global feature images and the category probability of the local feature images corresponding to the same category labels to obtain the fused category probability; and selecting the category corresponding to the maximum probability value of the fused category as the pet identification result.

Wherein, include the following step in S20:

the method comprises the following steps: and performing first feature extraction network processing on the preprocessed image to obtain a local feature image, and performing convolution calculation on the local feature image by using a convolution kernel with the same size as the local feature image to obtain a one-dimensional vector of the local feature image.

In the embodiment of the invention, a ResNet18 feature extraction network is used as a first feature extraction network, and ResNet18 is used for carrying out feature extraction on the preprocessed image to obtain a local feature image. The structure of the ResNet18 feature extraction network mainly designs a residual connection structure, so that the problem of gradient disappearance or gradient explosion caused by too deep network layer number can be avoided to the maximum extent.

ResNet18 has 18 layers in total, the size of an input image of a network structure is 224 x 224 pixels, the number of channels is 3, the size of the image is reduced to 112 x 112 after the first layer of convolution, the number of channels is increased to 64, the size of the image is further reduced to 56 x 56 after the maximum pooling layer, the number of channels is not increased, the image enters a residual error part after the first two steps of operation, the size of the residual error image is reduced to half of the original size after each part of residual error image is subjected to dimension reduction, the number of channels is increased to 2 times, the image dimension reduction is realized through a convolution layer with the step size of 2, the size of the final image is reduced to 7 x 7 after 4 times of residual error operation, the number of channels is increased to 512, and finally the average pooling layer and the full connection layer are connected.

In the ResNet18 network structure, convolutional layers, max pooling layers, and average pooling layers are mainly involved. The convolution layers have the functions of setting convolution kernels with different sizes and different step lengths to enable the receptive fields of the convolution layers to be different, so that image features in different ranges are extracted, the maximum pooling layer has the function of extracting the most characteristic features in the image and is equivalent to sharpening operation, and the average pooling layer has the function of extracting common features in the image and is equivalent to smoothing operation. The most representative structure in the ResNet18 network is a residual structure, and the output of the network structure of the previous layer and the output of the network structure of the current layer are added and fused, so that the violent change of the gradient in the network training process can be relieved, the phenomenon of gradient disappearance or gradient explosion is avoided, and the method is particularly effective for the deep network structure.

Step two: and performing second feature extraction network processing on the original image to obtain a global feature image, and performing convolution calculation on the global feature image by using a convolution kernel with the same size as the global feature image to obtain a one-dimensional vector of the global feature image.

In the embodiment of the invention, a VGG19 feature extraction network is used as a second feature extraction network, and the original image is subjected to feature extraction by using a VGG19, so as to obtain a global feature image. And performing VGG19 network processing on the original image, performing feature dimensionality reduction for 5 times, and sequentially reducing the dimensionality from 224 multiplied by 224 pixels of the input image to 112 multiplied by 112, 56 multiplied by 56, 28 multiplied by 28, 14 multiplied by 14 and 7 multiplied by 7, wherein the dimensionality reduction operation is realized through a maximum pooling layer, and the number of feature channels is sequentially increased from 3 to 64, 128, 256 and 512.

The VGG19 network structure largely uses 3 x 3 convolution kernels, and the accuracy of the model classification result is improved by increasing the number of small convolution kernels and the depth of the network.

The VGG19 has 19 layers in total, and consists of 16 layers of convolution layers and 3 layers of full connection layers in total, the size of an input image of the network structure is 224 multiplied by 224 pixels, the number of channels is 3, and the VGG19 network structure is characterized in that the effect of the receptive field same as that of a large convolution kernel is realized through a small convolution superposition mode; and all the convolution layers do not carry out dimension reduction operation, and the dimension reduction of the feature diagram is realized by using the maximum pooling layer. The network structure of the VGG19 is simple, and only involves the convolutional layer, the max-pooling layer and the full-link layer. The convolution layers have the effect that the perception fields of all the convolution layers are different by setting convolution kernels with different sizes and different step lengths, so that the image features in different ranges are extracted, and the maximum pooling layer has the effect of extracting the most characteristic features in the image and is equivalent to sharpening.

Step three: and performing classifier calculation on the one-dimensional vectors to respectively obtain class probabilities corresponding to preset class labels.

Wherein the classifier is a Softmax classifier, and the calculation formula is

Y in the formula_iOne-dimensional vector, P, representing the fully-connected layer output of the first or second feature extraction network of the ith class_sThe output probability of the Softmax classifier is shown, c is the number of classes, and e is a natural constant.

In S21, the method includes:

carrying out weighted summation on the category probability of the global feature image and the category probability of the local feature image corresponding to the same category label, wherein the processing formula of the weighted summation is as follows:

wherein P is₁Probability of local feature image, P₂And in the model training process, adjustment is carried out based on the prediction image category result.

Considering that some features of the local feature image may be neglected to influence the classification result, the original image is also processed by the feature extraction network to obtain the global feature image, and the classification results of the local feature image and the global feature image are comprehensively considered to improve the accuracy of the classification result.

For the accuracy of classification, the fused class probability and the corresponding class label form a training data set, and the training data set is used for carrying out supervised training on an image classification model to obtain the trained image classification model; and classifying the target in the image by using the trained image classification model.

Aiming at the feature extraction results of the two image granularities, acquiring the corresponding class probabilities of the two image granularities, and mapping the class probabilities to the same probability space; and performing supervised training based on the probability space and the image label to obtain an image classification model, wherein the feature vector output by the first feature extraction network and the feature vector output by the second feature extraction network are respectively used as the input of two full connection layers, performing image classification on the output of the two full connection layers by adopting a Softmax classifier to obtain corresponding class probabilities, and performing corresponding weighted summation on the two class probabilities to obtain a final image classification result.

Considering that different pets have the same angle but different emotions, the mood of the pet is comprehensively evaluated according to the category of the pet and the angle between the tail and the body of the pet in order to make the mood recognition result more accurate.

S3: and filling the outline image of the preprocessed image to obtain a mask image, and corroding 8-pixel neighborhoods of outline points of the mask image in sequence to obtain the skeleton characteristic of the foreground target.

The method specifically comprises the steps of judging whether a pixel field contains 3 connected pixels or not, deleting the point from a contour point if the pixel field contains the 3 connected pixels, sequentially judging whether the pixel field contains the 3 or 4 connected pixels, the 3 or 4 or 5 or 6 or 7 connected pixels or not, deleting the point from the contour point if the pixel field contains the 3 or 4 connected pixels, the 3 or 4 or 5 or 6 or 7 connected pixels, keeping the point if the pixel field does not contain the 3 or 4 connected pixels, obtaining a pseudo skeleton of a foreground object after all contour points are judged to be finished, finally detecting whether pixel points in the pseudo skeleton contain the 2 or 3 or 4 or 5 or 6 or 7 connected pixels or not, deleting the point from the contour point if the pixel field contains the 2 or 4 or 5 or 6 or 7 connected pixels, keeping the point if the pixel points do not contain the pseudo skeleton, and obtaining the skeleton characteristics of the foreground object after all pixel points in the pseudo skeleton are judged to be finished.

S4: and calculating an included angle between the tail skeleton and the body skeleton by using the skeleton characteristics, and determining the mood of the pet based on the included angle and the pet category.

In this step, calculating an included angle between the tail skeleton and the body skeleton by using the skeleton characteristics, including:

and processing and identifying the body area and the tail area of the pet obtained by the preprocessing image based on a feature extraction network, determining the position of each pixel point contained in the skeleton feature relative to the preprocessing image, taking the pixel point positioned in the tail area as a tail node, and taking the pixel point positioned in the body area as a body node.

Tanα=|(k2-k1)/(1+k1k2)|

When the number of the pixel points in the tail area is two, the pixel point adjacent to the body node is used as a tail head node, the other pixel point is used as a tail node, the included angle between the tail head node and the body node and the included angle between the tail head node and the tail node are respectively calculated according to the included angle formula, and the two obtained included angles are combined with the pet category to determine the mood of the pet.

The method comprises the steps that a body area and a tail area of a pet which are obtained through recognition are obtained according to a trained model, the model is obtained through collecting a large number of pet pictures, graying, Sobel operators and maximum external rectangle obtaining processing are carried out to obtain a preprocessed image, the preprocessed image is marked with a target detection frame and marks of a tail and the body area, feature extraction is carried out through a first feature extraction network, then the range of each mark frame in the pet pictures is predicted through a target detection algorithm, and learning is continuously carried out to obtain the pet image.

According to the method, the body area and the tail area of the pet are determined only by marking the tail and the body area of the acquired pet picture during model training, and compared with the method for marking thirteen key points of the body in the prior art, the method can reduce the labor time and the calculation time and improve the working efficiency of a computer.

The method comprises the steps of filling a contour image of an edge contour image of a pet foreground target to obtain a contour of a mask binary image, continuously corroding the contour to obtain skeleton characteristics, and judging the mood of a pet at that time through an included angle between a tail skeleton and a body skeleton and the category of the pet; considering that when a pet clings a tail to a body, the tail and the body part are not easy to distinguish by processing an image through a neural network, so that whether the 8-pixel neighborhood of each point in the outline meets the requirement or not is judged, the 8-pixel neighborhood is continuously corroded to obtain the skeleton characteristic of the pet target, the source of the key point is more accurate, the condition that the tail is not separated from the body is not worried, the included angle between the tail and the body is easy to calculate, and the tail cannot be in a straight state and possibly bent.

In summary, the embodiment of the invention discloses a mood identification method based on pet postures, which comprises the following steps: performing edge processing on the gray image obtained by the original processing to obtain an edge contour image of the pet foreground target; and filling the contour image to obtain a mask image of the foreground target, continuously corroding the mask image, extracting skeleton characteristics of the foreground target, calculating an included angle between a tail skeleton and a body skeleton by using the skeleton characteristics, and determining the mood of the pet based on the included angle and the pet category. The method has the advantages that the situation that the tail is not separated from the body is not worried, the included angle between the tail and the body is easy to calculate, and in order to achieve the identification accuracy, the pet classification identification is added. And determining that the body area and the tail area of the pet only need to mark the tail and the body area of the acquired pet picture during model training, and compared with marking thirteen body key points, the method can reduce the labor time and the calculation time and improve the working efficiency of a computer.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to each embodiment or some parts of the embodiments.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A mood recognition method based on pet postures is characterized by comprising the following steps:

2. The pet posture-based mood recognition method as recited in claim 1, wherein calculating an angle between a tail skeleton and a body skeleton using the skeleton features comprises:

Tanα=|(k2-k1)/(1+k1k2)|

3. The method as claimed in claim 2, wherein when there are two pixel points located in the tail region, the pixel point adjacent to the body node is used as a tail head node, the other pixel point is used as a tail node, the included angle between the tail head node and the body node and the included angle between the tail head node and the tail node are respectively calculated according to the included angle formula, and the two included angles are combined with the pet category to determine the mood of the pet.

4. The method as claimed in claim 1, wherein the step of predicting the pet category by processing and fusing the preprocessed image and the original image respectively through different feature extraction networks comprises:

carrying out weighted summation on the category probability of the global feature images and the category probability of the local feature images corresponding to the same category labels to obtain the fused category probability; and selecting the category corresponding to the maximum probability value of the merged category as a pet recognition result.

5. The method as claimed in claim 1, wherein the graying processing is performed on the original image to obtain a grayed image, and the method comprises:

Y_i=0.3R_i+0.59G_i+0.11B_i

where i represents the current image pixel value, Y represents the grayscale value, R represents the R channel pixel value, G represents the G channel pixel value, and B represents the B channel pixel value.

6. The method for recognizing the mood based on the pet posture as claimed in claim 1, wherein the edge processing is performed on the grayed image based on a Sobel operator to obtain an edge contour image of a pet foreground object, comprising:

7. The pet-pose based mood recognition method as recited in claim 1, wherein processing the edge contour image to obtain a maximum bounding rectangle comprises:

X_min=Min(X_i)

X_max=Max(X_i)

Y_min=Min(Y_i)

Y_max=Max(Y_i)

wherein X_iAbscissa, Y, representing edge pixel_iOrdinate representing edge pixel point, Min (.) representing minimum function, Max (.) representing maximum function, and X_minAnd X_maxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissa_minAnd Y_maxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis.

8. The pet posture-based mood recognition method as recited in claim 1, wherein the foreground object is extracted from the original image to obtain a preprocessed image, and a processing formula of the preprocessed image is as follows:

I=A[X_min：X_max：Y_min：Y_max]

9. The pet posture-based mood recognition method as recited in claim 4, wherein the classifier is a Softmax classifier, and the calculation formula is

10. The pet-posture-based mood recognition method as recited in claim 6, wherein the grayed image is respectively processed by a horizontal filter and a vertical filter to obtain a horizontal processing value and a vertical processing value corresponding to each pixel point included in each target pixel block, and the specific calculation formula is as follows: