CN114724190A - Mood recognition method based on pet posture - Google Patents

Mood recognition method based on pet posture Download PDF

Info

Publication number
CN114724190A
CN114724190A CN202210653859.7A CN202210653859A CN114724190A CN 114724190 A CN114724190 A CN 114724190A CN 202210653859 A CN202210653859 A CN 202210653859A CN 114724190 A CN114724190 A CN 114724190A
Authority
CN
China
Prior art keywords
image
value
pet
processing
pixel point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210653859.7A
Other languages
Chinese (zh)
Inventor
吴琎
何振东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kitten and Puppy Technology Co Ltd
Original Assignee
Beijing Kitten and Puppy Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kitten and Puppy Technology Co Ltd filed Critical Beijing Kitten and Puppy Technology Co Ltd
Priority to CN202210653859.7A priority Critical patent/CN114724190A/en
Publication of CN114724190A publication Critical patent/CN114724190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a mood identification method based on pet postures, which comprises the steps of carrying out gray processing on an original image to obtain a gray image; performing edge processing on the gray image based on a Sobel operator to obtain an edge outline image, processing the edge outline image to obtain a maximum circumscribed rectangle, and then extracting a foreground target from an original image to obtain a preprocessed image; respectively using different feature extraction networks to process and fuse the preprocessed image and the original image, and predicting to obtain the pet category; carrying out contour filling on the preprocessed image to obtain a mask image, sequentially corroding 8-pixel neighborhoods of contour points of the mask image to obtain skeleton characteristics, and further calculating to obtain an included angle between a tail node and a body node; determining the mood of the pet based on the included angle and the category; the skeleton is obtained by judging and processing each pixel point of the mask image, so that the situation that the tail is not separated from the body is not worried, the included angle between the tail and the body is easy to calculate, and the pet classification identification is added for the identification accuracy.

Description

Mood recognition method based on pet posture
Technical Field
The present invention relates to the field of computer vision. In particular to a mood identification method based on pet postures.
Background
Because companion pets can bring warmth, pleasure and physical and mental changes to people, the number of people raising the pets is increasing, and people treat the pets as family members, but because the language is not clear, communication with the pets is obstructed, and people can only judge the moods of the pets from the behavior of the pets.
In the prior art, animal mood recognition is carried out based on deep learning and SVM, thirteen body key points of a large number of animal pictures are marked, an animal posture estimation model for deep learning is constructed, the model can estimate the positions of the center points of the animals and the positions of the thirteen body key points, and a plurality of SVM classifiers are constructed based on the position change of the center points of the animals, the position relation and the relative position change relation of the thirteen body key points including tail key points, and the SVM classifiers are used for judging the behaviors and the mood of the animals; in the prior art, the sources of thirteen body key points are obtained through model training, the accuracy is possibly insufficient, a large amount of labor is consumed for marking the thirteen body key points in the picture, and a thirteen body key point model can be accurately identified through training in a long time, so that the invention provides a mood identification method based on pet postures in order to enable the sources of the key points to be more accurate, reduce the labor cost and the calculation time and improve the working efficiency of a computer.
With the development of the field of computer vision and the improvement of the performance of image acquisition equipment, people gradually increase the requirements on image definition, and provide higher challenges for image classification and detection tasks on computer vision technology in terms of both precision and speed. The image classification task is to classify the category of the complete image, and typical methods thereof can be divided into a digital image processing technology and a deep learning technology.
In the digital image processing technology, classification robustness is low because the classification is only carried out according to the pixel characteristics of the image, but the sensitivity of the digital image processing method to the pixels is very obvious; in the deep learning technology, the feature extraction network based on various structures can automatically extract features, so that the manual participation process is avoided, the classification accuracy is improved, and the labor cost is reduced. Most of images faced by the existing image classification task are images with complex scenes and disordered backgrounds, and how to extract typical features from the images is a difficult problem which needs to be faced by the two methods together.
Disclosure of Invention
The invention is based on the above requirements of the prior art, and aims to provide a mood identification method based on pet postures, so that the sources of key points are more accurate, the labor cost and the calculation time are reduced, the working efficiency of a computer is improved, and the animal mood identification is more accurate.
In order to solve the problems, the invention adopts the following technical scheme:
a mood recognition method based on pet postures is characterized by comprising the following steps:
carrying out graying processing on the original image to obtain a grayed image; carrying out edge processing on the grayed image based on a Sobel operator to obtain an edge contour image of a pet foreground target, processing the edge contour image to obtain a maximum circumscribed rectangle, and scratching out the foreground target from the original image based on the maximum circumscribed rectangle to obtain a preprocessed image;
respectively using different feature extraction networks to process and fuse the preprocessed image and the original image, and predicting to obtain the pet category;
filling the outline image of the preprocessed image to obtain a mask image, and sequentially corroding 8-pixel neighborhoods of outline points of the mask image, wherein the method specifically comprises the following steps: judging whether a pixel field contains 3 connected pixels, if so, deleting the point from the contour point, then sequentially judging whether the pixel field contains 3 or 4 connected pixels, 3 or 4 or 5 or 6 or 7 connected pixels, if so, deleting the point from the contour point, if not, retaining the point, obtaining a pseudo skeleton of the foreground target after the judgment of all the contour points is completed, finally detecting whether pixel points in the pseudo skeleton contain 2 or 3 or 4 or 5 or 6 or 7 connected pixels, if so, deleting the point from the contour point, if not, retaining the point, and obtaining the skeleton characteristic of the foreground target after the judgment of all the pixel points in the pseudo skeleton is completed;
and calculating an included angle between the tail node and the body node by using the skeleton characteristics, and determining the mood of the pet based on the included angle and the pet category.
Optionally, calculating an included angle between the tail skeleton and the body skeleton by using the skeleton features includes:
based on a feature extraction network, processing and identifying the preprocessed image to obtain a pet body area and a pet tail area, determining the position of each pixel point contained in the skeleton feature relative to the preprocessed image, taking the pixel point located in the tail area as a tail node, and taking the pixel point located in the body area as a body node;
calculating an included angle between the tail node and the body node according to an included angle formula, wherein the included angle formula is as follows:
Tanα=|(k2-k1)/(1+k1k2)|
wherein alpha is the included angle of the two skeletons, k1 is the slope of the tail skeleton, and k2 is the slope of the body skeleton.
Optionally, when there are two pixel points located in the tail region, the pixel point adjacent to the body node is used as a tail head node, another pixel point is used as a tail node, the included angle between the tail head node and the body node and the included angle between the tail head node and the tail node are respectively calculated according to the included angle formula, and the two obtained included angles are combined with the pet category to determine the mood of the pet.
Optionally, the preprocessing image and the original image are respectively processed and fused by using different feature extraction networks, and the predicting step of obtaining the pet category includes:
performing first feature extraction network processing on the preprocessed image to obtain a local feature image, performing second feature extraction network processing on the original image to obtain a global feature image, performing full-connection layer processing on the global feature image and the local feature image respectively to obtain corresponding one-dimensional vectors, and performing classifier calculation on the one-dimensional vectors to obtain class probabilities corresponding to preset class labels respectively;
carrying out weighted summation on the category probability of the global feature images and the category probability of the local feature images corresponding to the same category labels to obtain the fused category probability; and selecting the category corresponding to the maximum probability value of the fused category as the pet identification result.
Optionally, performing a graying process on the original image to obtain a grayed image, including:
performing feature dimensionality reduction on an original image based on RGB three channels, endowing different weight coefficients to pixel values of each channel, and realizing graying processing of the image to obtain a grayed image, wherein a graying formula is as follows:
Yi=0.3Ri+0.59Gi+0.11Bi
wherein i represents a current image pixel value, Y represents a gray scale value, R represents an R channel pixel value, G represents a G channel pixel value, and B represents a B channel pixel value.
Optionally, performing edge processing on the grayed image based on a Sobel operator to obtain an edge contour image of the pet foreground target, including:
respectively processing the grayed image by a transverse filter and a longitudinal filter to respectively obtain a transverse processing value and a longitudinal processing value corresponding to each pixel point contained in each target pixel block, and performing evolution operation on the square sum of the transverse processing value and the longitudinal processing value corresponding to each pixel point to obtain a convolution value corresponding to each pixel point;
judging whether the convolution value corresponding to each pixel point exceeds a preset threshold value, and if the convolution value exceeds the threshold value, taking the pixel point as an edge pixel point of the foreground target; and combining all edge pixel points of the gray images to form an edge contour image of the foreground target.
Optionally, the processing the edge contour image to obtain a maximum bounding rectangle includes:
obtaining the maximum value and the minimum value of a horizontal coordinate and the maximum value and the minimum value of a vertical coordinate according to the position coordinates of each pixel point of the edge contour image of the foreground target relative to the original image; determining the range of the maximum circumscribed rectangle according to the maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate, wherein the calculation formula is as follows:
Xmin=Min(Xi)
Xmax=Max(Xi)
Ymin=Min(Yi)
Ymax=Max(Yi)
wherein XiAbscissa, Y, representing edge pixel pointsiThe ordinate of the edge pixel point is represented, Min (.) represents a minimum function, Max (.) represents a maximum function, and XminAnd XmaxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissaminAnd YmaxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis.
Optionally, the foreground object is extracted from the original image to obtain a preprocessed image, and a processing formula of the preprocessed image is as follows:
I=A[Xmin:Xmax:Ymin:Ymax]
wherein A represents an original image, XminAnd XmaxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissaminAnd YmaxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis, and I represents the preprocessed image.
Optionally, the classifier is a Softmax classifier, and the calculation formula is
Figure 54780DEST_PATH_IMAGE001
Wherein y in the formulaiOne-dimensional vector, P, representing the fully-connected layer output of the first or second feature extraction network of the ith classsThe output probability of the Softmax classifier is shown, c is the number of classes, and e is a natural constant.
Optionally, the grayed image is respectively processed by a transverse filter and a longitudinal filter, so as to respectively obtain a transverse processing value and a longitudinal processing value corresponding to each pixel point included in each target pixel block, and a specific calculation formula is as follows:
Figure 327630DEST_PATH_IMAGE002
Figure 994234DEST_PATH_IMAGE003
wherein A represents a grayed image, GxAnd GyRespectively representing the convolution results of the Sobel convolution factors in different directions.
Compared with the prior art, the mood recognition method based on pet postures provided by the invention has the advantages that the condition that the tail is not separated from the body is not worried through the method, the included angle between the tail and the body is easy to calculate, the tail cannot be always in a straight state and is also possibly bent, in this case, two nodes are arranged in the tail area, the mood of the pet is judged according to the calculated included angle between the head node of the tail and the tail node of the tail, the included angle between the body and the head node of the tail and the recognized pet type through joint, and the judgment result is more accurate. The body area and the tail area of the pet are determined only by marking the tail and the body area of the acquired pet picture during model training, and compared with marking thirteen key points of the body, the method can reduce the labor cost and the calculation time and improve the working efficiency of a computer.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a flow chart of a method for recognizing a mood based on pet postures according to an embodiment of the present invention;
FIG. 2 is a flow chart of a pet classification recognition method based on pet postures according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a model framework of a method for recognizing a mood based on pet postures according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.
The embodiment provides a mood recognition method based on pet postures, the flow of which is shown in fig. 1 and fig. 2, and the method comprises the following steps:
s1: carrying out graying processing on the original image to obtain a grayed image; and carrying out edge processing on the grayed image based on a Sobel operator to obtain an edge contour image of the pet foreground target, processing the edge contour image to obtain a maximum external rectangle, and scratching out the foreground target from the original image based on the maximum external rectangle to obtain a preprocessed image.
The method comprises the steps of obtaining an original image, wherein the original image only has one target, and the target area of the original image accounts for at least 50% of the original image, and in the embodiment of the invention, the target area is a relevant area related to a pet in a pet image.
In this step, performing a graying process on the original image to obtain a grayed image, including:
performing feature dimensionality reduction on an original image based on RGB three channels, endowing different weight coefficients to pixel values of each channel, and realizing graying processing of the image to obtain a grayed image, wherein a graying formula is as follows:
Yi=0.3Ri+0.59Gi+0.11Bi
wherein i represents a current image pixel value, Y represents a gray scale value, R represents an R channel pixel value, G represents a G channel pixel value, and B represents a B channel pixel value.
Carrying out edge processing on the gray image based on a Sobel operator to obtain an edge contour image of the pet foreground target, wherein the edge contour image comprises the following steps:
s10: respectively processing the grayed image by a transverse filter and a longitudinal filter to respectively obtain a transverse processing value and a longitudinal processing value corresponding to each pixel point contained in each target pixel block, and performing evolution operation on the square sum of the transverse processing value and the longitudinal processing value corresponding to each pixel point to obtain a convolution value corresponding to each pixel point;
first, a horizontal filter and a vertical filter are sequentially applied to each pixel in the grayscale image.
Next, the values in the transversal filter and the longitudinal filter are multiplied by the corresponding gray values in the grayed image, respectively.
The transversal filter result is then added to the longitudinal filter result, the resulting sum being the convolution result of the target pixel.
And finally, performing evolution operation on the convolution result to obtain a convolution value corresponding to each pixel point.
In the embodiment of the present invention, the calculation formula for multiplying the values in the transversal filter and the longitudinal filter with the corresponding gray-scale values in the grayed image respectively is as follows:
Figure 745153DEST_PATH_IMAGE004
Figure 446392DEST_PATH_IMAGE005
wherein A represents a grayed image, GxAnd GyRespectively representing the convolution results of the Sobel convolution factors in different directions.
The convolution result is brought into the formula of the evolution operation:
Figure 155722DEST_PATH_IMAGE006
wherein G represents the convolution value of the image after the evolution operation, GxAnd GyRespectively representing the convolution results of the Sobel convolution factors in different directions.
S11: judging whether the convolution value corresponding to each pixel point exceeds a preset threshold value, and if the convolution value exceeds the threshold value, taking the pixel point as an edge pixel point of the foreground target; and combining all edge pixel points of the gray images to form an edge contour image of the foreground target.
After the same pixel block is processed by the transverse filter and the longitudinal filter, judging whether each pixel point contained in the pixel block exceeds a preset threshold value or not according to the preset threshold value, if so, taking the pixel point as an edge pixel point of the foreground target, otherwise, not doing any operation; then, processing the adjacent pixel blocks in the gray images based on the step length by utilizing a transverse filter and a longitudinal filter to obtain convolution values corresponding to the pixel points contained in the adjacent pixel blocks, and judging the pixel points contained in the adjacent pixel blocks again according to the threshold; and when all the pixel points in the grayed image are judged, acquiring all the edge pixel points.
After the edge contour image is obtained, determining the maximum circumscribed rectangle of the edge contour image, so that the foreground object is extracted from the original image to obtain a preprocessed image.
After the edge contour image is obtained, acquiring a maximum external rectangle by using the position coordinates of each pixel point of the edge contour image relative to the original image; extracting a significant target based on the maximum external rectangle to obtain a preprocessed image, performing first feature extraction network processing on the preprocessed image to obtain a classification result of a local feature image, performing second feature extraction network processing on an original image to obtain a classification result of a global feature image, and fusing the two classification results to obtain a final image classification result, wherein the model classification result is specifically shown in a model frame schematic diagram in fig. 3.
Determining a maximum circumscribed rectangle according to the position coordinates of each pixel point of the edge contour image of the foreground target relative to the original image, wherein the determining comprises the following steps:
obtaining the maximum value and the minimum value of a horizontal coordinate and the maximum value and the minimum value of a vertical coordinate according to the position coordinates of each pixel point of the edge contour image of the foreground target relative to the original image; determining the range of the maximum circumscribed rectangle according to the maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate, wherein the calculation formula is as follows:
Xmin=Min(Xi)
Xmax=Max(Xi)
Ymin=Min(Yi)
Ymax=Max(Yi)
wherein XiAbscissa, Y, representing edge pixel pointsiRepresents the ordinate of the edge pixel point, and Min (.) represents the minimum functionMax (.) denotes the maximum function, XminAnd XmaxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissa axisminAnd YmaxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis.
In the embodiment of the invention, aiming at the original image, the two-dimensional coordinates (x, y) of all the foreground target edge pixel points relative to the original image are obtained, and the maximum coordinate value and the minimum coordinate value relative to the horizontal axis of the image in the coordinates and the maximum coordinate value and the minimum coordinate value of the vertical axis are calculated according to the calculation formula for determining the maximum value and the minimum value.
Determining the range of the maximum circumscribed rectangle according to the maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate; according to the maximum external rectangular range of the foreground target, the foreground image is extracted from the original image to obtain a final preprocessed image, and the processing formula is as follows:
I=A[Xmin:Xmax:Ymin:Ymax]
wherein A represents an original image, XminAnd XmaxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissaminAnd YmaxRespectively representing the longitudinal coordinate value of the minimum edge pixel point and the longitudinal coordinate value of the maximum edge pixel point on the longitudinal axis, and I represents a preprocessed image.
And performing edge extraction on the original image through a Sobel operator, extracting a foreground significant target, extracting local characteristics for the first characteristic extraction network, filtering the interference of background information, and improving the accuracy of classification.
After the foreground object is extracted from the original image and a preprocessed image is obtained, the method further comprises the following steps:
and uniformly adjusting the size of the preprocessed image by using a bilinear interpolation method, so that the extracted significant region of the foreground target is the size required to be input by the first feature extraction network.
S2: and respectively using different feature extraction networks to process and fuse the preprocessed image and the original image, and predicting to obtain the pet category.
In this step, the method comprises the following steps:
s20: performing first feature extraction network processing on the preprocessed image to obtain a local feature image, performing second feature extraction network processing on the original image to obtain a global feature image, performing full-connection layer processing on the global feature image and the local feature image respectively to obtain corresponding one-dimensional vectors, and performing classifier calculation on the one-dimensional vectors to obtain class probabilities corresponding to preset class labels respectively.
S21: carrying out weighted summation on the category probability of the global feature images and the category probability of the local feature images corresponding to the same category labels to obtain the fused category probability; and selecting the category corresponding to the maximum probability value of the fused category as the pet identification result.
Wherein, include the following step in S20:
the method comprises the following steps: and performing first feature extraction network processing on the preprocessed image to obtain a local feature image, and performing convolution calculation on the local feature image by using a convolution kernel with the same size as the local feature image to obtain a one-dimensional vector of the local feature image.
In the embodiment of the invention, a ResNet18 feature extraction network is used as a first feature extraction network, and ResNet18 is used for carrying out feature extraction on the preprocessed image to obtain a local feature image. The structure of the ResNet18 feature extraction network mainly designs a residual connection structure, so that the problem of gradient disappearance or gradient explosion caused by too deep network layer number can be avoided to the maximum extent.
ResNet18 has 18 layers in total, the size of an input image of a network structure is 224 x 224 pixels, the number of channels is 3, the size of the image is reduced to 112 x 112 after the first layer of convolution, the number of channels is increased to 64, the size of the image is further reduced to 56 x 56 after the maximum pooling layer, the number of channels is not increased, the image enters a residual error part after the first two steps of operation, the size of the residual error image is reduced to half of the original size after each part of residual error image is subjected to dimension reduction, the number of channels is increased to 2 times, the image dimension reduction is realized through a convolution layer with the step size of 2, the size of the final image is reduced to 7 x 7 after 4 times of residual error operation, the number of channels is increased to 512, and finally the average pooling layer and the full connection layer are connected.
In the ResNet18 network structure, convolutional layers, max pooling layers, and average pooling layers are mainly involved. The convolution layers have the functions of setting convolution kernels with different sizes and different step lengths to enable the receptive fields of the convolution layers to be different, so that image features in different ranges are extracted, the maximum pooling layer has the function of extracting the most characteristic features in the image and is equivalent to sharpening operation, and the average pooling layer has the function of extracting common features in the image and is equivalent to smoothing operation. The most representative structure in the ResNet18 network is a residual structure, and the output of the network structure of the previous layer and the output of the network structure of the current layer are added and fused, so that the violent change of the gradient in the network training process can be relieved, the phenomenon of gradient disappearance or gradient explosion is avoided, and the method is particularly effective for the deep network structure.
Step two: and performing second feature extraction network processing on the original image to obtain a global feature image, and performing convolution calculation on the global feature image by using a convolution kernel with the same size as the global feature image to obtain a one-dimensional vector of the global feature image.
In the embodiment of the invention, a VGG19 feature extraction network is used as a second feature extraction network, and the original image is subjected to feature extraction by using a VGG19, so as to obtain a global feature image. And performing VGG19 network processing on the original image, performing feature dimensionality reduction for 5 times, and sequentially reducing the dimensionality from 224 multiplied by 224 pixels of the input image to 112 multiplied by 112, 56 multiplied by 56, 28 multiplied by 28, 14 multiplied by 14 and 7 multiplied by 7, wherein the dimensionality reduction operation is realized through a maximum pooling layer, and the number of feature channels is sequentially increased from 3 to 64, 128, 256 and 512.
The VGG19 network structure largely uses 3 x 3 convolution kernels, and the accuracy of the model classification result is improved by increasing the number of small convolution kernels and the depth of the network.
The VGG19 has 19 layers in total, and consists of 16 layers of convolution layers and 3 layers of full connection layers in total, the size of an input image of the network structure is 224 multiplied by 224 pixels, the number of channels is 3, and the VGG19 network structure is characterized in that the effect of the receptive field same as that of a large convolution kernel is realized through a small convolution superposition mode; and all the convolution layers do not carry out dimension reduction operation, and the dimension reduction of the feature diagram is realized by using the maximum pooling layer. The network structure of the VGG19 is simple, and only involves the convolutional layer, the max-pooling layer and the full-link layer. The convolution layers have the effect that the perception fields of all the convolution layers are different by setting convolution kernels with different sizes and different step lengths, so that the image features in different ranges are extracted, and the maximum pooling layer has the effect of extracting the most characteristic features in the image and is equivalent to sharpening.
Step three: and performing classifier calculation on the one-dimensional vectors to respectively obtain class probabilities corresponding to preset class labels.
Wherein the classifier is a Softmax classifier, and the calculation formula is
Figure 44044DEST_PATH_IMAGE007
Y in the formulaiOne-dimensional vector, P, representing the fully-connected layer output of the first or second feature extraction network of the ith classsThe output probability of the Softmax classifier is shown, c is the number of classes, and e is a natural constant.
In S21, the method includes:
carrying out weighted summation on the category probability of the global feature image and the category probability of the local feature image corresponding to the same category label, wherein the processing formula of the weighted summation is as follows:
Figure 864233DEST_PATH_IMAGE008
wherein P is1Probability of local feature image, P2And in the model training process, adjustment is carried out based on the prediction image category result.
Considering that some features of the local feature image may be neglected to influence the classification result, the original image is also processed by the feature extraction network to obtain the global feature image, and the classification results of the local feature image and the global feature image are comprehensively considered to improve the accuracy of the classification result.
For the accuracy of classification, the fused class probability and the corresponding class label form a training data set, and the training data set is used for carrying out supervised training on an image classification model to obtain the trained image classification model; and classifying the target in the image by using the trained image classification model.
Aiming at the feature extraction results of the two image granularities, acquiring the corresponding class probabilities of the two image granularities, and mapping the class probabilities to the same probability space; and performing supervised training based on the probability space and the image label to obtain an image classification model, wherein the feature vector output by the first feature extraction network and the feature vector output by the second feature extraction network are respectively used as the input of two full connection layers, performing image classification on the output of the two full connection layers by adopting a Softmax classifier to obtain corresponding class probabilities, and performing corresponding weighted summation on the two class probabilities to obtain a final image classification result.
Considering that different pets have the same angle but different emotions, the mood of the pet is comprehensively evaluated according to the category of the pet and the angle between the tail and the body of the pet in order to make the mood recognition result more accurate.
S3: and filling the outline image of the preprocessed image to obtain a mask image, and corroding 8-pixel neighborhoods of outline points of the mask image in sequence to obtain the skeleton characteristic of the foreground target.
The method specifically comprises the steps of judging whether a pixel field contains 3 connected pixels or not, deleting the point from a contour point if the pixel field contains the 3 connected pixels, sequentially judging whether the pixel field contains the 3 or 4 connected pixels, the 3 or 4 or 5 or 6 or 7 connected pixels or not, deleting the point from the contour point if the pixel field contains the 3 or 4 connected pixels, the 3 or 4 or 5 or 6 or 7 connected pixels, keeping the point if the pixel field does not contain the 3 or 4 connected pixels, obtaining a pseudo skeleton of a foreground object after all contour points are judged to be finished, finally detecting whether pixel points in the pseudo skeleton contain the 2 or 3 or 4 or 5 or 6 or 7 connected pixels or not, deleting the point from the contour point if the pixel field contains the 2 or 4 or 5 or 6 or 7 connected pixels, keeping the point if the pixel points do not contain the pseudo skeleton, and obtaining the skeleton characteristics of the foreground object after all pixel points in the pseudo skeleton are judged to be finished.
S4: and calculating an included angle between the tail skeleton and the body skeleton by using the skeleton characteristics, and determining the mood of the pet based on the included angle and the pet category.
In this step, calculating an included angle between the tail skeleton and the body skeleton by using the skeleton characteristics, including:
and processing and identifying the body area and the tail area of the pet obtained by the preprocessing image based on a feature extraction network, determining the position of each pixel point contained in the skeleton feature relative to the preprocessing image, taking the pixel point positioned in the tail area as a tail node, and taking the pixel point positioned in the body area as a body node.
Calculating an included angle between the tail node and the body node according to an included angle formula, wherein the included angle formula is as follows:
Tanα=|(k2-k1)/(1+k1k2)|
wherein alpha is the included angle of the two skeletons, k1 is the slope of the tail skeleton, and k2 is the slope of the body skeleton.
When the number of the pixel points in the tail area is two, the pixel point adjacent to the body node is used as a tail head node, the other pixel point is used as a tail node, the included angle between the tail head node and the body node and the included angle between the tail head node and the tail node are respectively calculated according to the included angle formula, and the two obtained included angles are combined with the pet category to determine the mood of the pet.
The method comprises the steps that a body area and a tail area of a pet which are obtained through recognition are obtained according to a trained model, the model is obtained through collecting a large number of pet pictures, graying, Sobel operators and maximum external rectangle obtaining processing are carried out to obtain a preprocessed image, the preprocessed image is marked with a target detection frame and marks of a tail and the body area, feature extraction is carried out through a first feature extraction network, then the range of each mark frame in the pet pictures is predicted through a target detection algorithm, and learning is continuously carried out to obtain the pet image.
According to the method, the body area and the tail area of the pet are determined only by marking the tail and the body area of the acquired pet picture during model training, and compared with the method for marking thirteen key points of the body in the prior art, the method can reduce the labor time and the calculation time and improve the working efficiency of a computer.
The method comprises the steps of filling a contour image of an edge contour image of a pet foreground target to obtain a contour of a mask binary image, continuously corroding the contour to obtain skeleton characteristics, and judging the mood of a pet at that time through an included angle between a tail skeleton and a body skeleton and the category of the pet; considering that when a pet clings a tail to a body, the tail and the body part are not easy to distinguish by processing an image through a neural network, so that whether the 8-pixel neighborhood of each point in the outline meets the requirement or not is judged, the 8-pixel neighborhood is continuously corroded to obtain the skeleton characteristic of the pet target, the source of the key point is more accurate, the condition that the tail is not separated from the body is not worried, the included angle between the tail and the body is easy to calculate, and the tail cannot be in a straight state and possibly bent.
In summary, the embodiment of the invention discloses a mood identification method based on pet postures, which comprises the following steps: performing edge processing on the gray image obtained by the original processing to obtain an edge contour image of the pet foreground target; and filling the contour image to obtain a mask image of the foreground target, continuously corroding the mask image, extracting skeleton characteristics of the foreground target, calculating an included angle between a tail skeleton and a body skeleton by using the skeleton characteristics, and determining the mood of the pet based on the included angle and the pet category. The method has the advantages that the situation that the tail is not separated from the body is not worried, the included angle between the tail and the body is easy to calculate, and in order to achieve the identification accuracy, the pet classification identification is added. And determining that the body area and the tail area of the pet only need to mark the tail and the body area of the acquired pet picture during model training, and compared with marking thirteen body key points, the method can reduce the labor time and the calculation time and improve the working efficiency of a computer.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to each embodiment or some parts of the embodiments.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A mood recognition method based on pet postures is characterized by comprising the following steps:
carrying out graying processing on the original image to obtain a grayed image; carrying out edge processing on the grayed image based on a Sobel operator to obtain an edge contour image of a pet foreground target, processing the edge contour image to obtain a maximum circumscribed rectangle, and scratching out the foreground target from the original image based on the maximum circumscribed rectangle to obtain a preprocessed image;
respectively using different feature extraction networks to process and fuse the preprocessed image and the original image, and predicting to obtain the pet category;
filling the outline image of the preprocessed image to obtain a mask image, and sequentially corroding 8-pixel neighborhoods of outline points of the mask image, wherein the method specifically comprises the following steps: judging whether a pixel field contains 3 connected pixels, if so, deleting the point from the contour point, then sequentially judging whether the pixel field contains 3 or 4 connected pixels, 3 or 4 or 5 or 6 or 7 connected pixels, if so, deleting the point from the contour point, if not, retaining the point, obtaining a pseudo skeleton of the foreground target after the judgment of all the contour points is completed, finally detecting whether pixel points in the pseudo skeleton contain 2 or 3 or 4 or 5 or 6 or 7 connected pixels, if so, deleting the point from the contour point, if not, retaining the point, and obtaining the skeleton characteristic of the foreground target after the judgment of all the pixel points in the pseudo skeleton is completed;
and calculating an included angle between the tail node and the body node by using the skeleton characteristics, and determining the mood of the pet based on the included angle and the pet category.
2. The pet posture-based mood recognition method as recited in claim 1, wherein calculating an angle between a tail skeleton and a body skeleton using the skeleton features comprises:
based on a feature extraction network, processing and identifying the preprocessed image to obtain a pet body area and a pet tail area, determining the position of each pixel point contained in the skeleton feature relative to the preprocessed image, taking the pixel point located in the tail area as a tail node, and taking the pixel point located in the body area as a body node;
calculating an included angle between the tail node and the body node according to an included angle formula, wherein the included angle formula is as follows:
Tanα=|(k2-k1)/(1+k1k2)|
wherein alpha is the included angle of the two skeletons, k1 is the slope of the tail skeleton, and k2 is the slope of the body skeleton.
3. The method as claimed in claim 2, wherein when there are two pixel points located in the tail region, the pixel point adjacent to the body node is used as a tail head node, the other pixel point is used as a tail node, the included angle between the tail head node and the body node and the included angle between the tail head node and the tail node are respectively calculated according to the included angle formula, and the two included angles are combined with the pet category to determine the mood of the pet.
4. The method as claimed in claim 1, wherein the step of predicting the pet category by processing and fusing the preprocessed image and the original image respectively through different feature extraction networks comprises:
performing first feature extraction network processing on the preprocessed image to obtain a local feature image, performing second feature extraction network processing on the original image to obtain a global feature image, performing full-connection layer processing on the global feature image and the local feature image respectively to obtain corresponding one-dimensional vectors, and performing classifier calculation on the one-dimensional vectors to obtain class probabilities corresponding to preset class labels respectively;
carrying out weighted summation on the category probability of the global feature images and the category probability of the local feature images corresponding to the same category labels to obtain the fused category probability; and selecting the category corresponding to the maximum probability value of the merged category as a pet recognition result.
5. The method as claimed in claim 1, wherein the graying processing is performed on the original image to obtain a grayed image, and the method comprises:
performing feature dimensionality reduction on an original image based on RGB three channels, endowing different weight coefficients to pixel values of each channel, and realizing graying processing of the image to obtain a grayed image, wherein a graying formula is as follows:
Yi=0.3Ri+0.59Gi+0.11Bi
where i represents the current image pixel value, Y represents the grayscale value, R represents the R channel pixel value, G represents the G channel pixel value, and B represents the B channel pixel value.
6. The method for recognizing the mood based on the pet posture as claimed in claim 1, wherein the edge processing is performed on the grayed image based on a Sobel operator to obtain an edge contour image of a pet foreground object, comprising:
respectively processing the grayed image by a transverse filter and a longitudinal filter to respectively obtain a transverse processing value and a longitudinal processing value corresponding to each pixel point contained in each target pixel block, and performing evolution operation on the square sum of the transverse processing value and the longitudinal processing value corresponding to each pixel point to obtain a convolution value corresponding to each pixel point;
judging whether the convolution value corresponding to each pixel point exceeds a preset threshold value, and if the convolution value exceeds the threshold value, taking the pixel point as an edge pixel point of the foreground target; and combining all edge pixel points of the gray images to form an edge contour image of the foreground target.
7. The pet-pose based mood recognition method as recited in claim 1, wherein processing the edge contour image to obtain a maximum bounding rectangle comprises:
obtaining the maximum value and the minimum value of a horizontal coordinate and the maximum value and the minimum value of a vertical coordinate according to the position coordinates of each pixel point of the edge contour image of the foreground target relative to the original image; determining the range of the maximum circumscribed rectangle according to the maximum value and the minimum value of the abscissa and the maximum value and the minimum value of the ordinate, wherein the calculation formula is as follows:
Xmin=Min(Xi)
Xmax=Max(Xi)
Ymin=Min(Yi)
Ymax=Max(Yi)
wherein XiAbscissa, Y, representing edge pixeliOrdinate representing edge pixel point, Min (.) representing minimum function, Max (.) representing maximum function, and XminAnd XmaxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissaminAnd YmaxAnd respectively representing the ordinate value of the minimum edge pixel point and the ordinate value of the maximum edge pixel point on the longitudinal axis.
8. The pet posture-based mood recognition method as recited in claim 1, wherein the foreground object is extracted from the original image to obtain a preprocessed image, and a processing formula of the preprocessed image is as follows:
I=A[Xmin:Xmax:Ymin:Ymax]
wherein A represents an original image, XminAnd XmaxRespectively representing the abscissa value of the minimum edge pixel point and the abscissa value, Y, of the maximum edge pixel point on the abscissaminAnd YmaxRespectively representing the longitudinal coordinate value of the minimum edge pixel point and the longitudinal coordinate value of the maximum edge pixel point on the longitudinal axis, and I represents a preprocessed image.
9. The pet posture-based mood recognition method as recited in claim 4, wherein the classifier is a Softmax classifier, and the calculation formula is
Figure DEST_PATH_IMAGE001
Wherein y in the formulaiOne-dimensional vector, P, representing the fully-connected layer output of the first or second feature extraction network of the ith classsThe output probability of the Softmax classifier is shown, c is the number of classes, and e is a natural constant.
10. The pet-posture-based mood recognition method as recited in claim 6, wherein the grayed image is respectively processed by a horizontal filter and a vertical filter to obtain a horizontal processing value and a vertical processing value corresponding to each pixel point included in each target pixel block, and the specific calculation formula is as follows:
Figure 266127DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
wherein A represents a grayed image, GxAnd GyRespectively representing the convolution results of the Sobel convolution factors in different directions.
CN202210653859.7A 2022-06-10 2022-06-10 Mood recognition method based on pet posture Pending CN114724190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210653859.7A CN114724190A (en) 2022-06-10 2022-06-10 Mood recognition method based on pet posture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210653859.7A CN114724190A (en) 2022-06-10 2022-06-10 Mood recognition method based on pet posture

Publications (1)

Publication Number Publication Date
CN114724190A true CN114724190A (en) 2022-07-08

Family

ID=82232426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210653859.7A Pending CN114724190A (en) 2022-06-10 2022-06-10 Mood recognition method based on pet posture

Country Status (1)

Country Link
CN (1) CN114724190A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110018717A1 (en) * 2009-07-23 2011-01-27 Casio Computer Co., Ltd. Animal emotion display system and method
CN106308822A (en) * 2016-08-18 2017-01-11 深圳市沃特沃德股份有限公司 Method and system for judging mood of animal
CN108663026A (en) * 2018-05-21 2018-10-16 湖南科技大学 A kind of vibration measurement method
CN110457999A (en) * 2019-06-27 2019-11-15 广东工业大学 A kind of animal posture behavior estimation based on deep learning and SVM and mood recognition methods
CN112395977A (en) * 2020-11-17 2021-02-23 南京林业大学 Mammal posture recognition method based on body contour and leg joint skeleton

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110018717A1 (en) * 2009-07-23 2011-01-27 Casio Computer Co., Ltd. Animal emotion display system and method
CN106308822A (en) * 2016-08-18 2017-01-11 深圳市沃特沃德股份有限公司 Method and system for judging mood of animal
CN108663026A (en) * 2018-05-21 2018-10-16 湖南科技大学 A kind of vibration measurement method
CN110457999A (en) * 2019-06-27 2019-11-15 广东工业大学 A kind of animal posture behavior estimation based on deep learning and SVM and mood recognition methods
CN112395977A (en) * 2020-11-17 2021-02-23 南京林业大学 Mammal posture recognition method based on body contour and leg joint skeleton

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李文娜等: "获取医学灰度图像轮廓图的算法", 《中国组织工程研究与临床康复》 *

Similar Documents

Publication Publication Date Title
CN110738101B (en) Behavior recognition method, behavior recognition device and computer-readable storage medium
JP6664163B2 (en) Image identification method, image identification device, and program
US8175384B1 (en) Method and apparatus for discriminative alpha matting
US20210264144A1 (en) Human pose analysis system and method
CN108280397B (en) Human body image hair detection method based on deep convolutional neural network
KR101198322B1 (en) Method and system for recognizing facial expressions
JP4905931B2 (en) Human body region extraction method, apparatus, and program
CN112633144A (en) Face occlusion detection method, system, device and storage medium
CN110909618B (en) Method and device for identifying identity of pet
CN111784747B (en) Multi-target vehicle tracking system and method based on key point detection and correction
JP4699298B2 (en) Human body region extraction method, apparatus, and program
US20210133980A1 (en) Image processing apparatus, training apparatus, image processing method, training method, and storage medium
CN110879982A (en) Crowd counting system and method
JP2021503139A (en) Image processing equipment, image processing method and image processing program
CN112784712B (en) Missing child early warning implementation method and device based on real-time monitoring
CN111882555B (en) Deep learning-based netting detection method, device, equipment and storage medium
JP7300027B2 (en) Image processing device, image processing method, learning device, learning method, and program
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN111553217A (en) Driver call monitoring method and system
CN111723688A (en) Human body action recognition result evaluation method and device and electronic equipment
CN110751163B (en) Target positioning method and device, computer readable storage medium and electronic equipment
CN111723614A (en) Traffic signal lamp identification method and device
CN113283306B (en) Rodent identification analysis method based on deep learning and migration learning
CN114724190A (en) Mood recognition method based on pet posture
CN114581709A (en) Model training, method, apparatus, and medium for recognizing target in medical image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination