CN113989836A

CN113989836A - Dairy cow face weight recognition method, system, equipment and medium based on deep learning

Info

Publication number: CN113989836A
Application number: CN202111219484.5A
Authority: CN
Inventors: 高月芳; 肖冬冬; 陈晓朗; 刘财兴; 麦凯湛
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-01-28
Anticipated expiration: 2041-10-20
Also published as: CN113989836B

Abstract

The invention discloses a deep learning-based cow face weight recognition method, a system, equipment and a medium for cows, wherein the method comprises the following steps: acquiring a cow face data set; constructing a first cow face weight recognition network combining global features and local features; improving the local feature extraction of the first cow face recognition network to obtain a second cow face recognition network; training the second cow face weight recognition network by using the cow face data set to obtain a third cow face weight recognition network; acquiring a cow face image of a cow to be identified; and inputting the cow face image of the cow to be identified into a third cow face re-identification network to realize re-identification of the cow face image of the cow to be identified. The method can improve the identification accuracy and the retrieval ordering capability of the cow faces, and the improved network solves the problems of alignment and information integrity of local areas.

Description

Dairy cow face weight recognition method, system, equipment and medium based on deep learning

Technical Field

The invention belongs to the technical field of target re-identification in computer vision, and particularly relates to a method, a system, equipment and a medium for identifying cow face re-identification based on deep learning.

Background

The large-scale, standardized, systematized, intelligent and precise breeding and management mode is the most practical and prospective development trend in the current animal husbandry and breeding industry. In recent years, highly scaled, standardized forms have been developed in the dairy farming industry, but they are still in the preliminary stage in terms of intelligent and precise breeding and management, because precise location and identification of the dairy cows of the basic task are still in the research stage. The traditional identification method is to make some identity identification marks on the individual cows, which can not only damage the body of the cows, but also is only suitable for manual identification; although electronic equipment methods such as a Radio Frequency Identification (RFID) technology can carry out automatic identification, the electronic equipment methods have the defects of easy damage and high maintenance cost; the biological characteristic method, such as identification by utilizing physiological information of the dairy cow, such as nasal veins, irises and the like, needs to artificially select a characteristic region and design characteristic expression, needs an image of the front angle of the dairy cow and has poor generalization; none of these approaches is well suited to the needs of this basic task.

The method is a core technology and an important basis for realizing intelligent and information breeding and management technologies, and can meet the requirements of development and realization of subsequent intelligent management technologies such as yield level analysis, disease prevention and control early warning, welfare release management and the like only if the technology is realized. Currently, the methods such as electronic ear tags and radio frequency identification are used in the aspect of automatic identification of cattle, but the methods still have the limitations that the identification prop is easy to lose, the equipment has cost, the distance range is limited, and the like.

Disclosure of Invention

In view of the above, the present invention provides a method, a system, a device, and a medium for identifying cow face weight based on deep learning, which construct a cow face weight identification network by combining high-dimensional global features and local features, and extract mid-dimensional features to enhance the discrimination degree of final combined feature expression, improve the identification accuracy and retrieval ordering capability of cow face, improve the local feature extraction of the cow face weight identification network, and solve the problems of alignment and information integrity in local regions.

The invention aims to provide a cow face weight recognition method based on deep learning.

The second aim of the invention is to provide a cow face weight recognition system for basic deep learning.

It is a third object of the invention to provide a computer apparatus.

It is a fourth object of the present invention to provide a storage medium.

The first purpose of the invention can be achieved by adopting the following technical scheme:

a deep learning-based cow face weight recognition method comprises the following steps:

acquiring a cow face data set;

constructing a first cow face weight recognition network combining global features and local features;

improving the local feature extraction of the first cow face recognition network to obtain a second cow face recognition network;

training the second cow face weight recognition network by using the cow face data set to obtain a third cow face weight recognition network;

acquiring a cow face image of a cow to be identified;

and inputting the cow face image of the cow to be identified into a third cow face re-identification network to realize re-identification of the cow face image of the cow to be identified.

Further, the first cow face recognition network uses ResNet50 as a main network and comprises three branches, wherein the three branches are a Global branch, a Middle branch and a Part branch respectively;

the Global branch is used for extracting high-dimensional Global features passing through the backbone network;

the Middle branch is used for extracting the global feature of the Middle dimension of the backbone network;

the Part branch is used for extracting local features of blocks with uniform sizes by using uniform segmentation.

Further, local feature extraction to first milk cow face heavy identification network improves, obtains second milk cow face heavy identification network, specifically is:

aiming at the Part branch, an attention mechanism STN module is adopted to replace uniform segmentation, and four STN modules are arranged to extract different local characteristic information of the cow face to obtain a second cow face re-identification network;

the STN module comprises a local network, a grid generator and a sampler, wherein the local network is used for regressing a transformation parameter theta and limits a transformation parameter matrix generated by the local network; the grid generator is a sampling network constructed according to the transformation parameter theta, and the sampler generates an output characteristic diagram by using the sampling grid and the input characteristic diagram as input at the same time.

Further, utilize milk cow face data set to train second milk cow face heavy identification network, specifically include:

inputting the cow face image in the cow face data set into a second cow face weight recognition network to generate a characteristic map;

performing global average pooling and global maximum pooling on the feature map to generate a first feature vector and a second feature vector with the same dimensionality;

adding the first feature vector and the second feature vector to obtain a third feature vector;

sending the third feature vector into a classifier, and generating a fourth feature vector through the first linear layer and the BN layer;

enabling the fourth feature vector to pass through the last linear layer to obtain a classification prediction score;

in the training process, the combination of losses generated by the Global branch, the Middle branch and the Part branch is used for optimizing the network parameters.

Further, the network parameters are optimized by using the combination of losses generated by the Global branch, the Middle branch and the Part branch, and the calculation method is as follows:

L＝L_T/6+L_ID/8

where 6 means that the three branches add up to 6 triplet penalties, i.e. average triplet penalties are used; 8 denotes that the collection is 8 classification losses, i.e. using the average classification loss, L_TShowing that the 2 triplet losses in each branch add up to give a total loss, L_IDIndicating that 1 in Global branch and Middle branch, and 6 in Part branch, are also added together to get the total penalty.

Furthermore, the grid generator obtains output points, namely the points of the input feature map, after sampling and changing the points in the input image

And output points of the feature map

The corresponding relationship of (A) is as follows:

wherein A is_θThe rightmost 3 × 2 transformation matrix is a transformation parameter for 2D affine transformation generated by the local network.

Further, the limiting the transformation parameter matrix generated by the local network includes:

the transformation parameter matrix is formally constrained using a 3 x 2D affine transformation matrix of matrix 1.3:

constraining each parameter in the transformation parameter matrix as follows:

the range of scaling parameters sx and sy is limited as follows:

L_SS＝(max(|sx|-α,0))²+(max(|sx|-α,0))²

the positive value constraint on scaling transformation parameters sx and sy is as follows:

L_SP＝max(0,β-sx)+max(0,β-sy)

the range limitation for the translation transformation parameters tx and ty is as follows:

L_TS＝(max(|tx|-γ,0))²+(max(|ty|-γ,0))²

the mutually exclusive constraint on the translation transformation parameters tx and ty is as follows:

where α, β, γ, δ are limiting parameters, and N is the number of STN module branches used.

The second purpose of the invention can be achieved by adopting the following technical scheme:

a deep learning based cow face weight recognition system, the system comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a cow face data set of a cow;

the construction unit is used for constructing a first cow face weight recognition network combining the global features and the local features;

the improvement unit is used for improving the local feature extraction of the first cow face recognition network to obtain a second cow face recognition network;

the training unit is used for training the second cow face weight recognition network by utilizing the cow face data set to obtain a third cow face weight recognition network;

the second acquisition unit is used for acquiring a cow face image of the cow to be identified;

and the re-identification unit is used for inputting the cow face image of the cow to be identified into a third cow face re-identification network to realize the re-identification of the cow face image of the cow to be identified.

Compared with the prior art, the invention has the following beneficial effects:

the cow face weight recognition network not only combines high-dimensional global features and local features, but also extracts intermediate-dimensional features to enhance the discrimination degree of final combined feature expression; in addition, global average pooling and global maximum pooling are simultaneously used for the feature maps generated by network extraction, and the two pooled feature maps are superposed to enhance the expression of the network; meanwhile, classification loss and triple loss are used for optimizing the network, the operation of label smoothing is used for the classification loss, the network identification capacity can be further improved, the triple loss is used for optimizing intra-class distribution and inter-class differences, the identification accuracy and the retrieval and sequencing capacity of the cow face are improved, the local feature extraction branches of the network are improved while the characteristics and the advantages are inherited, an STN module is used for extracting local features to replace an original uniform segmentation and blocking method, and the problems of alignment and information integrity of the local regions are solved while the local regions with higher discrimination are extracted.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a flowchart of a deep learning-based cow face re-identification method for cows in embodiment 1 of the invention.

Fig. 2 is a structural diagram of GPN of embodiment 1 of the present invention.

FIG. 3 is a block diagram of GPN-ST according to embodiment 1 of the present invention.

Fig. 4 is a structural diagram of an STN module according to embodiment 1 of the present invention.

Fig. 5 is a local network diagram of a GPN-ST using STN according to embodiment 1 of the present invention.

FIG. 6 is a diagram showing the visualization effect of GPN-ST according to embodiment 1 of the present invention.

Fig. 7 is a block diagram of a structure of a deep learning-based cow face re-identification system in embodiment 2 of the present invention.

Fig. 8 is a block diagram of a computer device according to embodiment 3 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

Example 1:

monitoring cameras are mostly installed in dairy farms, and meanwhile, vision technologies such as detection and identification based on image data are widely applied, so that the deep learning technology and the computer vision technology are combined in the embodiment, a deep learning model is constructed by taking the face of a dairy cow as an object, and automatic positioning and identification of the face of the dairy cow in the image data are realized. In order to facilitate the application to each dairy farm, improve the universality and reduce the use cost, the embodiment adopts a Re-Identification (ReID) technology to identify the face of the dairy cow. Meanwhile, because the image technology based on computer vision is adopted in the embodiment, the original cow image data can be applied to positioning and identification, and can also be applied to the follow-up intelligent management technologies such as cow state analysis and disease analysis based on images, so that the embodiment is an important basic task for realizing intelligent and information management.

As shown in fig. 1, the present embodiment provides a method for identifying cow face weight based on deep learning, which includes the following steps:

s101, acquiring a cow face data set.

The cow face data set is formed by a series of processing including picture collection, cleaning and screening, cow face extraction, cow face picture classification, data division and the like.

S102, constructing a first cow face re-identification network combining the global features and the local features.

The first cow face recognition network gpn (Global and Part netwok) takes ResNet50 as a main network, and has three branches, namely a Global branch, a Middle branch and a Part branch, which are specifically described as follows:

1) removing ResNet50 of the final pooling Layer and the full connection Layer as a backbone network, and adjusting the convolution step of downsampling Layer4 of the backbone network ResNet50 to be 1;

2) reducing the input multi-dimensional vector into a first linear layer with 512 dimensions, a BatchNormalize layer and a final linear layer for classification;

3) there are three branches, namely a Middle branch for extracting a Global feature of a Middle dimension of the backbone network, a Global branch for extracting a Global feature of a high dimension passing through the entire backbone network, and a Part branch for extracting a local feature of a uniform block size using uniform segmentation.

3.1) Global branch, which is the middle part of FIG. 2, is a branch for extracting network high-dimensional Global features. The method comprises the steps that generated by Layer4 of a backbone network ResNet50, Global Average Pooling (GAP) and Global Maximum Pooling (GMP) are carried out on feature maps generated by the whole backbone network, feature vectors G _ Avg and G _ Max which are 2048 dimensions are generated respectively, and then the two feature vectors are added to obtain a new feature vector G of 2048 dimensions; then the vector G is sent into a Classifier _ global, a 512-dimensional vector G _ Down is generated through a first linear layer and a BN layer, and then a classification prediction score is obtained through a last linear layer; in this branch, G _ Avg and G _ Max use triplet losses for loss calculation; the classification prediction score is calculated by classification loss, and Softmax cross entropy loss after label smoothing is used in the experiment.

3.2) Middle branch, which is the upper part of FIG. 2, is a branch for extracting dimension global features in the network; similar to the Global branching process, extracting a feature map generated by Layer3 of a backbone network ResNet50, then performing Global average pooling GAP and Global maximum pooling GMP to generate feature vectors M _ Avg and M _ Max with 1024 dimensions, and adding the two feature vectors to obtain a new feature vector M with 1024 dimensions; then, the feature vector M is sent to a Classifier _ Middle, a 512-dimensional vector M _ Down is generated through a first linear layer and a BN layer, and then a classification prediction score is obtained through a last linear layer; similarly, M _ Avg and M _ Max use triple loss for loss calculation, and the classification prediction scores are loss calculated by Softmax cross entropy loss after tag smoothing.

3.3) the Part branch, i.e. the lower Part of FIG. 2, is a branch for extracting local features in a way of uniformly dividing blocks of uniform size like a PCB; generating two characteristic diagrams of 2048 × 6 × 1 by using a characteristic diagram generated after the whole backbone network ResNet50 and using global average pooling GAP and global maximum pooling GMP, respectively, then expanding the two characteristic diagrams into characteristic vectors P _ Avg and P _ Max of 10288 dimensions, and superposing the two characteristic vectors to obtain a new characteristic vector P of 2048 × 6 × 1 size; extracting 6 2048-dimensional local feature vectors from the feature vector P in a mode of uniformly dividing the feature vector P to obtain uniform-size horizontal stripes, setting 6 classifiers Classifier _ Part, and respectively inputting the 6 local feature vectors; each feature vector generates a corresponding 512-dimensional vector P through a first linear layer and a BN layer_{i_Down}Then, 6 classification prediction scores are obtained through the last linear layer and respectively correspond to the 6 blocks; likewise, P _ Avg and P _ Max use triplet penalties for penalty calculation, and 6 classification prediction scores are penalized by Softmax cross entropy penalties after tag smoothing.

The network parameters need to be optimized by using the combination of losses generated by the three branches in the training process. The two triplet losses in each branch are added to obtain the total loss L_TThe Global branch and 1 of Middle branch, 6 classification losses in Part branch are also added to obtain total loss L_IDThe final overall loss calculation method of the network is shown in the following formula (1):

L＝L_T/6+L_ID/8 (1)

where 6 means that the three branches add up to 6 triplet penalties, i.e. average triplet penalties are used; the same theory 8 indicates that the collection is 8 classification losses, i.e. the average classification loss is used. During testing, G _ Down in Global branch, M _ Down in Middle branch and 6P in Part branch which are also 512-dimensional are measured_{i_Down}And the vectors with 4096 dimensions are obtained by splicing 8 vectors in total and are used as the feature expression of similarity contrast.

Comprehensively, the first cow face weight recognition network GPN is characterized in that: not only is the high-dimensional global feature and local feature combined mode, but also the intermediate-dimensional feature is extracted to enhance the discrimination degree of the final combined feature expression; the feature maps generated by network extraction are subjected to global average pooling and global maximum pooling simultaneously, and the two pooled feature maps are superposed to enhance the expression of the network; meanwhile, classification loss and triple loss are used for optimizing the network, the network identification capability can be improved by using label smoothing operation on the classification loss, the intra-class distribution and the inter-class gap of the triple loss are optimized, and the performance can be effectively improved by using the two losses together.

The performance experiment of the first cow face recognition network GPN of this embodiment is as follows:

1) and carrying out performance test on the single picture.

Firstly, a performance test experiment is carried out by taking Gallery1 with only 1 picture in each category as a Gallery set, ResNet50 as a reference line (Baseline) of a main network extracts MiddleNet from a inspiration source of characteristics of network middle dimensionality, and a PCB with a mechanism of obtaining local characteristics of blocks with uniform size by using the same uniform segmentation is compared with the GPN provided by the embodiment in test performance.

As a result, as shown in Table 1, the performance of GPN was 87.2% and 89.1% in Rank-1 accuracy and mAP performance, respectively. Meanwhile, on the two performance indexes, the accuracy of Rank-1 is higher by 0.3% and mAP is lower by 1.6% compared with ResNet50 as Baseline, the accuracy of Rank-1 is higher by 4.4% and 3.6% compared with MiddleNet, and the accuracy of mAP is higher by 1.8% compared with MGN. Compared with Baseline, MiddleNet and PCB, GPN not only integrates the characteristics and advantages of the networks, but also can organically combine the characteristics to obtain a proper recognition effect.

In comparison with MGN, it can be observed from the table that GPN has almost the same real-time difference with MGN in terms of Rank-5 and Rank-10 accuracy, while MGN is higher than GPN in the precision index mAP of retrieval ordering, which is attributed to the fact that MGN uses Re-ranking reordering algorithm in the test process. The Re-ranking algorithm adds the query pictures into the original ranking, and Re-ranks the pictures in the front part of the original ranking, so that the accuracy and mAP can be effectively improved.

GPN can achieve the same performance without using this algorithm, even a bit better at Rank-1 accuracy, which can be considered to be a better choice than MGN on the network.

TABLE 1 comparison of test results for GPN on Gallery1 with other networks

2) And carrying out performance test on a plurality of pictures.

Table 2 shows the Rank-1 accuracy and mAP results of further GPN tests at galleries 2-20, as well as the performance at Gallery1 for comparison. From the table, it can be seen that the recognition performance of GPN is also gradually increased with the increase of the number of pictures of the gallery set, and the variation range of ± 1% is maintained in the retrieval accuracy, i.e., the performance of the maps, and it can be seen that the balance between the number of pictures and the recognition performance is also reached when the number of pictures of the gallery set is increased to 4 and 5, which is consistent with the analysis result in the previous experiment.

Comparing table 2 with the performances of MiddleNet, PCB and MGN, it can be seen that even as the number of pictures in a gallery set increases, GPN can still lead the performance of Rank-1 accuracy on the same gallery set, while GPN leads the rest of the network on the performance of maps in addition to MGN, and GPN is slightly inferior to MGN on the performance of maps for the reason that MGN optimizes ranking using Re-ranking algorithm as described in the above paragraph.

TABLE 2 Rank-1 accuracy and mAP on Galley1 to 20 for GPN

3) Model simplification tests were also performed on Gallery1 Gallery sets, i.e., some of the modules in the model were removed to see if there was a change in the performance of the model.

According to the Okam's rule, a simple method should be chosen if the same effect can be achieved with a simple and complex method. Thus, in order to test the effect of using two different global pooling, one global pooling only with global average and one global maximum is tested, while the other global pooling is removed from the network.

In order to test the influence of the Middle-dimension characteristics of the extracted network on the effect, removing a Middle branch of the network, namely testing the network only having a Global branch and a Part branch; since SE-ResNet50 performs well, it is attempted to replace the backbone network ResNet50 in GPN and performance tests are done on Gallery1 as well.

The results of these modified tests are shown in table 3 and compared to the original GPN. On the two performance indexes of Rank-1 accuracy and mAP, the network using Global average pooling only is reduced by 2.8% and 2.5% respectively, the network using Global maximum pooling only is reduced by 3.4% and 3.1% respectively, the network using Global branch and Part branch only is reduced by 1.1% and 1.0% respectively, and GPN obtained by replacing the main network with SE-ResNet50 only obtains expressions of 85.8% and 87.8% respectively, and is different from original GPN by 1.4% and 1.3%.

It can be seen from the comparison of these experimental effects that these mechanisms and modules in the GPN all have their functions, but are all indispensable, and can be integrated to improve the performance of the whole network. In contrast to the results of replacing the backbone network, although the performance of re-identification is better than that of the basic ResNet50 due to the insertion of the SE module in the structure, the SE-ResNet50 alone is worse than that of the basic ResNet50, but the SE module is used as the backbone network, because the SE module can improve the discrimination as the global feature expression, but the local feature expression is not improved, so that the effect of the final combined feature is not improved, and the effect of the final combined feature is worse.

Table 3 model simplified test of GPN on Gallery1 and replacement of backbone network

In sum, the performance exhibited by GPNs is excellent and desirable. In comparison with networks such as MiddleNet, PCB and MGN using similar structures and similar strategies, GPN not only leads the gallery sets with sufficient number of pictures, but also achieves the best effect on the gallery sets with less number. The performance comparison result expressed on the model simplification test also illustrates the necessity of each module in the GPN, and shows that the GPN can organically combine the modules and the structure, thereby better improving the recognition effect and the retrieval ordering capability. All of the above is the advantage and excellence of GPN.

S104, local feature extraction of the first cow face recognition network is improved, and a second cow face recognition network is obtained.

And S105, training the second cow face weight recognition network by using the cow face data set to obtain a third cow face weight recognition network.

In this embodiment, on the basis of the proposed GPN for the first cow face re-identification Network, the Part branch is improved, and the original strategy of obtaining a local region by uniformly dividing is replaced with a method of extracting a local region by using an STN module of an attention system, so that the improved Network is called Global and Part Network with Spatial Transform, abbreviated as GPN-ST, and is used as the GPN-ST for the second cow face re-identification Network.

Regarding the second cow face re-identification network GPN-ST, although the method of uniformly dividing the partial portions of the blocks with uniform size used in the Part branch of the GPN is simple and effective, and has good effect in the result of experimental tests, the method also has the disadvantages that the blocks between different pictures may not be well aligned and the region with complete partial information may be split, so that it is proposed to improve the network GPN-ST of the GPN, which is expected to extract the information amount and discrimination Part on the cow face, and the structure and flow of which are shown in fig. 3, the GPN-ST still keeps the same structure as the GPN on the backbone network, Global branch and Middle branch, but is improved on the Part branch for extracting the partial region, and 6 horizontal bars obtained from the original uniform division on the Part branch are improved to extract the region in the picture by using 4 STN modules, it is desirable to extract a part having a high degree of discrimination on the face of a cow. On the improved Part branch, firstly, the feature maps generated after passing through the whole backbone network ResNet50 extract significant information areas from 4 STNs respectively to form 4 new feature maps, then each new feature map generates two 2048-dimensional feature vectors through global flattening pooling and global local pooling respectively, the two vectors are superposed to generate a new 2048-dimensional vector, 4 classifiers Classifier _ Part with a three-layer structure are also arranged, the 2048-dimensional vector generated by each STN branch is sent into respective classifiers, the feature vector reduced to 512-dimensional is obtained through a first linear layer and BN, and a classification prediction score is obtained through a last linear layer. During training, performing loss calculation on classification prediction scores obtained by the 4 STN modules of the Part branch by using Softmax cross entropy loss after label smoothing; during testing, similar to the original GPN, the total 6 512-dimensional feature vectors of the three branches of Global, Middle and Part are spliced to obtain 3072-dimensional long vectors which are finally used as comparison feature expressions.

As shown in fig. 4, the STN module structure of this embodiment can be divided into three parts, namely, a local Network (local Network), a Grid generator (Grid generator) and a Sampler (Sampler).

The local network is used for regressing the transformation parameter theta, and outputs the space variation parameter after inputting the characteristic diagram and through a series of full connections or convolutions and adding a regression layer. For this purpose, a local network is designed as shown in fig. 5, first of all by means of two successive convolutional layers and global average pooling layers, and then regressions are made by means of two fully-connected layers, in particular, in this local network each convolutional layer and fully-connected layer is followed by a BN operation, except for the last linear layer, in order to keep the normalization of the data in the process.

The grid generator is a sampling network constructed according to the transformation parameter theta, and obtains output points after sampling and changing the points in the input image, namely the points of the input characteristic diagram

And output points of the feature map

The correspondence relationship of (a) is shown in the following formula (2):

wherein A is_θThat is, the rightmost 3 × 2 transformation matrix is the transformation parameters for the 2D affine transformation generated by the local network;

the sampler generates an output characteristic diagram by using the sampling grid and the input characteristic diagram as input at the same time, and the grid generator and the sampler can directly call the grid generation and sampling functions in Python to complete the sampling.

In the STN block used in GPN-ST, the transformation parameter matrix generated by the local network is subjected to several restrictions, first of all, a 3 × 2D affine transformation matrix of the following matrix 1.3 is used for its formal restriction:

fixing the original transformation matrix parameter theta₁₂And theta₂₁The value of (1) is 0, namely the matrix only performs scaling and translation conversion, and the spatial information of the extraction area is ensured to be consistent with the original highlight. The parameters sx and sy in the matrix are scaling parameters, and their values respectively represent the ratio of the length and width of the transformed region on the original image to the length and width of the original image. The tx and ty parameters are translation parameters, and their values respectively represent the coordinate positions of the center point of the transformed region on the original image. This is because in the 2D affine transformation, the original image is regarded as a coordinate system, the center point is the origin (0,0), the right and downward directions are positive directions, and the image boundary is ± 1. Therefore, when the matrix is used for transformation, the length and width of the original image are firstly reduced to 1/sx and 1/sy times, and the central point of the area is still (0, 0); then, the central point of the area is moved to the position of (tx, ty) by taking the original image as a coordinate system; and finally, taking the center point of the transformed region as the center point of the output image, and extracting a new output image with the length and width of the original image.

Then constraints are applied to the parameters in the matrix, here in the form of losses to correct and limit during the learning process. The first is the range limitation on the scaling transform parameters sx and sy, as shown in equation (4) below:

L_SS＝(max(|sx|-α,0))²+(max(|sx|-α,0))² (4)

where α is a limiting parameter, set to 0.5 in this experiment, i.e., the values of the parameters sx and sy are limited in the interval [ -0.5,0.5] by this equation. The second is a positive value constraint on the scaling transform parameters sx and sy, as shown in equation (5) below:

L_SP＝max(0,β-sx)+max(0,β-sy) (5)

wherein β is a limiting parameter, which is set to 0.1 in the experiment, i.e. the values of the limiting parameters sx and sy cannot be smaller than 0.1, and the limiting parameter can only be a positive value, so as to prevent the picture from being reversed after scaling transformation. The third is the range limitation on the translation transformation parameters tx and ty, as shown in equation (6) below:

L_TS＝(max(|tx|-γ,0))²+(max(|ty|-γ,0))² (6)

where γ is a limiting parameter and is set in this experiment

This is because the transformed region may be partially outside the original picture, which will have an effect on the subsequent recognition, and therefore, the center point of the translated region is limited to be still within a certain range in the coordinate system of the original picture, which can reduce the occurrence of this situation. The fourth is the mutually exclusive constraint on the translation transformation parameters tx and ty, as shown in equation (7) below:

where δ is a limiting parameter, which is set to 0.4 in this experiment, and N is the number of STN module branches used, so the equation is to limit the quadratic power of the distance between the central points of the regions extracted by each STN to be not less than δ, so that mutual exclusion suppression can be generated between each region, and each STN module can extract different parts in the image.

Finally, these multiple loss equations are combined to obtain the overall constraint loss for the STN generated parameter matrix, as shown in equation (8) below:

L_STN＝L_SS+λ₁L_SP+L_TS+λ₂L_TE (8)

wherein λ is₁And λ₂Are set to 0.1 in the present embodiment.

Therefore, based on the last global loss in the GPN, the last global loss of the modified GPN-ST is shown by equation (9) below:

L＝L_T/4+L_ID/6+L_STN (9)

wherein L is_TAnd L_IDThe overall triple loss and the classification loss are respectively, and since the triple loss is not used in the improved Part branch, the divisor of the average triple loss is 4; similarly, only 4 STN blocks are used in the Part branch, so the divisor for the average classification penalty is 6.

And during training of the cow face data set, performing loss calculation on classification prediction scores obtained by the 4 STN modules of the Part branch by using Softmax cross entropy loss after tag smoothing. During testing, similar to the original GPN, the total 6 512-dimensional feature vectors of the three branches of Global, Middle and Part are spliced to obtain 3072-dimensional long vectors which are finally used as the feature expression for comparison, and a trained improved network capable of re-identifying the cow face of the cow is obtained.

Table 4 shows the Rank-1 accuracy and mAP results of GPN-ST experimental tests on a gallery set of 8 different picture numbers,

galery

1, 2, 3, 4, 5, 10, 15, 20, etc. From the table, it can be seen that the recognition performance of GPN-ST is also gradually increased with the increase of the number of pictures of the gallery set, the search accuracy, i.e., the performance of the maps, can be maintained at a variation range of ± 1%, and the balance between the number of pictures and the recognition performance is also reached when the number of pictures of the gallery set is increased to 4 and 5, which is consistent with the analysis results in the previous experiment.

The results of the GPN test on 8 atlas sets shown in the comparison table, first on the gallery1 atlas, GPN-ST achieved 90.0% and 91.3% respectively on Rank-1 accuracy and mAP, which were 2.8% and 2.2% higher than GPN; in the expression on a plurality of gallery sets with the number of pictures gradually increased later, the performance of the GPN-ST is always better than that of the GPN for the same gallery set. It can therefore be concluded that the performance of this improved GPN-ST is improved over the original GPN.

TABLE 4 Rank-1 accuracy and mAP on Galley1 to 20 for GPN-ST

The reason for the performance improvement can be seen from the visualization effect of the extraction area of the GPN-ST, as shown in fig. 6. For the inputted cow face picture, the branches of the 4 STN modules respectively extract the local regions of the left face, the right face, the left ear and the forehead, and the right ear and the forehead of the cow in sequence, and the local regions are all parts rich in information and having discrimination for the subsequent recognition task. And for different cattle face pictures, the same cattle face part is extracted by the same STN module, namely the alignment problem of local areas is solved. The method for obtaining the blocks with uniform size by comparing uniform segmentation solves the problem of local region alignment and ensures the integrity of region partial information, and the mechanism for selecting the candidate region according to the method of comparing NTS-Net also solves the problem of excessive overlapping parts of the alignment problem and the extraction region, which are the successful parts of the Part branch of GPN-ST.

The reason for good identification of GPNs is to integrate the advantages of each network, combine local features with medium-and high-dimensional global features, and optimize the network using global averaging, maximum pooling, and classification loss and triplet loss. The analysis can be seen from the comparison of the results of the model simplification experiments in table 4, that on the two performance indexes of Rank-1 accuracy and mAP, the network using only Global average pooling is respectively reduced by 2.8% and 2.5%, the network using only Global maximum pooling is respectively reduced by 3.4% and 3.1%, and the network using only Global branch and Part branch is respectively reduced by 1.1% and 1.0%.

The method of uniform segmentation used in the Part branch of GPN to obtain local parts of uniform-sized blocks is simple and effective and works well in experimental test results, but it also has the disadvantage that the blocks between different pictures may not be well aligned and the region with complete local information may be split. While GPN-ST inherits the advantages of GPN, the improvement of the Part branch thus achieves the performance improvement seen in the comparison in the previous section. From the visualization effect of the extracted region of the GPN-ST of fig. 5, it can be seen that: for the inputted cow face picture, the branches of the 4 STN modules of the GPN-ST sequentially extract the local regions of the left face, the right face, the left ear and the forehead, and the right ear and the forehead of the cow respectively, and for the subsequent identification task, the local regions are all parts rich in information and have discrimination; for different cattle face pictures, the same cattle face part is extracted by the same STN module, namely the alignment problem of local areas is solved; the method for obtaining the uniform-size blocks by comparing uniform segmentation solves the problem of local area alignment and ensures the integrity of the area part information, and meanwhile, the method does not need to provide additional supervision information for training. These are all the successes of GPN-ST to improve the Part branch, identifying the reasons for further performance improvement.

Comprehensively, the basic GPN can obtain good performance, the improved GPN-ST can obtain better performance improvement effect, the improved Part branch can also extract better local area information, and the main reason for the success of the improvement is that the network obtained after the GPN-ST is trained is used as a third cow face re-identification network.

And S106, acquiring a cow face image of the cow to be recognized.

And S107, inputting the cow face image of the cow to be identified into a third cow face re-identification network to realize re-identification of the cow face image of the cow to be identified.

Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.

It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Example 2:

as shown in fig. 7, the present embodiment provides a deep learning-based cow face re-identification system, which includes a first obtaining unit 701, a building unit 702, an improving unit 703, a training unit 704, a second obtaining unit 705, and a re-identification unit 706, where the specific functions of the units are as follows:

a first acquiring unit 701, configured to acquire a cow face data set;

a constructing unit 702, configured to construct a first cow face recognition network in which global features and local features are combined;

an improving unit 703, configured to improve local feature extraction of a first cow face recognition network to obtain a second cow face recognition network;

a training unit 704, configured to train the second cow face weight recognition network by using the cow face data set, so as to obtain a third cow face weight recognition network;

a second obtaining unit 705, configured to obtain a cow face image of a cow to be identified;

and the re-identification unit 706 is used for inputting the cow face image of the cow to be identified into a third cow face re-identification network to realize re-identification of the cow face image of the cow to be identified.

The specific implementation of each unit in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the system provided in this embodiment is only illustrated by the division of the functional units, and in practical applications, the above function distribution may be completed by different functional units according to needs, that is, the internal structure is divided into different functional units to complete all or part of the functions described above.

Example 3:

the present embodiment provides a computer apparatus, which is a computer, as shown in fig. 8, and includes a processor 802, a memory, an input device 803, a display 804 and a network interface 805 connected by a system bus 801, the processor is used for providing calculation and control capability, the memory includes a nonvolatile storage medium 806 and an internal memory 807, the nonvolatile storage medium 806 stores an operating system, computer programs and a database, the internal memory 807 provides an environment for the operation of the operating system and the computer programs in the nonvolatile storage medium, and when the processor 802 executes the computer programs stored in the memory, the cow face re-identification method of the above embodiment 1 is implemented, as follows:

acquiring a cow face data set;

acquiring a cow face image of a cow to be identified;

Example 4:

the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the method for identifying the cow face weight of the cow in embodiment 1 is implemented as follows:

acquiring a cow face data set;

acquiring a cow face image of a cow to be identified;

It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In conclusion, the invention provides the cow face weight recognition network, which not only combines high-dimensional global features and local features, but also extracts intermediate-dimensional features to enhance the discrimination degree of final combined feature expression; in addition, global average pooling and global maximum pooling are simultaneously used for the feature maps generated by network extraction, and the two pooled feature maps are superposed to enhance the expression of the network; meanwhile, classification loss and triple loss are used for optimizing the network, the operation of label smoothing is used for the classification loss, the network identification capacity can be further improved, the triple loss is used for optimizing the intra-class distribution and the inter-class difference, the identification accuracy and the retrieval and sequencing capacity of the cow face are improved, the local feature extraction branches of the network are improved while the characteristics and the advantages of the GPN are inherited, the mode of extracting the local features by using the STN module replaces the original method of uniformly dividing the blocks, and the problems of alignment and information integrity of the local regions are solved while the local regions with higher discrimination are extracted.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and all equivalent implementations or modifications, including diagrams, formulas, preset thresholds, etc., using ac/dc correlation and temporal structure modal parameter identification, which do not depart from the present invention, belong to the scope of the present invention.

Claims

1. A cow face weight recognition method based on deep learning is characterized by comprising the following steps:

acquiring a cow face data set;

acquiring a cow face image of a cow to be identified;

2. The cow face recognition method according to claim 1, wherein the first cow face recognition network uses ResNet50 as a main network and comprises three branches, namely a Global branch, a Middle branch and a Part branch;

3. The method for identifying the cow face weight according to claim 2, wherein the local feature extraction of the first cow face weight identification network is improved to obtain a second cow face weight identification network, and specifically comprises:

4. The method for identifying the cow face weight according to claim 3, wherein the training of the second cow face weight identification network by using the cow face data set specifically comprises:

5. The cow face recognition method according to claim 4, wherein the optimization of the network parameters is performed by using the combination of the losses generated by the Global branch, the Middle branch and the Part branch, and the calculation method is as follows:

L＝L_T/6+L_ID/8

where 6 means that the three branches add up to 6 triplet penalties, i.e. average triplet penalties are used; 8 denotes that the collection is 8 classification losses, i.e. using the average classification loss, L_RShowing that the 2 triplet losses in each branch add up to give a total loss, L_IDIndicating that 1 in Global branch and Middle branch, and 6 in Part branch, are also added together to get the total penalty.

6. The cow face recognition method according to claim 3, wherein the grid generator obtains output points, which are points of the input feature map, from the points in the input image after sampling change

And output points of the feature map

The corresponding relationship of (A) is as follows:

7. The cow face recognition method according to claim 3, wherein the transformation parameter matrix generated by the local network is subjected to a plurality of restrictions, specifically including:

constraining each parameter in the transformation parameter matrix as follows:

the range of scaling parameters sx and sy is limited as follows:

L_SS＝(max(|sx|-α，0))²+(max(|sx|-α，0))²

L_SP＝max(0，β-sx)+max(0，β-sy)

L_TS＝(max(|tx|-γ，0))²+(max(|ty|-γ，0))²

8. A milk cow face weight recognition system based on deep learning, the system comprising:

9. A computer device comprising a processor and a memory for storing processor-executable programs, wherein the processor, when executing the programs stored in the memory, implements the cow face weight identification method according to any one of claims 1 to 7.

10. A storage medium storing a program, wherein the program, when executed by a processor, implements the cow face weight recognition method according to any one of claims 1 to 7.