CN116311357A - Double-sided identification method for unbalanced bovine body data based on MBN-transducer model - Google Patents

Double-sided identification method for unbalanced bovine body data based on MBN-transducer model Download PDF

Info

Publication number
CN116311357A
CN116311357A CN202310117831.6A CN202310117831A CN116311357A CN 116311357 A CN116311357 A CN 116311357A CN 202310117831 A CN202310117831 A CN 202310117831A CN 116311357 A CN116311357 A CN 116311357A
Authority
CN
China
Prior art keywords
sampler
image
balance
cow
bovine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310117831.6A
Other languages
Chinese (zh)
Inventor
沈雷
徐运涛
刘浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yicai Tiancheng Zhengzhou Information Technology Co ltd
Original Assignee
Yicai Tiancheng Zhengzhou Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yicai Tiancheng Zhengzhou Information Technology Co ltd filed Critical Yicai Tiancheng Zhengzhou Information Technology Co ltd
Priority to CN202310117831.6A priority Critical patent/CN116311357A/en
Publication of CN116311357A publication Critical patent/CN116311357A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/70Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in livestock or poultry

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a bilateral identification method of unbalanced bovine body data based on an MBN-transducer model. According to the invention, the sampling rate of few types of cattle is increased by adding the balance sampler, the overfitting problem possibly caused by the balance sampler is reduced by dynamically fusing the mixed enhancement module, and meanwhile, vision Transformer is selected as a shared backbone network structure, so that the correlation of global information of cattle images is enhanced; and designing a balance branch and a conventional branch module by using a transducer encoder, respectively processing the image mixed enhancement data of the cowshed random sampler and the image mixed enhancement data of the milking channel balance sampler, fusing the output characteristics of the two branches by using a characteristic dynamic balance factor, and mining the relevance and global information of the cow body data of the dynamic fusion mixed enhancement module by using a multi-head self-attention mechanism of the transducer encoder. The invention solves the problem of poor identification rate of the single image of the cow gesture in the milking passage, and has better focusing capability for key features of the cow.

Description

Double-sided identification method for unbalanced bovine body data based on MBN-transducer model
Technical Field
The invention relates to the field of biological recognition, and provides a bilateral recognition method for unbalanced bovine body data based on an MBN-transducer model. In particular to a biological recognition technology based on deep learning in a cow body data unbalance application scene.
Background
With the improvement of the production level of our country, the individual workshop cultivation mode gradually disappears, and instead of the large-scale and fine cultivation mode, the rapid and efficient individual cow identification technology becomes a key factor for improving the cultivation scale, and the traditional management mode adopts the modes of ear tags, labeling, wireless radio frequency detection technology (Radio frequency identification, RFID) and the like, however, the ear tags and the labeling are easy to forge and possibly bring damage to the cow. With the application and development of deep learning in biological recognition technology, the non-contact recognition mode is not easy to cause physical injury to cattle, and can be rapidly used in actual scenes. The deep learning technology does not need any artificially produced features, can directly learn feature information from data, and extracts features more robust to gesture or background changes and illumination changes. Most of the methods are based on the convolutional neural network for cow body identification, the convolutional neural network focuses on local features more, global features are not overall, the cow body identification performance is poor through the convolutional neural network, training identification is carried out on the basis of balanced cow data, the effect is not ideal under the condition of unbalanced samples, and in the actual scene of acquisition, the acquired cow data are usually unbalanced. The data set is collected in an actual scene, and mainly is collected through the cowshed and the cameras in the milking passage, the cattle in the cowshed can only move freely, and the cattle in the passage are limited in movement, so that the collected image postures of the cattle in the cowshed are diversified and more in number, the postures of the cattle in the milking passage are single and less in number, the cattle in the cowshed are often biased in the training process, and the accuracy of the cattle in the milking passage is poor.
Disclosure of Invention
The invention provides an unbalanced bovine body data bilateral identification algorithm based on MBN-transducer for the problem that training identification effect caused by unbalance of bovine body data collected in an actual application scene is not ideal. The method increases the sampling rate of few types of cattle by increasing the balance sampler, reduces the overfitting problem possibly caused by the balance sampler by dynamically fusing the mixed enhancement module, adopts a visual transducer (Vision Transformer, viT) as a main network, enhances the association of global information of the cattle body image, adopts a transducer encoder to design a balance branch and a conventional branch module, respectively processes the mixed enhancement data of the cattle shed random sampler and the mixed enhancement data of the milking channel balance sampler, fuses the output characteristics of the two branches through characteristic dynamic balance factors, changes the fusion characteristics along with the training times, and solves the problem that the identification rate performance of the single image of the cattle body in the milking channel is poor.
The technical scheme adopted by the invention is as follows:
s1, constructing a data set, and dividing the data set into a training set and a testing set.
The data set adopted by the invention is a cattle body image obtained through a cattle group target detection and segmentation network. And expanding the divided training set by utilizing translation, rotation and scaling operations to meet different postures of the cow body so as to obtain a model with stronger generalization.
S1-1, acquiring image data of a plurality of bovine-like animals through a bovine group target detection and segmentation network.
S1-2, expanding the training set by utilizing translation, rotation and scaling operations. And normalize the image data of the extended training set to the same resolution size.
S2, designing an MBN-transducer model.
S2-1, selecting Vision Transformer as a shared backbone network structure to increase extraction of global information of the cow body, designing a conventional branch and a balance branch by utilizing a Transformer encoder to respectively process image mixing enhancement data of a cowshed random sampler and image mixing enhancement data of a milking channel balance sampler, adding input features of the two branches through dynamic balance factors, enabling final fusion features to tend to cow body data in a milking channel in a training process, and solving the problem of poor recognition rate of single-pose images of the cow in the milking channel. The random samplers and the balance samplers are designed to control different types of input images, meanwhile, the overfitting phenomenon caused by single data of the gesture in the oversampling milking passage is relieved by means of the dynamic fusion mixing enhancement module, and in addition, the relevance and the global information of the cow body data of the dynamic fusion mixing enhancement module are mined by utilizing a multi-head self-attention mechanism of the transform encoder.
The MBN-transducer model comprises a sampler module, a dynamic fusion image mixing enhancement module, a shared backbone network and two branch sub-networks, wherein the sampler module comprises a random sampler and a balance sampler, the random sampler is used for randomly sampling all samples of a training set, but the number of cow body images in a cowshed in the training set is more, so that the model is more prone to cow body images in the cowshed, and the output image is X r The method comprises the steps of carrying out a first treatment on the surface of the The balance sampler is used for oversampling cow body data of a milking channel, the balance sampler is more prone to cow body images in the milking channel, and the output image is X p . The two branch subnetworks include a normal branch and a balanced branch. The dynamic fusion mixed image enhancement module is used for preventing over-sampling of cow body images with single gestures in a milking passage by dynamically fusing output images of a random sampler and a balance sampler along with a training process, and the output of the cow body images is X r ' and X p '. The shared backbone network adopts Vision Transformer, namely ViT model; the ViT model consists of image blocks and several layers of transform encoders. The image segmentation divides the image into a plurality of identical segments, and the position codes representing the position information of the cow body and global features capable of learning the cow body information are added after the image segmentation to obtain an input sequence. Inputting the input sequence into the transducer encoder and outputting the characteristic F r And F p . Two features F r And F p To the normal branch and the balanced branch, respectively, which both use a Transformer encoder, so as to pay better attention to global features, the output features of the normal branch and the balanced branch are F r ' and F p ' two output features F r ' and F p ' fusion is performed, and the specific formula is shown as follows.
l=1-((n-1)/T) (1)
F=l×F′ r +(1-l)×F p ′ (2)
Where n is the current training number of rounds, T is the total training number of rounds, l is the degree of fusion of the two output features, and F is the final output feature, and fed into the classifier. By controlling the fusion of the two input features with the number of training rounds, more inputs tend to train the random samplers in the early stages of training, and as the number of training rounds increases, focus will be placed on the inputs of the balanced samplers, thereby increasing the focus on the cow in the milking tunnel. And finally, the cattle body recognition is carried out by calculating the Euclidean distance between the cattle body features.
S2-2, designing a sampler module:
the traditional sampler adopts a random sampler, samples are carried out according to the number of samples of each type, and the j-th bovine sampling probability calculation formula of the random sampler is shown as formula (3).
Figure SMS_1
Wherein k is the total number of categories of the training set, S i Is the number of bovine body images of the ith bovine class in the training set,
Figure SMS_2
is the sampling probability of the j-th bovine class of the random sampler. The formula can show that the sampling probability is higher for the cow body images with more samples in the cowshed; and the sampling probability is smaller for cow body images with a smaller number of samples in the milking passage. Under sample imbalance conditions, there is a possibility of resulting in an under-fitting of the cow body image in the milking tunnel.
The invention also adopts a balance sampler, and the sampling probability calculation formula of the jth cow of the balance sampler is shown in the following formula.
Figure SMS_3
Figure SMS_4
Where k is the total number of categories of the training set, N max Is the maximum sample number, N, of all cattle in the training set i Is the number of cow body images of the ith cow class, W i Is the i-th bovine class sampling degree,
Figure SMS_5
is the sampling probability of the j-th bovine class of the balanced sampler. It can be seen from equations (4) and (5) that when the number of samples is small, the cows in the milking tunnel receive more attention, and the number of samples is large, the number of samples in the cowshed decreases the attention.
According to the cow body image data acquisition method based on the random sampler, the random sampler and the balance sampler are adopted to train the conventional branches and the balance branches respectively, the random sampler acquires cow body image data in a cow house more, and the balance sampler acquires cow body image data in a milking passage more, so that cow body image data of two different acquisition modes in the cow house can be considered.
S2-3, designing a dynamic fusion mixed image enhancement module:
the milk channel has less cow body image data, and although the sampling probability of the cow body image in the milk channel can be increased through the balance sampler, the cow body image in the milk channel is easy to oversample. The hybrid image enhancement generalizes a network model through a synthetic sample, but is not restored to real image data, so that convergence is easy to be impossible in the training process, and new problems can be brought while the over-fitting problem is solved. Therefore, the invention provides a dynamic fusion mixed image enhancement module, which reflects the image mixing degree through dynamic fusion parameters and reduces the overfitting of the cow body image data in the milking passage. The specific formula is shown as follows.
λ=1-((T+n)/2×T) (6)
(X′ r ,Y′ r )=λ(X r ,Y r )+(1-λ)(X p ,Y p ) (7)
(X′ p ,Y′ p )=(1-λ)(X r ,Y r )+λ(X p ,Y p ) (8)
S3, designing a loss function.
In the network training, in order to enhance the expressive ability of modeling the invention, a label smooth cross entropy loss optimization network is adopted, and a specific calculation formula is shown as follows.
L=l*L smoo (y p_true ,y pred )+(1-l)*L smoo (y r_true ,y pred ) (9)
L smoo (y p_true ,y pred )=∑[(1-ε)*y p_true +ε/N]ln y pred (10)
L smoo (y r_true ,y pred )=∑[(1-ε)*y r_true +ε/N]ln y pred (11)
Wherein epsilon is a random noise coefficient, and is set to 0.1 in the chapter algorithm; l is changed along with training batch, and the degree of change adopts a mode of formula (1); n is the total number of bovine classes of the bovine training set; y is r_true The real label is obtained by the random sampler after the dynamic fusion of the mixed image enhancement; y is p_true The real label is obtained after the balance sampler is enhanced by dynamically fusing the mixed image; y is pred Is the prediction category output by the model; l representsTotal loss, L smoo Is the label smoothing cross entropy loss. From the equation, it can be seen that the loss function is more prone to balance the real label of the sampler after the dynamic fusion mixed image enhancement along with the l parameter when optimizing the network.
S4, training the whole model, as shown in FIG. 1, until the whole training set is iteratively trained for a plurality of times;
s5, inputting the test set image into a trained model to extract the characteristics of the cow image, and adopting cosine distance to carry out 1: and 1, comparing and identifying.
The invention has the following beneficial effects:
the bilateral recognition algorithm based on the unbalanced cow body data of the Transformer is provided, the overfitting phenomenon caused by the balance sampler is reduced through a dynamic fusion mixing enhancement module, meanwhile, the conventional branch and the balance branch are designed by utilizing the Transformer encoder to respectively process the image mixing enhancement data of the cowshed random sampler and the image mixing enhancement data of the milking channel balance sampler, and the input characteristics of the two branches are added through dynamic balance factors, so that the final characteristics tend to cow body data in the milking channel in the training process, and the unbalanced problem of cow body images is solved.
In the past identification models, such as LATRANSFOR, some key characteristics of the cow are paid attention to, but the key characteristics of the cow are not paid attention to; the BBN algorithm focuses on more key features than LATransformer, but still focuses on some noise; the present algorithm (MBN-transducer) not only reduces the noise concerns, but also focuses more clearly on the key features of the cow, so that the present algorithm has better focusing capabilities on the key features of the cow.
Drawings
FIG. 1 is a model diagram of a bilateral recognition algorithm based on MBN-transducer cow body imbalance data.
FIG. 2 is a view showing actual scenes of camera placement positions of cowhouses and milking channels
Fig. 3 shows different acquisition modes of bovine data sets.
Fig. 4 is a visual presentation of different algorithms.
Fig. 5 is a ROC curve for different algorithms in the test set.
Detailed Description
Specific embodiments of the present invention are described further below with reference to the accompanying drawings.
The unbalanced bovine body data bilateral identification algorithm based on MBN-transducer of the embodiment comprises the following steps:
s1, constructing a data set
S1-1, the text data comprise two acquisition modes, wherein one acquisition mode is to shoot and acquire through a camera positioned in a cowshed, place the camera at the top of the cowshed, and install the camera according to a proper distance so as to shoot more cattle and avoid repeated coverage of the same area; the other is shooting and collecting through the cameras positioned in the milking passage, the cameras are placed above the milking passage, and three cameras are placed at the same position so as to shoot the cow from the left side, the right side and the right side. Figure 2 shows the placement of the cowshed and milking cameras. The images of the cow's body taken by the cowshed and milking tunnel are shown in fig. 3.
Cattle in the cowshed can move freely only in the range of the cowshed, and each cattle can acquire different angles and postures; the cows in the milking passage can only walk in one direction and cannot turn freely, and each cow can only acquire a small number of angles and postures. The cattle body image data set consisting of 389 cattle in total is acquired in 2 cattle farms, cattle body images are acquired through cameras arranged in the cattle farms and the milking passages, and under the 1 st condition, the cattle can only walk freely, so that the difference of the cattle body image postures under the condition is large, 60 cattle are acquired in total, and the number of images of each cattle is between 40 and 80; in addition, 1 condition is that the cattle can be tethered in the cowshed by ropes, the moving range of the cattle is small, so that the variation difference of the image postures of the cattle body is small, 249 cattle are collected, and the number of images of each cattle is 20-50. The cattle in the milking passage quickly pass through, the gestures are single, 80 cattle are collected in total, the number of images of each cattle is between 20 and 40, but the effective images are fewer, and more images with the same gestures are obtained. 389 calves number was about 17000.
The 25 types in the cowshed and the first 25 types in the milking passage are taken together as test sets, and the total number of the test sets is about 1800, wherein the total number of the test sets is 50. The rest images are used as a training set, the data set is expanded through an image enhancement mode, the number of each type of cow body images in the cowshed after expansion is between 500 and 1000, but the cow body images in the milking passage are single in gesture and not ideal in image enhancement effect, so that the cow body images in the milking passage are enhanced to 100, the types of the final training set are 339, and the total number of the cow body images is about 130000.
S2, designing an unbalanced bovine body data bilateral identification algorithm based on MBN-transducer;
s2-1 sampler Module design
According to the invention, the random sampler and the balance sampler are adopted simultaneously to train the traditional network branch and the balance network branch respectively, the random sampler collects more cow body image data in the cowshed, and the balance sampler collects more cow body image data in the milking passage, so that the cow body image data of two different collection modes in the cowshed can be considered. The method comprises the steps of increasing the sampling rate of few types of cattle through increasing a balance sampler, reducing the overfitting problem possibly caused by the balance sampler through a dynamic fusion mixing enhancement module, adopting a visual transducer (Vision Transformer, viT) as a main network, enhancing the correlation of global information of cattle body images, adopting a transducer encoder to design a balance branch and a conventional branch module, respectively processing the image mixing enhancement data of the random cowshed sampler and the image mixing enhancement data of the milking channel balance sampler, fusing the output characteristics of the two branches through characteristic dynamic balance factors, changing the fusion characteristics along with the training times, and solving the problem that the identification rate performance of single cattle posture in the milking channel is poor.
S2-2 dynamic fusion hybrid image enhancement
Because the cow body image data in the milking passage is less, the sampling probability of the cow body image in the milking passage can be increased through the balance sampler, but the cow body image in the milking passage is easy to oversample. The hybrid image enhancement generalizes a network model through a synthetic sample, but is not restored to real image data, so that convergence is easy to be impossible in the training process, and new problems can be brought while the over-fitting problem is solved. Therefore, the algorithm provides a dynamic fusion mixed image enhancement module, the image mixing degree is reflected through dynamic fusion parameters, and the overfitting of the cow body image data in the milking passage is reduced. The specific formula is shown as follows.
λ=1-((T+n)/2×T) (6)
(X′ r ,Y′ r )=λ(X r ,Y r )+(1-λ)(X p ,Y p ) (7)
(X′ p ,Y′ p )=(1-λ)(X r ,Y r )+λ(X p ,Y p ) (8)
Wherein X is r And Y r Output image and label, X, of random sampler p And Y p Is the output image and label of the balanced sampler, X' r 、Y′ r And X' p 、Y′ p Is the image, label, where lambda is the degree of mixing of the two images, n is the current training batch, and T is the total number of training rounds. The mixing degree of the two samplers is controlled by using the parameter lambda, so that the overfitting phenomenon of the cow body images in the milking passage is avoided. And gradually reducing the mixing degree in the training process so as to finally restore to the real cow body image.
S3, design of loss function
In the invention, a label smooth cross entropy loss mode is adopted to guide network training.
The random noise figure is set to 0.1.
S4, training the whole model, as shown in FIG. 1. The whole training set is iterated for several times until the total model loss is reduced to about 0.01.
S5, inputting the test set image into the trained model to extract the surface features of the cow image and carrying out recognition comparison.
The invention adopts cosine distance as the standard for measuring the similarity of the surface features of the cattle. The cosine distance calculation formula is as follows:
Figure SMS_6
the larger cosine distance indicates that the similarity of the two bovine surface features is higher, whereas the similarity of the two bovine surface features extracted by the model is lower. And carrying out 1:1 different analogy on the characteristics extracted from the test cattle image after 6, 4 and 2 equally dividing and combined normalization to obtain a model comparison threshold T. And comparing the images in the test set in a similar way, and when the comparison value of the combined features extracted in the similar way is larger than T, judging that the comparison is successful. Otherwise, the comparison is considered to be failed.
The GPU server used in the experiment is NVIDIA TITAN RTX 3090, the deep learning framework used in training is Pytorch, the training batch size is 64, the iteration number is 50, the initial learning rate is 0.0024, and the learning rate correspondingly drops by 0.1 when the iteration is performed for 10,30 and 50 times. The whole model adopts Adam as an optimizer.
The following is based on the use of a Score-weighted class activation map (Score-Weighted Class Activation Mapping, score-CAM) for visual analysis with ResNet-50, viT, BBN, and LATRANSFOR.
In fig. 4, cow a is a cow photographed in a cowshed, and images of various different postures of cow a can be acquired in the cowshed, and only two postures of cow a are visualized; cattle b were photographed in a milking tunnel with only a few postures, and again only two postures of cattle b were visualized herein. From table 3, it can be seen that the ResNet algorithm is more difficult to pay attention to key features of the cow, while the ViT algorithm begins to pay attention to key features of the cow, but does not find real cow features, and the LATransformer algorithm is already paying attention to some key features of the cow, but does not pay attention to all key features of the cow; the BBN algorithm focuses on more key features than LATransformer, but still focuses on some noise; the present algorithm (MBN-transducer) not only reduces the noise concerns, but also focuses more clearly on the key features of the cow, so that the present algorithm has better focusing capabilities on the key features of the cow.
To verify the performance of the algorithm model herein, the performance of the ResNet-50, viT, BBN, LATransformer and the algorithm herein (MBN-transducer) model were compared with the two test sets, respectively, all models were tested 3 times herein, and the model was selected based on the highest recognition rate of the test set, resulting in a ROC curve as shown in FIG. 5, in which the abscissa is the false recognition rate (False Acceptance Rate, FAR) and the ordinate is the rejection rate (False Rejection Rate, FRR).
Fig. 5 is a ROC curve for a test set showing rejection rates for the test set at different false recognition rates, with the right side being the result of the log-removal for the left side. The characteristics of the cattle body extracted by the algorithm are more robust, and as can be seen from FIG. 2, the algorithm (MBN-transducer) performs best in the test set at FAR of 0, and is reduced by 8.25%, 5.82%, 4.61% and 1.47% compared with ResNet-50, viT, BBN and LATRANSFOR, respectively.
As can be seen from Table 1, the algorithm (MBN-transducer) Top1 herein performs best, and can extract more stable bovine body characteristics from unbalanced bovine body data, which are respectively improved by 11.57%, 7.87%, 5.46% and 1.12% in the test set compared to ResNet-50, viT, BBN and LATRANSFOR, respectively.
The bilateral recognition algorithm based on the unbalanced cow body data of the Transformer is provided, the overfitting phenomenon caused by the balance sampler is reduced through a dynamic fusion mixing enhancement module, meanwhile, the conventional branch and the balance branch are designed by utilizing the Transformer encoder to respectively process the image mixing enhancement data of the cowshed random sampler and the image mixing enhancement data of the milking channel balance sampler, and the input characteristics of the two branches are added through dynamic balance factors, so that the final characteristics tend to cow body data in the milking channel in the training process, and the unbalanced problem of cow body images is solved.
The present invention is not limited to the above examples, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the invention, and the present invention shall be considered as the scope of the present invention.

Claims (5)

1. The bilateral identification method of unbalanced bovine body data based on the MBN-transducer model is characterized in that the method increases the sampling rate of fewer bovine body categories by adding a balance sampler, reduces the over-fitting problem possibly caused by the balance sampler by dynamically fusing a mixed enhancement module, and adopts a selection Vision Transformer as a shared backbone network structure to enhance the association of global information of bovine body images; and designing a balance branch and a conventional branch module by using a transducer encoder, respectively processing the image mixed enhancement data of the cowshed random sampler and the image mixed enhancement data of the milking channel balance sampler, fusing the output characteristics of the two branches by using a characteristic dynamic balance factor, and mining the relevance and global information of the cow body data of the dynamic fusion mixed enhancement module by using a multi-head self-attention mechanism of the transducer encoder.
2. The method for identifying the double sides of unbalanced bovine body data based on the MBN-transducer model according to claim 1, which is characterized by comprising the following steps:
s1, constructing a data set, and dividing the data set into a training set and a testing set;
s2, designing an MBN-transducer model;
s3, designing a loss function;
s4, training the whole model until the whole training set is iteratively trained for a plurality of times;
s5, inputting the test set image into a trained model to extract the characteristics of the cow image, and adopting cosine distance to carry out 1: and 1, comparing and identifying.
3. The method for identifying double sides of unbalanced bovine body data based on MBN-transducer model according to claim 1 or 2, wherein the adopted data set is a bovine body image obtained by a bovine body object detection and segmentation network:
s1-1, acquiring image data of a plurality of types of cattle through a cattle group target detection segmentation network;
s1-2, expanding a training set by utilizing translation, rotation and scaling operations; and normalize the image data of the extended training set to the same resolution size.
4. The method for identifying double sides of unbalanced bovine body data based on MBN-transducer model according to claim 1 or 2, wherein the step S2 is specifically implemented as follows:
s2-1 MBN-transducer model comprises a sampler module, a dynamic fusion image mixing enhancement module, a shared backbone network and two branch subnets, wherein the sampler module comprises a random sampler and a balance sampler, the random sampler is used for randomly sampling all samples of a training set, but the number of cow body images in a cowshed in the training set is more, so that the cow body images in the cowshed are more prone to the cow body images, and the output image is X r The method comprises the steps of carrying out a first treatment on the surface of the The balance sampler is used for oversampling cow body data of a milking channel, the balance sampler is more prone to cow body images in the milking channel, and the output image is X p The method comprises the steps of carrying out a first treatment on the surface of the The two branch subnetworks include a normal branch and a balancing branch; the dynamic fusion mixed image enhancement module is used for preventing over-sampling of cow body images with single gestures in a milking passage by dynamically fusing output images of a random sampler and a balance sampler along with a training process, and the output of the cow body images is X r ' and X p 'A'; the shared backbone network adopts Vision Transformer, namely ViT model; the ViT model comprises image blocks and a plurality of layers of Transformer encoders; the image segmentation divides the image into a plurality of identical segments, and position codes representing the position information of the cow body and global features capable of learning the cow body information are added after the image is segmented to obtain an input sequence; inputting the input sequence into the transducer encoder and outputting the characteristic F r And F p The method comprises the steps of carrying out a first treatment on the surface of the Two features F r And F p To the normal branch and the balanced branch, respectively, which both use a Transformer encoder, so as to pay better attention to global features, the output features of the normal branch and the balanced branch are F r ' and F p ' two output features are combinedF r ' and F p ' fusion is carried out, and the specific formula is shown as follows;
l=1-((n-1)/T) (1)
F=l×F r ′+(1-l)×F p ′ (2)
wherein n is the current training number, T is the total training number, l is the degree of fusion of the two output features, F is the final output feature, and the final output feature is transmitted to the classifier;
s2-2, designing a sampler module:
sampling is carried out by adopting a random sampler according to the number of samples of each type, and the sampling probability calculation formula of the j-th bovine of the random sampler is shown as (3);
Figure FDA0004079191130000021
wherein k is the total number of categories of the training set, S i Is the number of bovine body images of the ith bovine class in the training set,
Figure FDA0004079191130000022
the sampling probability of the jth bovine class of the random sampler;
the sampling probability calculation formula of the jth cow of the balance sampler is shown in the following formula;
Figure FDA0004079191130000023
Figure FDA0004079191130000024
where k is the total number of categories of the training set, N max Is the maximum sample number, N, of all cattle in the training set i Is the number of cow body images of the ith cow class, W i Is the i-th bovine class sampling degree,
Figure FDA0004079191130000031
the sampling probability of the jth bovine class of the balance sampler; as can be seen from formulas (4) and (5), when the number of samples is small, the cows in the milking passage receive more attention, and the number of samples is large, the number of samples in the cowshed decreases the attention;
the sampler module adopts a random sampler and a balance sampler to train the traditional normal branches and the balance branches respectively, the random sampler collects more cow body image data in a cowshed, the balance sampler collects more cow body image data in a milking channel, and the mode can give consideration to cow body image data of two different collection modes in the cowshed;
s2-3, designing a dynamic fusion mixed image enhancement module:
the dynamic fusion mixed image enhancement module is provided, the image mixing degree is reflected through dynamic fusion parameters, the overfitting of the cow body image data in the milking passage is reduced, and a specific formula is shown in the following formula;
λ=1-((T+n)/2×T) (6)
(X′ r ,Y′ r )=λ(X r ,Y r )+(1-λ)(X p ,Y p ) (7)
(X′ p ,Y′ p )=(1-λ)(X r ,Y r )+λ(X p ,Y p ) (8)
5. the method for identifying double sides of unbalanced bovine body data based on MBN-transducer model according to claim 3, wherein the step S3 is specifically implemented as follows:
adopting a label smooth cross entropy loss optimization network, wherein a specific calculation formula is shown in the following formula;
L=l*L smoo (y p_true ,y pred )+(1-l)*L smoo (y r_true ,y pred ) (9)
L smoo (y p_true ,y pred )=∑[(1-ε)*y p_true +ε/N]lny pred (10)
L smoo (y r_true ,y pred )=∑[(1-ε)*y r_true +ε/N]lny pred (11)
wherein epsilon is a random noise coefficient, and is set to 0.1 in the chapter algorithm; l is changed along with training batch, and the degree of change adopts a mode of formula (1); n is the total number of bovine classes of the bovine training set; y is r_true The real label is obtained by the random sampler after the dynamic fusion of the mixed image enhancement; y is p_true The real label is obtained after the balance sampler is enhanced by dynamically fusing the mixed image; y is pred Is the prediction category output by the model; l represents total loss, L smoo Is label smooth cross entropy loss; from the equation, it can be seen that the loss function is more prone to balance the real label of the sampler after the dynamic fusion mixed image enhancement along with the l parameter when optimizing the network.
CN202310117831.6A 2023-02-15 2023-02-15 Double-sided identification method for unbalanced bovine body data based on MBN-transducer model Pending CN116311357A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310117831.6A CN116311357A (en) 2023-02-15 2023-02-15 Double-sided identification method for unbalanced bovine body data based on MBN-transducer model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310117831.6A CN116311357A (en) 2023-02-15 2023-02-15 Double-sided identification method for unbalanced bovine body data based on MBN-transducer model

Publications (1)

Publication Number Publication Date
CN116311357A true CN116311357A (en) 2023-06-23

Family

ID=86826569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310117831.6A Pending CN116311357A (en) 2023-02-15 2023-02-15 Double-sided identification method for unbalanced bovine body data based on MBN-transducer model

Country Status (1)

Country Link
CN (1) CN116311357A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116741372A (en) * 2023-07-12 2023-09-12 东北大学 Auxiliary diagnosis system and device based on double-branch characterization consistency loss

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116741372A (en) * 2023-07-12 2023-09-12 东北大学 Auxiliary diagnosis system and device based on double-branch characterization consistency loss
CN116741372B (en) * 2023-07-12 2024-01-23 东北大学 Auxiliary diagnosis system and device based on double-branch characterization consistency loss

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN106778902B (en) Dairy cow individual identification method based on deep convolutional neural network
CN108921051B (en) Pedestrian attribute identification network and technology based on cyclic neural network attention model
CN111898736B (en) Efficient pedestrian re-identification method based on attribute perception
CN111582225B (en) Remote sensing image scene classification method and device
CN110263697A (en) Pedestrian based on unsupervised learning recognition methods, device and medium again
CN105590099B (en) A kind of more people's Activity recognition methods based on improvement convolutional neural networks
CN114067143B (en) Vehicle re-identification method based on double sub-networks
CN111709311A (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
KR102036957B1 (en) Safety classification method of the city image using deep learning-based data feature
CN113297936B (en) Volleyball group behavior identification method based on local graph convolution network
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN113761259A (en) Image processing method and device and computer equipment
CN113095370A (en) Image recognition method and device, electronic equipment and storage medium
CN104463194A (en) Driver-vehicle classification method and device
CN114360038A (en) Weak supervision RPA element identification method and system based on deep learning
CN116311357A (en) Double-sided identification method for unbalanced bovine body data based on MBN-transducer model
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
CN116385432B (en) Light-weight decoupling wheat scab spore detection method
CN117333948A (en) End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism
CN117131436A (en) Radiation source individual identification method oriented to open environment
CN114972434B (en) Cascade detection and matching end-to-end multi-target tracking system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination