CN112861855A

CN112861855A - Group-raising pig instance segmentation method based on confrontation network model

Info

Publication number: CN112861855A
Application number: CN202110148643.0A
Authority: CN
Inventors: 涂淑琴; 万华; 袁伟俊; 黄健; 王帆; 林跃庭; 张加冲; 邱鸿鑫
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2021-05-28

Abstract

The application relates to an example segmentation method for pig raising in a group. The method adopts an antagonistic network model formed based on a correction network and a Mask screening R-CNN model to carry out example segmentation on the group-fed pigs. Due to the fact that the smooth loss function is adopted in the correction network and a segmentation quality Scoring network exists in the Mask Scoring R-CNN model, the scheme provided by the application can achieve rapid convergence of the example segmentation Mask loss function and improve the segmentation quality of the model, and therefore the problems that in the pig herding example segmentation, due to overlapping and shielding of pigs, detection omission of the pigs and rough edge contour segmentation are caused are solved.

Description

Group-raising pig instance segmentation method based on confrontation network model

Technical Field

The application relates to the technical field of computer vision, in particular to a method for segmenting group-fed pig instances based on an antagonistic network model.

Background

Pork is one of the common meats in daily life. In recent years, in the process of pig large-scale breeding, the pork yield is improved, artificial intelligence and image recognition and processing technologies are used for pig breeding and management, and automatic measurement of various indexes of live pigs is realized. Meanwhile, the growth and health conditions of the live pigs are automatically monitored, so that the breeding cost is greatly reduced, and the more efficient and informationized automatic live pig breeding is realized. Under the natural scene of the group-breeding pigs, the live pigs are mutually overlapped and seriously shielded, so that the detection and the segmentation of the live pigs have great challenges, and therefore, the automatic instance segmentation for realizing the group-breeding pigs has important significance for the efficient scientific large-scale breeding of the live pigs.

In the related technology, a patent with publication number CN113207563A discloses a mass-fed adherent pig example segmentation method based on the fusion of Mask R-CNN and Soft-NMS, and the method adopts the Soft-NMS algorithm to replace the traditional NMS algorithm on the basic Mask R-CNN frame to reduce the missed detection rate of adherent pigs.

But the technical scheme has poor effect of solving the problem of inaccurate live pig segmentation caused by intensive overlapping and mutual shielding of pigs.

Disclosure of Invention

In order to overcome the problems in the related art, the application provides a method for segmenting the group-raised pig instances based on an antagonistic network model, which improves the technical defects, thereby realizing the accurate segmentation and detection of the group-raised pigs.

The application provides a method for segmenting group-raised pig instances based on an antagonistic network model, which comprises the following steps:

acquiring pig swarm image data;

processing the pig image data of the pig group based on an antagonistic network model to obtain a coordinate set, a segmentation mask and classification information of each pig;

the confrontation network model is formed by modifying an example segmentation model through a modifying network; the correction network comprises a smooth loss function used for calculating the distance between the example segmentation predicted value and the real value; the correction network adds the smooth loss function into the loss function of the example segmentation model for compensation to form a corrected confrontation network model;

the example segmentation model includes: mask Scoring R-CNN model.

In one embodiment, the Mask Scoring R-CNN model comprises: a basic backbone network, a RPN based candidate region network, a Mask Head and a segmentation quality score network MaskIoU;

the basic backbone network is used for processing pig image data and inputting the obtained characteristic diagram into the RPN;

the RPN is used for processing a characteristic diagram and inputting the obtained region of interest into the Mask Head and the correction network;

the Mask Head carries out classification and frame regression based on the region of interest to obtain a coordinate set, a segmentation Mask and classification information of each pig, and the prediction segmentation Mask is input to the Mask IoU after being subjected to maximum pooling treatment;

the maskIoU is used for carrying out quality scoring on the prediction segmentation Mask in the Mask Head.

In one embodiment, the masklou comprises four convolutional layers and three fully-connected layers; among the four convolutional layers, the convolutional kernel size of the first layer is 3 × 3 × 257, and the convolutional kernels of the other three layers are all 3 × 3 × 256; in the three fully-connected layers, the output of the first layer and the output of the second layer are 1024, and the output of the third layer is the number of categories;

the MaskIoU obtains input features based on the prediction segmentation mask and the region of interest;

the loss function of the MaskIoU is

L_IoU(T_pred,T_gt)＝∑(T_pred-T_gt)²

Wherein L is_IoURepresents a loss function of MaskIoU; t is_predRepresenting a predictive segmentation mask; t is_gtThe notation segmentation mask is shown.

In one embodiment, the RPN is a full convolutional layer with three convolutional layers arranged in a tree structure, the trunk is one 3 × 3 convolutional layer, the branches are 21 × 1 convolutional layers;

and the ROI output by the RPN is subjected to bilinear interpolation operation through a RoIAlign network to obtain a characteristic diagram corresponding to each ROI, and the characteristic diagram is used as the input of the Mask Head.

In one embodiment, the penalty function for the antagonistic network model is

L_RoI＝L_cls+L_box+L_mask+L_IoU+L_Dis

Wherein L is_clsIs a classification loss function; l is_boxIs the frame detection loss function; l is_maskIs a segmentation mask penalty; l is_IoUIs a loss function of MaskIoU; l is_DisIs a smoothing loss function of the modified network.

In one embodiment, the correction network is a full convolutional layer formed by combining two convolutional layers after positive and negative sample training; the convolution kernels of the two layers of the full convolution layer are both 5 multiplied by 5, the channels are respectively 64 and 128, and the convolution step length is both 2;

the input of the convolution layer in the correction network is a positive sample example and a negative sample example;

the smoothing loss function of the correction network is

Wherein, theta_GA generator parameter, theta, representative of said antagonistic network_GDiscriminator parameters representing said antagonistic network, N representing the generation of an example segmented networkPredicting the total number of segmentation masks, x_nN predictive split masks, r, representing the generation of an instance split network_nDenotes x_nA region of interest of a corresponding original input image; y is_nDenotes x_nA corresponding real segmentation mask; one-dimensional vector fc (r)_n·x_n) The method comprises the steps of forming a pseudo sample by pixel-level characteristics, characteristics output by a first layer of convolutional layer and characteristics output by a second layer of convolutional layer; fc (r)_n·y_n) The method is composed of pixel-level characteristics of a true sample, characteristics output by a first layer of convolutional layer and characteristics output by a second layer of convolutional layer.

In one embodiment, the positive sample instance is a multiplication of the true segmentation mask and the instance image processing map point;

the negative example is obtained by multiplying the prediction division mask and the example image processing image point.

In one embodiment, the real segmentation mask is obtained by clipping an original labeling mask based on the anchor point frame coordinates corresponding to the ROI;

the example image processing graph is obtained by cutting an original input image based on the anchor point frame coordinates corresponding to the ROI;

the predictive segmentation mask is processed through a clipping and sigmoid activation function.

In one embodiment, the classification loss function is

Wherein L is_clsRepresents a classification loss function; x_labelA score value representing a prediction category; label represents a category index; n represents all categories;

the frame detection loss function is

Wherein L is_boxRepresentative frame detection loss boxCounting; t is_predRepresenting a predictive segmentation mask; t is_gtRepresenting a label segmentation mask;

the segmentation mask loss is obtained by multiplying a classification loss function and a maskIoU loss function.

In one embodiment, before the processing the pig image data based on the confrontation network model to obtain the coordinate set, the segmentation mask and the classification information of each pig, the method includes:

compensating the loss function of the example segmentation model by using the smooth loss function of the correction network to obtain the confrontation network model;

calling a verification set to evaluate the confrontation network model to obtain the hyper-parameters of the confrontation network model;

calling a test set to test the confrontation network model, judging whether the confrontation network model meets a target condition, and if so, outputting the confrontation network model; if not, reconstructing the hyper-parameters of the confrontation network model until the confrontation network model meets the target condition;

the target conditions include: the segmentation accuracy of the confrontation network model is not lower than an accuracy threshold and the recall rate is not lower than a recall rate threshold;

the accuracy threshold value range is 75% to 100%;

the recall rate threshold value ranges from 80% to 100%.

The technical scheme provided by the application can comprise the following beneficial effects: according to the scheme, an antagonistic network model formed on the basis of a correction network and a Mask screening R-CNN model is adopted to carry out example segmentation on the group-fed pigs, and as the Mask screening R-CNN model is adopted as the example segmentation model in the scheme, compared with the Mask R-CNN model, a segmentation quality Scoring network is added to the example segmentation model, and a high-quality segmentation result is selected by calculating the score of the quality confidence of a segmentation region, so that the example segmentation performance of the example segmentation model is improved; the correction network in the scheme adopts the smooth loss function and corrects the loss function of the example segmentation model based on the loss function to obtain the confrontation network model, and the smooth loss function has excellent robustness to noise and outliers and is not easy to mislead network training, so that the stability and accuracy of network training are improved, the obtained confrontation network model can meet the requirement of accurate segmentation of overlapped pigs, and the accuracy of example segmentation is improved. The problem that the pigs are missed to be detected and the segmentation is inaccurate due to overlapping, adhesion and other complex environments can be well solved by performing example segmentation on the group-fed pigs based on the confrontation network model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application, as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

Fig. 1 is a schematic flow chart of an example segmentation method for a group-raised pig based on an antagonistic network model according to an embodiment of the present application;

FIG. 2 is a flow chart of a construction method of an antagonistic network model according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for constructing an example segmentation model according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a positive and negative sample acquisition method according to an embodiment of the present application;

FIG. 5 is another schematic flow chart diagram illustrating a method for constructing a countermeasure network model according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating an example segmentation result of a herded pig based on an antagonistic network model according to an embodiment of the present application;

fig. 7 is another schematic diagram of the segmentation result of the example of the group-raised pigs based on the antagonistic network model according to the embodiment of the present application;

FIG. 8 is a schematic diagram of an example segmentation result of a herded pig based on an MS R-CNN model according to an embodiment of the present application;

fig. 9 is another schematic diagram of the segmentation result of the example of the herded pigs based on the MS R-CNN model according to the embodiment of the present application.

Detailed Description

Preferred embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

In order to solve the problems, the embodiment of the application provides a method for segmenting the group-raised pig instance based on an antagonistic network model, and the method can realize accurate segmentation and detection of the group-raised pig.

The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Example 1

Fig. 1 is a schematic flow chart of an example segmentation method for a herded pig based on an antagonistic network model according to an embodiment of the present application.

Referring to fig. 1, the method for segmenting the group-raised pig example based on the antagonistic network model comprises the following steps:

101. acquiring pig swarm image data;

in the embodiment of the application, the camera with the positive shooting angle or the camera with the downward shooting angle can be used for acquiring the image data of the pigs raised in the group.

It should be noted that, comparing the angle of the downward shooting, the images of the group-fed pigs obtained at the angle of the forward shooting are more complex, and meanwhile, in the image of the forward shooting, there is a more serious shielding overlap condition between the pigs, so that in practical application, the position and the angle of the camera can be set according to the actual condition.

It should be understood that the above description of the acquisition of herd pig image data is only an example of the embodiments of the present application and should not be construed as limiting the present invention.

102. And processing the pig image data of the group based on the confrontation network model to obtain a coordinate set, a segmentation mask and classification information of each pig.

The confrontation network model is formed by modifying an example segmentation model through a modifying network; the correction network comprises a smooth loss function used for calculating the distance between a predicted value and a true value; the correction network adds the smooth loss function into the loss function of the example segmentation model for compensation to form a corrected confrontation network model;

the example segmentation model includes: mask Scoring R-CNN model.

In the embodiment of the present application, the Smooth loss function adopts a Smooth L1 loss function, and the loss function of the countermeasure network model is obtained by performing correction compensation on the loss function of the example segmentation model through the Smooth loss function.

In the embodiment of the present application, the Mask screening R-CNN model includes: the method comprises a basic backbone network, a RPN based candidate area network, a Mask Head based Head network and a maskIoU based segmentation quality score network.

In this embodiment of the present application, after the original image is input to the confrontation network model, a feature map of the original image is obtained after convolutional layer processing of the basic backbone network, and the basic backbone network inputs the feature map to the RPN; after the RPN acquires the characteristic diagram output by the basic backbone network, the characteristic diagram is input to a full convolution layer to be processed to obtain the ROI; before the ROI is processed by a Full Convolution Network (FCN) and full connected layers (FC layers) of the Mask Head, a feature diagram corresponding to each ROI is obtained after bilinear interpolation operation is carried out on the ROI by a RoIAlign network and serves as the input of the Mask Head; the FCN of the Mask Head obtains a prediction segmentation Mask based on the feature map corresponding to the ROI, the prediction segmentation Mask is input to the Mask IoU for quality scoring after the maximum pooling processing, and meanwhile, the FC layers of the Mask Head extract classification information and a coordinate set of the feature map corresponding to the ROI; and the computer outputs the result of the original image instance segmentation to a screen to form a visualized segmented image based on the coordinate set and the classification information.

In the embodiment of the application, an antagonistic network model formed on the basis of a correction network and a Mask screening R-CNN model is adopted to carry out example segmentation on the group-fed pigs, and as the Mask screening R-CNN model is selected as the example segmentation model in the scheme, compared with the Mask R-CNN model, the example segmentation model is additionally provided with a segmentation quality Scoring network, and the segmentation result with high quality is selected by calculating the score of the quality confidence of the segmentation region, so that the example segmentation performance of the example segmentation model is improved; the correction network in the scheme adopts the smooth loss function and corrects the loss function of the example segmentation model based on the loss function to obtain the confrontation network model, and the smooth loss function has excellent robustness to noise and outliers and is not easy to mislead network training, so that the stability and accuracy of network training are improved, the obtained confrontation network model can meet the requirement of accurate segmentation of overlapped pigs, and the accuracy of example segmentation is improved. The problem that the pigs are missed to be detected and the segmentation is inaccurate due to overlapping, adhesion and other complex environments can be well solved by performing example segmentation on the group-fed pigs based on the confrontation network model.

Example 2

In practical application, before the pig herd image data is processed by using the pig herd example segmentation method based on the confrontation network model, the confrontation network model needs to be constructed.

The embodiment of the application designs a method for constructing a confrontation network model, and specifically refers to fig. 2, and fig. 2 is a flow diagram of the construction method of the confrontation network model.

201. Constructing an example segmentation model;

in the embodiment of the application, the example segmentation model is a Mask Scoring R-CNN model. The Mask Scoring R-CNN model comprises: the method comprises a basic backbone network, a RPN based candidate area network, a Mask Head based Head network and a maskIoU based segmentation quality score network.

In the embodiment of the application, the basic backbone network adopts improved ResNet50+ FPN; the RPN is a full-convolution layer with three convolution layers arranged in a tree structure, a trunk is a convolution layer of 3 multiplied by 3, and branches are 2 convolution layers of 1 multiplied by 1; the Mask Head comprises a Full Convolutional Network (FCN) and full connected layers (FC layers); the MaskIoU is a network structure comprising four convolutional layers and three fully-connected layers, wherein in the four convolutional layers, the size of a convolution kernel of the first layer is 3 multiplied by 257, and the sizes of convolution kernels of other three layers are all 3 multiplied by 256; in the three fully connected layers, the outputs of the first layer and the second layer are 1024, and the output of the third layer is the number of categories. In the embodiment of the present application, the number of the categories is 2.

It should be noted that the network structure of the Mask scoping R-CNN model is not strictly limited in the present invention, and in practical applications, the network structure may be adjusted, for example, ResNet18+ FPN and ResNet101+ FPN may also be adopted for the basic backbone network.

It should be understood that the above description of the example segmentation model is only an example of the embodiments of the present application and should not be construed as limiting the present invention.

202. Constructing a correction network;

the correction network comprises a smooth loss function used for calculating the distance between a predicted value and a true value;

in the embodiment of the present application, for example, the modified network construction process specifically includes:

constructing a network structure of a correction network;

constructing an input of a correction network;

a loss function is constructed that corrects for the network.

In the embodiment of the application, the network structure of the correction network is a full convolution layer formed by combining two convolution layers; the convolution kernels of the two convolution layers of the full convolution layer are both 5 multiplied by 5, the channels are respectively 64 and 128, and the convolution step length is both 2.

In the embodiments of the present application, the input of the convolutional layer in the correction network is a positive sample instance and a negative sample instance.

In the embodiment of the present application, the smoothing loss function of the modified network may be obtained according to the following formula:

wherein, theta_GA generator parameter, theta, representative of said antagonistic network_DA discriminator parameter representing said countermeasure network, N representing the total number of predicted segmentation masks generated by the example segmentation network, x_nN predictive split masks, r, representing the generation of an instance split network_nDenotes x_nA region of interest of a corresponding original input image; y is_nDenotes x_nA corresponding real segmentation mask; one-dimensional vector fc (r)_n·x_n) The method comprises the steps of forming pixel-level characteristics of a negative sample, characteristics output by a first layer of convolution layer and characteristics output by a second layer of convolution layer; fc (r)_n·y_n) The method is composed of pixel-level features of a positive sample, features output by a first layer of convolutional layer, and features output by a second layer of convolutional layer.

It is to be understood that the above description of the modified network construction process is only an example and should not be construed as limiting the present invention.

It should be noted that, the timing sequence of the step 202 and the step 201 is not strictly limited in the present invention, that is, the step 202 may be executed before the step 201 or in parallel with the step 201.

It is understood that the above description of the sequence of step 201 and step 202 is only an example and is not intended to limit the present invention.

203. Training a correction network based on the positive and negative samples;

in an embodiment of the present application, the positive and negative samples include: positive examples and negative examples. The positive sample example is obtained by dot multiplication of a real segmentation mask and an example image processing image; the negative example is obtained by multiplying the prediction division mask and the example image processing image point.

In the embodiment of the application, the real segmentation mask is obtained by cutting an original labeling mask based on the anchor point frame coordinate corresponding to the ROI; the example image processing graph is obtained by cutting an original input image based on the anchor point frame coordinates corresponding to the ROI; the predictive segmentation mask is processed through a clipping and sigmoid activation function.

It is to be understood that the above description of positive and negative examples is only one example of the embodiments of the present application and should not be taken as limiting the invention.

204. And carrying out countermeasure training based on the trained correction network and the example segmentation model to obtain a countermeasure network model.

In the embodiment of the application, the Mask screening R-CNN model is used as a generator, the correction network is used as a discriminator, and the countermeasure training for generating the countermeasure network is used to perform optimization training by a strategy of alternative countermeasures, so that the countermeasure network model is constructed.

It is to be understood that the above description of the formation process of the countermeasure network model is only an example and does not necessarily constitute a limitation of the present invention.

The embodiment of the application provides a method for generating a confrontation network model based on an example segmentation network and a correction network, which comprises the steps of carrying out positive and negative sample training on the correction network; and generating a confrontation network model based on the example segmentation network and the trained correction network by using the confrontation training. The positive and negative sample training optimizes the calculation precision of the correction network on the distance between the predicted value and the true value, thereby improving the classification accuracy of the confrontation network model obtained based on the correction network; and the confrontation network model is obtained by the confrontation training, so that the example segmentation model serving as a generator and the correction network serving as a discriminator can be jointly improved to achieve nash balance, and the performance of the model on the task is improved, therefore, the formed confrontation network model can achieve an ideal effect when the example segmentation task is carried out on the group-fed pigs.

Example 3

In practical application, this embodiment designs an example segmentation model construction method in step 201 in embodiment 2, and fig. 3 is a schematic flow chart of the example segmentation model construction method, and in detail, see fig. 3, the example segmentation model construction method includes:

301. constructing a Mask R-CNN model;

in the embodiment of the present application, the Mask R-CNN model includes: a basic backbone network, a RPN based candidate area network and a Mask Head; wherein, the basic backbone network adopts improved ResNet50+ FPN; the Mask R-CNN model removes redundant detection frames in the RPN by using Soft-NMS; the Mask R-CNN model also adopts a RoIAlign network to carry out bilinear interpolation operation on each ROI obtained by the RPN; and the Mask R-CNN model also utilizes the Head network Mask Head to carry out the detection, classification and segmentation operations of the ROI.

It is understood that the above description of the Mask R-CNN model is only an example of the embodiments of the present application and is not necessarily intended as a limitation on the present invention.

302. Constructing a segmentation quality score network MaskIoU;

in the embodiment of the application, the specific process of constructing the segmentation quality score network MaskIoU is as follows,

the following are exemplary:

constructing a network structure of the MaskIoU;

taking a feature map obtained based on the prediction segmentation mask and the region of interest as the input of the MaskIoU;

constructing a loss function of the MaskIoU.

In an embodiment of the present application, the specific steps of acquiring the MaskIoU input include: performing maximum pooling on the prediction segmentation Mask output by the Mask Head to obtain a 14 × 14 × 1 feature map as a first input; obtaining a 14 multiplied by 256 characteristic diagram as a second input after the RoIAlign; and adding the first input and the second input to be used as the input characteristic of the MaskIoU Head network.

In the embodiment of the present application, the MaskIoU loss function may be constructed according to the following calculation formula:

L_IoU(T_pred,T_gt)＝∑(T_pred-T_gt)²

It is understood that the above description of constructing the modified network and constructing the maskolou is only an example and does not limit the present invention.

It should be noted that, in the embodiment of the present application, there is no strict limitation on the timing sequence of step 301 and step 302, that is, step 302 may be executed before step 301 or in parallel with step 301.

It is understood that the above description of the sequence of step 301 and step 302 is only an example of the embodiment of the present application, and is not necessarily taken as a limitation on the present invention.

303. And combining the Mask R-CNN model and the maskIoU to form an example segmentation model.

In this embodiment of the present application, the example segmentation model formed by combining the Mask R-CNN model and the MaskIoU may be specifically expressed as: on the basis of Mask Head, adding a segmentation quality scoring network Mask IoU, and connecting a prediction segmentation Mask output by the Mask Head with features extracted by a RoIAlign layer to serve as the input of the Mask IoU so as to complete the scoring of example segmentation quality.

It is to be understood that the above description of the formation process of the example segmentation model is only one example of the embodiments of the present application and is not necessarily taken as a limitation on the present invention.

The embodiment of the application provides a construction method of an example segmentation model, and the method is characterized in that a segmentation quality scoring network MaskIoU is added on the basis of a Mask R-CNN model, the segmentation quality scoring network can score an example segmentation prediction segmentation Mask generated by the Mask R-CNN model, the score is used for evaluating the quality of a Mask, and the example segmentation model preferentially considers a more reliable prediction segmentation Mask based on the score, so that the accuracy of classification and segmentation of the Mask R-CNN model is improved.

Example 4

In practical application, this embodiment designs a positive and negative sample obtaining process in step 203 in embodiment 2, and fig. 4 is a schematic flow chart of a positive and negative sample obtaining method, and specifically please refer to fig. 4, where the positive and negative sample obtaining process includes:

401. acquiring a picture data set of a pig group;

in the embodiment of the present application, the specific steps of obtaining the picture data set are as follows: collecting various behavior video data of pigs at side view and overlook angles, wherein the video data comprises group-breeding pig data under three conditions of different illumination, various shelters and pig body overlapping; after video data is obtained, screenshots are selected by software according to a certain proportion.

During the experimental period, this example followed herd pigs in 5 piggeries, with about 3 to 20 herd pigs per piggery. The video was captured from 9 am to 4 pm for about 7 hours. The video is stored in AVI format, and the frame rate is 25 fps. After video data is obtained, 315 screenshots are selected by software according to a certain proportion.

It should be noted that the present invention is not limited strictly to the collection mode and the use device of the picture data set, and in the practical application process, the camera can be directly used to collect the picture data of a plurality of pig herds to form the image data set. In addition, the number of the picture data sets is not strictly limited, and in practical application, the number can be adjusted according to practical situations.

It is to be understood that the above description of the picture data set is only an example and not necessarily as a limitation of the invention.

402. Preprocessing the picture data set by adopting mean filtering to obtain an annotated data set;

in the embodiment of the application, in the experimental stage, Labelme software is used for labeling the image and arranging the labeled data into a COCO dataset format.

It should be noted that the above description of the labeled data set is an example of the embodiment in the experimental stage, and in practical applications, different image processing software and processing methods may be adopted to preprocess the picture data set according to actual situations to obtain the labeled data set, for example, gaussian filtering or kalman filtering.

It is to be understood that the above description of the pre-processing procedure of the picture data set is only an example and does not constitute a limitation of the present invention.

403. Dividing the labeled data set into a training set, a verification set and a test set;

in the embodiment of the application, the number of the pictures in the training set, the verification set and the test set is 180, 180 and 135 respectively; wherein, the training set and the verification set respectively comprise 105 positive pictures and 75 depression pictures; the test set included 65 forward pictures and 70 overhead pictures.

It should be noted that, in the embodiment of the present application, a splitting method of a picture is not limited, and in an actual application process, a random manner or a manner that a forward shooting angle and a forward shooting angle form a certain proportion may be selected to split a labeled data set.

It is to be understood that the above description of the training set, the validation set, and the test set is only an example, and should not be construed as limiting the present invention.

404. Carrying out data augmentation operation on the images in the training set to obtain an extended training set;

in an embodiment of the present application, the data augmentation operation includes: five data augmentation operations of horizontal turning, brightness adjustment, contrast adjustment, saturation adjustment and tone random adjustment.

It should be noted that, in practical applications, the data augmentation operation may further include N operations among cropping, scaling, and noise augmentation, where N is a positive integer with a value range of 1 to 3.

It should be understood that the above description of the data augmentation operation is only an example in the embodiment of the present application, and should not be construed as limiting the present invention.

405. Positive and negative samples are constructed based on the extended training set.

In the embodiment of the present application, the process of constructing the positive and negative samples is specifically,

the following are exemplary:

according to the positive sample frame detected by the RPN, finding the anchor point frame coordinates corresponding to each effective ROI in the original input image; cutting out a corresponding area in the original marking mask according to the coordinates and zooming the corresponding area into 28 multiplied by 28 to obtain a real division mask;

obtaining the coordinates of the anchor point frame corresponding to each effective ROI in the original input image, and cutting out corresponding areas on the original input image according to the coordinates; setting the pixel size of the area to be 28 multiplied by 28, and obtaining an example image processing map;

performing dot product operation on the real segmentation mask and the example image processing image to obtain a real sample example;

obtaining a positive sample box based on an original input image size through the RPN; processing the positive sample through an ROI Align layer to obtain a feature map of the positive sample in an underlying backbone network, namely an ROI with the size of 14x 14; then, the ROI is used as input and is mapped to an original Mask head network to obtain a 28 x 28 prediction segmentation Mask;

and processing the prediction segmentation mask by using a sigmoid activation function to map the value of the prediction segmentation mask into a [0,1] range, and then performing dot multiplication operation on the prediction segmentation mask and the processing graph of the example image to obtain a negative sample example.

It is to be understood that the above description of the positive and negative sample building process is only an example in the embodiments of the present application, and is not necessarily taken as a limitation on the present invention.

The embodiment of the application provides a method for acquiring positive and negative samples, which comprises the steps of using a data augmentation technology to perform data augmentation on acquired picture data, and then processing the data after augmentation to acquire the positive and negative samples. The data amplification increases a training data set, so that the data set is diversified as much as possible, and a trained model has stronger generalization capability; the problem of uneven distribution of the positive and negative samples can be effectively avoided based on the positive and negative samples formed by the diversified data sets, so that the model trained based on the positive and negative samples can have sufficient reliability.

Example 5

In practical application, this embodiment designs a method for generating the countermeasure network model in step 204 in embodiment 2, and fig. 5 is another schematic flow chart of a method for constructing the countermeasure network model, and specifically please refer to fig. 5, where the method for generating the countermeasure network model includes:

501. compensating the loss function of the example segmentation model by using the smooth loss function of the correction network to obtain the confrontation network model;

in this embodiment of the present application, the compensating the loss function of the example segmentation model by using the smooth loss function of the modified network may specifically be represented as:

obtaining a loss function of the countermeasure network model according to the following formula;

L_RoI＝L_cls+L_box+L_mask+L_IoU+L_Dis

wherein L is_clsIs a classification loss function; l is_boxIs the frame detection loss function; l is_maskIs a segmentation mask penalty; l is_IoUIs the loss letter of MaskIoUCounting; l is_DisIs a smoothing loss function of the modified network.

In the embodiment of the present application,

the classification loss function is

the frame detection loss function is

Wherein L is_boxRepresenting a bezel detection loss function; t is_predRepresenting a predictive segmentation mask; t is_gtRepresenting a label segmentation mask;

502. Evaluating the confrontation network model, and determining the hyper-parameters of the model;

in the embodiment of the present application, before the verification set is called to evaluate the countermeasure network model, data augmentation processing needs to be performed on the verification set.

In an embodiment of the present application, in the experimental stage, the evaluation process includes: setting parameters of the example segmentation network; carrying out accurate example segmentation on the individual group-fed pigs by using an confrontation network model obtained after the confrontation training; determining a hyper-parameter of the antagonistic network model based on the segmentation result.

Wherein, the parameter setting process includes: changing the number of network output categories from 81 to 2; setting a preset anchor width-length ratio of an RPN part to be [0.5,1.2], and modifying the size of an anchor frame from [32,64,128,256,512] to [32,64,128,256,384 ]; the initial learning rate of the model is set to be 0.001, the learning rate attenuation factor is 0.1, the learning rate is updated once every 1 thousand iterations, the maximum iteration time is set to be 5 ten thousand times, and the model is stored once every 5000 iterations; the model scales the size uniformly to 1333 × 800 when reading the training image.

In this embodiment of the present application, in the experiment stage, the parameter setting process further includes: two key parameters are set: maskiou _ on and Maskdis _ on.

Wherein, the Maskiou _ on option controls the MaskIoU Head and sets a control mask segmentation quality score; the MaskDis _ on option controls the MaskDis Head Network to judge the Network. These two parameters provide a simple switch interface, and the user can directly modify the relevant variables in the macro definition, so that the network can be easily switched on or off.

It should be noted that the process of model evaluation is a specific operation process in the experimental stage of the embodiment of the present application, and in practical applications, the process may be adjusted in steps based on actual situations. For example, in practical application, K-fold cross validation can be generally used for model evaluation.

It should be understood that the above description of model evaluation is only an example of the embodiments of the present application and should not be construed as limiting the invention.

503. And testing the confrontation network model to obtain the confrontation network model meeting the target condition.

The following are exemplary:

the accuracy threshold value range is 75% to 100%;

the recall rate threshold value ranges from 80% to 100%.

In the embodiment of the present application, since the data set used for the test is preprocessed, the noise of the obtained test set data is low, and the example segmentation test is performed based on the test set, the obtained accuracy and recall ratio are high, so in the embodiment of the present application, the accuracy threshold value is 90%, and the recall ratio threshold value is 90%.

It should be noted that, in practical applications, the accuracy threshold and the recall threshold may be set according to actual requirements.

It should be understood that the above description of the accuracy threshold and the recall threshold is only an example of the embodiments of the present application and should not be taken as a limitation on the present invention.

It should be noted that, in the embodiment of the present application, in the practical application stage, if the test finds that the confrontation network model does not meet the target condition, an appropriate loss function may be redesigned or a sufficient training data set may be prepared again until the obtained confrontation network model can meet the target condition.

In the embodiment of the present application, the target condition may further include: the F1 score is not below the FI threshold; the value range of the F1 threshold is 0.8-1.

In the embodiment of the present application, in the experimental stage, the accuracy, the recall ratio and the F1 score of the confrontation network model are studied, the study results are detailed in table 1, and the segmentation effect is shown in fig. 6 and fig. 7.

TABLE 1 Cluster pig case segmentation results based on antagonistic network model

The segmentation effect of the group-raised pig example based on the MS R-CNN model is shown in detail in FIGS. 8 and 9, and the accuracy, the recall ratio and the F1 score of the MS R-CNN model are shown in Table 2.

TABLE 2 MS R-CNN model-based segmentation results of group-raised pig examples

It can be known that, compared with the MS R-CNN model, the recall rate of the confrontation network model is improved by 0.08 to 92.18 percent, the accuracy rate is improved by 2.25 to 92.03 percent, and the F1 score is improved by 0.0118 to 0.9210; the detection number of the confrontation network model on the positive shooting test set is 494, the correct detection number is 413, the recall rate is improved by 1.66-85.68%, the accuracy rate is improved by 3.4-83.60%, and the F1 score is improved by 0.0256-0.8463; the antagonistic network model detects 710 pigs in the group on the plane test set, the correct detection number is 695, the accuracy is improved by 1.46-97.89%, and the F1 score is slightly improved from 0.9696 to 0.9720.

As can be seen from fig. 6 to 9, compared with the MS R-CNN model, the effect of example segmentation on the group pigs at the positive shooting angle is greatly improved by the countermeasure network model, and the countermeasure network model can accurately segment the pig ear edges and the pig leg edges, thereby solving the problems of missing detection and inaccurate segmentation edges in example segmentation.

The embodiment of the application provides a generation method of an confrontation network model, which carries out deepening correction on a loss function of the confrontation network model based on a Smooth L1 loss function, so that the confrontation network model is more stable during training, and the accuracy of the confrontation network model for carrying out example segmentation on overlapped pigs is improved; in addition, after the confrontation network model is trained, model evaluation and model test are carried out, the preferred value of the hyper-parameter of the confrontation network model is determined through evaluation, and then the confrontation network model which can meet the segmentation requirement is selected through the model test, so that the finally obtained confrontation network model can well solve the problems of missed detection and inaccurate segmentation of the pigs caused by overlapping, adhesion and other complex environments.

The aspects of the present application have been described in detail hereinabove with reference to the accompanying drawings. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. Those skilled in the art should also appreciate that the acts and modules referred to in the specification are not necessarily required in the present application. In addition, it can be understood that the steps in the method of the embodiment of the present application may be sequentially adjusted, combined, and deleted according to actual needs, and the modules in the device of the embodiment of the present application may be combined, divided, and deleted according to actual needs.

Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.

Alternatively, the present application may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or electronic device, server, etc.), causes the processor to perform part or all of the various steps of the above-described method according to the present application.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A group-raising pig instance segmentation method based on an antagonistic network model is characterized by comprising the following steps:

acquiring pig swarm image data;

the example segmentation model includes: mask Scoring R-CNN model.

2. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 1,

the Mask Scoring R-CNN model comprises: a basic backbone network, a RPN based candidate region network, a Mask Head and a segmentation quality score network MaskIoU;

3. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 2,

the MaskIoU comprises four convolution layers and three full-connection layers; among the four convolutional layers, the convolutional kernel size of the first layer is 3 × 3 × 257, and the convolutional kernels of the other three layers are all 3 × 3 × 256; in the three fully-connected layers, the output of the first layer and the output of the second layer are 1024, and the output of the third layer is the number of categories;

the loss function of the MaskIoU is

L_IoU(T_pred,T_gt)＝∑(T_pred-T_gt)²

4. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 2,

the RPN is a full-convolution layer with three convolution layers arranged in a tree structure, a trunk is a convolution layer of 3 multiplied by 3, and branches are 2 convolution layers of 1 multiplied by 1;

5. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 1,

the loss function of the countermeasure network model is

L_RoI＝L_cls+L_box+L_mask+L_IoU+L_Dis

6. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 5,

the correction network model is a full convolution layer formed by combining two convolution layers after positive and negative sample training; the convolution kernels of the two layers of the full convolution layer are both 5 multiplied by 5, the channels are respectively 64 and 128, and the convolution step length is both 2;

the smoothing loss function of the correction network is

Wherein, theta_GA generator parameter, theta, representative of said antagonistic network_DA discriminator parameter representing said countermeasure network, N representing the total number of predicted segmentation masks generated by the example segmentation network, x_nN predictive split masks, r, representing the generation of an instance split network_nDenotes x_nA region of interest of a corresponding original input image; y is_nDenotes x_nA corresponding real segmentation mask; one-dimensional vector fc (r)_n·x_n) The method comprises the steps of forming a pseudo sample by pixel-level characteristics, characteristics output by a first layer of convolutional layer and characteristics output by a second layer of convolutional layer; fc (r)_n·y_n) From a true sampleThe pixel level characteristics of the present invention, the characteristics of the first layer convolutional layer output, and the characteristics of the second layer convolutional layer output.

7. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 6,

the positive sample example is obtained by dot multiplication of a real segmentation mask and an example image processing image;

8. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 7,

the real segmentation mask is obtained by cutting an original labeling mask based on the anchor point frame coordinate corresponding to the ROI;

9. The method for partitioning the example of a herded pig based on an antagonistic network model according to claim 5,

the classification loss function is

the frame detection loss function is

10. The method for partitioning the pig herd based on the confrontation network model as claimed in claim 1, wherein the processing the pig herd image data based on the confrontation network model to obtain the coordinate set, the partitioning mask and the classification information of each pig comprises:

the accuracy threshold value range is 75% to 100%;

the recall rate threshold value ranges from 80% to 100%.