CN109766856A - A kind of method of double fluid RGB-D Faster R-CNN identification milking sow posture - Google Patents

A kind of method of double fluid RGB-D Faster R-CNN identification milking sow posture Download PDF

Info

Publication number
CN109766856A
CN109766856A CN201910040870.4A CN201910040870A CN109766856A CN 109766856 A CN109766856 A CN 109766856A CN 201910040870 A CN201910040870 A CN 201910040870A CN 109766856 A CN109766856 A CN 109766856A
Authority
CN
China
Prior art keywords
rgb
image
roi
cnn
depth image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910040870.4A
Other languages
Chinese (zh)
Other versions
CN109766856B (en
Inventor
薛月菊
朱勋沐
郑婵
杨晓帆
陈畅新
王卫星
甘海明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Agricultural University
Original Assignee
South China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Agricultural University filed Critical South China Agricultural University
Priority to CN201910040870.4A priority Critical patent/CN109766856B/en
Publication of CN109766856A publication Critical patent/CN109766856A/en
Application granted granted Critical
Publication of CN109766856B publication Critical patent/CN109766856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses the identification milking sow posture methods of double fluid RGB-D FasterR-CNN a kind of, propose it is a kind of feature extraction phases fusion RGB-D characteristics of image end-to-end double-current RGB-D Faster R-CNN algorithm, for identification the milking sow standing under free column sow scene, sitting, prostrate, abdomen crouch and 5 class postures of lying on one's side.Based on Faster R-CNN, RGB image feature and depth image feature are extracted respectively using two CNN networks first;Then the mapping relations for utilizing RGB-D image, the area-of-interest of RGB image characteristic pattern and depth image characteristic pattern is generated only with a RPN network;After area-of-interest feature pool, realize that the splicing of RGB-D feature is merged using an independent network layer;Finally in the Fast R-CNN stage, introduces NOC structure and continue the fused feature of convolution extraction, be re-fed into classifier and return device.The end-to-end high-precision of RGB-D data information, mini Mod and real-time sow gesture recognition method have been merged the present invention provides a kind of, has been laid a good foundation for further analysis sow behavior.

Description

Method for recognizing posture of lactating sow by double-current RGB-D Faster R-CNN
Technical Field
The invention relates to the technical field of multi-modal target detection and identification in computer vision, in particular to an end-to-end lactating sow posture identification method based on an Faster R-CNN target detection algorithm, which is fused in a characteristic extraction stage after RGB-D characteristics are extracted by using RGB-D data and double-flow CNN.
Background
The behavior of pigs in a pig farm is an important manifestation of their welfare and health status, directly affecting the economic benefits of the pig farm. In animal behavior monitoring technology, compared with traditional manual monitoring and sensor technology, automatic identification by using computer vision is a low-cost, efficient and non-contact mode, and valuable behavior information can be continuously provided.
In recent years, behavior recognition of pigs using computer vision has been extensively studied. For example: patent publication No. CN108830144A to schroemeri, university of south China agricultural, 2018, utilizes depth image data, introduces a residual structure and the centrloss improved fasterrr-CNN algorithm to automatically identify five poses of lactating sows in the freeboard. In 2017, the patent with publication number CN201710890676 of the team utilizes a depth image, a DPM algorithm is firstly adopted to detect the sow, a CNN network is used for recognizing the posture of the sow in a detection frame, and the patent with publication number CN107527351A utilizes an RGB image and an FCN algorithm to automatically segment the sow in a scene. A patent with publication number CN108717523A of Shaodeqin et al of Ministry of agriculture university in China in 2018 discloses a sow oestrus behavior detection method based on machine vision. A patent with publication number CN104881636A of Laobandan et al, China university of agriculture in 2016 discloses a method and a device for identifying lying behaviors of pigs. In addition, the patent with the group publication number of CN107679463A discloses an analysis method for identifying attack behaviors of the group-raised pigs by adopting a machine vision technology, and the patent with the publication number of CN107437069A discloses a contour-based pig drinking water identification method and the like.
In the current pig behavior recognition research based on computer vision, most researches only use RGB images or depth image data for research, which causes that robust feature representation is difficult to obtain in a real scene, and the bottleneck of recognition accuracy is easy to achieve. The camera mapping the 3-dimensional world into an RGB image of the 2-dimensional space inevitably results in information loss, and it is feasible to use the depth image as compensation for this part of the information. In addition, the depth image lacks the detailed information characteristics of the target object due to lack of information such as texture and color of the RGB image, and it is difficult to accurately recognize the target object when the shapes of the target objects are highly similar. In particular, in the posture recognition task of the sow in the top view shooting, on one hand, the height of the target is important information for judging different postures, which cannot be reflected in the RGB image, and on the other hand, the heights and shapes of the partial postures of the sow are similar (for example, prone posture and lying abdomen), and are difficult to be accurately distinguished only by using the depth information.
Therefore, providing a high-precision method for recognizing the posture of the lactating sow by using the double-current RGB-D Faster R-CNN is a technical problem to be solved by the technical personnel in the field.
Disclosure of Invention
In view of the above, the present invention provides a method for recognizing postures of lactating sows by using a dual-stream RGB-D Faster R-CNN, which realizes automatic, higher-precision and real-time posture recognition of lactating sows in a free column. The specific scheme for achieving the purpose is as follows:
the invention discloses a method for recognizing the posture of a lactating sow by using double-current RGB-D FasterR-CNN, which comprises the following steps:
s1, collecting RGB-D video images of the lactating sows, including RGB images and depth images, and establishing a sow posture recognition RGB-D video image library;
s2, calculating and obtaining a mapping relation between the RGB image and the depth image by a camera calibration method;
s3, based on the Faster R-CNN algorithm, respectively convolving the RGB image and the depth image by using two CNN networks to obtain an RGB image feature map and a depth image feature map;
s4, only using one RPN network, generating interested region D-ROI based on the depth image feature map, and generating interested region RGB-ROI of the RGB image feature map for each D-ROI in one-to-one correspondence through the mapping relation between the RGB image and the depth image;
s5, Pooling each D-ROI and each RGB-ROI to be fixed size by using an ROI Pooling layer, and fusing the features of each group of pooled D-ROI and RGB-ROI feature maps by using a splicing fusion method;
s6, further extracting fusion characteristics from the fused characteristic diagram by using Fast R-CNN of an NOC structure, processing the fusion characteristics by a classifier and a regressor after passing through a global average pooling layer to obtain a double-current RGB-D Fast R-CNN sow posture identification model, and outputting an identification result;
s7, acquiring a training set and a test set from the sow posture recognition RGB-D video image library, training a double-current RGB-D Faster R-CNN sow posture recognition model by using the training set, testing the performance of the model by using the test set, and finally screening an optimal performance model.
Preferably, the specific process of step S1 is as follows:
s11, fixing the RGB-D sensor to overlook to shoot and collect RGB-D video images of the pigsty; the RGB-D sensor can capture not only the color information of an objective world but also the depth information of a target, wherein a shot RGB image comprises information such as color, shape, texture and the like, and the depth image comprises clear edge information and depth information robust to light;
s12, sampling and acquiring a training set and a test set from the acquired RGB-D video image data, wherein the training data set accounts for 70%, and the test set accounts for 30% to test the model performance;
s13, preprocessing the depth image in a training set and a testing set, wherein the preprocessing comprises filtering, denoising and image enhancement, and then marking the target of the preprocessed depth image, namely marking a surrounding frame and a posture category outside the target, wherein the RGB image does not need to be processed; and then, carrying out rotation and mirror image amplification on the processed training set data for training the model.
Preferably, the specific process of step S2 is as follows:
s21, obtaining the internal reference matrix K of the RGB image by using the camera calibration methodrgbInternal reference matrix K of depth imagedObtaining an external reference matrix R of the RGB image aiming at the same checkerboard image used for camera calibrationrgbAnd TrgbAnd the outer reference matrices Rd and Td of the depth image; let the non-homogeneous pixel coordinate of RGB image be Prgb=[Urgb,Vrgb,1]TThe non-homogeneous pixel coordinate of the depth image is Pd=[Ud,Vd,1]T(ii) a Then the depth image coordinates are mapped to a rotation matrix R of RGB image coordinates, and the translation matrices T are respectively:
s22, the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the RGB image is as follows:
Prgb=(R*Zd*Pd+T)/Zrgb
from the above equation, it is possible to obtain the coordinate value P of the depth imagedAnd its pixel value ZdAnd a shooting distance ZrgbObtaining the coordinate value P of the RGB image corresponding to the point mappingrgb
Preferably, the specific process of step S3 is as follows:
in the shared convolution layer part of the FasterR-CNN, two identical CNN networks are used, wherein the CNN network taking the depth image as input is Conv-D, and the CNN network taking the RGB image as input is Conv-RGB.
Preferably, the specific process of step S4 is as follows:
s41, in the RPN stage of the FasterR-CNN algorithm, only using a depth image feature map output by a Conv-D network as an input to generate a region of interest D-ROI of a depth image;
s42, generating RGB image interesting regions RGB-ROI of the RGB image feature map output by Conv-RGB for each D-ROI in a one-to-one correspondence mode by utilizing the mapping relation between the RGB image and the depth image.
Preferably, the specific process of step S5 is as follows:
s51, pooling each set of D-ROI and RGB-ROI to a fixed size using ROIpooling layer;
s52, performing serial stacking fusion on the feature maps of the pooled D-ROI and the RGB-ROI, namely stacking D (D is not less than 1 and not more than D) on channels at the same spatial position i, j (i is not less than 1 and not more than H, j is not less than 1 and not more than W), and outputting the feature map with the channel number of 2D after stacking for the feature map with the channel number of two channels of D:
wherein,RGB-ROI and D-ROI before fusion and feature map after fusion are respectively shown.
Preferably, the specific process of step S6 is as follows:
s61, continuing to use the NOC structure convolution formed by combining a plurality of convolution layers to further extract the fusion characteristics from the fused characteristic diagram;
and S62, pooling by using the global average pooling layer, and inputting the pooled result to a classifier and a regressor of Fast R-CNN.
Preferably, the RGB-D sensor of step S11 is a kinect2.0 sensor.
Preferably, Conv-D and Conv-RGB of the step S31 are the same convolution structure and both are ZF structures.
Preferably, in step S51, the feature map is divided into 6 × 6 grids by using a spatial pyramid pooling method, and a feature map with a fixed size of 6 × 6 and 256 channels is generated for each grid by using a maximum pooling method.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
(1) the invention provides a double-current RGB-D Faster R-CNN algorithm, which integrates respective superiorities of an RGB image and a depth image, greatly improves the identification precision and does not increase too much time cost.
(2) The invention greatly compresses the size of the model through the structural design of full convolution and simultaneously ensures the real-time property.
(3) The method establishes the RGB-D video image database of the lactating sow, and provides a data source for the subsequent algorithm design and model training based on the RGB-D video image.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method of identifying the posture of a lactating sow by a dual stream RGB-D FasterR-CNN of the present invention;
FIG. 2 is a view of a sow posture recognition model structure of the double-current RGB-D Faster R-CNN of the present invention;
FIG. 3 is a schematic diagram of the recognition result of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The depth learning can obtain better image feature representation by fusing different image features, and in the target recognition task, the complementary property between the RGB image and the depth image feature can be extracted by fusing the RGB-D feature, so that the robustness of feature learning can be improved, and the feature with target recognition capability can be obtained. The invention provides an end-to-end fusion strategy of an RGB-D characteristic extraction stage, namely, firstly, two CNN networks are respectively used for extracting the difference characteristics of two data, and then the CNN networks are continuously used for extracting the inherent complementary characteristic information between the two data characteristics after the two characteristics are fused. Finally, based on the Faster R-CNN algorithm, a double-current RGB-D Faster R-CNN algorithm capable of fully utilizing RGB-D data information is provided for identifying the posture of the lactating sow with high precision.
In the first stage shown in the figure of fig. 1, to generate the ROI of the RGB-D video image, the CNN network Conv-D of the depth image stream is first used to extract the depth image features by inputting the depth image, and the CNN network Conv-RGB of the RGB image stream is then used to extract the RGB image features by inputting the RGB image. And then generating a region of interest D-ROI by using the RPN network and taking the depth image feature map as input, generating RGB-ROIs in one-to-one correspondence by using a mapping relation between the RGB image and the depth image, namely the RGB-D mapping relation, Pooling the feature maps of the D-ROI and the RGB-ROI into a fixed size by using an ROI-Pooling layer respectively, and then merging the feature maps of the D-ROI and the RGB-ROI by using splicing. And the second stage is a stage of classifying and identifying the ROI by Fast R-CNN, the fused features are continuously convolved by using an NOC structure so as to further fuse the RGB image and the depth image features, extract the robust RGB-D features, and finally, the robust RGB-D features are processed by a classifier and a regressor to output an identification result.
The depth learning can obtain better image feature representation by fusing different image features, and in the target recognition task, the complementary property between the RGB image and the depth image feature can be extracted by fusing the RGB-D feature, so that the robustness of feature learning can be improved, and the feature with target recognition capability can be obtained. The invention provides an end-to-end fusion strategy of an RGB-D characteristic extraction stage, namely, firstly, two CNN networks are respectively used for extracting the difference characteristics of two data, and then the CNN networks are continuously used for extracting the inherent complementary characteristic information between the two data characteristics after the two characteristics are fused. Finally, based on the Faster R-CNN algorithm, a double-current RGB-D Faster R-CNN algorithm capable of fully utilizing RGB-D data information is provided for identifying the posture of the lactating sow with high precision. The method is based on an Nvidia GTX 980Ti model GPU hardware platform, a Caffe deep learning framework is built on an Ubuntu14.04 operating system, and a python is used as a programming language to train and test a sow posture recognition model.
The concrete implementation is as follows:
collecting RGB-D video images of a lactating sow, wherein the RGB-D video images comprise RGB images and depth images, and establishing a sow posture recognition RGB-D video image library;
step two, obtaining a mapping relation between the RGB image and the depth image through camera calibration calculation;
thirdly, based on a Faster R-CNN algorithm, respectively convolving the RGB image and the depth image by using two CNN networks;
generating region-of-interest D-ROIs on the basis of the depth image feature map by using only one RPN, and generating the region-of-interest RGB-ROIs of the RGB image feature map in one-to-one correspondence to each D-ROI through the mapping relation between the RGB image and the depth image;
step five, Pooling each D-ROI and each RGB-ROI by using an ROI Pooling layer to be a fixed size, and fusing the features of each group of the D-ROI and the RGB-ROI after Pooling by using a splicing fusion method;
step six, continuously using the NOC structure convolution formed by combining a plurality of convolution layers to further extract the fusion characteristics, namely RGB-D characteristics, passing through a global average pooling layer and then processing by a classifier and a regressor to obtain a double-current RGB-D FasterR-CNN sow posture identification model, and outputting an identification result;
and seventhly, training a double-current RGB-DFasterR-CNN sow posture recognition model by using a training set in a sow posture recognition RGB-D video image library, testing the model performance by using a test set, and finally screening an optimal performance model.
The database establishing method of the first step specifically comprises the following steps:
1) data acquisition is carried out on 28-column pigs, the size of the acquired pigsty is about 3.8m multiplied by 2.0m, and each pigsty comprises one suckling sow and 8-10 piglets. Using a Kinect v2.0 sensor of Microsoft, looking down the shooting from top to bottom at the height of 190-270 cm from the ground of the pigsty, and acquiring RGB-D data at the speed of 5 frames per second. The pixels of the acquired RGB image are 1080 × 1920, and the RGB image is scaled to 540 × 960 resolution in order to save a video memory (GPU memory) and increase the calculation speed of processing the RGB image in the subsequent algorithm use process. The depth image is acquired with a resolution of 424 x 512 pixels, and the pixel values of the depth image reflect the distance of the object from the sensor.
2) Selecting one group of continuous video images at intervals of 10-40 frames randomly from 21 columns of data shot in the first three times, randomly sampling RGB-D image groups with 5 types of postures, standing 2522 groups, sitting 2568 groups, lying 2505 groups, lying abdomen 2497 groups and lying side 2508 groups, and taking 12600 groups of RGB-D images as an original training set. And randomly sampling 1127 standing groups, 1033 sitting groups, 1151 prone groups, 1076 abdominal lying groups and 1146 lateral lying groups from the 7 columns of data of the fourth shooting, and taking 5533 groups of RGB-D images as a test set for testing the performance of the model. Wherein each set of RGB-D images comprises an RGB image and its corresponding depth image. In the total data set, the training set accounts for about 70% and the testing set accounts for about 30%.
3) Firstly, median filtering and adaptive histogram equalization processing are carried out on a depth image acquired after sampling so as to improve the contrast, and the RGB image is not preprocessed. In the manual labeling stage, for data of each group of RGB-D video images, manual labeling is carried out on the depth images, namely, an outer boundary frame of a sow is labeled on each depth image in a data set, and coordinate position information of the sow in the image is obtained. In order to enhance the generalization capability and robustness of subsequent model training, data amplification processing is carried out on original training set data in an experiment, namely clockwise rotation of 90 degrees, 180 degrees and 270 degrees, and horizontal mirror image processing and vertical mirror image processing are carried out. The processed RGB-D data reached 75600 sets as training data sets for training the model.
TABLE 1 introduction of 5-class postures of lactating sows
The method for obtaining the RGB-D mapping relation by using the camera calibration method in the second step specifically comprises the following steps:
obtaining an internal reference matrix K of the RGB image by using a camera calibration methodrgbInternal reference matrix K of depth imagedObtaining RGB external reference matrix R aiming at the same checkerboard imagergbAnd TrgbAnd an appearance matrix R of the depth imagedAnd TdHere, the checkerboard image is a checkerboard image printed in an experiment used for camera calibration. Let the non-homogeneous pixel coordinate of RGB image be Prgb=[Urgb,Vrgb,1]TThe non-homogeneous pixel coordinate of the depth image is Pd=[Ud,Vd,1]T. Then the depth image coordinates are mapped to a rotation matrix R of RGB image coordinates, and the translation matrices T are respectively:
therefore, the mapping relationship between the pixel coordinates of the depth image and the pixel coordinates of the RGB image is:
Prgb=(R*Zd*Pd+T)/Zrgb
from the above equation, we can obtain the coordinate value P of the depth imagedAnd its pixel value ZdAnd a shooting distance ZrgbObtaining the coordinate value P of the RGB image corresponding to the point mappingrgb
The method for respectively convolving the RGB image and the depth image by using two CNN networks based on the FasterR-CNN algorithm specifically comprises the following steps:
1) based on the FasterR-CNN algorithm, taking the ZF network as an example, the network structure firstly uses a series of convolution layers and maxporoling layers of the ZF network structure to independently process the two data, and extracts the characteristics of the two image data. Conv 1-Conv 5, Pool1 and Pool2 form Conv-D for extracting depth image features, and Conv1_ 1-Conv 5_1, Pool1_1 and Pool2_1 form Conv-RGB for extracting RGB image features. Conv-D input is a depth image of 512 × 424 × 1, output is a feature map of 33 × 28 size and channel 256, Conv-RGB input is an RGB image of 960 × 540 × 3, output is a feature map of 61 × 35 size and channel 256, and output feature maps are shown in FIG. 2.
The method for using only one ROI of the RGB-D data features of the RPN network specifically comprises the following steps:
1) in the RPN stage, the feature maps output by double streams share one RPN network, namely, a D-ROI is generated on the basis of the depth map feature map, and an RGB-ROI is generated for the RGB image feature map through an RGB-D mapping relation. Wherein for the RPN network, at each sliding window position, 9 anchor points of 3 area scales {96,192,384} and 3 length-width ratios {1:1,1:3,3:1} are respectively taken.
Step five, the method for performing feature fusion after pooling each group of D-ROI and RGB-ROI into a fixed size specifically comprises the following steps:
1) and respectively generating D-ROI and RGB-ROI with different sizes by using two ROI-Pooling layers (spatial pyramid Pooling layers) through a grid of H W (H, W is set to be 6), and generating the feature maps with fixed sizes by adopting a maximum value Pooling mode, namely Pooling each ROI feature map into a feature map with the size of 6W and the number of channels of 256.
2) In the feature fusion phase. Define the fusion function F of the ROI fusion layer: is a feature map of an RGB-ROI,then the signature of the D-ROI, t the t-th set of ROI signatures (t is 128 in the experiments herein), H and W the height and width of the signature size, respectively, and D the number of channels, the RGB-ROI and D-ROI sizes in the methods herein are identical after ROI-Pooling (set to 6 x 6 in the experiments). Outputting feature map after fusion The input feature size is the same as that of the input feature, and is also H, W, and the number of channels is D'. For ease of discussion, the subscript t is omitted for analysis, regardless of the number of features in the set (each set using the same feature fusion).
The splicing fusion formula is Ycat=Fcat(Xrgb,Xd). I.e. stacking two signatures in series. Stacking D (D is more than or equal to 1 and less than or equal to D) on the channels for the same spatial positions i, j (i is more than or equal to 1 and less than or equal to H, j is more than or equal to 1 and less than or equal to W), and outputting the number of the channels of the feature map after stacking to be 2D for the feature map with the number of the two channels being D:
wherein,the serial stacking does not directly circulate information between the two feature maps, but the information circulation and information fusion of the two data can be realized after the convolution of the subsequent convolution layer. As shown in fig. 2.
The method for using Fast R-CNN with NOC structure for the fused features in the sixth step specifically comprises the following steps:
and continuously convolving the fused feature graph by using a NOC structure consisting of four convolutional layers Conv6, Conv7, Conv8 and Conv9 at the Fast R-CNN stage of the fused features to promote the circulation of information among fused feature channels, thereby further abstracting the features of RGB-D data, and finally connecting a classifier and a regressor of the Fast R-CNN after passing through a global average pooling layer. As shown in fig. 2.
Step seven, training a double-current RGB-D FasterR-CNN model by using a training set, testing the performance of the model by using a test set, and finally screening the optimal performance model specifically comprises the following steps:
model training is carried out by using a training set in a prepared RGB-D database, the mini batch of image input is set to be 1, impulse is set to be 0.9, and weight attenuation coefficient is set to be 5-4Maximum number of iterations 14 x 105The basic learning rate is 10-4Attenuation step of 6 x 105The attenuation coefficient gamma was 0.1. At 8 x 105After each iteration, every 1 × 105And (4) storing a model in the secondary iteration, and selecting the model with the highest precision of the test set as comparison. And taking the optimal model as a final model.
The experimental results of the present invention are explained in detail below:
the invention adopts 3 evaluation indexes accepted in the industry to count the sow posture recognition results of a test set, and compares the method provided by the invention with a method only using a Depth image (Depth image), a method only using an RGB image (RGBIM), a front fusion method (RGB-D early fusion) for simply splicing RGB-D as four-channel image input and a post fusion method (RGB-Dlater fusion) for fusing output results by adopting two CNNs, two RPNs and two Fast R-CNNs, wherein in the front fusion and the post fusion, the resolution of the Depth image is scaled to 540 x 960 and then is registered with the RGB image as input, and the results are as follows:
the method adopts AP (Average Precision), MAP (Mean Average Precision), identification speed and model size for evaluation. As shown in table 2 below:
TABLE 2 identification Performance comparison of models
After the RGB data and the depth image data are fused, the Average accuracy of APs (Average Precision) of five postures of standing, sitting, lying prone, lying abdomen and lying on side respectively reaches 99.74%, 96.49%, 90.77%, 90.91% and 99.45%, the MAP (Mean Average Precision) of the five postures reaches 95.47%, which is more than 7.11% of a method only using RGB images, which is more than 5.36% of a method only using depth images, which is more than 1.55% of a pre-fusion method, and which is more than 0.15% of a post-fusion method. When the recognition speed reaches 12.3FPS, the real-time recognition requirement can be met. The model size is only 70.1MB, which is far smaller than other methods, and great superiority is shown. In conclusion, the method of the invention has excellent performance of identification precision and model size, and simultaneously maintains real-time identification performance.
The method for recognizing the posture of the lactating sow by the double-current RGB-D FasterR-CNN provided by the invention is described in detail, a specific example is applied in the method for explaining the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method of identifying the posture of a lactating sow as claimed in claim 1 comprising the steps of:
s1, collecting RGB-D video images of the lactating sows, including RGB images and depth images, and establishing a sow posture recognition RGB-D video image library;
s2, calculating and obtaining a mapping relation between the RGB image and the depth image by a camera calibration method;
s3, based on a FasterR-CNN algorithm, respectively convolving the RGB image and the depth image by using two CNN networks to obtain an RGB image feature map and a depth image feature map;
s4, only using one RPN network, generating interested region D-ROI based on the depth image feature map, and generating interested region RGB-ROI of the RGB image feature map for each D-ROI in one-to-one correspondence through the mapping relation between the RGB image and the depth image;
s5, Pooling each D-ROI and each RGB-ROI to be fixed size by using an ROI Pooling layer, and fusing the features of each group of pooled D-ROI and RGB-ROI feature maps by using a splicing fusion method;
s6, further extracting fusion characteristics from the fused characteristic diagram by using FastR-CNN of an NOC structure, handing the fusion characteristics to a classifier and a regressor after passing through a global average pooling layer to obtain a double-current RGB-D Faster R-CNN sow posture identification model, and outputting an identification result;
s7, acquiring a training set and a test set from the sow posture recognition RGB-D video image library, training a double-current RGB-D Faster R-CNN sow posture recognition model by using the training set, testing the performance of the model by using the test set, and finally screening an optimal performance model.
2. The method for recognizing the posture of nursing sows according to claim 1, wherein the specific process of step S1 is as follows:
s11, fixing the RGB-D sensor to overlook to shoot and collect RGB-D video images of the pigsty;
s12, sampling and acquiring a training set and a test set from the acquired RGB-D video image data, wherein the training data set accounts for 70%, and the test set accounts for 30% to test the model performance;
s13, preprocessing the depth image in a training set and a testing set, wherein the preprocessing comprises filtering, denoising and image enhancement, and then marking the target of the preprocessed depth image, namely marking a surrounding frame and a posture category outside the target, wherein the RGB image does not need to be processed; and then, carrying out rotation and mirror image amplification on the processed training set data for training the model.
3. The method for recognizing the posture of nursing sows according to claim 1, wherein the specific process of step S2 is as follows:
s21, obtaining the internal reference matrix K of the RGB image by using the camera calibration methodrgbInternal reference matrix K of depth imagedObtaining an external reference matrix R of the RGB image aiming at the same checkerboard image used for camera calibrationrgbAnd TrgbAnd an appearance matrix R of the depth imagedAnd Td(ii) a Let the non-homogeneous pixel coordinate of RGB image be Prgb=[Urgb,Vrgb,1]TThe non-homogeneous pixel coordinate of the depth image is Pd=[Ud,Vd,1]T(ii) a Then the depth image coordinates are mapped to a rotation matrix R of RGB image coordinates, and the translation matrices T are respectively:
s22, the mapping relation between the pixel coordinates of the depth image and the pixel coordinates of the RGB image is as follows:
Prgb=(R*Zd*Pd+T)/Zrgb
from the above equation, it is possible to obtain the coordinate value P of the depth imagedAnd its pixel value ZdAnd a shooting distance ZrgbObtaining the coordinate value P of the RGB image corresponding to the point mappingrgb
4. The method for recognizing the posture of nursing sows according to claim 1, wherein the specific process of step S3 is as follows:
in the shared convolution layer part of the Faster R-CNN, two identical CNN networks are used, with the depth image and the RGB image as input, respectively, the CNN network with the depth image as input being Conv-D, and the CNN network with the RGB image as input being Conv-RGB.
5. The method for recognizing the posture of nursing sows according to claim 4, wherein the specific process of step S4 is as follows:
s41, in the RPN stage of the Faster R-CNN algorithm, only one RPN network is used to generate a region of interest D-ROI of the depth image by taking a depth image feature map output by Conv-D as input;
s42, generating RGB image interesting regions RGB-ROI of the RGB image feature map output by Conv-RGB for each D-ROI in a one-to-one correspondence mode by utilizing the mapping relation between the RGB image and the depth image.
6. The method for recognizing the posture of nursing sows according to claim 1, wherein the specific process of step S5 is as follows:
s51, pooling to a fixed size using an ROI posing layer for each set of D-ROI and RGB-ROI;
s52, performing series stacking fusion on the feature maps of the pooled D-ROI and the RGB-ROI, namely stacking D on channels at the same spatial position i, j, wherein i is more than or equal to 1 and less than or equal to H, j is more than or equal to 1 and less than or equal to W, D is more than or equal to 1 and less than or equal to D, and for the feature maps with two channels of D, the number of the channels of the stacked output feature maps is 2D:
wherein,RGB-ROI and D-ROI before fusion and feature map after fusion are respectively shown.
7. The method for recognizing the posture of nursing sows according to claim 1, wherein the specific process of step S6 is as follows:
s61, continuing to use the NOC structure convolution formed by combining a plurality of convolution layers to further extract the fusion characteristics from the fused characteristic diagram;
and S62, pooling by using the global average pooling layer, and inputting the pooled result to a classifier and a regressor of Fast R-CNN.
8. The method for recognizing the posture of nursing sows as claimed in claim 2, wherein said RGB-D sensor of step S11 is Kinect2.0 sensor.
9. The method for recognizing the posture of nursing sows as claimed in claim 4, wherein Conv-D and Conv-RGB of said step S31 are the same convolution structure and both ZF structures.
10. The method of claim 6, wherein the step S51 is performed in a spatial pyramid pooling manner, wherein the feature map is divided into 6 × 6 grids, and a maximum pooling manner is performed on each grid to generate a feature map with a fixed size of 6 × 6 and a channel number of 256.
CN201910040870.4A 2019-01-16 2019-01-16 Method for recognizing postures of lactating sows through double-current RGB-D Faster R-CNN Active CN109766856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910040870.4A CN109766856B (en) 2019-01-16 2019-01-16 Method for recognizing postures of lactating sows through double-current RGB-D Faster R-CNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910040870.4A CN109766856B (en) 2019-01-16 2019-01-16 Method for recognizing postures of lactating sows through double-current RGB-D Faster R-CNN

Publications (2)

Publication Number Publication Date
CN109766856A true CN109766856A (en) 2019-05-17
CN109766856B CN109766856B (en) 2022-11-15

Family

ID=66452306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910040870.4A Active CN109766856B (en) 2019-01-16 2019-01-16 Method for recognizing postures of lactating sows through double-current RGB-D Faster R-CNN

Country Status (1)

Country Link
CN (1) CN109766856B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309786A (en) * 2019-07-03 2019-10-08 华南农业大学 A kind of milking sow posture conversion identification method based on deep video
CN110378953A (en) * 2019-07-17 2019-10-25 重庆市畜牧科学院 A kind of method of spatial distribution behavior in intelligent recognition swinery circle
CN110532854A (en) * 2019-07-11 2019-12-03 中国农业大学 A kind of live pig mounting behavioral value method and system
CN110598658A (en) * 2019-09-18 2019-12-20 华南农业大学 Convolutional network identification method for sow lactation behaviors
CN111104921A (en) * 2019-12-30 2020-05-05 西安交通大学 Multi-mode pedestrian detection model and method based on Faster rcnn
CN111368666A (en) * 2020-02-25 2020-07-03 上海蠡图信息科技有限公司 Living body detection method based on novel pooling and attention mechanism double-current network
CN111753658A (en) * 2020-05-20 2020-10-09 高新兴科技集团股份有限公司 Post sleep warning method and device and computer equipment
CN112101259A (en) * 2020-09-21 2020-12-18 中国农业大学 Single pig body posture recognition system and method based on stacked hourglass network
CN112088795A (en) * 2020-07-07 2020-12-18 南京农业大学 Method and system for identifying postures of piggery with limiting fence based on laser positioning
CN113313688A (en) * 2021-05-28 2021-08-27 武汉乾峯智能科技有限公司 Energetic material medicine barrel identification method and system, electronic equipment and storage medium
CN113822185A (en) * 2021-09-09 2021-12-21 安徽农业大学 Method for detecting daily behavior of group health pigs
CN113869271A (en) * 2021-10-13 2021-12-31 南京华捷艾米软件科技有限公司 Face detection method and device and electronic equipment
CN115019391A (en) * 2022-05-27 2022-09-06 南京农业大学 Piglet milk eating behavior detection system based on YOLOv5 and C3D
CN116519106A (en) * 2023-06-30 2023-08-01 中国农业大学 Method, device, storage medium and equipment for determining weight of live pigs

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102282570A (en) * 2008-10-30 2011-12-14 聪慧系统公司 System and method for stereo-view multiple animal behavior characterization
CN102521563A (en) * 2011-11-19 2012-06-27 江苏大学 Method for indentifying pig walking postures based on ellipse fitting
CN104881636A (en) * 2015-05-08 2015-09-02 中国农业大学 Method and device for identifying lying behavior of pig
CN106295558A (en) * 2016-08-08 2017-01-04 华南农业大学 A kind of pig Behavior rhythm analyzes method
CN106456057A (en) * 2014-03-21 2017-02-22 凯耐特赛斯公司 Motion capture and analysis system for assessing mammalian kinetics
CN106778784A (en) * 2016-12-20 2017-05-31 江苏大学 Pig individual identification and drinking behavior analysis method based on machine vision
CN106815579A (en) * 2017-01-22 2017-06-09 深圳市唯特视科技有限公司 A kind of motion detection method based on multizone double fluid convolutional neural networks model
CN107527351A (en) * 2017-08-31 2017-12-29 华南农业大学 A kind of fusion FCN and Threshold segmentation milking sow image partition method
CN107844797A (en) * 2017-09-27 2018-03-27 华南农业大学 A kind of method of the milking sow posture automatic identification based on depth image
CN108074224A (en) * 2016-11-09 2018-05-25 环境保护部环境规划院 A kind of terrestrial mammal and the monitoring method and its monitoring device of birds
CN108830144A (en) * 2018-05-03 2018-11-16 华南农业大学 A kind of milking sow gesture recognition method based on improvement Faster-R-CNN
CN108846326A (en) * 2018-05-23 2018-11-20 盐城工学院 The recognition methods of pig posture, device and electronic equipment
CN108921037A (en) * 2018-06-07 2018-11-30 四川大学 A kind of Emotion identification method based on BN-inception binary-flow network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102282570A (en) * 2008-10-30 2011-12-14 聪慧系统公司 System and method for stereo-view multiple animal behavior characterization
CN102521563A (en) * 2011-11-19 2012-06-27 江苏大学 Method for indentifying pig walking postures based on ellipse fitting
CN106456057A (en) * 2014-03-21 2017-02-22 凯耐特赛斯公司 Motion capture and analysis system for assessing mammalian kinetics
CN104881636A (en) * 2015-05-08 2015-09-02 中国农业大学 Method and device for identifying lying behavior of pig
CN106295558A (en) * 2016-08-08 2017-01-04 华南农业大学 A kind of pig Behavior rhythm analyzes method
CN108074224A (en) * 2016-11-09 2018-05-25 环境保护部环境规划院 A kind of terrestrial mammal and the monitoring method and its monitoring device of birds
CN106778784A (en) * 2016-12-20 2017-05-31 江苏大学 Pig individual identification and drinking behavior analysis method based on machine vision
CN106815579A (en) * 2017-01-22 2017-06-09 深圳市唯特视科技有限公司 A kind of motion detection method based on multizone double fluid convolutional neural networks model
CN107527351A (en) * 2017-08-31 2017-12-29 华南农业大学 A kind of fusion FCN and Threshold segmentation milking sow image partition method
CN107844797A (en) * 2017-09-27 2018-03-27 华南农业大学 A kind of method of the milking sow posture automatic identification based on depth image
CN108830144A (en) * 2018-05-03 2018-11-16 华南农业大学 A kind of milking sow gesture recognition method based on improvement Faster-R-CNN
CN108846326A (en) * 2018-05-23 2018-11-20 盐城工学院 The recognition methods of pig posture, device and electronic equipment
CN108921037A (en) * 2018-06-07 2018-11-30 四川大学 A kind of Emotion identification method based on BN-inception binary-flow network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAN ZHENG ETC.: "Automatic recognition of lactating sow postures from depth images by deep", 《COMPUTERS AND ELECTRONICS IN AGRICULTURE》 *
薛月菊等: "基于改进Faster R-CNN 识别深度视频图像哺乳母猪姿态", 《农业工程学报》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309786A (en) * 2019-07-03 2019-10-08 华南农业大学 A kind of milking sow posture conversion identification method based on deep video
CN110532854B (en) * 2019-07-11 2021-11-26 中国农业大学 Live pig crawling and crossing behavior detection method and system
CN110532854A (en) * 2019-07-11 2019-12-03 中国农业大学 A kind of live pig mounting behavioral value method and system
CN110378953A (en) * 2019-07-17 2019-10-25 重庆市畜牧科学院 A kind of method of spatial distribution behavior in intelligent recognition swinery circle
CN110598658A (en) * 2019-09-18 2019-12-20 华南农业大学 Convolutional network identification method for sow lactation behaviors
CN110598658B (en) * 2019-09-18 2022-03-01 华南农业大学 Convolutional network identification method for sow lactation behaviors
CN111104921A (en) * 2019-12-30 2020-05-05 西安交通大学 Multi-mode pedestrian detection model and method based on Faster rcnn
CN111368666A (en) * 2020-02-25 2020-07-03 上海蠡图信息科技有限公司 Living body detection method based on novel pooling and attention mechanism double-current network
CN111368666B (en) * 2020-02-25 2023-08-18 上海蠡图信息科技有限公司 Living body detection method based on novel pooling and attention mechanism double-flow network
CN111753658A (en) * 2020-05-20 2020-10-09 高新兴科技集团股份有限公司 Post sleep warning method and device and computer equipment
CN112088795A (en) * 2020-07-07 2020-12-18 南京农业大学 Method and system for identifying postures of piggery with limiting fence based on laser positioning
CN112088795B (en) * 2020-07-07 2022-04-29 南京农业大学 Method and system for identifying postures of piggery with limiting fence based on laser positioning
CN112101259A (en) * 2020-09-21 2020-12-18 中国农业大学 Single pig body posture recognition system and method based on stacked hourglass network
CN113313688A (en) * 2021-05-28 2021-08-27 武汉乾峯智能科技有限公司 Energetic material medicine barrel identification method and system, electronic equipment and storage medium
CN113313688B (en) * 2021-05-28 2022-08-05 武汉乾峯智能科技有限公司 Energetic material medicine barrel identification method and system, electronic equipment and storage medium
CN113822185A (en) * 2021-09-09 2021-12-21 安徽农业大学 Method for detecting daily behavior of group health pigs
CN113869271A (en) * 2021-10-13 2021-12-31 南京华捷艾米软件科技有限公司 Face detection method and device and electronic equipment
CN115019391A (en) * 2022-05-27 2022-09-06 南京农业大学 Piglet milk eating behavior detection system based on YOLOv5 and C3D
CN116519106A (en) * 2023-06-30 2023-08-01 中国农业大学 Method, device, storage medium and equipment for determining weight of live pigs
CN116519106B (en) * 2023-06-30 2023-09-15 中国农业大学 Method, device, storage medium and equipment for determining weight of live pigs

Also Published As

Publication number Publication date
CN109766856B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN109766856B (en) Method for recognizing postures of lactating sows through double-current RGB-D Faster R-CNN
CN108830144B (en) Lactating sow posture identification method based on improved Faster-R-CNN
Tian et al. Automated pig counting using deep learning
Kamilaris et al. Deep learning in agriculture: A survey
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN105335725B (en) A kind of Gait Recognition identity identifying method based on Fusion Features
CN106897673B (en) Retinex algorithm and convolutional neural network-based pedestrian re-identification method
CN107844797A (en) A kind of method of the milking sow posture automatic identification based on depth image
Nielsen et al. Vision-based 3D peach tree reconstruction for automated blossom thinning
CN114241031A (en) Fish body ruler measurement and weight prediction method and device based on double-view fusion
CN113762009B (en) Crowd counting method based on multi-scale feature fusion and double-attention mechanism
Badhan et al. Real-time weed detection using machine learning and stereo-vision
Wang et al. An efficient attention module for instance segmentation network in pest monitoring
CN108664942A (en) The extracting method and video classification methods of mouse video multidimensional characteristic value
CN109684941A (en) One kind picking region partitioning method based on MATLAB image procossing litchi fruits
CN110969182A (en) Convolutional neural network construction method and system based on farmland image
Zine-El-Abidine et al. Assigning apples to individual trees in dense orchards using 3D colour point clouds
CN113011404A (en) Dog leash identification method and device based on time-space domain features
CN115100688A (en) Fish resource rapid identification method and system based on deep learning
CN109166127B (en) Wearable plant phenotype sensing system
Sujatha et al. Enhancing Object Detection with Mask R-CNN: A Deep Learning Perspective
CN111985472A (en) Trough hay temperature image processing method based on artificial intelligence and active ball machine
CN117197836A (en) Traditional Chinese medicine physique identification method based on multi-modal feature depth fusion
CN116935296A (en) Orchard environment scene detection method and terminal based on multitask deep learning
CN108967246B (en) Shrimp larvae positioning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant