CN115909401A

CN115909401A - Cattle face identification method and device integrating deep learning, electronic equipment and medium

Info

Publication number: CN115909401A
Application number: CN202211460737.2A
Authority: CN
Inventors: 曹寅; 秦俊平; 任维; 赵志燕; 马千里; 任家琪
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-04-04

Abstract

The invention provides a cattle face identification method integrating deep learning, which comprises the following steps: acquiring RGB (red, green and blue) images and depth images of sample cattle face pairs, making an image data set of the sample cattle, sending the image data set to a cattle face segmentation algorithm, segmenting cattle faces from image backgrounds of the RGB images and the depth images, then constructing a sample cattle face data set, and marking a label for classifying whether the two sample cattle pairs come from the same cattle or not; inputting the data into a cattle face recognition network with depth information and image information fused for training until the cattle face recognition network distinguishes pictures of different sample cattle in a sample cattle face data set, and ending the training to obtain a trained cattle face recognition network; and finally, recognizing the cow face by using the trained cow face recognition network to determine the identity information of the cow. The method improves the cow face recognition technology by adopting the method of fusing the depth information and the image information, reduces the influence of the change of the cow posture and the illumination on the cow face recognition, and improves the robustness of the cow face recognition in different scenes.

Description

Cattle face identification method and device integrating deep learning, electronic equipment and medium

Technical Field

The invention belongs to the technical field of visual recognition, and particularly relates to a method and a device for recognizing a cattle face by fusing deep learning, electronic equipment and a medium.

Background

And (3) combining the RGB image and the depth image. A cow face segmentation algorithm is provided, and cow face segmentation is carried out by combining an RGB (red, green and blue) image and a depth image. The pixel point value of the depth image is used for representing the distance between each point of a shot object and a camera, and the characteristic that the pixel point value is not influenced by the color of the object is utilized, the depth image is converted into an HSV space picture, and the HSV space picture is divided by using a threshold value of V (brightness). A method of obtaining a segmentation threshold is proposed. According to the characteristic that the last large waveform of the histogram of the depth map is the wave of the cow face in the cow face identification scene, threshold segmentation is carried out by taking the last large trough of the histogram of the depth map as a segmentation threshold. And (4) using an RGB image segmentation algorithm GrabCut as an auxiliary segmentation. The cattle face segmentation algorithm avoids adverse effects on segmentation caused by similarity or similarity of the cattle face color and the background color, the foreground of the segmented cattle face picture is separated from the background, and subsequent interference on the operation of the cattle face picture is avoided.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method, a device, electronic equipment and a readable storage medium for recognizing the cattle face by fusing deep learning, wherein the method adopts a method of fusing depth information and image information to improve the cattle face recognition technology, reduces the influence of the change of the posture and illumination of the cattle on the cattle face recognition, and improves the robustness of the cattle face recognition in different scenes.

In order to solve the technical problems, the invention adopts the technical scheme that: a cattle face identification method integrating deep learning is characterized by comprising the following steps:

acquiring an RGB (red, green and blue) image and a depth image of a sample cattle face in pair to prepare an image data set of the sample cattle;

inputting an image data set of a sample cow into a cow face segmentation algorithm combining an RGB (red, green and blue) image and a depth image, segmenting a cow face from image backgrounds of the RGB image and the depth image to obtain an RGB cow face image and a depth cow face image, forming a cow face image pair according to the RGB cow face image and the depth cow face image of the sample cow, constructing a sample cow face data set by the cow face image pair of the sample, wherein the sample cow face data set comprises the cow face RGB images and the depth images of a plurality of sample pairs, the cow face RGB image and the depth image of each sample pair comprise the cow face RGB images and the depth images of two samples, the two sample cows are from different cows or the same cow, the cow face RGB image and the depth image of each sample pair are labeled, and the label is used for classifying whether the two sample cow pairs are from the same cow;

inputting a sample cattle face data set into a cattle face recognition network with depth information and image information fused for training until the cattle face recognition network distinguishes pictures of different sample cattle in the sample cattle face data set, and finishing training to obtain a trained cattle face recognition network;

firstly, identity registration is carried out on paired RGB (red, green and blue) images and depth images of each cow in a cow farm, one cow is selected as a to-be-recognized cow, the paired RGB images and depth images of the to-be-recognized cow are obtained, the to-be-recognized cow is divided into RGB (red, green, blue) cow face images and depth cow face images of the to-be-recognized cow through a cow face division algorithm combining the RGB images and the depth images, and then the RGB images and the depth cow face images are input into a trained cow face recognition network to obtain identity information of the to-be-recognized cow.

Further, in S1, the depth camera is a binocular camera, and a depth map and an RGB map of the same picture of the sample bovine face are acquired, that is, the RGB map and the depth map of the sample bovine face are paired.

Further, in the step S2, the image data set of the sample cow is input into a cow face segmentation algorithm combining the RGB image and the depth map, and the cow face is segmented from the picture backgrounds of the RGB image and the depth map to obtain an RGB cow face image and a depth cow face image; the method specifically comprises the following steps:

s201, clustering and displaying dark to bright pixels in a depth map in an image data set of the sample cattle to obtain a histogram of the depth map;

s202, segmenting valleys of the histogram of the depth map in S201 by adopting a valley obtaining algorithm based on continuous wavelet transformation, calculating a bovine face rectangular frame coordinate array in the depth map according to a threshold value of the last valley, and finally obtaining a bovine face rectangular frame in the RGB map according to the bovine face rectangular frame coordinate array in the depth map;

s203, carrying out GrabCT segmentation algorithm of the RGB image on the cattle face in the cattle face rectangular frame in the RGB image in the S202 to obtain a preliminarily segmented cattle face image A;

s204, based on the brightness of the depth map where the last wave trough is located in the S202 as a threshold, the depth map is segmented, the part larger than the threshold is reserved, and the part smaller than the threshold is removed and replaced by black as a background color; and then obtaining coordinates of foreground pixel points of the depth map, and segmenting the RGB map according to the coordinates to obtain a primarily segmented cattle face map B.

Further, the sample pair in S2 is a positive sample pair or a negative sample pair, when the sample pair is a positive sample pair, it indicates that the two sample cows of the sample pair are from the same cow, and when the sample pair is a negative sample pair, it indicates that the two sample cows of the sample pair are from different cows.

Further, the form of the label in S2 is an array, the array subscript is the sample pair number, and the array value is 0 or 1; in particular, assume X ₁ And X ₂ Sample pairs of two sample cows, Y is the label of the sample pair, X ₁ And X ₂ When the sample pair is from the same cow, the sample pair is matched asA positive sample pair, with label Y set to 1, indicates from the same sample; x ₁ And X ₂ When a sample pair is from a different cow, the sample pair does not match, and is a negative sample pair, and the label Y is set to 0, indicating that the sample pair is from a different sample.

Furthermore, the cow face recognition network fusing the depth information and the image information in the S3 uses a twin network as a main network for cow face recognition, two sub-networks sharing weight are important components of the twin network, and the twin network uses a convolution neural network to map an original image to a high-dimensional feature space; the weight sharing means that in the two convolutional neural networks, trainable parameters such as the weight of a convolutional core in a convolutional layer, the offset of the channel in the convolutional layer, the weight in a fully-connected layer and the offset in the fully-connected layer are updated synchronously along with the increase of the epoch of training.

Further, when the twin network processes the picture information, the input sample x ₁ And x ₂ Are all RGB images; input sample x when a twin network handles three-dimensional modalities ₁ And x ₂ Are depth maps; f (x) comprises a convolution layer, a pooling layer, a dropout layer and a full-link layer; the distance of each pair of samples is obtained by parameter sharing in the training process, the parameter is represented by w, f (x) ₁ ) And f (x) ₂ ) W is used as a parameter; using distance < f (x) ₁ )，f(x ₂ ) Represents the distance between two samples finally obtained after network processing; and according to the labels printed in advance, the loss of the same type of sample is reduced, and the loss of different types of samples is amplified.

Further, the RGB cattle face image and the deep cattle face image are respectively trained independently by using a twin network, the features are extracted and then multiplied by corresponding weights, and finally a late decision level fusion method is used for outputting a sample distance < f (x) through the network ₁ )，f(x ₂ ) The average value is calculated to obtain a prediction result;

the weights are calculated by the following method: converting the RGB cattle face image into HSV space, selecting brightness which is proper in illumination and not influenced by illumination in the definition of a photo as a standard value by taking brightness as a basis for judging illumination intensity, wherein the weight is expressed by the following formula:

W _d ＝1-W _R

in the formula, W _R Weight, V, representing RGB cattle face image _S Standard value representing lightness, V _P Representing lightness values, W, of RGB cattle face images _d Weights representing the depth map;

and the fusion adopts late decision-making fusion and combines the weights by using a voting method in deep learning. The voting method is mostly used for classical classification and identification networks, the probability that the last full-connection layer outputs samples belonging to each class is calculated, and the voting method averages the probabilities of different modes to obtain a final result. In the method, the twin network judges whether the sample pair belongs to the same cattle or not according to the distance, so that the decision-level fusion obtains a prediction result by averaging the sample distances output by the network. The data of the two modes are trained respectively by using the twin network, and then are fused during decision making.

The invention also discloses a cattle face recognition device integrating deep learning, which comprises:

the acquisition module is used for acquiring an RGB (red, green and blue) image and a depth image of the face of the sample cattle in a pair to prepare an image data set of the sample cattle;

the processing module is used for inputting an image data set of a sample cow into a cow face segmentation algorithm combining an RGB (red, green and blue) image and a depth image, segmenting a cow face from image backgrounds of the RGB image and the depth image to obtain an RGB cow face image and a depth cow face image, forming a cow face image pair according to the RGB cow face image and the depth cow face image of the sample cow, constructing a sample cow face data set by the cow face image pair of the sample, wherein the sample cow face data set comprises the cow face RGB image and the depth image of a plurality of sample pairs, the cow face RGB image and the depth image of each sample pair comprise the cow face RGB image and the depth image of two samples, the two sample cows are from different cows or the same cow, the cow face RGB image and the depth image of each sample pair are labeled, and the label is used for classifying whether the two sample cow pairs are from the same cow;

the training module is used for inputting the sample cattle face data set into a cattle face recognition network with depth information and image information fused for training until the cattle face recognition network distinguishes pictures of different sample cattle in the sample cattle face data set from each other, and the training is finished to obtain a trained cattle face recognition network;

the identification module is used for firstly carrying out identity registration on the paired RGB images and depth images of each cow in the cattle farm, selecting one cow as the cow to be identified, acquiring the paired RGB images and depth images of the cow to be identified, segmenting the RGB images and depth images into RGB cow face images and depth cow face images of the cow to be identified through a cow face segmentation algorithm combining the RGB images and the depth images, and inputting the RGB face images and the depth cow face images into a trained cow face identification network to obtain identity information of the cow to be identified.

The invention also discloses an electronic device, which is characterized by comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of fused deep learning face recognition as described above.

The invention also discloses a computer readable storage medium which stores computer instructions, wherein the computer instructions are operated to execute the cattle face identification method integrating deep learning.

Compared with the prior art, the invention has the following advantages:

1. according to the cattle face identification algorithm, in the process of carrying out cattle individual identification by combining depth information and image information with a cattle face identification technology, features such as the contour and color of a cattle face are extracted through a two-dimensional image identification process to serve as identification bases, the depth information of the cattle face is utilized to obtain the length information of the cattle face in the direction from the forehead to the nose of the cattle, and the information serves as an additionally extracted feature to help the two-dimensional cattle face identification to improve the identification effect. In addition, the contour information of the cow face in the depth map is more complete and is not interfered by the body image with the same color as the face. When the natural illumination condition is changed violently, the depth map is not affected. Meanwhile, the method is still effective when the illumination changes, and has extremely high robustness.

The technical solution of the present invention is further described in detail by the accompanying drawings and examples.

Drawings

Fig. 1 is a schematic flow chart of a cattle face identification method with depth information fused according to embodiment 1 of the present invention.

Fig. 2 is a reference diagram of a pair of an RGB image and a depth image of a bovine face provided in embodiment 1 of the present invention.

Fig. 3 is a flowchart of a cow face segmentation algorithm combining an RGB map and a depth map according to embodiment 1 of the present invention.

Fig. 4 is a histogram correspondence diagram of the depth map provided in embodiment 1 of the present invention.

Fig. 5 is a representation of the meaning of different waves in the histogram of the depth map provided in embodiment 1 of the present invention.

Fig. 6 is a schematic diagram of a histogram after the smoothing processing provided in embodiment 1 of the present invention.

FIG. 7 is a comparison graph of different threshold segmentations provided in example 1 of the present invention.

Fig. 8 is a diagram of a twin network structure provided in embodiment 1 of the present invention.

FIG. 9 is a flow chart of a recognition algorithm decision fusion RGB map with a depth map.

Fig. 10 is a comparison graph of the early-stage fusion and late-stage decision fusion effects of the recognition network provided by the present invention.

FIG. 11 is a comparison graph of different methods of identifying networks provided by the present invention.

Detailed Description

Example 1

As shown in fig. 1, an embodiment of the present invention provides a method for recognizing a bovine face by fusing depth information, where the method includes:

s1, obtaining RGB (red, green and blue) images and depth images of a plurality of sample cattle face pairs to prepare an image data set of the sample cattle;

furthermore, a depth camera is adopted for collection, specifically a binocular camera, and two cameras with extremely advanced distances are used for simultaneously capturing, so that a depth map and an RGB map of the same picture at the same time can be conveniently obtained, as shown in FIG. 2;

s2, inputting an image data set of a sample cow into a cow face segmentation algorithm combining an RGB (red, green and blue) image and a depth image, segmenting a cow face from image backgrounds of the RGB image and the depth image to obtain an RGB cow face image and a depth cow face image, forming a cow face image pair according to the RGB cow face image and the depth cow face image of the sample cow, constructing a sample cow face data set by the cow face image pair of the sample, wherein the sample cow face data set comprises a plurality of sample pairs of the cow face RGB image and the depth image, the cow face RGB image and the depth image of each sample pair comprise two sample cow face RGB images and depth images, the two sample cows are from different cows or the same cow, the cow face RGB image and the depth image of each sample pair are labeled, and the label is used for classifying whether the two sample cow pairs are from the same cow;

s3, inputting the sample cattle face data set into a cattle face recognition network with depth information and image information fused for training until the cattle face recognition network distinguishes pictures of different sample cattle in the sample cattle face data set, and finishing training to obtain a trained cattle face recognition network;

and S4, identity registration is carried out on the paired RGB images and depth images of each cow in the cow farm, one cow is selected as the cow to be recognized, the paired RGB images and depth images of the cow to be recognized are obtained, the RGB images and depth images of the cow to be recognized are segmented into the RGB cow face image and the depth cow face image of the cow to be recognized through a cow face segmentation algorithm combining the RGB images and the depth images, and then the RGB image and depth cow face image are input into a trained cow face recognition network to obtain identity information of the cow to be recognized.

In this embodiment, the binocular camera is fixed in the breeding base to collect RGB maps and depth maps, enters the base at an appropriate time after being permitted by managers of the breeding base before data collection, and needs to be disinfected to prevent bacteria or viruses from infecting cattle, the collected sample cattle have multiple heads, and each cattle collects the cattle face RGB maps and depth maps in multiple postures and under multiple illumination conditions.

In this embodiment, the inputting of the image data set of the sample cow into the cow face segmentation algorithm combining the RGB map and the depth map, and segmenting the cow face from the picture background of the RGB map and the depth map to obtain the RGB cow face image and the depth cow face image specifically include:

firstly, segmenting an RGB image by using an image segmentation algorithm based on the RGB image, replacing a background with black after segmenting an ideal region, reserving a foreground part, then segmenting by using a segmentation algorithm based on a depth map, and further removing an inaccurate segmentation part caused by only depending on color segmentation in the segmentation algorithm based on the RGB image;

the depth map based segmentation algorithm is a main body of a cow face segmentation algorithm combining an RGB map and a depth map, and the depth map segmentation algorithm relies on depth map threshold segmentation, so that mistaken segmentation caused by the fact that the color of an object is the same as the background color is effectively avoided. In the segmentation algorithm based on the depth map, compared with a common threshold acquisition method, the method for acquiring the threshold by utilizing the histogram can accurately acquire the threshold under the condition that the color of an object is the same as or similar to the background color, and is suitable for a cattle face segmentation scene.

The cattle face segmentation algorithm combining the RGB image and the depth image comprises the following steps: the method comprises the steps of obtaining a rectangular frame by a segmentation algorithm based on a depth map, using the rectangular frame for the segmentation algorithm based on the RGB map, reusing the segmentation algorithm based on the depth map after segmentation by the segmentation algorithm based on the RGB map, and fusing results of the two algorithms. As shown in fig. 3, the detailed steps are as follows:

s201, clustering dark-to-bright pixels in a depth map D in the image data set of the sample cattle:

f ₁ (D)＝H

f ₁ a function represents a clustering method, wherein H represents light and dark pixel array data obtained after clustering, and a histogram related to a depth map is obtained according to clustering data, as shown in FIG. 4, the left side of the depth map is shown in FIG. 4, and the right side of the depth map is shown in a corresponding histogram;

s202, segmenting valleys of a histogram of the depth map by adopting a valley obtaining algorithm based on continuous wavelet transform, calculating a cattle face rectangular frame coordinate array in the depth map according to a threshold value of the last valley, and finally obtaining a cattle face rectangular frame in the RGB map according to the cattle face rectangular frame coordinate array;

in the depth map of the bovine face, the target is the bovine face closest to the depth camera, and in the histogram of the depth map, a plurality of waves appear due to the complexity of the shooting scene, and objects with continuous depths form one wave independently. In the shooting environment of the cattle face, each point of the cattle face is continuous in depth and discontinuous with background information such as a cattle body, a cattle shed or a pasture, so that the cattle face is represented as an independent waveform in a histogram, and the head of the cattle is closer to a depth camera than the background information such as the cattle body, the cattle shed or the pasture; therefore, the last wave in the histogram of the depth map is the wave of the bovine face, and the position where the last large trough is located is the threshold value for segmenting the bovine face. As shown in fig. 5, green is the background and red is the bovine face.

In practical application, the wave crest and the wave trough of the depth map are not obviously distributed, and noise points exist. The noise points are higher or lower than surrounding pixel points, forming small peaks or small valleys. The continuous wavelet transform-based acquisition valley algorithm is affected to assume that the noise point is a large valley. Therefore, the histogram curve is smoothed by an interpolation smoothing method:

first, the step size of the overlay for each smoothing operation is set, and then for a certain point x on the histogram, its smoothed value temp is calculated:

the smoothed histogram is shown in fig. 6, and the value of the occurrence frequency of each pixel point of the smoothed histogram is influenced and deviates from the true value, but the histogram is only used for obtaining the threshold value, so that the deviation from the true value does not influence the image segmentation. After smoothing, reserving larger wave crests and wave troughs, and covering noise points and wavelet troughs;

generally, the peak and the trough are determined by distinguishing the background from the target by an extremum method. However, in the application here, the last valley is not necessarily the minimum value of the whole histogram, and the last peak is not necessarily the maximum value, so the extreme value method is not applicable to this scenario. The peak detection method based on continuous wavelet transform is used for the conversion:

C(a，b)＝∫s(t)ψ _a，b (t)dt，

in the formula, # _a，b (t) represents the scaled and transformed wavelet,. Phi. (t) is the mother wavelet, a ∈ R ⁺ Is the scale of the mother wavelet scaling, b ∈ R is the distance of mother wavelet translation, s (t) represents the signal, and C represents the two-dimensional matrix of wavelet coefficients.

In the detection of a peak, for better performance, the wavelet should have the basic features of a peak, including approximate symmetry and one dominant positive peak. Mexican Hat wavelet was used as the mother wavelet. In 3D space, the 2DCWT coefficient is visualized with the amplitude of the CWT coefficient as the third dimension, and the problem of peak detection can be transformed into the problem of finding ridges on the 2DCWT coefficient matrix.

The ridge finding algorithm is an algorithm for obtaining peaks, namely: and initializing ridge lines, and searching the maximum point of the next adjacent scale in the (n-1) th row of the coefficient matrix for the ridge lines with the distance smaller than a certain threshold value. If not, the number of gaps in the ridge is increased by 1. And storing the ridge lines with the distance larger than the threshold value and deleting the ridge lines from the search list. The maximum points that are not connected to the upper layer points are taken as new ridge lines. The above steps are repeated. A peak is considered if the length of the ridge is greater than a certain threshold and the scale corresponding to the maximum amplitude on the ridge should be within a certain range, the scale being proportional to the width of the peak.

In the embodiment, the peak is not required to be acquired, but each trough is required to be acquired, so that the algorithm for acquiring the peak is improved to be the trough acquiring algorithm based on continuous wavelet transformation, and the specific idea is to invert the histogram and acquire the inverted peak by using a ridge line searching method, so that the trough in the original histogram can be acquired. Taking the number of pixels of each brightness value in the histogram as an inverse number, smoothing the inverse number, and obtaining the coordinate of each trough in the histogram by using a ridge line searching method;

after the wave troughs are obtained, calculating a coordinate array of the cattle face rectangular frame in the RGB image according to the threshold value of the last wave trough, wherein the coordinate array is specifically represented as follows:

f ₂ (H ₁ )＝arr_rectangle[4]＝[x ₁ ，y ₁ ，x ₂ ，y ₂ ]

f ₂ the function represents the above histogram valley thresholding algorithm, H ₁ The smoothed H data. The step is mainly to find the rectangular frame coordinate arr _ rectangle of the cattle face existing in the image, namely to find the upper left position coordinate (x) of the rectangular frame ₁ ，y ₁ ) And lower right position coordinates (x) ₂ ，y ₂ )；

In the thresholding method, the histogram shown in fig. 5 is compared to the method herein using the OTSU method, wherein all three methods of comparison use a bovine face depth map histogram dataset for thresholding. Taking fig. 5 as an example, after measuring and analyzing the original depth map, the threshold value is between 210 and 220, which is a suitable segmentation boundary of the bovine face, and the results of comparing the threshold values of different methods are shown in table 1:

TABLE 1 acquisition thresholds for different algorithms

Segmentation algorithm	Threshold value
		OTSU algorithm	126
Text algorithm without using smoothing algorithm	231
		Text algorithm	216

The last large wave in the histogram is the wave where the cow face is located, and the threshold obtained by the OTSU method is smaller than the threshold where the cow face is located. As a result of segmenting the three thresholds, as shown in fig. 7 below, when the position of the threshold is before the last large trough, the foreground after segmentation contains too much background information. When the threshold position is at the last big trough, the phenomenon of over-segmentation is caused. The threshold of the method is positioned at the position of the last big wave trough, and the segmentation effect is optimal.

S203, carrying out GrabCT segmentation algorithm of the RGB image on the cattle face in the cattle face rectangular frame in the RGB image to obtain a preliminarily segmented cattle face image A; the method specifically comprises the following steps:

firstly, modeling color data in an RGB (red, green and blue) graph R, then segmenting by using an iterative energy minimization method, and framing a target rectangular frame according to interaction with a user; then, the obtained cattle face rectangular frame art _ rectangle replaces the framed target rectangular frame; finally, according to GrabCut segmentation algorithm f ₃ The function replaces the background with black all, specifically expressed as:

f ₃ (R，art_rectangle)＝R ₁ ∪R ₃

R ₁ ，R ₃ all being R subregions, R ₁ Representing a bovine face region in the image, R ₃ Representing non-bovine face regions segmented due to the limitations of the GrabCut segmentation algorithm.

S204, based on the brightness of the depth map D where the last valley is located in S202 as a threshold, the depth map is segmented, the part larger than the threshold is reserved, the part smaller than the threshold is removed and replaced by black as a background color, and the specific representation is as follows:

f ₄ (D)＝D ₁ ∪D ₂

f ₄ the function being a depth map segmentation algorithm, D ₁ And D ₂ Is a subset of a depth map D, D ₁ Representing the bovine face region in the depth map, D ₂ Representing non-bovine face regions segmented due to the limitations of the depth map segmentation algorithm;

then obtaining the coordinates of foreground pixel points of the depth map, segmenting the RGB map according to the coordinates, and obtaining a cattle face map B which is initially segmented, wherein the specific expression is as follows:

f ₅ (D ₁ ∪D ₂ )＝R ₁ ∪R ₂

f ₅ the function segments the RGB image according to the cattle face coordinates obtained by the depth image segmentation algorithm. R ₂ The sub-region is R and represents a non-bovine face region redundantly segmented according to a depth map segmentation algorithm in the RGB map.

S205, taking the intersection of the cattle face image A preliminarily segmented in S203 and the cattle face image B preliminarily segmented in S204,

(R ₁ ∪R ₃ )∩(R ₁ ∪R ₂ )＝R ₁

obtaining a complete RGB cattle face image R ₁ Acquiring the coordinates of foreground pixel points of the complete RGB cattle face image, and segmenting the depth map according to the coordinates to obtain a complete depth cattle face image D ₁ 。

In order to verify that the bovine face segmentation algorithm combining the RGB map and the depth map proposed in this embodiment focuses on a specific target in an image when an object is close to a camera, segments a background and a foreground to remove an effect in a scene affected by the background, and compares the bovine face segmentation scene with a contour detection segmentation algorithm and an OTSU algorithm, where the contour detection segmentation algorithm only uses RGB image data in the data set for segmentation, and the OTSU algorithm only uses depth image data in the data set for segmentation.

In order to objectively evaluate the segmentation effect of the algorithm, PA (Pixel Accuracy) is used as an evaluation index. The bovine face data of this example were used for comparison. Because the ox-face picture does not have a correct image segmentation standard, the picture is manually segmented, a background part is removed, and a foreground part is reserved and then used as a standard segmentation image. And then comparing the picture divided by the cattle face division algorithm combining the RGB image and the depth image with the standard division image. And calculating PA, namely dividing the proportion of the correct pixel points to the total pixels. The comparative results are shown in table 2:

TABLE 2 comparison of segmentation results for different segmentation algorithms

Segmentation algorithm	PA
		Contour detection segmentation algorithm	60.21％
OTSU algorithm	53.39％
		Cattle face segmentation algorithm combining RGB (red, green and blue) image and depth image	97.13％

As can be seen from table 2, the existing image segmentation method cannot effectively segment the bovine face, and cannot accurately determine background and foreground information. The segmentation effect of the embodiment is closer to the standard segmentation, and most of the bovine face background can be removed.

In this embodiment, the cattle face data set is labeled, the label form is an array, the subscript of the array is the sample pair number, and the array value is 0 or 1. Suppose X ₁ And X ₂ Two sample pairs to be learned, respectively, and Y is a label of the sample pair. X ₁ And X ₂ When a pair of samples is from the same cow, the pair of samples match, and the label Y is set to 1, indicating that the samples are from the same sample. X ₁ And X ₂ When a sample pair of composition comes from a different cow, the sample pair does not match, and the label Y is set to 0, indicating that it comes from a different sample.

The sample cow face data set is shot with C different cows, and each cow has E paired RGB images and depth images. When taking positive sample pairs, different pairs of RGB map and depth of the same cattleThe images (image groups) are numbered, and different pairs of RGB images and depth images of the same cattle are combined into sample pairs by the idea of combination in' permutation and combination

Taking negative sample pairs, numbering different sample cattle, and using the concept of 'combination' to make the negative sample pairs come from different cattle>

In this embodiment, a common image recognition is a two-dimensional image recognition. The scene of face recognition is different from face recognition. The application scenes of face recognition are as follows: the scenes of the entrances and exits of buildings such as railway stations, dormitories, office buildings and the like are mostly indoor. Outdoor face recognition has a facility for blocking light, so that the face recognition is not influenced by light change. And pastures are usually outdoors, and are easily influenced by sunlight when the sunlight is strong in the daytime, so that the RGB images are too bright to influence identification. When the sunlight is reduced at dusk or at night, the RGB image is too dark, the illumination of the cowshed is dark, the illumination range is small, and the problem of reduced illumination is common at night. In the two-dimensional cow face recognition, the two-dimensional cow face recognition is influenced by factors such as cow postures and illumination, and the two-dimensional cow face recognition has a poorer effect in a complex environment than in a normal environment. The two-dimensional cow face picture can only reflect partial information of the cow face, and when the cow face changes in posture or illumination, information in the two-dimensional picture of the same cow can change greatly. In the process of carrying out individual identification of the cattle by the depth information and image information fusion cattle face identification technology, extracting features such as cattle face contours and colors through a two-dimensional image identification process as identification basis, recycling the depth information of the cattle face, acquiring the length information of the cattle face in the direction from the forehead to the nose of the cattle, and taking the information as an additionally extracted feature to help the two-dimensional cattle face identification to improve the identification effect. In addition, the contour information of the cow face in the depth map is more complete and is not interfered by the body image with the same color as the face. When the natural illumination condition is changed violently, the depth map is not affected. Meanwhile, the method is still effective when the illumination changes, and has extremely high robustness.

In the embodiment, a cow face picture pair in a sample cow face data set which is acquired and contains a plurality of cow face postures and various illumination condition data is identified, 75% of cow face data is randomly selected as a training set, and 25% of cow face data is selected as a testing set;

and forming a cow face picture by the RGB cow face image and the depth cow face image of the sample cow after the background is segmented, and sending the cow face picture into a cow face recognition network for recognition. The twin network is used as a main network for cow face recognition, two weight sharing sub-networks are important components of the twin network, and the twin network uses a convolution neural network to map an original image to a high-dimensional feature space, so that influence caused by geometric distortion can be reduced. The twin network structure is shown in figure 8.

The twin network employs two inputs that each receive a sample, so that the data received by the twin network is a pair of samples. The sample pairs are positive sample pairs, with both samples from the same cow. The sample pairs are negative sample pairs, with the two samples from different cattle. A pair of samples enter two networks for training respectively, and the two networks have the same structure, so that the two networks are both expressed by a function f (x). The sharing of the two network weights of the twin network ensures that two similar images will not be mapped to very distant locations in high dimensional space after passing through the respective networks. The convolutional neural networks are used as two sub-networks of the twin network, and weight sharing represents that trainable parameters such as the weight of a convolutional kernel in a convolutional layer, the offset of the channel in the convolutional layer, the weight in a full-link layer, the offset in the full-link layer and the like in the two convolutional neural networks are synchronously updated along with the increase of the training epoch.

Input sample x when a twin network processes picture information ₁ And x ₂ Are all RGB images; input sample x when the twin network handles three-dimensional modalities ₁ And x ₂ Are depth maps. f (x) includes convolutional layers, pooling layers, dropout layers, and full-link layers. The distance of each pair of samples is obtained by parameter sharing in the training process, the parameter is represented by w, f (x) ₁ ) And f (x) ₂ ) W is used as a parameter. Using distance < f (x) ₁ )，f(x ₂ ) Denotes the distance between two samples finally obtained after processing by the network. According to the labels printed in advance, the loss of the same type of samples is reduced, and the loss of different types of samples is amplified;

the image information and depth information fusion data are RGB cattle face images and deep cattle face images of sample cattle, the RGB cattle face images and the deep cattle face images are separately trained by using a twin network, corresponding weights are multiplied after characteristics are extracted, a late decision level fusion method is used, whether an input sample pair is the same cattle or not is judged after fusion, and the training fusion process is shown in fig. 9.

Since the depth map is not affected by illumination changes, the weights are determined by the illumination intensity, and thus the weights are calculated by: converting the RGB cattle face image into HSV space, taking lightness (V) as a basis for judging illumination intensity, selecting lightness which is proper in illumination and not influenced by illumination in the definition of a photo as a standard value, and when the illumination intensity is too high and the light is too bright, the lightness is higher than the standard value; when the illumination intensity is too low and the light is too dim, the brightness is lower than the standard value. The specific weight is expressed in the following formula:

W _d ＝1-W _R

in the formula, W _R Weight, V, representing RGB cattle face image _S Standard value representing lightness, V _P Representative of the lightness value, W, of an RGB cattle face image _d Representing the weights of the depth map. The greater the deviation of the brightness of the picture from the standard value, the greater the influence of the illumination on the RGB map, so the smaller the weight of the RGB map, the greater the weight of the depth map, and at this time, the recognition effect by the RGB map is affected, and the depth map complements the recognition of the RGB map when the illumination condition is poor. In contrast, the closer the brightness of the picture is to the standard value, the smaller the influence of illumination on the RGB map is, so the higher the weight of the RGB map is, the lower the depth map identification weight is, and the depth map provides more three-dimensional spatial feature information as an aid.

And the fusion adopts late decision-level fusion, and uses a voting method in deep learning to combine the weights for fusion. The voting method is mainly used for classical classification recognition networks, and the probability that a sample belongs to each class is output at the last full connection layer. In this embodiment, the twin network determines whether the sample pair belongs to the same cow according to the feature difference distance between the samples, so that the twin network is firstly used to train the data of the two modalities respectively, then the voting method is used to obtain a prediction result by weighting and averaging the feature difference distances of the different modalities output by the network during decision making, determine whether the sample pair belongs to the same cow, and train the network according to the prediction result.

Through the integration of deep learning and image information, when deviation occurs in the recognition prediction of a single mode, the deviation can be corrected by the other mode, the overall recognition rate is improved, and the complementation between the modes is completed. The prediction deviation of the two-dimensional bovine face due to the fact that the depth information cannot be obtained, and the deviation can be corrected through prediction of the depth map bovine face. When the illumination condition is not ideal, the two-dimensional bull face cannot acquire enough features, false recognition occurs, and the prediction of the depth map bull face can correct the deviation;

in this example, the effect of using early fusion and late decision fusion is compared. Early fusion was image fusion using weights to add two-dimensional images to three-dimensional images, followed by training and recognition using neural networks. The early fusion method used in this example is to directly fuse the RGB cattle face image and the deep cattle face image, input the fused image into a twin network for training, and determine whether the sample pair is from the same cattle. The comparison results are shown in fig. 10, and the recognition rate of early fusion is 89.682%, and the recognition rate of late decision fusion is 93.619%. The early fusion is not subjected to feature extraction, pixels are overlapped and the features are covered by direct fusion, the features belonging to the cattle face in a single mode are difficult to extract in the subsequent feature extraction, and certain interference is caused in a scene of cattle face identification;

comparing the recognition rates of the three recognition algorithms, wherein the first method is to separately use the RGB image collected in the S1 to carry out cattle face recognition, and belongs to the current conventional method; the second method is that an image data set which is not separated by a cattle face segmentation algorithm combining an RGB image and a depth image in S2 is used for recognizing cattle after being labeled and used as a cattle face recognition network for training depth information and image information fusion of a sample cattle face data set in S3; the third is the identification method disclosed in the present embodiment; the comparative recognition results are shown in fig. 11. Compared with the recognition of a single RGB image, the recognition rate is improved by 2.403% after the recognition method combining the depth image and the RGB image is introduced. Compared with the recognition method combining the depth map and the RGB map, the recognition method combining the depth map and the RGB map is carried out after image segmentation pretreatment is carried out, and the recognition rate is improved by 0.472%. No matter the recognition rate is improved or unchanged, the background is necessarily removed, the influence of the removal of the background can enable the neural network to only focus on the features of the face part in the picture, and the restoration only depends on the recognition of the difference of the face features. And a segmentation algorithm for removing the background is still used in the application stage, so that the model is still effective when the background in the pasture is in unpredictable change. In addition, the image segmentation method provided by the text is very suitable for being applied to a scene in which an object is close to a camera and a background needs to be ignored, and has important significance for the subsequent image recognition field by providing a new idea of introducing a depth map for segmentation and using a histogram to obtain a threshold value.

Example 2

As shown in fig. 2, this embodiment further provides a bovine face recognition apparatus with depth information fused, including: the acquisition module is used for acquiring an RGB (red, green and blue) image and a depth image of the face of the sample cattle in a pair to prepare an image data set of the sample cattle;

Example 3

The present embodiment provides an electronic device, including:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of fused depth information bovine face identification of embodiment 1.

Example 4

The present embodiment provides a computer-readable storage medium storing computer instructions that are operated to execute the method for recognizing a bovine face by fusing depth information according to embodiment 1.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the present invention in any way. Any simple modifications, alterations and equivalent changes of the above embodiments according to the technical essence of the invention are still within the protection scope of the technical solution of the invention.

Claims

1. A cattle face identification method integrating deep learning is characterized by comprising the following steps:

2. The method for recognizing the cattle face through the fusion deep learning as claimed in claim 1, wherein in S1, the depth camera is a binocular camera, and a depth map and an RGB map of the same frame of the sample cattle face are acquired, that is, the RGB map and the depth map of the sample cattle face are paired.

3. The method for recognizing the cattle face by fusing the deep learning as claimed in claim 1, wherein in S2, the image dataset of the sample cattle is input into a cattle face segmentation algorithm combining an RGB image and a depth map, and the cattle face is segmented from the picture background of the RGB image and the depth map to obtain an RGB cattle face image and a deep cattle face image; the method specifically comprises the following steps:

s201, clustering and displaying dark pixels to bright pixels in a depth map in the image data set of the sample cattle to obtain a histogram of the depth map;

s203, performing GrabCut segmentation algorithm of the RGB image on the cattle face in the cattle face rectangular frame in the RGB image in the S202 to obtain a cattle face image A subjected to preliminary segmentation;

4. The method for recognizing the bovine face through the deep learning according to claim 1, wherein the sample pair in S2 is a positive sample pair or a negative sample pair, when the sample pair is a positive sample pair, the two sample cows of the sample pair are from the same cow, and when the sample pair is a negative sample pair, the two sample cows of the sample pair are from different cows;

the form of the label in S2 is an array, the small label of the array is the number of the sample pair, and the value of the array is 0 or 1; in particular, assume X ₁ And X ₂ Sample pairs of two sample cows, Y is the label of the sample pair, X ₁ And X ₂ When the formed sample pair comes from the same cow, the sample pair is matched and is a positive sample pair, and the label Y is set to be 1, which indicates that the sample pair comes from the same sample; x ₁ And X ₂ When the sample pair is from different cattle, the sample pair is not matched, and isNegative sample pairs, with label Y set to 0, indicate from different samples.

5. The method for recognizing the cattle face fusing the deep learning according to claim 4, characterized in that the cattle face recognition network fusing the depth information and the image information in S3 uses a twin network as a backbone network for cattle face recognition, two weight-sharing sub-networks are important components of the twin network, and the twin network uses a convolutional neural network to map an original image to a high-dimensional feature space; the weight sharing means that in the two convolutional neural networks, trainable parameters such as the weight of a convolutional core in a convolutional layer, the offset of the channel in the convolutional layer, the weight in a fully-connected layer and the offset in the fully-connected layer are updated synchronously along with the increase of the epoch of training.

6. The method for recognizing the cattle face fused with deep learning as claimed in claim 5, wherein when the twin network processes picture information, the input sample x is ₁ And x ₂ Are all RGB images; input sample x when the twin network handles three-dimensional modalities ₁ And x ₂ Are depth maps; f (x) comprises a convolution layer, a pooling layer, a dropout layer and a full-connection layer; the distance of each pair of samples is obtained by parameter sharing in the training process, the parameter is represented by w, f (x) ₁ ) And f (x) ₂ ) W is used as a parameter; by distance<f(x ₁ ),f(x ₂ )>Representing the distance between two samples finally obtained after processing by the network; and according to the labels printed in advance, the loss of the same type of sample is reduced, and the loss of different types of samples is amplified.

7. The method for recognizing the cattle face fusing deep learning as claimed in claim 6, wherein the RGB cattle face image and the deep cattle face image are trained separately by using a twin network, the features are extracted and multiplied by corresponding weights, and finally a late decision level fusion method is used to output sample distance through the network<f(x ₁ ),f(x ₂ )>Averaging to obtain a prediction result;

W _d ＝1-W _R

in the formula, W _R Weight, V, representing RGB cattle face image _S Standard value representing lightness, V _P Representative of the lightness value, W, of an RGB cattle face image _d Representing the weights of the depth map.

8. The utility model provides a fuse face recognition device of degree of deep learning which characterized in that includes:

the acquisition module is used for acquiring the RGB image and the depth image of the face of the sample cow in pairs to prepare an image data set of the sample cow;

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of fused deep learning bovine face identification as claimed in any one of claims 1-7.

10. A computer readable storage medium storing computer instructions operable to perform a method of bovine face recognition incorporating deep learning according to any one of claims 1 to 7.