CN114022509A

CN114022509A - Target tracking method based on monitoring videos of multiple animals and related equipment

Info

Publication number: CN114022509A
Application number: CN202111121343.XA
Authority: CN
Inventors: 牛凯; 贺志强; 陈云
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2022-02-08

Abstract

The present disclosure provides a target tracking method based on surveillance videos of a plurality of animals and related devices. The target tracking method comprises the following steps: monitoring a plurality of animals in real time to obtain original images; preprocessing an original image to obtain a processed image; extracting a t frame processing image, and detecting an animal in the t frame processing image; marking the detected animal as a detection object, and marking the animal which is distributed with the identity identification from the t-1 frame to the t-k frame as a tracking object; extracting attribute features of the tracking object and the detection object and correlation features between the tracking object and the detection object; matching the detection object in the image of the t frame with the tracking object in the frames from t-1 to t-k according to the correlation characteristics; and outputting a tracking result. The monitoring video is processed through deep learning, animals are detected out, tracks of the animals are tracked, and management personnel are assisted to track, observe and position the bred animals.

Description

Target tracking method based on monitoring videos of multiple animals and related equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence machine learning, in particular to a target tracking method based on a plurality of animal monitoring videos of a neural network model and related equipment.

Background

Deep learning networks are the most interesting machine learning techniques and architectures. With the continuous development of computing power, the potential of deep learning is developed, and the problem which is difficult to solve by the traditional machine learning algorithm is broken through by the deep learning algorithm. Today, deep learning algorithms achieve extremely high achievements in terms of images, speech, text problems, and the like. The intelligent image and video processing is realized by a deep learning method, and the method is a hot point of current research. The system and the method aim to solve the monitoring and management problems in the breeding process through a deep learning technology, and make a contribution to providing informatization and intellectualization of the breeding industry.

In traditional cultivation monitoring management, managers mainly depend on manual work to monitor and manage the cultivation farm. In the case of farms, management personnel typically know the condition of the farm through field inspection or monitoring cameras. However, the farm is generally large in scale, large in area, and large in number of monitoring cameras, but the number and energy of management personnel are limited, so that the management personnel cannot know the things happening at each place in the farm in the whole time, cannot take timely countermeasures to all possible hidden dangers or accidents happening in the farm, and finally causes economic loss; while hiring more management personnel would obviously increase costs. Therefore, the traditional management mode has relatively low efficiency and is obviously disadvantageous to large-scale culture.

Disclosure of Invention

In view of the above, the present disclosure is directed to a method and related apparatus for tracking a target based on multiple animal surveillance videos, which can reduce manpower and material resources.

In view of the above, the present disclosure provides a target tracking method based on surveillance videos of a plurality of animals, the target tracking method including:

s1: monitoring a plurality of animals in real time to obtain original images;

s2: preprocessing an original image to obtain a processed image;

s3: extracting a t frame processing image, and detecting animals in the t frame processing image, wherein t is an integer larger than 0;

s4: marking the detected animal as a detection object, and meanwhile, marking the animal which is distributed with the identity identification from the t-1 frame to the t-k frame as a tracking object, wherein t & gt k & gt 2;

s5: extracting attribute features of the tracking object and the detection object and correlation features between the tracking object and the detection object;

s6: matching the detection object in the image of the t frame with the tracking object in the frames from t-1 to t-k according to the correlation characteristics;

s7: outputting a tracking result;

and in the step S6, in response to determining that the detected object matches the tracked object based on the associated features, assigning the same identity as the identity of the tracked object to the detected object.

As a further improvement of the present disclosure, step S5 includes:

s51: extracting pixel characteristics and spatial characteristics of a tracking object and a detection object;

s52: acquiring the attribute characteristics of the tracking object and the detection object according to the pixel characteristics of the tracking object and the detection object;

s53: acquiring the correlation characteristics between the tracking object and the detection object according to the pixel characteristics and the space characteristics of the tracking object and the detection object;

s54: setting the integral characteristics of the video according to the dimensionality of the attribute characteristics and the dimensionality of the associated characteristics;

s55: iteratively updating the attribute characteristics, the associated characteristics and the overall characteristics of the tracked object and the detected object;

s56: and updating the association feature between the tracking object and the detection object again based on the updated attribute feature, the association feature and the overall feature.

As a further improvement of the present disclosure, step S51 includes:

the pixel characteristics of the tracking object and the detection object are calculated and extracted through the neural network, and when the neural network is trained, the evaluation criterion of the tracking object and the detection object meets the following formula:

max∑|f(O_i)-f(D_j)|,if id(O_i)≠id(D_j)，i＝1,2，…,p,

min∑|f(O_i)-f(D_j)|,if id(O_i)＝id(D_j)，j＝1,2,…,q，

wherein, O_iTo track an object, D_jFor detecting the object, p and q respectively represent the number of the tracking object and the detection object, f (-) represents a neural network, id (-) represents a digital identifier, and the extracted pixel feature is marked as L;

calculating the relative distance between the tracking object and the detection object:

wherein (x)_i,y_i，h_i,w_i) Represents the abscissa, ordinate, height and width of the top left corner vertex of the tracked object, (x)_j,y_j,h_j,w_j) Representing the abscissa, ordinate, height and width of the top left corner vertex of the detected object, and the calculated relative distance is recorded as the spatial feature s_ij。

As a further improvement of the present disclosure, step S52 includes:

the pixel characteristics of the tracked object and the pixel characteristics of the detected object are processed by a convolution neural network to obtain the attribute characteristics A of the tracked object_iI-0, 1,2, …, p and attribute feature B of the detection object_j,j＝0,1,2,…，q；

Step S53 includes:

cosine similarity between pixel features of an object to be tracked and pixel features of an object to be detected

And spatial information s between the two_ijObtaining the correlation characteristics R of the tracking object and the detection object through a neural network_ij；

Step S54 includes: and obtaining the overall characteristic V according to the dimension adjustment of the attribute characteristic and the associated characteristic.

As a further improvement of the present disclosure, step S55 includes:

updating the associated features through a neural network and obtaining the updated associated features

Wherein f is_RRepresenting a neural network that completes the associated feature update, i-0, 1,2, …, p, j-0, 1,2, …, q;

the attribute characteristics of the detection object are updated through the neural network, and the updated attribute characteristics can be obtained

Of course, it is also possible to circulate in the neural network all the time, wherein,

wherein f is_BA neural network indicating that the update of the attribute feature of the detection object is completed, i is 0,1,2, …, p, j is 0,1,2, …, q;

the overall characteristics are updated through a neural network, and the updated overall characteristics V can be obtained¹And, of course, may also be always circulating in the neural network, wherein,

firstly, attribute characteristics A of the tracked object_iAttribute feature of the updated detection object

And the correlation characteristics between the two

Carrying out a polymerization operation:

wherein E (·) represents an averaging aggregation function, i ═ 0,1,2, …, p, j ═ 0,1,2, …, q;

the overall characteristics are then updated and,

wherein f is_VRepresenting the neural network that completes the global feature update.

As a further improvement of the present disclosure, step S56 includes:

the correlation characteristics are updated for the last time through the neural network to obtain updated correlation characteristics

Wherein the content of the first and second substances,

the neural network that completes the last update of the associated feature is represented by α, the number of updates of the attribute feature is represented by β, the number of updates of the associated feature is represented by γ, and i is 0,1,2, …, p, j is 0,1,2, …, q.

As a further improvement of the present disclosure, step S6 includes:

s61: matching the detection object in the t frame image with the tracking object in the t-1 frame image;

s62: matching the detected object with a tracking object which is unsuccessfully matched in the images from the t-2 th frame to the t-k frame, wherein t & gt k & gt 2;

s63: the detection object is endowed with the identity identification and the digital identification which are the same as those of the tracking object;

s64: creating a new identity and endowing the detection object with the new identity;

in step S61, in response to determining that a detected object in the t-th frame image matches a tracked object in the t-1 th frame image, the method proceeds to step S63 and ends;

in response to determining that a detected object in the t-th frame image is unsuccessfully matched with a tracked object in the t-1-th frame image, proceeding to step S62;

in step S62, in response to determining that a certain detected object in the t-th frame image is successfully matched with a certain tracked object in the t-2 th to t-k-th frame images, the method proceeds to step S63 and ends; in response to determining that the match is not successful, step S64 is entered and ends.

As a further improvement of the present disclosure, step S3 includes:

s31: extracting a t-th frame of processed image, and acquiring shallow texture features and deep semantic features in the processed image through a neural network;

s32: fusing the shallow texture features and the deep semantic features to obtain a fused feature map;

s33: and processing the fusion feature map to acquire a boundary frame of the animal in the original image and confidence information of whether the animal exists or not.

In view of the above, the present disclosure provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.

In view of the above, the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the above-described method.

From the above, the target tracking method based on the monitoring videos of the multiple animals and the related device provided by the present disclosure process the monitoring images through the deep learning technology, detect the animals existing in the images and track the movement tracks of the animals, so as to assist managers to locate the positions of the cultured animals and track the lives of the animals. The method and the device can analyze and process the monitored images and extract the characteristics of the images, thereby detecting and identifying the animals in the images, and judging whether the animals detected in the adjacent frames belong to the same individual or not according to the characteristics extracted from the images of the adjacent frames. For managers, the animals in the monitoring video are identified by the aid of a computer, so that the energy and time consumed by identifying the animals in the images can be reduced, the number of the animals in a farm can be conveniently determined, and other possible problems can be relieved; the tracking of the animal movement track is beneficial to the manager to confirm the movement direction of a certain animal, and can provide certain help for problems such as loss and the like. For the plant, the information level of management can be improved, the labor cost is reduced, and the management efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method of target tracking based on surveillance videos of a plurality of animals according to the present disclosure;

fig. 2 is a schematic flow chart illustrating the preprocessing of the original image in the target tracking method based on the surveillance videos of multiple animals according to the present disclosure;

FIG. 3 is a schematic view illustrating a process of detecting an animal in a processed image in the target tracking method based on surveillance videos of a plurality of animals according to the present disclosure;

fig. 4 is a schematic flowchart of a feature extraction method for a detected object in the target tracking method based on surveillance videos of multiple animals according to the present disclosure;

fig. 5 is a schematic flowchart illustrating comparison between information of a detection object and information of a tracking object in the target tracking method based on surveillance videos of multiple animals according to the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device in the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings. It is stated that the following examples are only particular examples of the method according to the invention, not all examples, and that all other examples, which are based on the examples according to the invention and can be obtained without inventive work and with detailed adaptations, are within the scope of protection of the invention.

As shown in fig. 1 to 5, the present disclosure provides a target tracking method based on surveillance videos of a plurality of animals, and specifically, the tracking method includes the following steps:

s1: monitoring the farm in real time to obtain an original image;

of course, the original image is in a video format, and the image format can be obtained by extracting the frame number.

S2: preprocessing an original image to obtain a processed image;

specifically, the step aims to solve the problems of unbalanced illumination distribution and weak contrast by preprocessing the original image through an image processing method.

In animal houses, sunlight can shine into through the window, animals exposed to the sunlight are relatively bright, animals in shadows are relatively dim, the difference between the illumination and the brightness of the animals is large, the difference between the contrast and the environment is large, the classification of the animals in the images and the positioning of the positions of the animals are not facilitated, and therefore the images need to be preprocessed, and the illumination and the contrast of the animals in the sunlight and the animals in the shadows tend to be consistent to the greatest extent.

In this embodiment, the step is to perform preprocessing on the original image by a histogram equalization method to balance the problem of weak illumination distribution and contrast, as shown in fig. 2, the step specifically includes:

s21: sequentially extracting channel images in an original image, wherein the channel images comprise an r channel image, a g channel image and a b channel image;

s22: histogram equalization is carried out on the channel image, and the specific method comprises the following steps:

first, an initial histogram of the current channel image is obtained, which can be expressed as:

where M × N is the size of the image, r_kIs a pixel value with a value range of [0, L-1 ]]，n_kFor the pixel value to be r_kThe number of pixels in the image;

for the animal monitoring image in the colony house, the pixel value r of the animal in the sunlight_kIs relatively large, corresponding to n_kIs large; for an animal in shadow, its pixel value r_kSmaller, corresponding to n_kIs relatively large. And for the pixel with the pixel value between the two, the corresponding n_kIt is smaller. The purpose of histogram equalization is to make all pixel values correspond to n_kIt is possible to have a more or less similar size so that the contrast and illumination of the image tend to be uniform.

Secondly, the initial histogram is calculated to obtain a transformed histogram, which can be expressed as:

wherein s is_kRepresenting a pixel value of r_kThe pixel value obtained after the transformation of the pixel of (a).

S3: and extracting a t frame processing image, and detecting animals in the t frame processing image, wherein t is greater than 0.

The step is to detect animals in the processed image in a neural network mode so as to facilitate subsequent target tracking. The neural network in the present disclosure may be a commonly used neural network model, such as VGG, ResNet, RepVGG, MLP, resenext, etc., and of course, may be other neural networks with similar structures known to those skilled in the art to which the present disclosure pertains.

In this embodiment, as shown in fig. 3, the present step specifically includes:

s31: extracting the processed image of the t frame, and acquiring shallow texture features and deep semantic features in the processed image through a neural network.

For the animal to be detected, even if the sizes of individuals are not different greatly, due to the problem of the shooting angle of the camera, the animal to be detected in the monitored image is in a near-large and far-small condition, and the size difference between individuals displayed in the image is large, so that image features of multiple scales need to be extracted, and different neural networks are used for sensing and extracting the image features. For the animal to be detected, the characteristics of the hair color, the body shape and the like are favorable for distinguishing different types of animals, and can be obtained through deep semantic characteristics extracted by a neural network; however, to distinguish different individuals of the same kind, the superficial texture characteristics of the individual, such as the position of the animal, the skin color of the face, the thickness of the hair, etc., are required, and these information can be obtained when the number of layers of the neural network is small, and are lost as the neural network deepens.

Therefore, the step S31 specifically includes:

firstly, a down-sampling module is arranged, and a down-sampling convolution kernel with an active layer and a step length of s and n convolution kernels with the active layer and keeping the scale and the channel number unchanged are used as the down-sampling module;

secondly, sequentially passing the processed image of the t frame through the n +1 down-sampling modules to obtain feature maps with different sizes;

finally, m feature maps of different sizes are selected and marked as C₁,C₂,…,C_m。

Therefore, the feature map obtained by processing of a small number of down-sampling modules contains abundant textural features, and the information of the individual of the animal to be detected can be reserved; the feature map obtained by the plurality of down-sampling modules contains rich semantic information; the feature map obtained in the middle part also retains the texture features and semantic features of the animal. The size of the feature map changes every time a down-sampling module is passed, so that the dimensions of the feature map are different, and C_iHas a dimension of C_i+1Is multiplied by s, and different feature maps have different shallow texture features and deep semantic features.

S32: and fusing the shallow texture features and the deep semantic features to obtain a fused feature map.

Since both the texture features of the shallow layer and the semantic features of the higher layer are necessary for detection, it is necessary to fuse these two features. For different feature maps extracted by the neural network, the two features need to be fused, so the neural network needs to perform two fusion schemes of top-down and bottom-up.

Therefore, the step S32 specifically includes:

firstly, feature fusion is carried out from top to bottom so as to transmit deep semantic features to a feature map of each scale,

P_i＝Conv1x1(C_i)+UpSample(C_i+1),i＝1,2，…，m-1，

wherein, UpSample represents the up-sampling module with the step length of s, and can use the form of interpolation or deconvolution, Conv1x1 represents the convolution kernel with the step length of 1 and the convolution kernel size of 1;

then, performing feature fusion from bottom to top to transfer the shallow texture features to the feature map of each scale,

N_i+1＝Conv1x1(P_i+1+Conv3x3(P_i))，i＝1，2,…,m-1，

where, Conv3x3 represents a convolution kernel having a step size of s and a convolution kernel size of 3, and Conv1x1 represents a convolution kernel having a step size of 1 and a convolution kernel size of 1.

The purpose of this step is to calculate the confidence that an animal exists on the fused feature map and the category and bounding box to which the animal belongs, from the fused feature map obtained in step S32. The fusion feature maps are m in number and are respectively marked as: n is a radical of₁,N₂，…，N_mAnd the fusion characteristic graphs can respectively obtain a prediction graph through a convolution kernel, and then animal information existing on the original graph is calculated through the information of the prediction graph.

Specifically, the step S33 includes:

calculating to obtain the confidence of the possible objects, the bounding box and the types of the objects;

sorting all the bounding boxes with the same prediction category according to the confidence;

taking the bounding box with the maximum confidence as a prediction result;

calculating the intersection ratio of the residual boundary box and the prediction result, if the intersection ratio is greater than a threshold value, discarding the boundary box, otherwise, keeping the boundary box;

and repeating the steps until the intersection ratio between every two bounding boxes is smaller than the threshold value.

In this example, animals of class P were shared, assuming a profile N_jJ is 0,1,2, …, M, the down-sampling magnification is S, the dimension information is mxnxc, the width, height and channel number are respectively represented, and the prediction obtained after a convolution kernel is mxnxc ', wherein the value of C' is num_anchor×(5+P)，num_ancRepresenting the number of a priori anchors. The value on the channel represents the confidence conf that an animal exists at this anchor, the offset of the point on the feature map centered in the bounding box of the animal_x，offset_yOffset of the height and width of the bounding box_h，offset_wAnd probability P of belonging to class P animals_i。

The position information of the detection target on the original image can be calculated by the following formula:

x_center＝S×(i+offset_x),i＝0,1,2,…,M-1，

y_center＝S×(j+offset_y),j＝0,1,2,…,N-1，

wherein, Anchor_kW denotes the width of the kth Anchor, Anchor_kAnd h denotes high for the kth anchor.

If the confidence of the object is less than the confidence threshold conf_thresholdThen the confidence is dropped and passed asThe following formula updates the confidence:

conf＝conf×max(p_i)。

then, for all the bounding boxes with the same prediction category, sorting the bounding boxes according to the confidence degrees, reserving the bounding box with the maximum confidence degree as one of the prediction results, and for the rest bounding boxes, calculating the intersection ratio of the bounding boxes with the prediction results. And if the intersection ratio of the boundary box and the prediction result is greater than the threshold value IOU _ THRES, the two boundary boxes are considered to belong to the same animal individual, and the boundary box is discarded. Otherwise, the bounding box is retained. And repeating the process until the intersection ratio of every two bounding boxes with the same prediction category is less than the threshold value IOU _ THRES, and finally keeping the remaining bounding boxes, namely all final prediction results. For each prediction result, the predicted boundary information and the predicted category information determine the position of the animal in the image and the category of the animal.

And each fusion characteristic graph is subjected to a convolution kernel to obtain prediction information, the prediction information comprises confidence information, center coordinate information, height and width information and probability information of each category of the animal existing on the pixel point, and then a boundary frame on the original graph is calculated through the information. Judging whether an animal exists or not according to the confidence coefficient, judging the class of the animal according to the maximum class probability if the animal exists, finally judging whether the animal belongs to the same individual or not according to the intersection ratio of the animal boundary frames of the same class, and if the animal belongs to the same individual, discarding the boundary frame with lower probability. The detection results of the animals are obtained through the steps.

After the above steps, an animal can be detected or no animal can be detected in the processed image of the t-th frame, and if the processed image of the t-th frame does not have an animal, the processed image of the t-th frame is discarded, and the t + 1-th frame is extracted and returned to the step S2 for image processing;

if it is determined that at least one animal exists in the t-th processed image, the process proceeds to step S4.

S4: and recording the detected animal as a detection object, and recording the animal which is distributed with the identity identification and the digital identification from the t-1 frame to the t-k frame as a tracking object. Therefore, in the following steps, the detection object and the tracking object are compared, and the upper identity identification and the digital identification are added to the detection object as much as possible so as to track the animal.

S5: and extracting attribute features of the tracking object and the detection object and association features between the tracking object and the detection object. As shown in fig. 4, the S5 step includes:

s51: and extracting pixel characteristics and spatial characteristics of the tracking object and the detection object.

And extracting pixel characteristics and spatial characteristics of the tracking object and the detection object. The goal of this step is to initially extract the features of the tracked object and the detected object as input for subsequent steps by a convolutional neural network based method.

Wherein, the pixel characteristic refers to the tracking object O of the previous frame_iI-1, 2, …, p, and the detection object D of the current frame_jJ is 1,2, …, q, and the slice after clipping indicates pixel value information. Where p, q represent the number of tracked objects and detected objects, respectively.

Because animal individuals of the same type need to be distinguished, the neural network needs to increase the pixel value characteristic difference between the tracking object and the detection object with different digital identifications as much as possible and reduce the pixel value characteristic difference between the tracking object and the detection object with the same digital identification.

Therefore, the step S51 specifically includes: the pixel characteristics of the tracking object and the detection object are calculated and extracted through the neural network, and when the neural network is trained, the evaluation criterion of the tracking object and the detection object meets the following formula:

max∑|f(O_i)-f(D_j)|,if id(O_i)≠id(D_j)，

min∑|f(O_i)-f(D_j)|,if id(O_i)＝id(D_j)，

wherein, O_iTo track an object, i ═ 1,2, …, p, D_jFor detecting the object, j ═ 1,2, …, q, p and q respectively represent the number of the tracking object and the detecting object, f (·) represents a neural network, id (·) represents a digital identifier, and the extracted pixel feature label is L;

the spatial feature refers to position information of a tracking object of a previous frame and a detection object of a current frame in an image. In the video image, the time interval between two adjacent frames of images is small, and the animal running speed is not fast, so that the positions of the same animal individual on the two frames of images are not too far away from each other, and therefore, the tracked object O is not influenced_iAnd detecting object D_jOn the other hand, if the relative distance between the two is greater than a threshold distance _ threshold on the original image, the spatial information between the two may be considered invalid and may not be extracted. For animals in a farm, the differences in appearance and the like are not very large, so the spatial information of the animals will have a relatively greater effect on identifying whether the animals between two frames are the same individual.

Therefore, the step S51 specifically further includes:

wherein (x)_i,y_i,h_i,w_i) Represents the abscissa, ordinate, height and width of the top left corner vertex of the tracked object, (x)_j,y_j,h_j,w_j) Representing the abscissa, ordinate, height and width of the top left corner vertex of the detected object, and the calculated relative distance is recorded as the spatial feature s_ij。

S52: and acquiring the attribute characteristics of the tracking object and the detection object according to the pixel characteristics of the tracking object and the detection object.

The objective of this step is to further process the pixel characteristics preliminarily extracted in the above step to extract the attribute characteristics of the tracking object of the previous frame and the detection object of the current frame.

The attribute feature represents the features of the tracking object of the previous frame and the detection object of the current frame, namely, the attribute feature only depends on the information of the image of the current frame and is unrelated to the information of the images of other frames. Therefore, the attribute feature can be obtained by processing the pixel feature.

Specifically, step S52 includes:

the pixel characteristics of the tracked object and the pixel characteristics of the detected object are processed by a convolution neural network to obtain the attribute characteristics A of the tracked object_iI-0, 1,2, …, p and attribute feature B of the detection object_j,j＝0,1，2，…,q。

S53: and acquiring the correlation characteristics between the tracking object and the detection object according to the pixel characteristics and the space characteristics of the tracking object and the detection object.

The objective of this step is to further process the pixel features and spatial features preliminarily extracted in the above steps to extract the correlation features between the tracked object of the previous frame and the detected object of the current frame.

The correlation feature refers to an internal relation existing between a tracking object of a previous frame and a detection object of a current frame, and pixel features and spatial features of the tracking object and the detection object of the previous frame need to be considered comprehensively, but the correlation feature does not exist between all tracking objects and detection objects, because the tracking object and the detection object which have a large relative distance on an image cannot be the same individual, that is, spatial information is not extracted, it can be considered that a correlation does not exist between the tracking object and the detection object, and the correlation feature can not be extracted.

Specifically, step S53 includes:

S54: and extracting the overall characteristics according to the dimension of the attribute characteristics and the associated characteristics.

The step S54 specifically includes: and obtaining the overall characteristic V according to the dimension adjustment of the attribute characteristic and the associated characteristic.

Because the obtained attribute features and the obtained associated features are both dependent on information of a certain frame or two frames, are local information, and only the local information is used, the characteristics of the whole video image cannot be well reflected, an overall feature independent of any frame information is introduced and represents the features of the whole video, wherein the dimension of the overall feature is K Q, the overall feature can be adjusted according to the dimensions of the attribute features and the associated features, the overall feature is randomly initialized to a numerical value in a certain range, and the recorded overall feature is V.

S55: and performing iterative updating on the attribute characteristics, the associated characteristics and the overall characteristics of the tracked object and the detected object.

The attribute features, the associated features, and the overall features in step S54 are not subjected to information interaction, and therefore, the expression capability and the credibility are not sufficient, and information between two frames of images needs to be fused and interacted. In particular, the attribute feature A of the tracked object is required_iAttribute feature B of the object to be detected_jAnd the correlation characteristic R between the two_ijAnd the overall characteristic V is updated iteratively.

Specifically, step S55 includes:

firstly, the associated features are updated through the neural network, the associated features represent the internal connection between the tracking object and the detection object, so that the attribute features and the initial associated features of the tracking object and the detection object are required to be updated, and meanwhile, the overall features are also involved in updating so as to represent the information of the whole video.

The attribute characteristics of the detected object are updated through the neural network,and updated attribute characteristics may be obtained

Of course, multiple updates may be made in the neural network, wherein,

wherein f is_BRepresenting the neural network that completes the update of the attribute features of the detection object, i is 0,1,2, …, p, j is 0,1,2, …, q.

Secondly, the attribute features of the detection object are updated through the neural network, the updating of the attribute features only aims at the attribute features of the detection object, the digital identification of the tracking object is already allocated, the detection object is not yet allocated, the digital identification of the detection object depends on the tracking object, therefore, the attribute features of the detection object need to be updated, and the attribute features of the tracking object can be regarded as reference values and do not need to be updated.

The overall characteristics are updated through a neural network, and the updated overall characteristics V can be obtained¹Of course, multiple updates may be made in the neural network, wherein,

And the correlation characteristics between the two

Carrying out a polymerization operation:

the overall characteristics are then updated and,

The integral feature needs to reflect the feature of the whole video, so when training between two frames, all information between the two frames needs to be used, namely the attribute feature of the tracked object, the attribute feature of the detected object and the association feature between the two, and meanwhile, the original integral feature also participates in iterative updating.

Moreover, since the determination of whether the tracking object and the detection object between two frames belong to the same individual depends mainly on the internal relationship between the two, the association feature needs to be updated for the second time by using the detection object attribute feature that has been updated once, the association feature that has been updated once, and the overall feature that has been updated once. Therefore, the step S5 further includes:

s56, the correlation characteristics are updated for the last time through the neural network to obtain updated correlation characteristics

Wherein the content of the first and second substances,

neural net representing the last update of a completed associated featureThe complex, α represents the number of attribute feature updates, β represents the number of associated feature updates, γ represents the number of overall feature updates, i is 0,1,2, …, p, j is 0,1,2, …, q.

Through the above-described steps S1 to S5, several features of the tracking object and the detection object have been obtained, so that the detection object can be matched by these features.

S6: and matching the detected object in the image of the t-th frame with the tracked object in the frames from t-1 to t-k according to the correlation characteristics, wherein t & gt k & gt 2.

S7: and outputting a tracking result.

In step S6, if matching is successful, the detected object is given the same identification and digital identification as the corresponding tracked object; and if the matching is unsuccessful, creating a new identity and giving the detection object.

Since the tracking object of the previous frame already has the assigned digital identifier, the detection object of the current frame has no assigned digital identifier. The associated features obtained in step S53

Reflecting the inherent link between the tracked object and the detected object, which can be used as the basis for assigning the digital identification. Thus, matching is performed by an object matching algorithm, and a digital identification is assigned to each detected object in combination with the digital identification of the tracked object. If tracking object O_iAnd detecting the object D_jIf there is a match, then the detected object is assigned the same numerical identification as the tracked object.

And the similarity between the attribute characteristics of the detection object and the tracking object is larger than a threshold value, the detection object and the tracking object are considered to be the same individual and are endowed with the same digital identifier, otherwise, the tracking object is considered to have disappeared.

Specifically, as shown in fig. 5, the step S6 includes:

in step S61, if a detected object in the t-th frame image is successfully matched with a tracked object in the t-1 th frame image, the process goes to step S63 and ends;

if the matching between a certain detected object in the t-th frame image and a certain tracked object in the t-1-th frame image is unsuccessful, the step S62 is executed;

in step S62, if a detected object in the t-th frame image is successfully matched with a tracked object in the t-2 th to t-k frame images, the process goes to step S63 and ends; if the matching is not successful, the process proceeds to step S64 and ends.

If there is no object matching the detection object of the current frame in the tracking object of the previous frame, the detection object of the current frame needs to be matched with the tracking objects in the t-1 th to t-k th frames. If the matching is successful, the same digital identifier as the tracking object needs to be given to the detection object. If the matching is not successful, the detection object is a new animal, and a new identity needs to be given to the detection object.

In step S6, if there is a tracked object that does not match any of the current detected objects, the attribute feature of the tracked object is stored to continue matching.

In addition, in this embodiment, it is necessary to first binarize the obtained value of the associated feature to obtain an associated feature with a value of [0,1 ]. Wherein, 0 represents that the detection object and the tracking object are not possible to be the same individual, 1 represents that the detection object and the tracking object are possible to be the same individual, and the problem is converted into the maximum matching problem. The problem can be solved through the Hungarian algorithm, and a tracking result is finally obtained and output.

It can be seen from the foregoing embodiments that, in the target tracking method based on the monitoring videos of multiple animals in the present disclosure, the monitoring images are processed by a deep learning technique, and the animals existing in the images are detected and the movement trajectories thereof are tracked, so as to assist managers in locating the positions of the cultured animals and tracking the lives thereof. The method and the device can analyze and process the monitored images and extract the characteristics of the images, thereby detecting and identifying the animals in the images, and judging whether the animals detected in the adjacent frames belong to the same individual or not according to the characteristics extracted from the images of the adjacent frames. For managers, the animals in the monitoring video are identified by the aid of a computer, so that the energy and time consumed by identifying the animals in the images can be reduced, the number of the animals in a farm can be conveniently determined, and other possible problems can be relieved; the tracking of the animal motion trail is beneficial for managers to confirm the movement direction of a certain animal, and can provide certain help for possible problems such as missing and the like. For the plant, the information level of management can be improved, the labor cost is reduced, and the management efficiency is improved.

It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, corresponding to the method of any embodiment, the disclosure also provides an image monitoring device for an animal farm. The image monitoring device can be used for realizing the target tracking method.

Specifically, the image monitoring device includes: a camera module to implement step S1; an image processing module to implement step S2; an animal detection module to implement step S3; a marking module to implement step S4; a feature extraction module to implement step S5; a matching module for implementing step S6; an output module for implementing step S7.

Therefore, the modules in the image monitoring device are sequentially in communication connection and are mutually matched, so that the target tracking method in the disclosure is realized.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.

The apparatus of the foregoing embodiment is used to implement the corresponding target tracking method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above-mentioned embodiments, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the target tracking method described in any of the above embodiments is implemented.

Fig. 6 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the foregoing embodiment is used to implement the corresponding target tracking method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above-described embodiment methods, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the target tracking method according to any of the above embodiments.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the target tracking method according to any one of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description.

The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. A target tracking method based on monitoring videos of a plurality of animals is characterized by comprising the following steps:

s1: monitoring a plurality of animals in real time to obtain original images;

s2: preprocessing an original image to obtain a processed image;

s7: outputting a tracking result;

2. The object tracking method according to claim 1, wherein step S5 includes:

3. The object tracking method according to claim 2, wherein step S51 includes:

max∑|f(O_i)-f(D_j)|，if id(O_i)≠id(D_j)，i＝1，2，...，p，

min∑|f(O_i)-f(D_j)|，if id(O_i)＝id(D_j)，j＝1，2，...，q，

wherein (x)_i，y_i，h_i，w_i) Represents the abscissa, ordinate, height and width of the top left corner vertex of the tracked object, (x)_j，y_j，h_j，w_j) Representing the abscissa, ordinate, height and width of the top left corner vertex of the detected object, and the calculated relative distance is recorded as the spatial feature s_ij。

4. The target tracking method of claim 2,

step S52 includes:

the pixel characteristics of the tracked object and the pixel characteristics of the detected object are processed by a convolution neural network to obtain the attribute characteristics A of the tracked object_iI-0, 1,2, p and an attribute feature B of the detection object_j，j＝0，1，2，...，q；

Step S53 includes:

5. The object tracking method according to claim 2, wherein step S55 includes:

Wherein f is_RA neural network representing the completion of the associated feature update, i 0,1, 2., p, j 0,1, 2.., q;

Of course, multiple updates may be made in the neural network, wherein,

wherein f is_BA neural network representing that the updating of the attribute features of the detection object is completed, i is 0,1,2,. and p, j is 0,1,2,. and q;

And the correlation characteristics between the two

Carrying out a polymerization operation:

wherein E (·) represents an averaging aggregation function, i ═ 0,1, 2.., p, j ═ 0,1, 2.., q;

the overall characteristics are then updated and,

6. The object tracking method according to claim 2, wherein step S56 includes:

Wherein the content of the first and second substances,

the neural network is used for finishing the last update of the associated features, alpha is used for updating the attribute features, beta is used for updating the associated features, gamma is used for updating the overall features, and i is 0,1, 2.

7. The object tracking method according to claim 1, wherein step S6 includes:

8. The object tracking method according to claim 1, wherein step S3 includes:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 8 when executing the program.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 8.