CN107808122B - Target tracking method and device - Google Patents

Target tracking method and device Download PDF

Info

Publication number
CN107808122B
CN107808122B CN201710920018.7A CN201710920018A CN107808122B CN 107808122 B CN107808122 B CN 107808122B CN 201710920018 A CN201710920018 A CN 201710920018A CN 107808122 B CN107808122 B CN 107808122B
Authority
CN
China
Prior art keywords
neural network
target
bounding box
convolutional neural
different
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710920018.7A
Other languages
Chinese (zh)
Other versions
CN107808122A (en
Inventor
杨依凡
王宇庆
杨航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Original Assignee
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun Institute of Optics Fine Mechanics and Physics of CAS filed Critical Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority to CN201710920018.7A priority Critical patent/CN107808122B/en
Publication of CN107808122A publication Critical patent/CN107808122A/en
Application granted granted Critical
Publication of CN107808122B publication Critical patent/CN107808122B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a target tracking method and a target tracking device, which combine two layers of convolutional neural networks with a time recursive neural network model and solve the problem of low detection rate of small targets. And moreover, information related to the target in the background is extracted for target detection, so that the speed and the accuracy of the target tracking model in video target detection are improved.

Description

Target tracking method and device
Technical Field
The present application relates to the field of target detection technologies, and in particular, to a target tracking method and apparatus.
Background
Target tracking is always a hotspot problem in the fields of computer vision and pattern recognition, and has wide application in video monitoring, man-machine interaction, vehicle navigation and the like. The inventor finds that the current target tracking method has poor detection effect on a small group in the process of realizing the application.
Therefore, how to improve the accuracy of the target detection result is an urgent problem to be solved.
Disclosure of Invention
The application aims to provide a target tracking method and a target tracking device so as to improve the accuracy of a target detection result.
In order to achieve the purpose, the application provides the following technical scheme:
a target tracking method for detecting the target of each frame of image in a video stream through a pre-trained target tracking model comprises the following steps:
a first convolution neural network in the target tracking model performs target detection on the image to obtain the position of the detected target in the image and the category of the detected target;
a second convolutional neural network in the target tracking model performs target detection based on the background on the image to obtain information associated with different types of targets in the background;
and the time recursive neural network in the target tracking model associates the detected target with different backgrounds at different moments based on the information associated with the targets of different types in the backgrounds to obtain a target detection result.
In the method, preferably, the process of performing target detection on the image by the first convolutional neural network includes:
dividing the image into n x n meshes;
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box;
and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of all the reserved bounding boxes of different categories to obtain the position and category information of the target.
In the method, preferably, the process of performing target detection on the image by the first convolutional neural network includes:
dividing the image into m grids according to L different division granularities, wherein m has L different values;
predicting a plurality of bounding boxes in each grid corresponding to each partition granularity, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box to the category of each bounding box based on the trust value and the category value corresponding to each bounding box in the grid;
and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of the bounding boxes of different categories reserved under different partition granularities to obtain the position and category information of the target.
Preferably, in the above method, the associating, by the temporal recurrent neural network, the detected target with different backgrounds at different times based on the information associated with the targets of different categories in the backgrounds to obtain the target detection result includes:
the time recursive neural network correlates the detected target with different backgrounds at different times through the pre-learned correlation relationship between the target of the same type and different backgrounds at different times to obtain a target detection result.
In the above method, preferably, the training process of the target tracking model includes:
assigning the weight of the parameter of the convolution layer in the YOLO convolution neural network to the first convolution neural network, and initializing the weights of other parameters of the first convolution neural network by adopting Gaussian random distribution; performing end-to-end training on the first convolutional neural network on a target detection and classification task to obtain a first convolutional neural network model;
assigning the weight of the parameter of the convolution layer in the first convolution neural network to the second convolution neural network, and initializing the weights of other parameters of the second convolution neural network by selecting Gaussian random distribution; performing end-to-end training on the second convolutional neural network on a background-based target type detection task to obtain a second convolutional neural network model;
assigning parameters of the weight of the convolutional layer of the second convolutional neural network model to the convolutional layer of the first convolutional neural network model, training again through the steps, and repeating the steps twice to obtain a final first convolutional neural network model and a final second convolutional neural network model;
training a time recurrent neural network on a task of associating the same type of target with different backgrounds at different moments through a preselected video training set to obtain a time recurrent neural network model; the video training set comprises a first type of video and a second type of video which are equal in quantity, the time lengths of the first type of video and the second type of video are the same, and the variation amplitude of a target in the first type of video is larger than that of a target in the second type of video;
constructing an initial target tracking model: connecting all convolutional layers of a first convolutional neural network model into the time recursive neural network model through a first fully-connected layer, connecting at least one part (for example, all convolutional layers or the first 12 layers) of the convolutional layers of the second convolutional neural network model into the time recursive neural network model through a second fully-connected layer, connecting the output end of the time recursive neural network model with the input ends of the first fully-connected layer and the second fully-connected layer, and the input end of a third fully-connected layer,
and training the initial target tracking model on a preset target detection task to obtain the target tracking model.
Preferably, in the method, the end-to-end training of the first convolutional neural network on the target detection and classification task includes: the first convolutional neural network performs target detection and classification by the following method:
dividing the image into n x n grids;
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box;
deleting the bounding boxes in the grids, of which the trust value scores of the category information are smaller than a preset threshold value, and respectively inhibiting non-maximum values of the bounding boxes of different categories reserved in all the grids to obtain a target detection result;
calculating the error degree of the target detection result of the first convolutional neural network through a preset loss function, wherein the loss function is as follows:
Figure BDA0001426493480000041
wherein, Loss is the error process of the target detection result of the first convolution neural networkDegree, lambda1Predicting a loss weight, λ, of the loss for the coordinates1Can take on a value of 5, λ2Loss weight, λ, for loss of confidence value for bounding box without target2Can take on a value of 0.5, lambda3Loss weight, λ, for loss of confidence value and loss of class for bounding box containing target3The value of (d) can be 1; i is used to distinguish different grids and j is used to distinguish different bounding boxes; x is the number ofij,yij,wij,hij,CijThe predicted value is represented by a value of the prediction,
Figure BDA0001426493480000042
indicating a calibrated value, S2Representing the number of divided grids, B representing the number of bounding boxes in a certain grid, CijRepresents the confidence score, p, of the jth bounding box in the ith gridi(c) Representing the probability of the occurrence of the object of category c in the ith grid; if the pre-calibrated bounding box is the same as the item type detected by the jth bounding box in the ith grid, then
Figure BDA0001426493480000043
Taking 1; otherwise
Figure BDA0001426493480000044
Taking 0; if the pre-calibrated bounding box is the same as the item type detected by the jth bounding box in the ith grid, then
Figure BDA0001426493480000045
Taking 0; otherwise
Figure BDA0001426493480000046
Taking 1;
and if the error degree is greater than or equal to the preset threshold, updating the weight by adopting a back propagation algorithm and an Adam update method, and inputting unused data in a training library for next training until the difference value between the loss degree and the minimum value of the loss function is less than a preset threshold.
An object detection device comprising:
the first detection module is used for carrying out target detection on each frame of image in the video stream through a first convolutional neural network to obtain the position of a detected target in the image and the category of the detected target;
the second detection module is used for carrying out target detection based on the background on the image through a second convolutional neural network to obtain information associated with different types of targets in the background;
and the association module is used for associating the detected targets with different backgrounds at different moments based on the information associated with the targets of different types in the backgrounds to obtain target detection results.
Preferably, the first detection module is specifically configured to divide the image into n × n grids through a first convolutional neural network; predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box; calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box; and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of all the reserved bounding boxes of different categories to obtain the position and category information of the target.
Preferably, in the apparatus, the first detection module is specifically configured to divide the image into m × m grids according to L different division granularities through a first convolutional neural network, where m has L different values; predicting a plurality of bounding boxes in each grid corresponding to each partition granularity, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box; calculating the trust value score of each bounding box to the category of each bounding box based on the trust value and the category value corresponding to each bounding box in the grid; and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of the bounding boxes of different categories reserved under different partition granularities to obtain the position and category information of the target.
The above apparatus, preferably, the association module is specifically configured to,
and associating the detected target with different backgrounds at different moments to obtain a target detection result through the association relationship between the target of the same type and different backgrounds at different moments learned in advance.
According to the scheme, the target tracking method and the target tracking device provided by the application combine the two layers of convolutional neural networks and the time recursive neural network model, and solve the problem of low detection rate of small targets. And moreover, information related to the target in the background is extracted for target detection, so that the speed and the accuracy of the target tracking model in video target detection are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is an exemplary diagram of a target tracking model provided by an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of a target tracking method provided in an embodiment of the present application;
fig. 3 is a flowchart of an implementation of the object detection apparatus according to the embodiment of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, which is an exemplary diagram of a target tracking model provided in an embodiment of the present application, the target tracking model provided in the present application includes two Convolutional Neural Networks (CNNs) and a time recursive Neural network LSTM (Long Short-Term Memory). The convolutional network 1 is a convolutional layer of one convolutional neural network (hereinafter referred to as a first convolutional neural network for convenience of distinction), and the convolutional network 2 is a convolutional layer of another convolutional neural network (hereinafter referred to as a second convolutional neural network for convenience of distinction).
The following first explains the training process of the target tracking model.
In the embodiment of the application, the two convolutional neural networks and the time recursive neural network are trained independently, an initial target tracking model of the application is constructed based on the result obtained by training, and then the initial target tracking model is trained to obtain a final target tracking model.
In the embodiment of the application, the first convolutional neural network is mainly responsible for extracting the target and marking the type and the position of the target. The first convolutional neural network comprises 24 convolutional layers and 2 fully-connected layers. The method can be obtained by training on the basis of a YOLO (you Only Look one) convolutional neural network. Specifically, a weight of a parameter of a convolutional layer in the YOLO convolutional neural network is assigned to a convolutional layer of a first convolutional neural network, and a weight of a fully-connected layer of the first convolutional neural network is initialized by using gaussian random distribution (for example, the gaussian random distribution may be a gaussian random distribution with a mean value of zero and a variance of 0.01); performing end-to-end training on the first convolutional neural network on a target detection and classification task to obtain a first convolutional neural network initial model;
in the training process, one way for the first convolutional neural network to perform the target detection and classification task may be as follows:
each frame of image in the training video is divided into n x n grids, wherein n is a positive integer. In an alternative embodiment, n may take a value of 7. The position and the class value of the target are marked in each frame of image in the training video.
Predicting a plurality of bounding boxes (usually rectangular boxes for marking detected targets) in each grid, and recording the position and the size of each predicted bounding box and a corresponding trust value and a corresponding category value of each bounding box; the class value represents the class of the target in the bounding box, the trust value represents two important pieces of information, namely the confidence degree of the target in the predicted bounding box and the prediction accuracy of the bounding box, and the calculation formula of the trust value is as follows:
Figure BDA0001426493480000071
wherein, the values of Pr (object) are determined according to whether the object is in a bounding box, when the object is in a bounding box, the values of Pr (object) are 1, otherwise, the values of Pr (object) are 0.
Figure BDA0001426493480000072
Represents the IOU (Intersection-over-Union ratio) value between the predicted bounding box and the targeted target bounding box. Wherein, whether the target falls in the bounding box can be judged according to the calibration value, and the target falls in the bounding box includes: the target falls entirely within the bounding box and the target portion falls within the bounding box.
Generally, the location of the bounding box is the coordinates of the upper left corner of the bounding box, and the size of the bounding box is the length and width of the bounding box.
And calculating the trust value score of the category to which each bounding box pair belongs based on the corresponding trust value and the category value of each bounding box.
And multiplying the trust value corresponding to each bounding box by the class value to obtain the specific class trust value score of each bounding box, namely the trust value score of the class to which each bounding box belongs.
And deleting the bounding boxes of which the scores of the trust values of the categories are smaller than a preset score threshold value in the grids, and performing non-maximum suppression on the bounding boxes belonging to the same category in the bounding boxes reserved in the grids to obtain a target detection result of each grid.
The processing mode of each grid is the same, and is not described in detail here.
In an alternative embodiment, the predetermined score threshold may be 0.6.
After the target detection result of each grid is obtained, the non-maximum value suppression is carried out on the bounding boxes belonging to the same category in the whole image, and the final target detection result is obtained.
The non-maximum suppression process for bounding boxes belonging to the same category in the bounding boxes reserved in the grid may be:
determining the bounding box with the highest score of the arbitrary values in the bounding boxes in the same category (denoted as the first bounding box for convenience of description);
and calculating the coincidence rate of other bounding boxes (for convenience of description, called second bounding boxes) in the same category and the first bounding box, deleting the second bounding box if the coincidence rate is higher than a set value, and otherwise, keeping the second bounding box.
Calculating the error degree of the target detection result of the first convolutional neural network through a preset loss function, wherein the loss degree represents the error between a predicted value (namely the detection result) and a calibrated value, and the loss function is as follows:
Figure BDA0001426493480000081
wherein, Loss is the error degree of the target detection result of the first convolution neural network, lambda1Predicting a loss weight, λ, of the loss for the coordinates1Can take on a value of 5, λ2Loss weight, λ, for loss of confidence value for bounding box without target2Can take on a value of 0.5, lambda3Loss weight, λ, for loss of confidence value and loss of class for bounding box containing target3The value of (d) can be 1;i is used to distinguish different grids and j is used to distinguish different bounding boxes. x is the number ofij,yij,wij,hij,CijIndicates the predicted value, xijAnd yijFor the predicted coordinates of the jth bounding box in the ith grid, wijFor the predicted width, h, of the jth bounding box in the ith meshijTo predict the height of the jth bounding box in the ith mesh,
Figure BDA0001426493480000082
the value of the calibration is represented and,
Figure BDA0001426493480000083
and
Figure BDA0001426493480000084
for the coordinates of the jth bounding box in the nominal ith grid,
Figure BDA0001426493480000085
for the nominal width of the jth bounding box in the ith grid,
Figure BDA0001426493480000086
for the height of the jth bounding box in the ith grid, S2Representing the number of divided grids, B representing the number of bounding boxes in a certain grid, CijRepresents the confidence score for the jth bounding box in the predicted ith mesh,
Figure BDA0001426493480000087
representing a confidence score, p, for the jth bounding box in the ith grid of the targeti(c) Representing the probability of the bounding box of the class c in the predicted ith mesh;
Figure BDA0001426493480000088
representing the probability of the bounding box of the c category in the nominal ith mesh. The probability of the occurrence of the bounding boxes of the c category in the ith grid is the quotient of the number of the bounding boxes of the c category in the ith grid and the total number of all the bounding boxes in the ith bounding box.
Figure BDA0001426493480000091
The value of (b) is determined according to whether the jth bounding box in the ith grid contains a set detection target, if the pre-calibrated bounding box is the same as the item type detected by the jth bounding box in the ith grid, the item type is determined to be the same
Figure BDA0001426493480000092
Taking 1; otherwise, 0 is taken.
Figure BDA0001426493480000093
A product of the confidence value prediction penalty and the penalty weight representing the bounding box containing the target;
Figure BDA0001426493480000094
a product of the confidence value prediction penalty and the penalty weight representing the bounding box without the target;
Figure BDA0001426493480000095
the value of (b) is determined according to whether the jth bounding box in the ith grid contains a set detection target, if the pre-calibrated bounding box is the same as the item type detected by the jth bounding box in the ith grid, the item type is determined to be the same
Figure BDA0001426493480000096
Taking 0; otherwise
Figure BDA0001426493480000097
1 is taken.
Figure BDA0001426493480000098
The product of the class prediction loss and the loss weight indicating whether the target center falls within grid i. Wherein, if the target center falls in the grid i, then
Figure BDA0001426493480000099
The value of (a) is 1, otherwise,
Figure BDA00014264934800000910
the value is 0. And c represents a category.
In order to detect both a small target and a large target, in the embodiment of the application, in order to make each loss in the loss function more balanced, the coordinate prediction loss is represented by the euler distance, so that only the coordinate is finely adjusted in the process of optimizing the first convolution neural network, and the problems of target false detection and target missing detection and multiple detection are solved.
And if the error degree is greater than or equal to the preset threshold, updating the weight by adopting a BP back propagation algorithm and an Adam update method, and inputting other data of the database for next training until the error degree is less than the preset threshold.
In the training process, another way for the first convolutional neural network to perform the target detection and classification task may be:
dividing the image into m grids according to L different division granularities, wherein m has L different values; in an alternative embodiment, L may have a value of 4, and 4 values of m may be 7, 5, 3, and 1, respectively. Corresponding to each of the granularity of the division,
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each predicted bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box;
and deleting the bounding boxes in the grid, of which the trust value scores of the category information are smaller than a preset threshold value, and respectively performing non-maximum suppression on the bounding boxes of different categories reserved in the grid, namely performing non-maximum suppression on the bounding boxes belonging to the same category in the bounding boxes reserved in the grid to obtain a target detection result of each grid.
The processing mode of each grid is the same, and is not described in detail here.
After the target detection result of each grid is obtained, carrying out non-maximum suppression on bounding boxes of different types in the whole image respectively, namely carrying out non-maximum suppression on bounding boxes belonging to the same type in the whole image to obtain a final target detection result.
And calculating the error degree of the target detection result of the first convolution neural network through a preset loss function.
And if the error degree is greater than or equal to the preset threshold, updating the weight by adopting a BP back propagation algorithm and an Adam update method, and inputting other data of the database for next training until the error degree is less than the preset threshold.
The above target detection and classification process in each granularity division can be referred to as the foregoing process, that is, when the image is divided into 7 × 7 grids, the above target detection process is performed once, when the image is divided into 5 × 5 grids, the above target detection process is performed once, and so on, until the above target detection is performed in each granularity division. The target detection process at each granularity is not described in detail here.
In each training process, the union of the detection results under all the granularities is the final target detection result in the training process.
In the embodiment of the application, the target detection and classification are carried out through various granularity divisions, so that the accuracy of the target detection is higher.
The second convolutional neural network is primarily responsible for extracting information associated with different classes of targets in the background. The second convolutional neural network has the same structure as the first convolutional neural network, but performs different tasks and outputs, the task performed by the second convolutional neural network is the detection of the type of the target based on the background, the output of the second convolutional neural network is information associated with different types of targets in the background, the second convolutional neural network optimizes the Softmax function as a loss function, and the parameter updating process is the same as that of the first convolutional network.
When a second convolutional neural network is trained, assigning the weight of the parameter of the convolutional layer in the trained first convolutional neural network to the second convolutional neural network, and initializing the weight of the parameter of the full-connection layer of the second convolutional neural network by selecting Gaussian random distribution; performing end-to-end training on the second convolutional neural network on a background-based target type detection task to obtain a second convolutional neural network model; background-based object type detection may use common detection methods.
And assigning the parameters of the weight of the convolutional layer of the second convolutional neural network model to the parameters of the convolutional layer of the first convolutional neural network model, and training the first convolutional neural network model and the second convolutional neural network model again by the method, so that the first convolutional neural network model and the second convolutional neural network model are circularly performed twice (namely, three times of training are performed in total) to obtain the final first convolutional neural network model and the final second convolutional neural network model.
In the embodiment of the application, the first convolutional neural network and the second convolutional neural network are jointly trained, so that the calculation speed in the training process is increased.
From the training process of the two convolutional neural networks, the convolutional layer parameters of the first convolutional neural network and the second convolutional neural network are the same. In order to reduce the calculation time, the first convolutional neural network and the second convolutional neural network can share convolutional layer parameters, so that the occupied storage space can be reduced.
The time recursive neural network is mainly used for correlating the detected target with different backgrounds at different moments, so that the target detection accuracy in the video is improved.
In the embodiment of the application, a training set comprising two types of videos is selected to train the time recurrent neural network. The quantity of the first type of video and the quantity of the second type of video are equal, the time lengths of the first type of video and the second type of video are the same, and the change amplitude of the target in the first type of video is larger than that of the target in the second type of video; the large variation range of the target may mean that the target suddenly appears, disappears, or the posture and the like have large variation. The small change amplitude of the target may mean that the target changes slowly, does not appear or disappear suddenly, and has a small posture change.
And analyzing the incidence relation between the same target in each video and different backgrounds at different moments by the time recurrent neural network, and obtaining the incidence relation between the same type of target and different backgrounds at different moments by machine learning.
And in the training process, updating the weight according to a time back propagation algorithm and an Adam update method.
The respective training processes of the convolutional neural network and the time recursive neural network have been described previously. The following describes a process of training a target tracking model formed by the above-described trained convolutional neural network and temporal recurrent neural network.
Constructing an initial target tracking model through the two trained convolutional neural network models and the time recursive neural network model: connecting all convolution layers of the first convolution neural network model into the time recurrent neural network model through a first full connection layer, connecting at least part of convolution layers of the second convolution neural network model into the time recurrent neural network model through a second full connection layer, and connecting the output end of the time recurrent neural network model with the input ends of the two first full connection layers and the input end of the third full connection layer.
And training the initial target tracking model on a preset target detection task to obtain the target tracking model.
The preset target detection task may be:
the method comprises the steps that a first convolution neural network carries out target detection on an image to obtain the position of a detected target in the image and the type of the detected target;
the second convolutional neural network carries out target detection based on the background on the image to obtain information associated with different types of targets in the background;
and the time recursive neural network correlates the detected target with different backgrounds at different moments based on the information correlated with the targets of different types in the backgrounds to obtain a target detection result, and the target detection result is output through a third full-connection layer.
In a preferred embodiment, after obtaining the target detection result, the time-recursive neural network does not output the result, but feeds back the target detection result to the convolutional neural network, specifically to the fully-connected layer of the convolutional neural network, the fully-connected layer of the previous stage randomly selects the data output by the convolutional network and the data fed back by the LSTM, processes the randomly selected value by the time-recursive neural network to obtain a final target detection result, and outputs the final target detection result through the final fully-connected layer. In the embodiment of the application, the target detection precision is improved through a feedback mechanism.
In the target tracking model training process, the weight of the parameters in the convolutional neural network is updated by adopting a BP back propagation algorithm and an Adam update method, and the weight of the parameters in the time recursive neural network is updated by adopting a time back propagation algorithm and the Adam update method.
In an alternative embodiment, the process of the first convolutional neural network performing target detection on the image may include:
dividing the image into n x n meshes;
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box;
and deleting the bounding boxes in the grid, of which the trust values to the categories are less than a preset threshold value, and performing non-maximum value suppression on the bounding boxes which are reserved in the grid and belong to the same category to obtain the position and category information of the target in the grid.
After the target detection result of each grid is obtained, the non-maximum value suppression is carried out on the bounding boxes belonging to the same category in the whole image, and the final target detection result is obtained.
In an alternative embodiment, the process of the first convolutional neural network performing target detection on the image may include:
dividing the image into m grids according to L different division granularities, wherein m has L different values;
corresponding to each partition granularity, pre-predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box to the category information based on the trust value and the category value corresponding to each bounding box;
and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and performing non-maximum suppression on the bounding boxes belonging to the same category in the bounding boxes reserved in the grid to obtain the position and category information of the target.
After the target detection result of each grid is obtained, the non-maximum value suppression is carried out on the bounding boxes belonging to the same category in the whole image, and the final target detection result is obtained.
Target detection was performed at each granularity by the method described above.
After the target tracking model is trained, the target tracking model can be used for target detection.
Referring to fig. 2, fig. 2 is a flowchart of an implementation of a target tracking method according to an embodiment of the present application, where the implementation of the target tracking method includes:
step S21: the first convolution neural network carries out target detection on the image to obtain the position of the detected target in the image and the category of the detected target;
step S22: the second convolutional neural network carries out target detection based on the background on the image to obtain information associated with different types of targets in the background;
step S22: and the time recursive neural network associates the detected targets with different backgrounds at different moments based on the information associated with the targets of different classes in the backgrounds to obtain target detection results.
The process of performing target detection on the image by the first convolutional neural network may include:
dividing the image into n x n meshes;
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box to the category information based on the trust value and the category value corresponding to each bounding box;
and deleting the bounding boxes in the grids, of which the trust value scores to the category information are smaller than a preset threshold value, and performing non-maximum suppression on the bounding boxes reserved in the grids, which belong to the same category, to obtain the position and the category information of the target in each grid.
After the target detection result of each grid is obtained, the non-maximum value suppression is carried out on the bounding boxes belonging to the same category in the whole image, and the final target detection result is obtained.
In another alternative embodiment, the process of performing target detection on the image by the first convolutional neural network may include:
dividing the image into m grids according to L different division granularities, wherein m has L different values; in an alternative embodiment, L may have a value of 4, and 4 values of m may be 7, 5, 3, and 1, respectively. Corresponding to each of the granularity of the division,
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each predicted bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box to the category information based on the trust value and the category value corresponding to each bounding box;
and deleting the bounding boxes in the grids, of which the trust value scores to the category information are smaller than a preset threshold value, and performing non-maximum suppression on the bounding boxes belonging to the same category in the bounding boxes reserved in the grids to obtain the position and the category information of the target in each grid.
After the target detection result of each grid is obtained, the non-maximum value suppression is carried out on the bounding boxes belonging to the same category in the whole image, and the final target detection result is obtained.
The process of target detection is the same for each granularity division, which is not described herein.
In an alternative embodiment, the associating, by the temporal recurrent neural network, the detected object with different contexts at different times based on the information associated with the different classes of objects in the contexts to obtain the object detection result may include:
the time recursive neural network correlates the detected target with different backgrounds at different times through the pre-learned correlation relationship between the target of the same type and different backgrounds at different times to obtain a target detection result.
Corresponding to the method embodiment, the present application further provides a target detection apparatus, and an implementation flowchart of the target detection apparatus provided in the embodiment of the present application is shown in fig. 3, and may include:
a first detection module 31, a second detection module 32 and an association module 33; wherein the content of the first and second substances,
the first detection module 31 is configured to perform target detection on each frame of image in the video stream through a first convolutional neural network, so as to obtain a position of a detected target in the image and a category of the detected target;
the second detection module 32 is configured to perform background-based target detection on the image through a second convolutional neural network, so as to obtain information associated with different types of targets in the background;
the association module 33 is configured to associate the detected target with different backgrounds at different times based on the information associated with the targets of different categories in the backgrounds, so as to obtain a target detection result.
The application provides a target detection device combines two-layer convolution neural network and time recursion neural network model, has solved the problem that the detection rate is low to little target. And moreover, information related to the target in the background is extracted for target detection, so that the speed and the accuracy of the target tracking model in video target detection are improved.
In an optional embodiment, the first detecting module 31 may be specifically configured to divide the image into n × n grids through a first convolutional neural network; predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box; calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box; and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of all the reserved bounding boxes of different categories to obtain the position and category information of the target.
In another optional embodiment, the first detecting module 31 may be specifically configured to divide the image into m × m grids according to L different division granularities through a first convolutional neural network, where m has L different values; predicting a plurality of bounding boxes in each grid corresponding to each partition granularity, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box; calculating the trust value score of each bounding box to the category of each bounding box based on the trust value and the category value corresponding to each bounding box in the grid; and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of the bounding boxes of different categories reserved under different partition granularities to obtain the position and category information of the target.
In an alternative embodiment, the association module 33 may be specifically adapted to,
and associating the detected target with different backgrounds at different moments to obtain a target detection result through the association relationship between the target of the same type and different backgrounds at different moments learned in advance.
In an optional embodiment, the target detection apparatus may further include:
the training module is used for training a target tracking model, and specifically used for assigning the weight of the parameter of the convolution layer in the YOLO convolutional neural network to the first convolutional neural network, and the weights of other parameters of the first convolutional neural network are initialized by adopting Gaussian random distribution; performing end-to-end training on the first convolutional neural network on a target detection and classification task to obtain a first convolutional neural network model;
assigning the weight of the parameter of the convolution layer in the first convolution neural network to the second convolution neural network, and initializing the weights of other parameters of the second convolution neural network by selecting Gaussian random distribution; performing end-to-end training on the second convolutional neural network on a background-based target type detection task to obtain a second convolutional neural network model;
assigning parameters of the weight of the convolutional layer of the second convolutional neural network model to the convolutional layer of the first convolutional neural network model, training again through the steps, and repeating the steps twice to obtain a final first convolutional neural network model and a final second convolutional neural network model;
training a time recurrent neural network on a task of associating the same type of target with different backgrounds at different moments through a preselected video training set to obtain a time recurrent neural network model; the video training set comprises a first type of video and a second type of video which are equal in quantity, the time lengths of the first type of video and the second type of video are the same, and the variation amplitude of a target in the first type of video is larger than that of a target in the second type of video;
constructing an initial target tracking model: connecting all convolutional layers of a first convolutional neural network model into the time recursive neural network model through a first fully-connected layer, connecting at least one part (for example, all convolutional layers or the first 12 layers) of the convolutional layers of the second convolutional neural network model into the time recursive neural network model through a second fully-connected layer, and connecting the output end of the time recursive neural network model with the input ends of the first fully-connected layer and the second fully-connected layer and the input end of a third fully-connected layer.
And training the initial target tracking model on a preset target detection task to obtain the target tracking model.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
It should be understood that the technical problems can be solved by combining and combining the features of the embodiments from the claims.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A target tracking method is characterized in that target detection is carried out on each frame of image in a video stream through a pre-trained target tracking model, and the method comprises the following steps:
a first convolution neural network in the target tracking model performs target detection on the image to obtain the position of the detected target in the image and the category of the detected target;
a second convolutional neural network in the target tracking model performs target detection based on the background on the image to obtain information associated with different types of targets in the background;
the time recursion neural network in the target tracking model associates the detected target with different backgrounds at different moments based on the information associated with the targets of different classes in the backgrounds to obtain a target detection result;
after obtaining a target detection result, the time recurrent neural network feeds back the target detection result to a first full connection layer of the first recurrent neural network and a second full connection layer of the second recurrent neural network, the first full connection layer and the second full connection layer randomly select data output by the recurrent neural network and data fed back by the time recurrent neural network, the randomly selected numerical value is processed by the time recurrent neural network to obtain a final target detection result, and the final target detection result is output through a third full connection layer;
the training process of the target tracking model comprises the following steps:
assigning weight parameters of convolution layers in the YOLO convolutional neural network to the first convolutional neural network, and initializing the weights of other weight parameters of the first convolutional neural network by selecting Gaussian random distribution; performing end-to-end training on the first convolutional neural network on a target detection and classification task to obtain a first convolutional neural network model;
assigning the weight parameters of convolution layers in the first convolution neural network to the second convolution neural network, and initializing the weights of other weight parameters of the second convolution neural network by adopting Gaussian random distribution; performing end-to-end training on the second convolutional neural network on a background-based target type detection task to obtain a second convolutional neural network model;
assigning weight parameters of the convolutional layer of the second convolutional neural network model to the convolutional layer of the first convolutional neural network model, training again through the steps, and repeating the steps twice to obtain a final first convolutional neural network model and a final second convolutional neural network model;
training a time recurrent neural network on a task of associating the same type of target with different backgrounds at different moments through a preselected video training set to obtain a time recurrent neural network model; the video training set comprises a first type of video and a second type of video which are equal in quantity, the time lengths of the first type of video and the second type of video are the same, and the variation amplitude of a target in the first type of video is larger than that of a target in the second type of video;
constructing an initial target tracking model: connecting all convolutional layers of a first convolutional neural network model into the time recursive neural network model through a first fully-connected layer, connecting at least one part of convolutional layers of a second convolutional neural network model into the time recursive neural network model through a second fully-connected layer, and connecting the output end of the time recursive neural network model with the input ends of the first fully-connected layer and the second fully-connected layer and the input end of a third fully-connected layer;
and training the initial target tracking model on a preset target detection task to obtain the target tracking model.
2. The method of claim 1, wherein the first convolutional neural network performs a target detection process on the image, comprising:
dividing the image into n x n meshes;
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box;
and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of all the reserved bounding boxes of different categories to obtain the position and category information of the target.
3. The method of claim 1, wherein the first convolutional neural network performs a target detection process on the image, comprising:
dividing the image into m grids according to L different division granularities, wherein m has L different values;
predicting a plurality of bounding boxes in each grid corresponding to each partition granularity, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box to the category of each bounding box based on the trust value and the category value corresponding to each bounding box in the grid;
and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of the bounding boxes of different categories reserved under different partition granularities to obtain the position and category information of the target.
4. The method of claim 1, wherein the temporally recurrent neural network associates the detected objects with different contexts at different times based on information associated with different classes of objects in the contexts to obtain object detection results, comprising:
the time recursive neural network correlates the detected target with different backgrounds at different times through the pre-learned correlation relationship between the target of the same type and different backgrounds at different times to obtain a target detection result.
5. The method of claim 1, wherein the end-to-end training of the first convolutional neural network on a target detection and classification task comprises: the first convolutional neural network performs target detection and classification by the following method:
dividing the image into n x n grids;
predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box;
calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box;
deleting the bounding boxes in the grids, of which the trust value scores of the category information are smaller than a preset threshold value, and respectively inhibiting non-maximum values of the bounding boxes of different categories reserved in all the grids to obtain a target detection result;
calculating the error degree of the target detection result of the first convolutional neural network through a preset loss function, wherein the loss function is as follows:
Figure 440945DEST_PATH_IMAGE002
wherein, Loss is the error degree of the target detection result of the first convolutional neural network,
Figure 295768DEST_PATH_IMAGE003
the lost weight of the loss is predicted for the coordinates,
Figure 13189DEST_PATH_IMAGE003
the value of (a) may be 5,
Figure 408398DEST_PATH_IMAGE004
the loss weight for the loss of trust value for the bounding box without the target,
Figure 160453DEST_PATH_IMAGE004
the value of (a) may be 0.5,
Figure 982916DEST_PATH_IMAGE005
the loss weights for the loss of confidence value and the loss of category for the bounding box containing the target,
Figure 249949DEST_PATH_IMAGE005
the value of (d) can be 1; i is used to distinguish different grids and j is used to distinguish different bounding boxes;
Figure 885067DEST_PATH_IMAGE006
Figure 553946DEST_PATH_IMAGE007
Figure 547310DEST_PATH_IMAGE008
Figure 239322DEST_PATH_IMAGE009
Figure 976334DEST_PATH_IMAGE010
the predicted value is represented by a value of the prediction,
Figure 499720DEST_PATH_IMAGE011
and
Figure 867247DEST_PATH_IMAGE012
for the predicted coordinates of the jth bounding box in the ith mesh,
Figure 108872DEST_PATH_IMAGE008
for the predicted width of the jth bounding box in the ith mesh,
Figure 321679DEST_PATH_IMAGE013
to predict the height of the jth bounding box in the ith mesh,
Figure 965150DEST_PATH_IMAGE014
Figure 300316DEST_PATH_IMAGE015
Figure 465456DEST_PATH_IMAGE016
Figure 544271DEST_PATH_IMAGE017
Figure 42248DEST_PATH_IMAGE018
the value of the calibration is represented and,
Figure 751578DEST_PATH_IMAGE014
and
Figure 702217DEST_PATH_IMAGE019
for the coordinates of the jth bounding box in the nominal ith grid,
Figure 584722DEST_PATH_IMAGE020
for the nominal width of the jth bounding box in the ith grid,
Figure 874889DEST_PATH_IMAGE021
to scale the height of the jth bounding box in the ith grid,
Figure 817437DEST_PATH_IMAGE022
represents the number of divided grids, B represents the number of bounding boxes in a certain grid,
Figure 255372DEST_PATH_IMAGE010
representing the trust value score of the jth bounding box in the ith grid,
Figure 49890DEST_PATH_IMAGE023
representing the confidence score of the jth bounding box in the nominal ith grid,
Figure 522460DEST_PATH_IMAGE024
representing the probability of the occurrence of the object of category c in the ith grid;
Figure 573593DEST_PATH_IMAGE025
representing the probability of the bounding box of the class c in the ith mesh of the calibration; if the pre-calibrated bounding box is the same as the item type detected by the jth bounding box in the ith grid, then
Figure 170927DEST_PATH_IMAGE026
Taking 1; otherwise
Figure 395235DEST_PATH_IMAGE027
Taking 0; if the pre-calibrated bounding box is the same as the item type detected by the jth bounding box in the ith grid, then
Figure 659994DEST_PATH_IMAGE028
Taking 0; otherwise
Figure 944345DEST_PATH_IMAGE029
Taking 1;
and if the error degree is greater than or equal to the preset threshold, updating the weight by adopting a back propagation algorithm and an Adam update method, and inputting unused data in a training library for next training until the difference value between the loss degree and the minimum value of the loss function is less than a preset threshold.
6. An object detection device, comprising:
the first detection module is used for carrying out target detection on each frame of image in the video stream through a first convolution neural network in a target tracking model to obtain the position of a detected target in the image and the category of the detected target;
the second detection module is used for carrying out background-based target detection on the image through a second convolutional neural network in the target tracking model to obtain information associated with different types of targets in the background;
the correlation module is used for correlating the detected target with different backgrounds at different moments through a time recurrent neural network in the target tracking model based on the information associated with the targets of different classes in the backgrounds to obtain a target detection result; after obtaining a target detection result, the time recurrent neural network feeds back the target detection result to a first full connection layer of the first recurrent neural network and a second full connection layer of the second recurrent neural network, the first full connection layer and the second full connection layer randomly select data output by the recurrent neural network and data fed back by the time recurrent neural network, the randomly selected numerical value is processed by the time recurrent neural network to obtain a final target detection result, and the final target detection result is output through a third full connection layer;
the training module is used for training a target tracking model, and the specific training process is that weight parameters of convolution layers in the YOLO convolutional neural network are assigned to the first convolutional neural network, and other weight parameters of the first convolutional neural network are initialized by selecting Gaussian random distribution; performing end-to-end training on the first convolutional neural network on a target detection and classification task to obtain a first convolutional neural network model;
assigning the weight parameters of convolution layers in the first convolution neural network to the second convolution neural network, and initializing the weights of other weight parameters of the second convolution neural network by adopting Gaussian random distribution; performing end-to-end training on the second convolutional neural network on a background-based target type detection task to obtain a second convolutional neural network model;
assigning weight parameters of the convolutional layer of the second convolutional neural network model to the convolutional layer of the first convolutional neural network model, training again through the steps, and repeating the steps twice to obtain a final first convolutional neural network model and a final second convolutional neural network model;
training a time recurrent neural network on a task of associating the same type of target with different backgrounds at different moments through a preselected video training set to obtain a time recurrent neural network model; the video training set comprises a first type of video and a second type of video which are equal in quantity, the time lengths of the first type of video and the second type of video are the same, and the variation amplitude of a target in the first type of video is larger than that of a target in the second type of video;
constructing an initial target tracking model: connecting all convolutional layers of a first convolutional neural network model into the time recursive neural network model through a first fully-connected layer, connecting at least one part of convolutional layers of a second convolutional neural network model into the time recursive neural network model through a second fully-connected layer, and connecting the output end of the time recursive neural network model with the input ends of the first fully-connected layer and the second fully-connected layer and the input end of a third fully-connected layer;
and training the initial target tracking model on a preset target detection task to obtain the target tracking model.
7. The apparatus of claim 6, wherein the first detection module is specifically configured to divide the image into n x n meshes by a first convolutional neural network; predicting a plurality of bounding boxes in each grid, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box; calculating the trust value score of each bounding box pair belonging to the category based on the trust value and the category value corresponding to each bounding box; and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of all the reserved bounding boxes of different categories to obtain the position and category information of the target.
8. The apparatus of claim 6, wherein the first detection module is specifically configured to divide the image into m × m grids according to L different division granularities through a first convolutional neural network, where m has L different values; predicting a plurality of bounding boxes in each grid corresponding to each partition granularity, and recording the position and the size of each bounding box, and a trust value and a category value corresponding to each bounding box; calculating the trust value score of each bounding box to the category of each bounding box based on the trust value and the category value corresponding to each bounding box in the grid; and deleting the bounding boxes in the grid, of which the trust values of the categories are less than a preset threshold value, and respectively inhibiting the non-maximum values of the bounding boxes of different categories reserved under different partition granularities to obtain the position and category information of the target.
9. The apparatus according to claim 6, characterized in that the association module is specifically configured to,
and associating the detected target with different backgrounds at different moments to obtain a target detection result through the association relationship between the target of the same type and different backgrounds at different moments learned in advance.
CN201710920018.7A 2017-09-30 2017-09-30 Target tracking method and device Active CN107808122B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710920018.7A CN107808122B (en) 2017-09-30 2017-09-30 Target tracking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710920018.7A CN107808122B (en) 2017-09-30 2017-09-30 Target tracking method and device

Publications (2)

Publication Number Publication Date
CN107808122A CN107808122A (en) 2018-03-16
CN107808122B true CN107808122B (en) 2020-08-11

Family

ID=61584759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710920018.7A Active CN107808122B (en) 2017-09-30 2017-09-30 Target tracking method and device

Country Status (1)

Country Link
CN (1) CN107808122B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008792B (en) * 2018-01-05 2021-10-22 比亚迪股份有限公司 Image detection method, image detection device, computer equipment and storage medium
CN110619254B (en) * 2018-06-19 2023-04-18 海信集团有限公司 Target tracking method and device based on disparity map and terminal
CN108968811A (en) * 2018-06-20 2018-12-11 四川斐讯信息技术有限公司 A kind of object identification method and system of sweeping robot
CN108764215A (en) * 2018-06-21 2018-11-06 郑州云海信息技术有限公司 Target search method for tracing, system, service centre and terminal based on video
CN109145781B (en) * 2018-08-03 2021-05-04 北京字节跳动网络技术有限公司 Method and apparatus for processing image
CN110826572B (en) * 2018-08-09 2023-04-21 京东方科技集团股份有限公司 Non-maximum value inhibition method, device and equipment for multi-target detection
CN110826379B (en) * 2018-08-13 2022-03-22 中国科学院长春光学精密机械与物理研究所 Target detection method based on feature multiplexing and YOLOv3
CN111104831B (en) * 2018-10-29 2023-09-29 香港城市大学深圳研究院 Visual tracking method, device, computer equipment and medium
CN111178495B (en) * 2018-11-10 2023-06-30 杭州凝眸智能科技有限公司 Lightweight convolutional neural network for detecting very small objects in an image
CN109410251B (en) * 2018-11-19 2022-05-03 南京邮电大学 Target tracking method based on dense connection convolution network
CN109817009A (en) * 2018-12-31 2019-05-28 天合光能股份有限公司 A method of obtaining unmanned required dynamic information
CN109753931A (en) * 2019-01-04 2019-05-14 广州广电卓识智能科技有限公司 Convolutional neural networks training method, system and facial feature points detection method
CN110007366B (en) * 2019-03-04 2020-08-25 中国科学院深圳先进技术研究院 Life searching method and system based on multi-sensor fusion
CN110087041B (en) * 2019-04-30 2021-01-08 中国科学院计算技术研究所 Video data processing and transmitting method and system based on 5G base station
CN110443789B (en) * 2019-08-01 2021-11-26 四川大学华西医院 Method for establishing and using immune fixed electrophoretogram automatic identification model
CN110487211B (en) * 2019-09-29 2020-07-24 中国科学院长春光学精密机械与物理研究所 Aspheric element surface shape detection method, device and equipment and readable storage medium
CN112306104A (en) * 2020-11-17 2021-02-02 广西电网有限责任公司 Image target tracking holder control method based on grid weighting
CN112911171B (en) * 2021-02-04 2022-04-22 上海航天控制技术研究所 Intelligent photoelectric information processing system and method based on accelerated processing
CN115482417B (en) * 2022-09-29 2023-08-08 珠海视熙科技有限公司 Multi-target detection model, training method, device, medium and equipment thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503723A (en) * 2015-09-06 2017-03-15 华为技术有限公司 A kind of video classification methods and device
CN106846364A (en) * 2016-12-30 2017-06-13 明见(厦门)技术有限公司 A kind of method for tracking target and device based on convolutional neural networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682697B (en) * 2016-12-29 2020-04-14 华中科技大学 End-to-end object detection method based on convolutional neural network
CN106911930A (en) * 2017-03-03 2017-06-30 深圳市唯特视科技有限公司 It is a kind of that the method for perceiving video reconstruction is compressed based on recursive convolution neutral net

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503723A (en) * 2015-09-06 2017-03-15 华为技术有限公司 A kind of video classification methods and device
CN106846364A (en) * 2016-12-30 2017-06-13 明见(厦门)技术有限公司 A kind of method for tracking target and device based on convolutional neural networks

Also Published As

Publication number Publication date
CN107808122A (en) 2018-03-16

Similar Documents

Publication Publication Date Title
CN107808122B (en) Target tracking method and device
Li et al. Adaptively constrained dynamic time warping for time series classification and clustering
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN107169463B (en) Method for detecting human face, device, computer equipment and storage medium
CN107220618B (en) Face detection method and device, computer readable storage medium and equipment
CN108470354A (en) Video target tracking method, device and realization device
CN113240936B (en) Parking area recommendation method and device, electronic equipment and medium
CN110348437B (en) Target detection method based on weak supervised learning and occlusion perception
CN109272016A (en) Object detection method, device, terminal device and computer readable storage medium
CN109740416B (en) Target tracking method and related product
Xu et al. Stochastic Online Anomaly Analysis for Streaming Time Series.
CN110796141A (en) Target detection method and related equipment
CN111553488A (en) Risk recognition model training method and system for user behaviors
CN113239914B (en) Classroom student expression recognition and classroom state evaluation method and device
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
CN111008631A (en) Image association method and device, storage medium and electronic device
CN113065593A (en) Model training method and device, computer equipment and storage medium
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
CN115719294A (en) Indoor pedestrian flow evacuation control method and system, electronic device and medium
CN109919043B (en) Pedestrian tracking method, device and equipment
CN113296089A (en) LMB density fusion method and device for multi-early-warning-machine target tracking system
CN113065379B (en) Image detection method and device integrating image quality and electronic equipment
CN110147768B (en) Target tracking method and device
CN115346125B (en) Target detection method based on deep learning
CN108133234B (en) Sparse subset selection algorithm-based community detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant