CN115294490A - Dynamic multi-target identification method under intermittent shielding - Google Patents

Dynamic multi-target identification method under intermittent shielding Download PDF

Info

Publication number
CN115294490A
CN115294490A CN202210782158.3A CN202210782158A CN115294490A CN 115294490 A CN115294490 A CN 115294490A CN 202210782158 A CN202210782158 A CN 202210782158A CN 115294490 A CN115294490 A CN 115294490A
Authority
CN
China
Prior art keywords
target
frame
network
representing
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210782158.3A
Other languages
Chinese (zh)
Inventor
曹政才
李俊年
张东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202210782158.3A priority Critical patent/CN115294490A/en
Publication of CN115294490A publication Critical patent/CN115294490A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dynamic multi-target identification method under intermittent shielding. Firstly, continuous image sequences and video data sets for target recognition with intermittent shielding influence are collected and manufactured and are used for training and testing a dynamic multi-target recognition algorithm. Secondly, a target identification module based on a central network is designed, and the bounding box, the motion offset and the hot spot map feature information of the target in two adjacent frames of images of the input video are obtained. And thirdly, updating the feature information of the target in the current frame by adopting a feature extraction module based on a gated cyclic unit network, and acquiring the motion track of the target by combining the feature information of the target in the previous frame. And finally, matching the targets and the motion tracks thereof by adopting a target track matching module based on the Hungarian algorithm, and allocating a corresponding ID value to each target so as to realize dynamic multi-target identification. The method can be applied to the dynamic multi-target identification problem under the intermittent shielding condition.

Description

Dynamic multi-target identification method under intermittent shielding
Technical Field
The invention relates to the field of image processing, in particular to a dynamic multi-target identification method under intermittent shielding.
Background
The multi-target identification is a basic research problem in computer vision, is being widely applied to various fields such as intelligent monitoring, industrial detection, human-computer interaction and the like, has important research and application values, but still faces various challenges in complex scenes such as multiple targets, mutual shielding, obvious change of ambient illumination, background interference and the like.
In recent years, the deep learning algorithm further promotes the improvement of the environment perception performance, and even reaches or exceeds the accuracy of human recognition on the tasks such as target classification, and therefore, a plurality of target recognition methods based on deep learning are proposed. Compared with the traditional method, the target identification method based on deep learning is concerned by extensive researchers due to stronger generalization and robustness.
The mutual shielding of targets is very easy to occur in the process of dynamic multi-target identification, the traditional method improves the influence of low confidence coefficient of single target identification by combining target characteristics of front and back frames in an input video on the basis of single-frame image target detection, and further improves the identification precision of the targets, but the method can not well acquire the characteristic information of the targets between adjacent image frames, so that the targets and the motion tracks thereof can not be effectively matched, and the effect on the problem of target identification shielded for a long time is poor.
In recent years, a series of multi-target recognition methods based on deep learning have been proposed, for example, zhou et al propose to use a central network (centrnet) for dynamic multi-target recognition in the European Conference on Computer Vision, and the method achieves higher accuracy of dynamic target recognition on the basis of more simplified network structure by extracting feature information of a boundary frame, a hot spot map and a motion offset of a target in two frames before and after an input video and matching the target and a motion track thereof. Experiments prove that the method can obtain more excellent dynamic target recognition performance, and the deep learning algorithm is proved to have excellent performance on dynamic target recognition. However, the method cannot well extract the target characteristic information in the input video multi-frame image, and is difficult to accurately identify the dynamic target affected by intermittent shielding. Through related technology retrieval, a dynamic multi-target identification method based on deep learning under the influence of intermittent shielding of a target is not found at present.
Disclosure of Invention
In order to solve the problem that a dynamic multi-target identification method based on a central network cannot solve the dynamic multi-target identification problem under intermittent shielding, the invention provides a novel method for identifying the dynamic multi-target under the intermittent shielding.
The invention provides a dynamic multi-target identification method under intermittent shielding, which comprises two stages of training and testing, wherein,
the training stage is realized by a target identification module based on a central network, a feature extraction module based on a gated cycle unit network and a target track matching module based on a Hungarian algorithm. The target identification module based on the central network comprises a reference network and a feature extraction module, wherein the reference network adopts a DLA-34 network structure, and the feature extraction module adopts the central network.
After the testing stage and the training stage are finished, the video to be recognized is input into a target recognition module based on a central network, and after processing, the target and the motion trail thereof are matched through a target trail matching module based on the Hungarian algorithm, so that the dynamic multi-target recognition can be realized. The method comprises the following steps:
step 1: training data set preparation.
Step 2: inputting training video data with total length of T frames into a reference network, and acquiring adjacent frame sequence (x) marked with target type and bounding box information through the reference network t-1 ,x t ) T =2, \ 8230;, T, where x t And the image of the t-th frame marked with the target type and the boundary frame information is represented, the target type information is used for identifying the target, the type information of the same type of target is the same, and the boundary frame information is used for calculating the position of the central point of the target.
And step 3: sequence of adjacent frames (x) to be acquired by reference network t-1 ,x t ) An input feature extraction module for predicting x t+1 The motion offset of all target center points on the frame sequence is used for calculating a motion track, and the hot spot graph is used for judging whether the marked target type is quasi-standard or notAnd (8) determining.
And 4, step 4: calculating a loss function, wherein the overall loss function expression is updated as follows:
Figure BDA0003726444370000031
where T denotes the sequence length of the input video, λ f ,λ s And λ O For the hyper-parameters, the weight of each branch in the overall loss function is defined, λ f =1,λ s =0.1,λ O =0.1;
L f And (3) representing the predicted loss of the target heat point diagram, specifically adopting a loss function based on local loss, wherein the expression is as follows:
Figure BDA0003726444370000032
wherein, Y cab A hot-spot map true value representing the target belonging to class c at position (a, b) of the t-th frame, which is directly obtained from the input video,
Figure BDA0003726444370000033
the representation is composed of a sequence of adjacent frames (x) t-1 Xt) predicting the hot spot of the target at the corresponding position in the t frame, N represents the number of the targets in the t frame image, alpha and beta are hyper-parameters,
Figure BDA0003726444370000034
representing all kinds of objects at all positions of the image of the t-th frame.
L s And representing the predicted loss of the target bounding box position, and adopting a loss function based on the L1 loss, wherein the expression is as follows:
Figure BDA0003726444370000035
wherein,
Figure BDA0003726444370000036
the position of the bounding box representing the position of the ith target center point on the t frame image predicted by the reference network,
Figure BDA0003726444370000037
and a real value of the position of the boundary box representing the position of the ith target center point on the t frame image is directly obtained from the input video.
L o And (3) representing the prediction loss of the target motion offset, and adopting a loss function based on a regression function, wherein the expression is as follows:
Figure BDA0003726444370000038
wherein
Figure BDA0003726444370000041
Each bounding box represented on the t-th frame image
Figure BDA0003726444370000042
The motion trajectory of the upper object i is,
Figure BDA0003726444370000043
representing the real value of the central point position of the target i on the detected t frame and t-1 frame images, wherein the real value is directly obtained by an input video;
Figure BDA0003726444370000044
representing target feature information M based on update t Using a positive alignment head P in the input t-th frame image t The supervised learning is carried out on the position of the center point of the target,
Figure BDA0003726444370000045
representing target feature information M based on update t Using a negative alignment head V on the input tth frame image t And carrying out supervised learning on the position of the target central point. Wherein M is t Based on gated cyclic unitsThe characteristic extraction module of the network is completed by respectively calculating the updated door parameter z in the gating cycle unit network corresponding to the t-th frame t Hidden layer recursive representation parameter
Figure BDA0003726444370000046
And resetting the gate parameter r t Further obtaining the updated t frame image target characteristic information M t The target characteristic information comprises the size of a target boundary frame, a motion offset and a heat point diagram, and the specific calculation formula is as follows:
Figure BDA0003726444370000047
wherein z is t Represents the updated door parameter, M t-1 Representing the updated target bounding box size, the motion offset and the hot spot diagram of the t-1 st frame, wherein M is the image of the input 1 st frame 1 =0,
Figure BDA0003726444370000048
Representing hidden layer recursive representation parameters, updating gate parameters z t The calculation expression of (a) is as follows:
z t =δ(W z F t +U z M t-1 +b z )
where δ () represents the Logistic function with an output interval of (0, 1), W Z 、U z And b z Is a network parameter that the gated cyclic unit network updates the gate can learn, F t Representing the size of the target bounding box, the motion offset and the hot spot map of the t frame before updating, F 1 、F 2 The values of the three-dimensional image are respectively the size of a target boundary frame on the image of the 1 st frame and the image of the 2 nd frame, the motion offset and the real value of the hot spot diagram, the real value is directly obtained by the input video, and a predicted value is given from the third frame.
Hidden layer recursive representation parameters
Figure BDA0003726444370000049
The calculation expression of (a) is as follows:
Figure BDA00037264443700000410
wherein [ ] indicates a convolution operation, W M 、U M And b M Is a network parameter which can be learnt by a hidden layer of the gating cycle unit network.
r t To reset the gate parameter, the computational expression is:
r t =δ(W r F t +U r M t-1 +b r )
W r 、U r and b r Is a network parameter that the gated cycle cell network reset gate can learn.
And 5: based on the intersection ratio of the true value and the predicted value of the target motion trail obtained by the target motion offset, matching a plurality of targets and motion trails which still appear after shielding by adopting a Hungarian algorithm based on the intersection ratio, wherein the calculation process of the intersection ratio is as follows:
definition G = { G 1 ,…,g T The real value of the target motion track is directly obtained from the input video, and D = { D = is defined 1 ,…,d T The predicted value of the target motion track is expressed as the following relation function:
Figure BDA0003726444370000051
and for the targets which do not appear any more from the t-th frame, matching the targets with the predicted track by adopting a pedestrian re-recognition method, and traversing the training video to finish training by analogy.
After training is finished, the video to be recognized is input into a target recognition module based on a central network, and after processing, the target and the motion trail of the target are matched through a target trail matching module based on the Hungarian algorithm, so that dynamic multi-target recognition can be realized.
Has the advantages that:
the method has the advantages that the method can accurately identify the dynamic multiple targets influenced by intermittent shielding (less than or equal to 30 FPS), has higher robustness, can identify the dynamic multiple targets influenced by the intermittent shielding in different scenes (campus, market, street and the like), and has the identification accuracy rate of more than or equal to 70%.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention.
Fig. 1 is a flow chart of a dynamic multi-target identification method in the present invention.
Fig. 2 is a schematic diagram of a dynamic multi-target identification method in the present invention.
Fig. 3 is a schematic diagram of a feature extraction module based on a central network in the present invention.
FIG. 4 is a diagram illustrating the effect of the present invention on the constructed data set for dynamic multi-target recognition affected by intermittent occlusion.
Detailed Description
For a better understanding of the technical solutions of the present invention, the following further describes embodiments of the present invention with reference to the accompanying drawings and specific examples. It is noted that the aspects described below in connection with the figures and the specific embodiments are only illustrative and should not be construed as imposing any limitation on the scope of the present invention.
A flow chart of a dynamic multi-target identification method under the influence of intermittent shielding is shown in figure 1, the schematic diagram of the method is shown in figure 2, the method comprises two stages of training and testing, the training stage is realized by a target identification module based on a central network, a feature extraction module based on a gated cyclic unit network and a target track matching module based on a Hungarian algorithm, and the method specifically comprises the following steps:
step 1: the method comprises the steps of collecting continuous image sequences and video data used for dynamic multi-target recognition in a plurality of different scenes (such as schools, shopping malls, streets and the like) by adopting a Logitech C920 camera, selecting the continuous image sequences and the video data influenced by intermittent shielding, and finishing data set manufacturing, wherein the data set comprises a training set, a verification set and a test set.
Step 2: object identification module based on central networkIn the method, a DLA-34 network structure is adopted as a reference network, and the network comprises an encoder-decoder structure, so that the accuracy of target identification can be improved. Inputting training image data of total T frames into a reference network, and acquiring adjacent frame sequence (x) marked with target type and bounding box information thereof through the reference network t-1 ,x t ) T =2, \ 8230;, T, where x t The method comprises the steps that a t frame image marked with target types and boundary frame information is represented, the target type information is used for identifying targets, the type information of the same type of targets is the same, and the boundary frame information is used for calculating the position of a target center point;
and 3, step 3: in the target identification module based on the central network, the feature extraction module adopts the central network to extract the adjacent frame sequences (x) acquired by the reference network t-1 ,x t ) Input feature extraction module predicting x t+1 The motion offset of all target central points in the frame sequence is used for calculating a motion track, and the hot spot graph is used for judging whether the marked target type is accurate or not. The module can accelerate the extraction speed of target features, reduce network parameters and improve the running speed of the algorithm, and a schematic diagram of the module is shown in FIG. 3.
And 4, step 4: calculating a loss function, wherein the overall loss function expression is as follows:
Figure BDA0003726444370000071
where T denotes the sequence length of the input video, λ f ,λ s And λ O For the hyper-parameters, the weight of each branch in the overall loss function is defined, λ f =1,λ S =0.1,λ O =0.1。
L f And (3) representing the predicted loss of the target heat point diagram, specifically adopting a loss function based on local loss, wherein the expression is as follows:
Figure BDA0003726444370000072
wherein Y is cab A hot-spot diagram true value representing the target belonging to the category c at the position (a, b) of the t-th frame, the true value being directly obtained from the input video,
Figure BDA0003726444370000073
the representation is composed of a sequence of adjacent frames (x) t-1 ,x t ) Predicting a hot spot diagram of the target at the corresponding position in the t-th frame, wherein N represents the number of the targets in the t-th frame of the training data, alpha and beta are hyper-parameters,
Figure BDA0003726444370000074
representing the traversal of all kinds of objects at all positions in the t-th frame image.
L s And representing the predicted loss of the target bounding box position, and adopting a loss function based on the L1 loss, wherein the expression is as follows:
Figure BDA0003726444370000075
wherein,
Figure BDA0003726444370000076
the position of the bounding box representing the position of the ith target center point on the t frame image predicted by the reference network,
Figure BDA0003726444370000077
and a real value of the position of the boundary box representing the position of the ith target center point on the t frame image is directly obtained from the input video.
L o And representing the prediction loss of the target motion offset, and adopting a loss function based on a regression function, wherein the expression is as follows:
Figure BDA0003726444370000081
wherein
Figure BDA0003726444370000082
Representing images in the t-th frameEach bounding box
Figure BDA0003726444370000083
The motion trajectory of the upper object i is,
Figure BDA0003726444370000084
representing the real value of the central point position of the target i on the t frame image and the t-1 frame image, wherein the real value is directly obtained by an input video;
Figure BDA0003726444370000085
representing target feature information M based on update t Using a positive alignment head P in the input t-th frame image t Supervised learning of the target midpoint location is performed,
Figure BDA0003726444370000086
representing target feature information M based on update t Using a negative alignment head V on the input t-th frame image t And carrying out supervised learning on the position of the target central point. Wherein M is t The updating process is completed by a characteristic extraction module based on the gated cyclic unit network, and the updated gate parameters z in the gated cyclic unit network corresponding to the t-th frame are respectively calculated t Hidden layer recursive representation parameter
Figure BDA0003726444370000087
And resetting the gate parameter r t Further obtain the updated t frame target characteristic information M t The target characteristic information comprises the size of a target boundary box, a motion offset and a heat point diagram, and the specific calculation formula is as follows:
Figure BDA0003726444370000088
wherein z is t Indicating updated door parameters, M t-1 Representing the updated target bounding box size, the motion offset and the hot spot diagram of the t-1 st frame, wherein M is the image of the input 1 st frame 1 =0,
Figure BDA0003726444370000089
Representing hidden layer recursive representation parameters, updating gate parameters z t The calculation expression of (a) is as follows:
z t =δ(W z F t +U z M t-1 +b z )
wherein W Z 、U z And b z Is a gate-controlled cycle unit network update gate-learnable network parameter, F t Representing the size of the target bounding box, the motion offset and the hot spot diagram of the t frame before updating, F 1 、F 2 The values of the three-dimensional image are respectively the size of a target boundary frame on the image of the 1 st frame and the image of the 2 nd frame, the motion offset and the real value of the hot spot diagram, the real value is directly obtained by the input video, and a predicted value is given from the third frame.
Hidden layer recursive representation parameters
Figure BDA00037264443700000810
The calculation expression of (a) is as follows:
Figure BDA00037264443700000811
wherein [ ] indicates a convolution operation, W M 、U M And b M Is a network parameter that can be learned by a hidden layer of the gated loop unit network.
r t To reset the gate parameter, the calculation expression is:
r t =δ(W r F t +U r M t-1 +b r )
W r 、U r and b r Is a network parameter that can be learned by the gated loop unit network reset gate.
And 5: based on the intersection and combination ratio of the real value and the predicted value of the target motion track obtained by the target motion offset, matching a plurality of predicted targets with corresponding tracks by adopting Hungarian algorithm, and matching targets which still appear after the t-th frame is shielded based on the intersection and combination ratio, wherein the calculation process of the intersection and combination ratio is as follows:
definition G = { G 1 ,…,g T The real value of the target motion track is directly obtained from the input video, and D = { D = is defined 1 ,…,d T The predicted value of the target motion track is expressed as the following relation function:
Figure BDA0003726444370000091
when the temperature is higher than the set temperature
Figure BDA0003726444370000092
And matching the target and the corresponding track. And for the target which does not appear any more from the t-th frame, matching the target with the predicted track by adopting a pedestrian re-recognition method, and repeating the training video to finish the training.
After training is finished, the video to be recognized is input into a target recognition module based on a central network, and after processing, the target and the motion trail of the target are matched through a target trail matching module based on the Hungarian algorithm, so that dynamic multi-target recognition can be realized.
The dynamic multi-target recognition algorithm is trained by using an open data set MOT17, KITTI and COCO, an Intel (R) Xeon (R) CPU E5-2620 v4@2.10GHz x 16, a 4-core NVIDIA GeForce GTX 2080 display card is carried by the server, the memory is 128GB, the operating system is Ubuntu 16.04, and the dynamic multi-target recognition algorithm is realized by adopting a Python 3.6 programming language and a Pythrch deep learning framework. The training hyper-parameters are: batchsize =2, epochs =100, iterations =1600, optimizer = adam. The characteristic dimension of the gated round-robin cell network is set to 256, and a 3 x 3 filter is included, and the result of the algorithm operation is shown in fig. 4.
The method has the advantages that the dynamic multi-target influenced by intermittent shielding (less than or equal to 30 FPS) can be accurately identified, the robustness is high, the dynamic multi-target influenced by the intermittent shielding in different scenes (such as campuses, shopping malls, streets and the like) can be identified, and the identification accuracy rate is more than or equal to 70%.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in one or more steps, occur in different orders, and the principles of acts may be understood and appreciated by those skilled in the art.
Although illustrative embodiments of the present invention have been described in some detail so that those skilled in the art will appreciate, it is not limited thereto but is capable of making various changes and modifications within the spirit and scope of the invention as defined and limited by the appended claims.

Claims (3)

1. A dynamic multi-target identification method under intermittent shielding is characterized in that: the method comprises two stages of training and testing, wherein,
the training stage is realized by a target identification module based on a central network, a feature extraction module based on a gated cyclic unit network and a target track matching module based on a Hungarian algorithm;
and (3) a testing stage: after training is finished, the video to be recognized is input into a target recognition module based on a central network, and after processing, the targets and the motion tracks thereof are matched through a target track matching module based on the Hungarian algorithm, so that dynamic multi-target recognition can be realized.
2. The dynamic multi-target identification method under intermittent shielding according to claim 1, characterized in that: the target identification module based on the central network comprises a reference network and a feature extraction module, wherein the reference network adopts a DLA-34 network structure, and the feature extraction module adopts the central network.
3. The method for dynamic multi-target recognition under intermittent shielding according to claim 1, wherein the training process comprises the following steps:
step 1, training data set preparation.
Step 2: total length of the tubeInputting training video data of T frames into a reference network, and acquiring adjacent frame sequences (x) marked with target types and bounding box information thereof through the reference network t-1 ,x t ) T =2, \8230, T, where x t And the t frame image marked with the target type and the boundary frame information thereof is represented, the target type information is used for identifying the target, and the boundary frame information is used for calculating the position of the central point of the target.
And step 3: sequence of adjacent frames (x) to be acquired by reference network t-1 ,x t ) An input feature extraction module for predicting x t+1 The motion offset of the center point of all the targets in the frame sequence is used for calculating a motion track, and the hot spot graph is used for judging whether the marked target type is accurate or not.
And 4, step 4: calculating a loss function, wherein the overall loss function expression is updated as follows:
Figure FDA0003726444360000021
where T denotes the sequence length of the input video, λ f ,λ s And λ O For the hyper-parameter, the weight of each branch in the overall loss function is respectively defined,
L f and (3) representing the predicted loss of the target heat point diagram, specifically adopting a loss function based on local loss, wherein the expression is as follows: )
Figure FDA0003726444360000022
Wherein, Y cab Represents the corresponding value of the target belonging to the category c on the real hotspot graph at the position (a, b) of the t-th frame,
Figure FDA0003726444360000023
the representation is composed of a sequence of adjacent frames (x) t-1 ,x t ) Predicting a hot spot diagram of a target at a corresponding position in the tth frame, wherein N represents the tth frame of training dataAlpha and beta are hyper-parameters,
Figure FDA0003726444360000024
representing all kinds of targets at all positions in the t-th frame image;
L s and (3) representing the predicted loss of the target bounding box position, and adopting a loss function based on the L1 loss, wherein the expression is as follows:
Figure FDA0003726444360000025
wherein,
Figure FDA0003726444360000026
the position of a bounding box representing the position of the ith target center point on the tth frame image predicted by the reference network,
Figure FDA0003726444360000027
the real value of the position of the boundary box representing the position of the ith target center point on the tth frame image;
L o and representing the prediction loss of the target motion offset, and adopting a loss function based on a regression function, wherein the expression is as follows:
Figure FDA0003726444360000028
wherein
Figure FDA0003726444360000029
Each bounding box representing the image in the t-th frame
Figure FDA00037264443600000210
The motion trajectory of the upper object i is,
Figure FDA0003726444360000031
graph showing the t-th frame and the t-1 th frameThe real value of the center point position of the target i on the image;
Figure FDA0003726444360000032
representing target feature information M based on update t Using a positive alignment head P in the input t-th frame image t The supervised learning is carried out on the position of the center point of the target,
Figure FDA0003726444360000033
representing target feature information M based on update t Using a negative alignment head V on the input tth frame image t And carrying out supervised learning on the position of the center point of the target. Wherein, M t The updating process is completed by a characteristic extraction module based on the gated cyclic unit network, and the updated gate parameters z in the gated cyclic unit network corresponding to the t-th frame are respectively calculated t Hidden layer recursive representation parameter
Figure FDA0003726444360000034
And resetting the gate parameter r t Further obtaining the target characteristic information M on the updated t frame image t The target characteristic information comprises the size of a target boundary frame, a motion offset and a heat point diagram, and the specific calculation formula is as follows:
Figure FDA0003726444360000035
wherein z is t Indicating updated door parameters, M t-1 Showing the size of the target bounding box, the motion offset and the hot spot diagram after updating on the t-1 frame,
Figure FDA0003726444360000036
representing hidden layer recursive representation parameters, updating gate parameters z t The calculation expression of (a) is as follows:
z t =δ(W z F t +U z M t-1 +b z )
where δ () represents the Logistic function with an output interval of (0,) 1), W z 、U z And b z Is a gate-controlled cycle unit network update gate-learnable network parameter, F t Representing the size of a target boundary box, the motion offset and a heat point diagram on the t frame before updating; )
Hidden layer recursive representation parameters
Figure FDA0003726444360000037
The calculation expression of (c) is as follows:
Figure FDA0003726444360000038
wherein [ ] indicates a convolution operation, W M 、U M And b M Is a network parameter which can be learnt by a hidden layer of the gating cycle unit network.
r t To reset the gate parameter, the computational expression is:
r t =δ(W r F t +U r M t-1 +b r )
W r 、U r and b r Is a network parameter that can be learned by the gated loop unit network reset gate.
And 5: based on the intersection and combination ratio of the true value and the predicted value of the target motion trajectory obtained by the target motion offset, matching a plurality of predicted targets with corresponding trajectories by adopting a Hungarian algorithm, and matching targets which appear after the t-th frame is shielded based on the intersection and combination ratio, wherein the calculation process of the intersection and combination ratio is as follows:
definition G = { G 1 ,…,g T D = { D } is the true value of the target motion trajectory 1 ,…,d T The predicted value of the target motion track is expressed as the following relation function:
Figure FDA0003726444360000041
and for the targets which do not appear any more from the t-th frame, matching the targets with the predicted track by adopting a pedestrian re-recognition method, and traversing the training video to finish training by analogy.
After training is finished, the video to be recognized is input into a target recognition module based on a central network, and after processing, the target and the motion trail of the target are matched through a target trail matching module based on the Hungarian algorithm, so that dynamic multi-target recognition can be realized.
CN202210782158.3A 2022-07-01 2022-07-01 Dynamic multi-target identification method under intermittent shielding Pending CN115294490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210782158.3A CN115294490A (en) 2022-07-01 2022-07-01 Dynamic multi-target identification method under intermittent shielding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210782158.3A CN115294490A (en) 2022-07-01 2022-07-01 Dynamic multi-target identification method under intermittent shielding

Publications (1)

Publication Number Publication Date
CN115294490A true CN115294490A (en) 2022-11-04

Family

ID=83821460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210782158.3A Pending CN115294490A (en) 2022-07-01 2022-07-01 Dynamic multi-target identification method under intermittent shielding

Country Status (1)

Country Link
CN (1) CN115294490A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726821A (en) * 2024-02-05 2024-03-19 武汉理工大学 Medical behavior identification method for region shielding in medical video

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726821A (en) * 2024-02-05 2024-03-19 武汉理工大学 Medical behavior identification method for region shielding in medical video
CN117726821B (en) * 2024-02-05 2024-05-10 武汉理工大学 Medical behavior identification method for region shielding in medical video

Similar Documents

Publication Publication Date Title
CN110070074B (en) Method for constructing pedestrian detection model
Feichtenhofer et al. Detect to track and track to detect
CN109816689B (en) Moving target tracking method based on adaptive fusion of multilayer convolution characteristics
CN111476302B (en) fast-RCNN target object detection method based on deep reinforcement learning
CN112836640B (en) Single-camera multi-target pedestrian tracking method
Xu et al. Deepmot: A differentiable framework for training multiple object trackers
CN112884742B (en) Multi-target real-time detection, identification and tracking method based on multi-algorithm fusion
CN111582349B (en) Improved target tracking algorithm based on YOLOv3 and kernel correlation filtering
CN112489081B (en) Visual target tracking method and device
CN112085765B (en) Video target tracking method combining particle filtering and metric learning
CN111476161A (en) Somatosensory dynamic gesture recognition method fusing image and physiological signal dual channels
CN113361370B (en) Abnormal behavior detection method based on deep learning
CN116402850A (en) Multi-target tracking method for intelligent driving
Xiao et al. MeMu: Metric correlation Siamese network and multi-class negative sampling for visual tracking
CN112634368A (en) Method and device for generating space and OR graph model of scene target and electronic equipment
CN111340842A (en) Correlation filtering target tracking algorithm based on joint model
WO2024093209A1 (en) Method for dynamic target tracking by legged robot
CN113256690A (en) Pedestrian multi-target tracking method based on video monitoring
Chebli et al. Pedestrian detection based on background compensation with block-matching algorithm
Lee et al. Online multiple object tracking using rule distillated siamese random forest
Shen et al. Infrared multi-pedestrian tracking in vertical view via siamese convolution network
CN116311063A (en) Personnel fine granularity tracking method and system based on face recognition under monitoring video
CN112288778A (en) Infrared small target detection method based on multi-frame regression depth network
CN115294490A (en) Dynamic multi-target identification method under intermittent shielding
Abdullah et al. Vehicle counting using deep learning models: a comparative study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination