CN111833380A - Multi-view image fusion space target tracking system and method - Google Patents

Multi-view image fusion space target tracking system and method Download PDF

Info

Publication number
CN111833380A
CN111833380A CN202010977186.1A CN202010977186A CN111833380A CN 111833380 A CN111833380 A CN 111833380A CN 202010977186 A CN202010977186 A CN 202010977186A CN 111833380 A CN111833380 A CN 111833380A
Authority
CN
China
Prior art keywords
target
tracking
class
clustering
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010977186.1A
Other languages
Chinese (zh)
Other versions
CN111833380B (en
Inventor
姜益民
晏世武
洪勇
罗书培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Optics Valley Information Technology Co ltd
Original Assignee
Wuhan Optics Valley Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Optics Valley Information Technology Co ltd filed Critical Wuhan Optics Valley Information Technology Co ltd
Priority to CN202010977186.1A priority Critical patent/CN111833380B/en
Publication of CN111833380A publication Critical patent/CN111833380A/en
Application granted granted Critical
Publication of CN111833380B publication Critical patent/CN111833380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to a space target tracking system and a method of multi-view image fusion.A camera shooting unit comprises at least two cameras symmetrically arranged at the periphery of a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain multi-shot frame images, and the multi-shot frame images are at least two image frame images of the same scene at the same time shot by the at least two cameras; the target frame extraction module marks targets detected from the multiple shot frame images in a target frame mode; the spatial feature extraction module extracts feature information in each target frame, and clusters each feature information to obtain each clustering feature; the tracker module tracks the target according to the clustering characteristics; the method solves the problem of identification error caused by shielding, can greatly reduce the disordered situation of target tracking, and is suitable for target tracking of a single scene with high requirement on target tracking accuracy.

Description

Multi-view image fusion space target tracking system and method
Technical Field
The invention relates to the field of target tracking, in particular to a multi-view image fused space target tracking system and method.
Background
With the new requirements of people on security, especially in the field of target tracking, people hope to monitor the action of a target through video streaming, the high efficiency performance of the deep learning technology on image feature extraction, and with the continuous improvement of the graphic computing capability of the GPU, the framework of image classification and target detection based on the deep learning technology is further developed, and the relevant traditional method is rapidly replaced.
However, the current target tracking technology is mainly based on the traditional image retrieval mode, that is, the similar distance between images is directly calculated, whether the images belong to the same object is judged through the similar distance, and if the images are blocked, the matching of the images is likely to be unsuccessful, the performance of the mode is poor.
The effective characteristics of the target which is greatly stored are extracted from the image characteristic information based on the convolutional neural network, and the similar distance between the images is calculated, so that the performance of target tracking is greatly improved, but the frame needs to mark the target frame of the target in the video in advance, and the purpose of practicability cannot be achieved.
Disclosure of Invention
The invention provides a space target tracking method based on multi-view image fusion, aiming at the technical problems in the prior art, and solving the problem of poor space target tracking performance in the prior art.
The technical scheme for solving the technical problems is as follows: a multi-view image fused space target tracking system comprises: the system comprises a camera shooting unit, a target frame extraction module, a spatial feature extraction module and a tracker module;
the camera shooting unit comprises at least two cameras symmetrically arranged around a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain multiple shooting frame images, and the multiple shooting frame images are at least two image frame images of the same scene shot by the at least two cameras at the same time;
the target frame extraction module marks targets detected from the multiple shot same frame images in a target frame mode;
the spatial feature extraction module extracts feature information in each target frame, and clusters each feature information to obtain each clustering feature;
and the tracker module carries out target tracking according to the clustering characteristics.
A tracking method of a multi-view image fusion space target tracking system comprises the following steps:
step 1, at least two cameras are symmetrically arranged around a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain multiple shooting frame images, and the multiple shooting frame images are at least two image frame images of the same scene shot by the at least two cameras at the same time;
step 2, labeling the targets detected from the multiple shot same frame images in a target frame mode;
step 3, extracting characteristic information in each target frame, and clustering each characteristic information to obtain each clustering characteristic;
and 4, tracking the target according to the clustering characteristics.
The invention has the beneficial effects that: the invention provides a multi-view image fused space target tracking system and a method, which consider the situation that in the prior art, once shielding occurs between targets in the process of tracking the space target, the situation of disordered identification is very easy to occur by calculating the similar distance of the characteristics between images, and aiming at special scenes, such as rectangular spaces of sports fields and the like and places with higher requirements on target tracking accuracy, at least two cameras are symmetrically arranged around the scene and shoot the scene in an inclined downward posture, so that the problem of identification errors caused by shielding is solved; the spatial feature extraction module can extract overlook spatial features according to specific target frames by combining extracted features of all visual angles, perform clustering, greatly reduce the scene of disordered target tracking according to the uniqueness of the position of a target in the space, and is suitable for target tracking of a single scene with high requirement on target tracking accuracy.
On the basis of the technical scheme, the invention can be further improved as follows.
Furthermore, the scene is a rectangular space, the number of the cameras is four, the cameras are arranged on four corners of the rectangular scene, and the cameras face to the center of the scene simultaneously.
Further, the target frame extraction module predicts the target frame based on the YoloV3 detection network.
Further, the spatial feature extraction module comprises a convolutional neural network module and a spatial clustering module;
the convolutional neural network extracts the characteristic information of each target frame, and the characteristic information is spliced to form a spatial information characteristic matrix;
the spatial clustering module clusters the spatial information characteristic matrix to obtain clustering centers of each category, and outputs a clustering center matrix formed by the clustering centers to the tracker module.
Further, the tracker module includes respective tracker classes corresponding to respective ones of the cluster centers;
when the continuous occurrence time of the target exceeds a set target occurrence time threshold, initializing the corresponding tracker class;
and when the target loss time exceeds a set target loss time threshold, discarding the corresponding tracker class.
Further, the tracker class includes a tracking target class attribute, a live index attribute, a true live index, and a global live index attribute;
the tracking target class attribute is used for storing the tracked target;
the survival index attribute is used for recording the index of the tracked target;
the real survival index is used for recording the tracking target class which is not influenced by the time threshold value and is not removed;
the global live index attribute is used for recording the live tracking target class index.
Further, the initialization process of the tracker class includes:
initializing a global category serial number, wherein the size of the global category serial number is related to the category number of the current clustering center; and classifying the categories in the multi-shot same-frame images in an index tracking index mode.
Further, the maintenance process of the tracker class in the tracking process includes:
calculating the similar distance between each cluster center in the current cluster center matrix and each cluster center in the last cluster center matrix, judging whether each category in the current cluster center matrix exists or not by adopting an optimal distribution principle, if so, matching with the last category, and tracking; otherwise, a non-existent class is added to the tracked class for storage.
The beneficial effect of adopting the further scheme is that: the method comprises the steps of fusing a YoloV3 detection network which is better in performance in the deep learning based process at present, fully automatically detecting a target of a video frame, marking the target in the form of a target frame, combining the target frame detected by the target detection network with a global image information extraction network to obtain a feature map only containing the target frame, and removing most useless feature information; setting a time threshold value, and removing targets which do not appear in a period of time from the tracking target class, so that the operation also ensures that the number of the target tracking classes in the cache is within the safe range of the memory and the phenomenon of cache explosion is avoided; the method is characterized in that real-time target tracking is achieved based on a deep learning technology, high-efficiency target detection efficiency of YoloV3, strong capability of an image space feature extraction network and related judgment rules of the same object are combined, real-time target tracking without manual labeling is achieved through multiple cameras in the same scene, and the effect of the method meets the requirement of practicability.
Drawings
Fig. 1 is a block diagram of a multi-view image fusion space target tracking system according to the present invention;
fig. 2 is a diagram illustrating the detection effect of YoloV3 target in a multi-shot frame image according to an embodiment of the present invention;
fig. 3 is a flowchart of spatial feature extraction according to an embodiment of the present invention;
fig. 4 is a block diagram illustrating a method for tracking a spatial target by multi-view image fusion according to an embodiment of the present invention; .
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a block diagram of a multi-view image fusion space target tracking system according to the present invention, and as shown in fig. 1, the system includes: the system comprises a camera shooting unit, a target frame extraction module, a spatial feature extraction module and a tracker module.
The camera shooting unit comprises at least two cameras symmetrically arranged around a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain a multi-shooting frame image, and the multi-shooting frame image is at least two image frame images of the same scene shot by the at least two cameras at the same time; the camera takes an image in a downward inclined posture, that is, the image taken by the camera is an image taken at a top view angle.
The target frame extraction module marks targets detected from each of the multiple shot frame images in the form of target frames. The target frame detected by the target detection network is combined with the global image information extraction network to obtain a feature map only containing the target frame, and most useless feature information is removed.
And the spatial feature extraction module extracts feature information in each target frame and clusters each feature information to obtain each clustering feature.
And the tracker module tracks the target according to the clustering characteristics.
The invention provides a multi-view image fused space target tracking system, which considers that in the process of tracking space targets in the prior art, once shielding occurs between the targets, disordered identification scenes can easily occur by calculating the similar distance of the characteristics between images, and aiming at special scenes, such as rectangular spaces of sports fields and other places with higher requirements on target tracking accuracy, at least two cameras are symmetrically arranged around the scene and shoot the scene in an inclined downward posture, so that the problem of identification errors caused by shielding is solved; the spatial feature extraction module can extract overlook spatial features according to specific target frames by combining extracted features of all visual angles, perform clustering, greatly reduce the scene of disordered target tracking according to the uniqueness of the position of a target in the space, and is suitable for target tracking of a single scene with high requirement on target tracking accuracy.
Example 1
Embodiment 1 of the present invention is an embodiment of a multi-view image fusion space target tracking system according to the present invention, in this embodiment, for the real-time tracking processing of the target in the video stream, that is, for the processing of each frame of image, to realize a real-time tracking frame of the target based on deep learning, the yoolov 3 is first used to detect the target in the image of the same frame under four cameras, and at the same time, the four image frames are simultaneously input into the Resnet18 network and the features of the image are extracted by using the migration learning, and then the yoolov 3 detection box and the image feature extraction network are combined to form a spatial feature network, clustering different spatial features of the multiple targets under the four cameras to obtain overlooking features, the method greatly improves the defect of disordered target tracking caused by occlusion in the past by utilizing the uniqueness characteristic of the target in the overlooking space, and puts the final clustering matrix into a tracker module for tracking the target.
Specifically, the embodiment of the system comprises: the system comprises a camera shooting unit, a target frame extraction module, a spatial feature extraction module and a tracker module.
The camera shooting unit comprises at least two cameras symmetrically arranged around a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain a multi-shooting frame image, and the multi-shooting frame image is at least two image frame images of the same scene shot by the at least two cameras at the same time; the camera takes an image in a downward inclined posture, that is, the image taken by the camera is an image taken at a top view angle.
At present, the similar distance between images is calculated by a traditional target tracking algorithm or a target tracking algorithm based on deep learning, but the difference is that the image similar distance is calculated directly according to the pixel value of the image, the effective characteristics of the image are firstly extracted by the deep learning, and then the similar distance of the image is calculated according to the characteristics, so that the deep learning is faster in speed and higher in precision compared with the traditional method. However, when occlusion occurs between targets, it is very easy to find out a scene with confusing recognition by calculating the similarity distance of features between images, and although partial occlusion between targets can be solved by continuously adjusting the pose of the cameras or increasing the number of the cameras, the following situations occur: the labor cost is increased by adjusting the camera, the pose of the camera cannot be adjusted every time when the camera is shielded, and the camera cannot be suitable for most scenes; the number of the cameras is increased, and only some cameras can observe non-shielded pictures, and the cameras with shielded pictures still do not solve the problem of identification errors caused by shielding. Therefore, the problems can be well solved by using the overlooking spatial feature extraction network, and the overlooking spatial feature extraction network mainly utilizes the characteristic of uniqueness of the target object in an overlooking two-dimensional plane to extract the features. The overlook spatial feature extraction network cannot extract the overlook spatial features from one camera image, and therefore the number of cameras needs to be increased.
Preferably, the test scene of the invention is a rectangular space, the four cameras are distributed at four corners, and the four cameras simultaneously face the center of the floor of the rectangular space in an inclined posture, so that the purpose of converting the features extracted from the same-frame images of the four cameras into overlooking features when clustering the extracted space is performed, and the accuracy of re-identification of the target can be improved by utilizing the uniqueness of the position of the target in the space.
The target frame extraction module marks targets detected from each of the multiple shot frame images in the form of target frames.
Preferably, the target frame extraction module predicts the target frame based on the YoloV3 detection network. Fig. 2 is a diagram illustrating the effect of detecting YoloV3 targets in a multi-shot image according to an embodiment of the present invention.
Aiming at the problem that manual labeling of a target frame in the early stage of the current target tracking frame is time-consuming and labor-consuming, the method disclosed by the invention integrates a YoloV3 detection network which is better in performance in deep learning at present, fully automatically detects the target of a video frame, labels the target in the form of a target frame, and then combines the output multi-object target frame and an image information extraction network into a subsequent spatial information characteristic extraction network.
YoloV3 is a high-precision and good-detection-speed target detection network, which takes Darknet-53 as the basic skeleton of the network, takes 256x256 pictures as input, largely uses 1x1 and 3x3 convolutional layers for stacking, and uses a residual error network to transmit shallow information to a deep layer, thereby increasing the network depth and simultaneously not causing problems of gradient explosion and the like; a multi-scale detection strategy is adopted in the detection aspect of the Yolov3 structure, three feature maps with different sizes, namely 32x32, 16x16 and 8x8, are used for detection output, the feature maps are subjected to size mapping on each point of a detection feature layer, each point is provided with 3 prediction boxes, therefore 4032 prediction boxes are totally arranged on the detection of the three feature layers, the prediction number greatly meets the requirement for detecting multiple types of objects, finally, a logistic regression is used for carrying out targeted scoring on each prediction box, target boxes meeting the requirement are selected according to the targeted scoring, and the target boxes are predicted.
And the spatial feature extraction module extracts feature information in each target frame and clusters each feature information to obtain each clustering feature.
Preferably, as shown in fig. 3, a flowchart of spatial feature extraction provided in the embodiment of the present invention is provided, and the spatial feature extraction module includes a convolutional neural network module and a spatial clustering module.
Extracting the characteristic information of the multi-shot same-frame image by using an image characteristic extraction network pre-trained by an Imagenet database, wherein the image characteristic extraction network can be a classical convolution neural network; the convolutional neural network extracts the characteristic information of each target frame, and the characteristic information is spliced to form a large spatial information characteristic matrix.
The spatial clustering module clusters the spatial information characteristic matrix to obtain clustering centers of each category, and outputs a clustering center matrix formed by the clustering centers to the tracker module. Since the clustering center is equivalent to the mean value of the features of the same category, the clustering centers of the images of the symmetrically distributed cameras are the overlooking features of the images.
And the tracker module tracks the target according to the clustering characteristics. The tracker module mainly combines a clustering center extracted by a spatial characteristic information network with a dynamic increase and decrease tracking target to realize a relatively accurate target tracking effect.
Preferably, the tracker module comprises respective tracker classes corresponding to respective cluster centers.
When the continuous occurrence time of the target does not exceed the set target occurrence time threshold, the target does not need to be tracked, so that the condition that the target which appears in a short time cannot be tracked is avoided, and the storage space is saved; when the continuous appearance time of the target exceeds the set target appearance time threshold, the target needs to be tracked, so that the cluster center of the target is stored in a tracking target class, and a corresponding tracker class is initialized.
And when the target loss time exceeds a set target loss time threshold, discarding the corresponding tracker class.
The time threshold value of the tracker class is set to be very important in initialization of the tracker class, which is related to the process of dynamically increasing and decreasing targets, the target loss time threshold value is set mainly to remove targets which do not appear in a period of time from the tracked target class, and the operation also ensures that the number of target tracking classes in the cache is within the internal memory safety range and the phenomenon of cache burst does not occur.
The tracker module mainly utilizes the combination of the clustering center extracted by the spatial characteristic information network and the dynamic increase and decrease of the tracked target to realize a more accurate target tracking effect.
In particular, the tracker class includes a track target class attribute, a live index attribute, a true live index, and a global live index attribute.
The tracked target class attribute is used for storing the tracked target.
The live index attribute is used for recording the index of the tracked target and corresponds to the tracked target class attribute.
The true live index is used to record the trace-target classes that are not affected by the time threshold and are not removed.
The global live index attribute is used to record the live tracked target class index.
Further, the initialization process of the tracker class includes:
initializing a global category serial number, wherein the size of the global category serial number is related to the category number of the current cluster center; and classifying the categories in the images of the multiple shot frames by adopting an index tracking index mode.
Further, the maintenance process of the tracker class in the tracking process includes:
calculating the similar distance between each cluster center in the current cluster center matrix and each cluster center in the last cluster center matrix, judging whether each category in the current cluster center matrix exists or not by adopting an optimal distribution principle, if so, matching with the last category, and tracking; otherwise, the non-existent category is added to the tracked class for storage, which is equivalent to putting the data into the database.
In the specific implementation process, theoretically, the more cameras are used, the better the test effect is, so that the test environment is built by using two paths of cameras, and the test space is a leisure space as shown in fig. 4. Two mobile phones are erected on opposite corners under the environment to prepare for shooting videos at the same time, a rectangular space is formed, and the mobile phones face the center of the rectangular space in a top view state. Let the person walk gradually into the center of the rectangle and the person walks in the rectangular space, with increasing number of persons. And finally, the shot two mobile phone videos are the same-frame video images in the same scene.
And inputting the two videos into a target tracking system in a certain sequence for tracking and identifying. The IDs of the same type of targets under the two cameras can be accurately identified to be consistent, and if the number of the cameras is increased, more collected spatial features can be used for accurately tracking and re-identifying multiple people.
Example 2
Embodiment 2 of the present invention is an embodiment of a multi-view image fused space target tracking method, and as shown in fig. 4, is a flowchart of an embodiment of a multi-view image fused space target tracking method, and as can be seen from fig. 4, the embodiment of the method includes:
step 1, at least two cameras are symmetrically arranged around a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain a multi-shooting frame image, and the multi-shooting frame image is at least two image frame images of the same scene shot by the at least two cameras at the same time.
Preferably, the scene is a rectangular space, the number of the cameras is four, the cameras are arranged at four corners of the rectangular scene, and each camera faces to the center of the scene at the same time.
And 2, marking the targets detected from the multiple shot frame images in a target frame mode.
Preferably, the target frame is predicted based on the YoloV3 detection network.
And 3, extracting the characteristic information in each target frame, and clustering each characteristic information to obtain each clustering characteristic.
Preferably, the feature information of each target frame is extracted, and the feature information is spliced to form a large spatial information feature matrix.
Clustering the spatial information characteristic matrix to obtain the clustering center of each category, and outputting the clustering center matrix formed by the clustering centers as the clustering characteristic.
And 4, tracking the target according to the clustering characteristics.
Preferably, step 4 comprises:
and establishing each tracker class corresponding to each clustering center.
When the continuous appearance time of the target exceeds the set target appearance time threshold, the target needs to be tracked, so that the cluster center of the target is stored in a tracking target class, and a corresponding tracker class is initialized.
And when the target loss time exceeds a set target loss time threshold, discarding the corresponding tracker class.
The time threshold value of the tracker class is set to be very important in initialization of the tracker class, which is related to the process of dynamically increasing and decreasing targets, the target loss time threshold value is set mainly to remove targets which do not appear in a period of time from the tracked target class, and the operation also ensures that the number of target tracking classes in the cache is within the internal memory safety range and the phenomenon of cache burst does not occur.
In particular, the tracker class includes a track target class attribute, a live index attribute, a true live index, and a global live index attribute.
The tracked target class attribute is used for storing the tracked target.
The live index attribute is used for recording the index of the tracked target and corresponds to the tracked target class attribute.
The true live index is used to record the trace-target classes that are not affected by the time threshold and are not removed.
The global live index attribute is used to record the live tracked target class index.
Further, the initialization process of the tracker class includes:
a global class number is initialized whose size is related to the number of classes in the current cluster center. And classifying the categories in the images of the multiple shot frames by adopting an index tracking index mode.
Further, the maintenance process of the tracker class in the tracking process includes:
calculating the similar distance between each cluster center in the current cluster center matrix and each cluster center in the last cluster center matrix, judging whether each category in the current cluster center matrix exists or not by adopting an optimal distribution principle, if so, matching with the last category, and tracking; otherwise, the non-existent category is added to the tracked class for storage, which is equivalent to putting the data into the database.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, is implemented to perform the environmental event monitoring method for an environmental internet of things provided in the foregoing embodiments, for example, the method includes: step 1, setting the types of indexes related to each environmental event and the alarm value range. And 2, monitoring the numerical values of all indexes in real time, and executing the step 3 after first alarm information is generated when any index is in the corresponding alarm numerical value range. And 3, executing the step 4 when the number of the types of the indexes related to the environmental events corresponding to the indexes is at least two. And 4, generating second alarm information when all indexes related to the environmental event within the set time are judged to be within the corresponding alarm value range.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A multi-view image fused space target tracking system is characterized by comprising: the system comprises a camera shooting unit, a target frame extraction module, a spatial feature extraction module and a tracker module;
the camera shooting unit comprises at least two cameras symmetrically arranged around a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain multiple shooting frame images, and the multiple shooting frame images are at least two image frame images of the same scene shot by the at least two cameras at the same time;
the target frame extraction module marks targets detected from the multiple shot same frame images in a target frame mode;
the spatial feature extraction module extracts feature information in each target frame, and clusters each feature information to obtain each clustering feature;
the tracker module carries out target tracking according to the clustering characteristics;
the spatial feature extraction module comprises a convolutional neural network module and a spatial clustering module;
the convolutional neural network extracts the characteristic information of each target frame, and the characteristic information is spliced to form a spatial information characteristic matrix;
the spatial clustering module clusters the spatial information characteristic matrix to obtain clustering centers of each category, and outputs a clustering center matrix formed by the clustering centers to the tracker module;
the tracker module comprises tracker classes corresponding to the clustering centers;
when the continuous occurrence time of the target exceeds a set target occurrence time threshold, initializing the corresponding tracker class;
when the target loss time exceeds a set target loss time threshold, discarding the corresponding tracker class;
the tracker class includes a tracking target class attribute, a live index attribute, a true live index, and a global live index attribute;
the tracking target class attribute is used for storing the tracked target;
the survival index attribute is used for recording the index of the tracked target;
the real survival index is used for recording the tracking target class which is not influenced by the time threshold value and is not removed;
the global live index attribute is used for recording the live tracking target class index.
2. The spatial target tracking system of claim 1, wherein the scene is a rectangular space, the number of the cameras is four, the four cameras are arranged at four corners of the rectangular scene, and each camera faces to a center position of the scene at the same time.
3. The spatial target tracking system of claim 1, wherein the target box extraction module predicts the target box based on a yoolov 3 detection network.
4. The spatial target tracking system of claim 1, wherein the initialization of the tracker class comprises:
initializing a global category serial number, wherein the size of the global category serial number is related to the category number of the current clustering center; and classifying the categories in the multi-shot same-frame images in an index tracking index mode.
5. The spatial target tracking system of claim 1, wherein the maintenance procedure of the tracker class in the tracking procedure comprises:
calculating the similar distance between each cluster center in the current cluster center matrix and each cluster center in the last cluster center matrix, judging whether each category in the current cluster center matrix exists or not by adopting an optimal distribution principle, if so, matching with the last category, and tracking; otherwise, a non-existent class is added to the tracked class for storage.
6. A tracking method of a multi-view image fused space target tracking system is characterized by comprising the following steps:
step 1, at least two cameras are symmetrically arranged around a scene for space target tracking, the cameras shoot the scene in an inclined downward posture to obtain multiple shooting frame images, and the multiple shooting frame images are at least two image frame images of the same scene shot by the at least two cameras at the same time;
step 2, labeling the targets detected from the multiple shot same frame images in a target frame mode;
step 3, extracting characteristic information in each target frame, and clustering each characteristic information to obtain each clustering characteristic;
step 4, tracking the target according to the clustering characteristics;
the step 4 comprises the following steps: establishing each tracker class corresponding to each clustering center;
when the continuous occurrence time of the target exceeds a set target occurrence time threshold, initializing the corresponding tracker class;
when the target loss time exceeds a set target loss time threshold, discarding the corresponding tracker class;
the tracker class includes a tracking target class attribute, a live index attribute, a true live index, and a global live index attribute;
the tracking target class attribute is used for storing the tracked target;
the survival index attribute is used for recording the index of the tracked target;
the real survival index is used for recording the tracking target class which is not influenced by the time threshold value and is not removed;
the global live index attribute is used for recording the live tracking target class index.
7. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, implements the steps of the tracking method of the multi-view image fused spatial target tracking system according to claim 6.
CN202010977186.1A 2020-09-17 2020-09-17 Multi-view image fusion space target tracking system and method Active CN111833380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010977186.1A CN111833380B (en) 2020-09-17 2020-09-17 Multi-view image fusion space target tracking system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010977186.1A CN111833380B (en) 2020-09-17 2020-09-17 Multi-view image fusion space target tracking system and method

Publications (2)

Publication Number Publication Date
CN111833380A true CN111833380A (en) 2020-10-27
CN111833380B CN111833380B (en) 2020-12-15

Family

ID=72918483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010977186.1A Active CN111833380B (en) 2020-09-17 2020-09-17 Multi-view image fusion space target tracking system and method

Country Status (1)

Country Link
CN (1) CN111833380B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707014A (en) * 2022-06-06 2022-07-05 科大天工智能装备技术(天津)有限公司 FOV-based image data fusion indexing method
CN114782865A (en) * 2022-04-20 2022-07-22 清华大学 Intersection vehicle positioning method and system based on multi-view angle and re-recognition
CN115695818A (en) * 2023-01-05 2023-02-03 广东瑞恩科技有限公司 Efficient management method for intelligent park monitoring data based on Internet of things

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020989A (en) * 2012-12-05 2013-04-03 河海大学 Multi-view target tracking method based on on-line scene feature clustering
CN104680559A (en) * 2015-03-20 2015-06-03 青岛科技大学 Multi-view indoor pedestrian tracking method based on movement behavior mode
CN107992791A (en) * 2017-10-13 2018-05-04 西安天和防务技术股份有限公司 Target following failure weight detecting method and device, storage medium, electronic equipment
CN109635721A (en) * 2018-12-10 2019-04-16 山东大学 Video human fall detection method and system based on track weighting depth convolution sequence poolization description
US20200134329A1 (en) * 2018-10-26 2020-04-30 Cartica Ai Ltd. Tracking after objects

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020989A (en) * 2012-12-05 2013-04-03 河海大学 Multi-view target tracking method based on on-line scene feature clustering
CN104680559A (en) * 2015-03-20 2015-06-03 青岛科技大学 Multi-view indoor pedestrian tracking method based on movement behavior mode
CN107992791A (en) * 2017-10-13 2018-05-04 西安天和防务技术股份有限公司 Target following failure weight detecting method and device, storage medium, electronic equipment
US20200134329A1 (en) * 2018-10-26 2020-04-30 Cartica Ai Ltd. Tracking after objects
CN109635721A (en) * 2018-12-10 2019-04-16 山东大学 Video human fall detection method and system based on track weighting depth convolution sequence poolization description

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782865A (en) * 2022-04-20 2022-07-22 清华大学 Intersection vehicle positioning method and system based on multi-view angle and re-recognition
CN114707014A (en) * 2022-06-06 2022-07-05 科大天工智能装备技术(天津)有限公司 FOV-based image data fusion indexing method
CN114707014B (en) * 2022-06-06 2022-08-26 科大天工智能装备技术(天津)有限公司 FOV-based image data fusion indexing method
CN115695818A (en) * 2023-01-05 2023-02-03 广东瑞恩科技有限公司 Efficient management method for intelligent park monitoring data based on Internet of things

Also Published As

Publication number Publication date
CN111833380B (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN111833380B (en) Multi-view image fusion space target tracking system and method
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
Sharma et al. YOLOrs: Object detection in multimodal remote sensing imagery
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
Kong et al. Detecting abandoned objects with a moving camera
WO2016131300A1 (en) Adaptive cross-camera cross-target tracking method and system
KR101781358B1 (en) Personal Identification System And Method By Face Recognition In Digital Image
Feng et al. Cityflow-nl: Tracking and retrieval of vehicles at city scale by natural language descriptions
Bouma et al. Re-identification of persons in multi-camera surveillance under varying viewpoints and illumination
CN112668557A (en) Method for defending image noise attack in pedestrian re-identification system
Di Benedetto et al. An embedded toolset for human activity monitoring in critical environments
Saif et al. Crowd density estimation from autonomous drones using deep learning: challenges and applications
Špaňhel et al. Vehicle fine-grained recognition based on convolutional neural networks for real-world applications
CN111767839A (en) Vehicle driving track determining method, device, equipment and medium
Vajhala et al. Weapon detection in surveillance camera images
Huang et al. Whole-body detection, recognition and identification at altitude and range
Essmaeel et al. A new 3D descriptor for human classification: Application for human detection in a multi-kinect system
Fatichah et al. Optical flow feature based for fire detection on video data
Micheal et al. Comparative analysis of SIFT and SURF on KLT tracker for UAV applications
Park et al. Video surveillance system based on 3d action recognition
Southey et al. Object discovery through motion, appearance and shape
Holla et al. Vehicle re-identification in smart city transportation using hybrid surveillance systems
Chen et al. Vehicle re-identification method based on vehicle attribute and mutual exclusion between cameras
Chen et al. Dynamic visual saliency modeling based on spatiotemporal analysis
Ferreira et al. Human detection and tracking using a Kinect camera for an autonomous service robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A space target tracking system and method based on multi view image fusion

Effective date of registration: 20210611

Granted publication date: 20201215

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN OPTICS VALLEY INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2021420000035

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20220615

Granted publication date: 20201215

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN OPTICS VALLEY INFORMATION TECHNOLOGY CO.,LTD.

Registration number: Y2021420000035

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A space target tracking system and method based on multi view image fusion

Effective date of registration: 20220617

Granted publication date: 20201215

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN OPTICS VALLEY INFORMATION TECHNOLOGY CO.,LTD.

Registration number: Y2022420000164

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230615

Granted publication date: 20201215

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN OPTICS VALLEY INFORMATION TECHNOLOGY CO.,LTD.

Registration number: Y2022420000164