CN112396627A - Target tracking method and device and computer readable storage medium - Google Patents

Target tracking method and device and computer readable storage medium Download PDF

Info

Publication number
CN112396627A
CN112396627A CN201910741611.4A CN201910741611A CN112396627A CN 112396627 A CN112396627 A CN 112396627A CN 201910741611 A CN201910741611 A CN 201910741611A CN 112396627 A CN112396627 A CN 112396627A
Authority
CN
China
Prior art keywords
video
frame
result
target tracking
temporary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910741611.4A
Other languages
Chinese (zh)
Inventor
杜奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910741611.4A priority Critical patent/CN112396627A/en
Publication of CN112396627A publication Critical patent/CN112396627A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The embodiment of the invention discloses a target tracking method and a device thereof and a computer readable storage medium, comprising the following steps: loading an ith frame of video, and acquiring a target tracking result of the ith-1 frame of video and a detection classification model corresponding to the ith-1 frame of video; wherein i is not equal to 1; tracking the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result; obtaining a prediction region from the ith frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the (i-1) th frame of video to obtain a temporary detection result; the prediction area is used for representing an area with a redundant part removed in the ith frame of video; integrating the temporary tracking result and the temporary detection result to determine a target tracking result of the ith frame of video; continuously loading the (i + 1) th video for target tracking until the i is equal to N, and finishing target tracking; where N is the total number of frames of the video. Therefore, redundant image blocks can be eliminated, and the real-time performance of target tracking is improved.

Description

Target tracking method and device and computer readable storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a target tracking method and apparatus, and a computer-readable storage medium.
Background
The target tracking refers to a process of tracking a target in a video image in real time by using an image processing technology and a computer vision technology, and has important application in the fields of video monitoring, behavior recognition, intelligent driving and the like. The target Tracking method based on Tracking-Learning-Detection (TLD) is a target Tracking method with superior performance, in the method, a Detection module and a Tracking module run synchronously, the traditional Tracking method and the traditional Detection method can be combined, and meanwhile, an online Learning mechanism is introduced to update parameters in the Detection module, so that the accuracy of target Tracking is ensured.
When target tracking is carried out based on TLD, a video image needs to be blocked first, and a detection module inputs all image blocks into the detection module according to a global scanning strategy. However, since most image blocks only contain the background area of the image, when the detection module records the image blocks to be classified according to the global scanning strategy, hardware resources are wasted in redundant image blocks not containing foreground targets, and the real-time performance of target tracking is seriously affected.
Disclosure of Invention
The embodiment of the invention provides a target tracking method and device and a computer-readable storage medium, which can reduce redundant image blocks to be classified and improve the real-time performance of target tracking.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a target tracking method, including:
loading an ith frame of video, and acquiring a target tracking result of the ith-1 frame of video and a detection classification model corresponding to the ith-1 frame of video; wherein i is not equal to 1;
tracking the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result;
obtaining a prediction region from the ith frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the (i-1) th frame of video to obtain a temporary detection result; the prediction area is used for representing an area with a redundant part removed in the ith frame of video;
integrating the temporary tracking result and the temporary detection result to determine a target tracking result of the ith frame of video;
continuously loading the (i + 1) th frame of video for target tracking until the i is equal to N, and finishing target tracking; where N is the total number of frames of the video.
In the above scheme, before the loading the ith frame of video, obtaining the target tracking result of the ith-1 frame of video, and the detection classification model corresponding to the ith-1 frame of video, the method further includes:
loading an initial video frame; wherein the initial video frame is used for representing a first frame of a video;
and receiving initialization operation, and determining a target tracking result of the initial video frame and a detection classification model corresponding to the initial video frame according to the initialization operation.
In the above scheme, the tracking processing on the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result includes:
determining tracking characteristic points according to a target tracking result of the i-1 frame video;
searching the position information of the tracking feature points in the ith frame of video;
and determining the temporary tracking result by using the position information.
In the foregoing solution, the obtaining a prediction region from the i-th frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the i-1 th frame of video to obtain a temporary detection result includes:
determining a first area corresponding to a target tracking result of the i-1 frame video;
determining at least one moving object from the ith frame of video by using the region prediction algorithm, and determining at least one second region according to the at least one moving object;
removing second areas which meet a preset area screening relation with the first area from the at least one second area, and determining candidate areas;
according to a preset region expansion rule, performing expansion operation on the candidate region to obtain the prediction region; the preset region expansion rule is used for representing that the width and the height of the candidate region are expanded according to a proportion;
dividing the prediction area into a plurality of image blocks according to a preset image block generation strategy;
and detecting the plurality of image blocks by using a detection classification model corresponding to the i-1 frame of video to obtain the temporary detection result.
In the foregoing solution, the removing, from the at least one second region, a second region that satisfies a preset region screening relationship with the first region to determine a candidate region includes:
acquiring a first size corresponding to the first area;
acquiring at least one second size corresponding to the at least one second area;
and removing second areas corresponding to second sizes, of which the first sizes meet the preset area screening relation, from the at least one second area, and determining the remaining second areas as candidate areas.
In the above scheme, the integrating the temporary tracking result and the temporary detection result to determine the target tracking result of the ith frame of video includes:
when the temporary tracking result exists and the temporary detection result exists, clustering the temporary detection result to obtain a clustering result, and weighting the clustering result and the temporary tracking result to obtain a target tracking result of the ith frame of video;
when the temporary tracking result exists and the temporary detection result does not exist, taking the temporary tracking result as a target tracking result of the ith frame of video;
when the temporary tracking result does not exist and the temporary detection result exists, clustering the temporary detection result to obtain a clustering result, and taking the clustering result as a target tracking result of the ith frame of video;
and when the temporary tracking result does not exist and the temporary detection result does not exist, determining that the target tracking result of the ith frame of video is an invalid result.
In the above solution, after the integrating the temporary tracking result and the temporary detection result and determining the target tracking result of the ith frame of video, the method further includes:
and displaying the target tracking result of the ith frame of video on a current display interface.
In the above scheme, after the obtaining a prediction region from the i-th frame of video by using a region prediction algorithm, detecting the prediction region according to a detection classification model corresponding to the i-1 th frame of video, and obtaining a temporary detection result, and before integrating the temporary tracking result and the temporary detection result and determining a target tracking result of the i-th frame of video, the method further includes:
and updating the detection classification model corresponding to the i-1 frame video according to the temporary tracking result and the temporary detection result to obtain the detection classification model corresponding to the i frame video.
In the foregoing scheme, the updating the detection classification model corresponding to the i-1 th frame of video according to the temporary tracking result and the temporary detection result to obtain the detection classification model corresponding to the i-th frame of video includes:
evaluating the temporary tracking result of the ith frame of video by using the temporary tracking result of the ith-1 frame of video;
when the temporary tracking result of the ith frame of video is reliable, correcting the temporary detection result of the ith frame according to the temporary tracking result of the ith frame of video to obtain a correction result;
and updating the detection classification model corresponding to the i-1 frame of video by using the correction result to obtain the detection classification model corresponding to the i frame of video.
In a second aspect, an embodiment of the present invention provides a target tracking apparatus, including:
the acquisition module is used for loading the ith frame of video, acquiring a target tracking result of the (i-1) th frame of video and a detection classification model corresponding to the (i-1) th frame of video; wherein i is not equal to 1; the system is used for continuously loading the (i + 1) th frame of video to track the target until the i is equal to N, and finishing target tracking; wherein N is the total frame number of the video;
the tracking module is used for tracking the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result;
the detection module is used for obtaining a prediction region from the ith frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the (i-1) th frame of video to obtain a temporary detection result; the prediction area is used for representing an area with a redundant part removed in the ith frame of video;
and the integration module is used for integrating the temporary tracking result and the temporary detection result and determining the target tracking result of the ith frame of video.
In a third aspect, an embodiment of the present invention further provides a target tracking apparatus, including: a memory and a processor;
the memory is used for storing executable target tracking instructions;
the processor is configured to execute the executable target tracking instructions stored in the memory to implement the method according to any of the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where executable target tracking instructions are stored, and when the executable target tracking instructions are executed, the processor is configured to implement the method according to any one of the above first aspects.
The embodiment of the invention provides a target tracking method and device and a computer readable storage medium, wherein an ith frame of video is loaded, and a target tracking result of an (i-1) th frame of video and a detection classification model corresponding to the (i-1) th frame of video are obtained; wherein i is not equal to 1; tracking the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result; obtaining a prediction region from the ith frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the (i-1) th frame of video to obtain a temporary detection result; the prediction area is used for representing an area with a redundant part removed in the ith frame of video; integrating the temporary tracking result and the temporary detection result to determine a target tracking result of the ith frame of video; continuously loading the (i + 1) th frame of video for target tracking until the i is equal to N, and finishing target tracking; where N is the total number of frames of the video. By adopting the implementation mode, the target tracking device can remove the image blocks only containing the background when the target is tracked, so that the redundancy of the image blocks to be classified is greatly reduced, and the real-time performance of target tracking is improved while the accuracy and the robustness of the target tracking are ensured.
Drawings
Fig. 1 is a system framework diagram of a TLD target tracking method according to an embodiment of the present invention;
fig. 2 is a first flowchart of a target tracking method according to an embodiment of the present invention;
fig. 3 is a flowchart of a target tracking method according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a target tracking method according to an embodiment of the present invention;
fig. 5 is a flowchart three of a target tracking method according to an embodiment of the present invention;
fig. 6 is a fourth flowchart of a target tracking method according to an embodiment of the present invention;
fig. 7 is a fifth flowchart of a target tracking method according to an embodiment of the present invention;
fig. 8 is a sixth flowchart of a target tracking method according to an embodiment of the present invention;
FIG. 9 is a diagram illustrating an effect of determining candidate regions by the target tracking apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic diagram illustrating a process of classifying image blocks by the target tracking apparatus according to an embodiment of the present invention;
fig. 11 is a seventh flowchart of a target tracking method according to an embodiment of the present invention;
fig. 12 is a flowchart eight of a target tracking method according to an embodiment of the present invention;
fig. 13 is a ninth flowchart of a target tracking method according to an embodiment of the present invention;
FIG. 14 is a diagram illustrating a process of updating a detection classification model according to an embodiment of the present invention;
fig. 15 is a first schematic structural diagram of a target tracking apparatus according to an embodiment of the present invention;
fig. 16 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present invention;
fig. 17 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present invention;
fig. 18 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The target tracking refers to a process of tracking a specific target in a video image in real time by adopting an image processing technology and a computer vision technology. Currently, there are many different target Tracking methods, and among them, a target Tracking method based on Tracking-Learning-Detection (TLD) is a target Tracking method with superior performance. The TLD target tracking method comprises three modules which are respectively as follows: the tracking module is responsible for tracking the target, the detection module is responsible for searching the target, and the learning module is responsible for integrating the tracking result of the tracking module and the target searched by the detection module to obtain the final target tracking result and learning according to the tracking result of the tracking module and the target searched by the detection module so as to update the parameters in the tracking module and the detection module. When the target tracking is carried out, three modules in the TLD target tracking algorithm operate simultaneously and complement each other, so that the target tracking reaches a better state. As shown in fig. 1, in order to more conveniently understand the functions of TLD target tracking modules, in fig. 1, a learning module responsible for integrating the tracking result of the tracking module with the target found by a detection module to obtain a final target tracking result is virtualized to be an integration module, at this time, when the TLD target tracking method is used for target tracking, after the target to be tracked is determined by a target initial tracking frame, a video frame is read in, the tracking module and the detection module generate positive and negative samples and send the positive and negative samples to the learning module and the integration module, after the integration module integrates the positive and negative samples, the target tracking result is determined and the tracking frame is output, and the integrated samples are sent to the learning module, which uses the positive and negative samples, and the integrated samples are learned, and parameters of the tracking module and the detection module are updated by using the target model, so that the accuracy of subsequent target tracking is ensured.
When the TLD target tracking method is used for target tracking, the detection module needs to adopt a sliding window, block the video frame according to a global scanning strategy, and detect the obtained image block. However, it has been found through experiments that for a video frame with a frame resolution of 320 × 240, if the size of the target bounding box in the video frame is 20 × 20, the video frame can generate about 96000 image blocks. In fact, in the image blocks of nearly one hundred thousand, only a few parts of the image blocks contain foreground objects, and most of the image blocks only contain background images.
Based on the problems existing in the process of tracking the target by adopting the TLD target tracking method, the basic idea of the embodiment of the invention is that the target tracking device eliminates redundant image blocks only containing background images in video frames by using a region prediction algorithm, so that the redundancy of the image blocks to be classified is reduced, and the real-time performance of target tracking is improved.
Example one
Based on the idea of the foregoing embodiment of the present invention, an embodiment of the present invention provides a target tracking method, which may include, referring to fig. 2:
s101, loading an ith frame of video, and acquiring a target tracking result of the ith-1 frame of video and a detection classification model corresponding to the ith-1 frame of video; where i is not equal to 1.
The target tracking method provided by the embodiment of the invention processes the video image, and the video image is composed of video frames, so that when the target tracking device tracks the target of the video image, each frame of video frame in the video image is processed substantially. Because the target tracking device needs to use the target tracking result corresponding to the previous frame of video of the current video frame and the detection classification model corresponding to the previous frame of video of the current video frame to process the current video frame to obtain the target tracking result and update the detection classification model, when the target tracking device performs target tracking, the ith frame of video corresponding to the current moment needs to be loaded first, and then the previous frame of video of the ith frame of video, namely the target tracking result of the ith-1 frame of video and the detection classification model corresponding to the ith-1 frame of video are obtained in the target tracking device. In the target tracking process, the 1 st frame of video is usually used for initialization, and only after a target object to be tracked is selected on the 1 st frame of video through the clicking operation of the mouse, the target tracking device can automatically load the rest of video frames and obtain a target tracking result and a detection classification model corresponding to the previous frame of video of the current video frame, that is, when the target tracking device performs target tracking, the operations performed on the first frame of the video image and the non-first frame of the video image are different, so in the embodiment of the invention, i is not equal to 1.
It should be noted that the ith frame of video refers to a frame of video that needs to be loaded by the target tracking device at the current time, and correspondingly, the (i-1) th frame of video refers to a frame of video that is previous to the ith frame of video.
In the embodiment of the invention, the target tracking result of the i-1 frame video refers to a final target tracking result corresponding to the i-1 frame video obtained after the target tracking device carries out a series of processing on the i-1 frame video. It should be noted that, when the target tracking device performs target tracking on each frame of video frame, the target tracking device obtains a target tracking result corresponding to the frame, and stores the target tracking result in the target tracking device.
Specifically, the target tracking result of the i-1 frame video refers to a target tracking object finally determined by the target tracking device in the i-1 frame video. For example, if the target tracking device determines that the pedestrian a is the target tracking object, the target tracking result of the i-1 th frame of video refers to the pedestrian a finally determined in the i-1 th frame of video after the target tracking device performs a series of processing on the i-1 th frame of video.
In the embodiment of the invention, the detection classification model corresponding to the i-1 th frame of video refers to a detection classification model corresponding to the last frame of video of the i-th frame of video corresponding to the current moment. It should be noted that, when the target tracking device performs target tracking on each frame of video frame, the target tracking device obtains the detection classification model corresponding to the video frame, and stores the obtained detection classification model in the target tracking device, so as to facilitate subsequent processing.
Optionally, the target tracking apparatus in the embodiment of the present invention may be an electronic device with computing capability, such as a server, a personal computer, and the like, and the embodiment of the present invention is not limited in particular here.
S102, tracking the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result.
After obtaining the target tracking result of the i-1 frame video, the target tracking device can track the i-frame video according to the target tracking result of the i-1 frame video, and the obtained tracking processing result of the i-frame video is used as a temporary tracking result.
In the embodiment of the invention, the temporary tracking result refers to an intermediate result obtained by processing the ith frame of video by the target tracking device, and after the temporary tracking result is obtained by the target tracking device, a series of operation processing is required to obtain a final target tracking result.
Specifically, the target tracking device can determine tracking feature points according to a target tracking result of the i-1 frame video; then, the target tracking device searches the position information of the tracking characteristic points in the ith frame of video; finally, the target tracking features can determine a temporary tracking result by using the position information.
The provisional tracking result refers to the target tracking target determined by the target tracking apparatus from the i-th frame video. For example, if the target tracking device determines that the pedestrian a is the target tracking object, the temporary tracking result of the ith frame of video refers to the pedestrian a determined by the target tracking device from the ith frame of video through the tracking module.
It should be noted that, in the embodiment of the present invention, when the target tracking apparatus determines the target tracking object in the i-th frame video by using the target tracking result of the i-1-th frame video, at this time, the target tracking apparatus considers that the temporary tracking result exists, and uses the target tracking object determined from the i-th frame video as the temporary tracking result. However, in practice, the target tracking apparatus is not able to determine the target tracking object from the video frame every time, and therefore, there is another case where the target tracking apparatus considers that the provisional tracking result does not exist when the target tracking apparatus does not determine the target tracking object in the i-th frame video using the target tracking result of the i-1-th frame video.
Illustratively, if the target tracking result of the i-1 frame video is a pedestrian A, when the target tracking device finds the pedestrian A in the i-1 frame video by using the target tracking result of the i-1 frame video, the target tracking device considers that a temporary tracking result exists, and takes the pedestrian A determined from the i-1 frame video as the temporary tracking result; when the target tracking device cannot find the pedestrian A in the ith frame of video by using the target tracking result of the ith-1 st frame of video, the target tracking device considers that the temporary tracking result does not exist.
S103, obtaining a prediction region from the ith frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the ith-1 frame of video to obtain a temporary detection result; the prediction area is used for representing an area with the redundant part removed in the ith frame of video.
The target tracking device eliminates redundant parts only containing background images in the ith frame of video by using a region prediction algorithm, and then processes the residual image parts containing foreground targets to obtain a prediction region in the ith frame of video. Then, the target tracking device performs detection operation on the prediction region by using the detection classification model corresponding to the i-1 frame of video obtained before to obtain a detection result corresponding to the i-frame of video as a temporary detection result.
Specifically, the target tracking device determines a first area corresponding to a target tracking result of the i-1 frame of video; then, the target tracking device determines at least one moving object from the ith frame of video by using a region detection algorithm, and determines at least one second region according to the at least one moving object; then, second areas meeting the preset area screening relation with the first area are removed from at least one second area, and candidate areas are determined; finally, the target tracking device performs expansion operation on the candidate area according to a preset area expansion rule to obtain a prediction area; the preset region expansion rule is used for representing that the width and the height of the candidate region are expanded proportionally.
It should be noted that the region detection algorithm in the embodiment of the present invention can separate the background image from the foreground object in the moving state in the video frame, so that the object tracking device can determine the redundant portion only including the background image in the video frame and remove the redundant portion.
Optionally, the area prediction algorithm in the embodiment of the present invention may adopt a Visual Background Extraction algorithm (ViBe), or may adopt other algorithms capable of achieving the same purpose, and the embodiment of the present invention is not limited herein.
After obtaining the prediction region from the i-th frame video, the target tracking device can detect the prediction region according to the detection classification model corresponding to the i-1 th frame video obtained before, and obtain a temporary detection result.
Specifically, the target tracking device divides the prediction area into a plurality of image blocks according to a preset image block generation strategy; and then, the target tracking device detects the plurality of image blocks by using a detection classification model corresponding to the i-1 frame of video to obtain a temporary detection result.
The temporary detection result is an intermediate result obtained by the target tracking apparatus performing detection processing on the image block into which the prediction region is divided by using the detection classification model corresponding to the i-1 th frame video.
Specifically, the target tracking apparatus regards, as the provisional detection result, all image blocks including the target tracking object detected from among the plurality of image blocks into which the prediction area is divided, and at this time, the target tracking apparatus regards the provisional detection result as present. However, in practical applications, there are also cases where the target tracking apparatus cannot detect an image block including the target tracking object, and at this time, the target tracking apparatus considers that the provisional detection result does not exist.
For example, if the target tracking device determines that the pedestrian a is the target tracking object, when the target tracking device detects an image block including the pedestrian a from a plurality of image blocks divided by the prediction area by using the detection classification model corresponding to the i-1 th frame of video, the target tracking device takes the detected image block as a temporary detection result and considers that the temporary detection result exists; and when the target tracking device does not detect an image block containing the pedestrian A from the plurality of image blocks divided by the prediction area by using the detection classification model corresponding to the i-1 frame video, the target tracking device considers that the temporary detection result does not exist.
And S104, integrating the temporary tracking result and the temporary detection result, and determining the target tracking result of the ith frame of video.
After the temporary tracking result and the temporary detection result are determined, the target tracking device needs to integrate the temporary tracking result and the temporary detection result, and comprehensively determines the final target tracking result of the ith frame of video according to the temporary tracking result and the existence condition of the temporary detection result.
Specifically, when a temporary tracking result exists and a temporary detection result exists, the target tracking device clusters the temporary detection result to obtain a clustering result, and weights the clustering result and the temporary tracking result to obtain a target tracking result of the ith frame of video; when the temporary tracking result exists and the temporary detection result does not exist, the target tracking device takes the temporary tracking result as the target tracking result of the ith frame of video; when the temporary tracking result does not exist and the temporary detection result exists, the target tracking device clusters the temporary detection result to obtain a clustering result, and the clustering result is used as the target tracking result of the ith frame of video; and when the temporary tracking result does not exist and the temporary detection result does not exist, the target tracking device determines that the target tracking result of the ith frame of video is an invalid result.
It should be noted that the invalid result means that the target tracking apparatus does not obtain the final target tracking result from the ith frame of video, that is, the target tracking apparatus fails to track the target on the ith frame of video.
S105, continuously loading the (i + 1) th frame of video to track the target until i is equal to N, and finishing target tracking; where N is the total number of frames of the video.
After the target tracking result of the ith frame of video is determined, the target tracking device automatically continues to load the next frame of the ith frame of video, namely the (i + 1) th frame of video for target tracking, and the processing steps are circulated until the tracking processing of the last frame of video is finished, and the target tracking process is finished.
In the embodiment of the invention, the target tracking device can remove the redundant image blocks only containing the background image according to the regional prediction algorithm, and only the image blocks containing the foreground target need to be detected, so that the redundancy of the image blocks to be classified is greatly reduced, and the target tracking accuracy and robustness are ensured, and the real-time performance of target tracking is improved.
Example two
Based on the same inventive concept as the first embodiment, referring to fig. 3 based on fig. 2, before loading the ith frame of video, acquiring the target tracking result of the ith-1 frame of video, and the detection classification model corresponding to the ith-1 frame of video, that is, before S101, the method may further include: S106-S107, as follows:
s106, loading an initial video frame; wherein the initial video frame is used to characterize the first frame of the video.
Because the target tracking device needs to determine the target tracking object to be tracked through mouse click operation on the 1 st frame of the video image, namely the first frame of the video image, the target tracking device needs to load the initial video frame first, and can load the subsequent video frame after the initialization operation is completed on the initial video frame to start target tracking.
It should be noted that, after the target tracking apparatus loads the initial frame, the target tracking apparatus interrupts the target tracking process, so that the initial frame is displayed in a static state on the display interface, so as to determine the target tracking object to be tracked on the initial frame.
S107, receiving initialization operation, and determining a target tracking result of the initial video frame and a detection classification model corresponding to the initial video frame according to the initialization operation.
After the initial video frame is loaded, the target tracking device receives initialization operation, a target tracking object determined by the initialization operation is used as a target tracking result of the initial video frame, a detection classification model corresponding to the initial video frame is determined according to the target tracking object determined by the initialization operation, then the target tracking device can load subsequent video frames, the tracking processing is carried out on the ith frame of video according to the target tracking result of the ith-1 frame of video, the detection processing is carried out on the ith frame of video according to the detection classification model corresponding to the ith-1 frame of video, the temporary tracking result and the temporary detection result are integrated, the target tracking result of the ith frame of video is determined, and the target tracking result of the ith frame of video is loaded into the cycle for processing the (i + 1) th frame of video.
In the embodiment of the present invention, the target tracking apparatus may determine an area on the initial video frame according to the received initialization operation, and use the image content located in the area as the target tracking object to be tracked.
It should be noted that the target tracking device determines the target tracking result of the initial video frame according to the initialization operation, which can be regarded as that the target tracking device initializes the tracking feature point by using the initialization operation on the initial video frame; the target tracking device determines a detection classification model corresponding to the initial video frame according to the initialization operation, which can be regarded as that the target tracking device initializes the detection classification model by using the initialization operation on the initial video frame.
For example, fig. 4 is a schematic flowchart of a target tracking method according to an embodiment of the present invention, in fig. 4, after a target tracking process starts, an initialization operation needs to be received first to initialize a first frame of image in a video image, that is, a first frame of image is initialized, then, a target tracking apparatus needs to initialize tracking feature points and a detection classification model according to the initialization operation, after the initialization is completed, the target tracking apparatus can load a next frame of image, process the loaded next frame of video image according to the tracking feature points and the detection classification model, integrate an obtained temporary detection result and an obtained temporary tracking module, and send the integrated result into a Learning module for Positive-Negative sample Learning (PN Learning), output a target tracking result, and finally, the target tracking device judges whether the current video frame is the last frame, if the current video frame is not the last frame of the video image, the target tracking device continues to load the next frame of video for target tracking, and if the current video frame is the last frame of the video image, the target tracking device ends the target tracking.
In the embodiment of the invention, the target tracking device can receive the initialization operation of the initial video frame, determine the target tracking result of the initial video frame and the detection classification model corresponding to the initial video frame according to the initialization operation, and then load the subsequent video frame for processing, so that the target tracking device can determine the target tracking object to be tracked according to the initialization operation of a user, thereby realizing the target tracking of the target tracking object in the subsequent video frame.
In some embodiments, based on fig. 2 and referring to fig. 5, after integrating the temporary tracking result and the temporary detection result and determining the target tracking result of the ith frame of video, continuing to load the (i + 1) th frame of video for target tracking until i is equal to N, before completing the target tracking, that is, after S104, before S105, the method may further include: s108, the following steps are carried out:
and S108, displaying the target tracking result of the ith frame of video on a display interface.
After the target tracking device integrates the temporary tracking result and the temporary detection result to obtain the final target tracking result of the ith frame of video, the target tracking result can be displayed on a display interface, so that the target tracking result can be displayed more intuitively.
It should be noted that, in the embodiment of the present invention, when the target tracking device displays the target tracking result of the ith frame of video on the display interface, a rectangular area is determined according to the height and width of the pixel area occupied by the target tracking result of the ith frame of video, and when the target tracking device displays the target tracking result, the rectangular area is used to frame the target tracking result, so as to display the target tracking result more intuitively.
In the embodiment of the invention, the target tracking device can display the target tracking result of the ith frame of video on the display interface after integrating the temporary tracking result and the temporary detection result to obtain the target tracking result of the ith frame of video, so that the target tracking device can display the tracked target tracking object in real time when the target tracking device performs target tracking processing on the video image, and the target tracking process can be displayed intuitively.
EXAMPLE III
Based on the same inventive concept as the first embodiment, referring to fig. 6, in the first embodiment, according to the target tracking result of the i-1 th frame of video, the tracking processing is performed on the i-th frame of video to obtain a temporary tracking result, that is, the specific implementation process of S102 in the first embodiment may include: S1021-S1023, as follows:
and S1021, determining tracking characteristic points according to the target tracking result of the i-1 frame video.
After obtaining the target tracking result of the i-1 frame video, the target tracking device can determine a plurality of tracking feature points according to the target tracking result of the i-1 frame video, and further determine a temporary tracking result according to the tracking feature points.
In the embodiment of the invention, the target tracking result of the i-1 frame video is integrated according to the temporary tracking result and the temporary detection result corresponding to the i-1 frame video, so that the target tracking result of the i-1 frame video has higher accuracy than the temporary tracking result obtained by processing the i-1 frame video by the target tracking device, and therefore, the characteristic point can be better described by determining the tracking characteristic point according to the target tracking result of the i-1 frame video, and the target tracking device can better update the target tracking object in the subsequent process.
Optionally, in the embodiment of the present invention, the target tracking device may determine the tracking feature point corresponding to the target tracking result of the i-1 th frame of video by using an optical flow tracking method, for example, a median optical flow tracker, or may determine the tracking feature point corresponding to the target tracking result of the i-1 th frame of video by using other methods capable of achieving the same purpose, which is not limited in the embodiment of the present invention.
And S1022, searching the position information of the tracking characteristic points in the ith frame of video.
After the target tracking device determines the tracking feature points corresponding to the target tracking result of the i-1 th frame of video, the target tracking device needs to find the tracking feature points in the i-th frame of video and determine the position information of the tracking feature points in the i-th frame of video.
It should be noted that, in the embodiment of the present invention, the position information of the tracking feature points refers to position information of all tracking feature points determined by the target tracking feature in the ith frame of video.
And S1023, determining a temporary tracking result by using the position information.
After the target tracking device obtains the position information of all the tracking feature points in the ith frame of video, the temporary tracking result can be determined by using the position information of the feature tracking points.
It should be noted that the position information of the tracking feature points refers to the position information of all the feature tracking points determined in the i-th frame video, and therefore, the target tracking apparatus can determine a minimum area including all the tracking feature points based on the position information of these feature points, and can use the image content in this area as the provisional tracking result.
In the embodiment of the invention, the target tracking device determines the tracking characteristic points according to the target tracking result of the i-1 th frame video and determines the position information of the tracking characteristic points in the i-th frame video so as to determine the temporary tracking result of the i-th frame video.
In the first embodiment, a region prediction algorithm is used to obtain a prediction region from the ith frame of video, and the prediction region is detected according to a detection classification model corresponding to the i-1 th frame of video to obtain a temporary detection result, that is, as shown in fig. 7, a specific implementation process of S103 in the first embodiment may include: S1031-S1036, as follows:
and S1031, determining a first area corresponding to the target tracking result of the i-1 frame video.
Since the target tracking device has already obtained the target tracking result of the i-1 th frame of video, the target tracking device can determine a square area by using the obtained target tracking result of the i-1 th frame of video and the height and width of the pixel area occupied in the i-1 th frame of video, and use the square area as the first area corresponding to the target tracking result of the i-1 th frame of video.
It should be noted that, because there is only one target tracking result of the i-1 th frame of video determined by the target tracking device, there is only one first area corresponding to the determined target tracking result of the i-1 th frame of video determined by the target tracking device according to the target tracking result of the i-1 th frame of video.
S1032, determining at least one moving object from the ith frame of video by using a region prediction algorithm, and determining at least one second region according to the at least one moving object.
After determining a first region corresponding to a target tracking result of an i-1 frame video, the target tracking device can separate a background image and a foreground target in the i frame video by using a region prediction algorithm, determine at least one moving object from the i frame video, and determine at least one second region corresponding to the at least one moving object.
Specifically, the target tracking device extracts an object in a motion state in the ith frame of video as a foreground target by using a region prediction algorithm, and uses the remaining image part as a background image.
It should be noted that, since there is a high possibility that more than one object is in motion in the ith frame of video, the target tracking apparatus can determine at least one moving object by using the region prediction algorithm.
After the target tracking device determines at least one moving object, the second area corresponding to each moving object can be determined according to the height and the width of the pixel area occupied by each moving object. Since the target tracking device is able to determine at least one moving object, the target tracking device is able to determine at least one second region while moving.
S1033, removing the second area which meets the preset area screening relation with the first area from the at least one second area, and determining the candidate area.
The target tracking device can judge the first area and the second area after determining the first area and the at least one second area, and when the target tracking device judges that the second area meeting the preset area screening relation with the first area exists, the target tracking device can reject the second area so as to determine the candidate area.
Specifically, referring to fig. 8, in the embodiment of the present invention, a second region that satisfies a preset region screening relationship with a first region is removed from at least one second region, and a candidate region is determined, which may be determined according to S1033a-S1033c, as follows:
and S1033a, acquiring a first size corresponding to the first area.
Since the first area is determined by the target tracking apparatus based on the target tracking result of the i-1 th frame video, the height and width of the pixel area occupied by the i-1 th frame video are obtained, the target tracking apparatus can obtain the height and width of the first area and record the obtained height and width as the first size.
Illustratively, the target tracking device determines the height and width of a pixel area occupied in the i-1 frame video according to the target tracking result of the i-1 frame video, and further determines a first area BB corresponding to the i-1 frame videolast_frameThen, the target tracking apparatus acquires BBlast_frameHigh H oflast_frameAnd BBlast_frameWidth W oflast_frameAnd is combined with Hlast_frameAnd Wlast_frameAs the first dimension.
S1033b, obtaining at least one second size corresponding to the at least one second area.
Since the target tracking device can determine at least one moving object in the ith frame of video and further determine at least one second region by using the region prediction algorithm, when the target tracking device obtains the second size corresponding to the second region, the target tracking device obtains the heights and widths corresponding to all the determined second regions, and therefore, the target tracking device can obtain at least one second size corresponding to at least one second region.
Illustratively, the target tracking device obtains N moving objects in the ith frame of video by using a region prediction algorithm, and determines N second regions, namely T, corresponding to the N moving objectsCandidate region={BB0,BB1,…,BBNWherein, BBiAn ith second region, BB, corresponding to an ith moving objectiIs highAnd width are each Hi、WiAt this time, at least one second size corresponding to the at least one second area obtained by the target tracking device is Hi、Wi
S1033c, removing second areas corresponding to second sizes, of which the first sizes meet the preset area screening relation, from the at least one second area, and determining the remaining second areas as candidate areas.
After obtaining the first size and the second size, the target tracking device needs to judge whether the first size and the second size meet a preset area screening relationship, and when the target tracking device judges that a second size meeting the preset area screening relationship with the first size exists in at least one second size, the target tracking device rejects the second area corresponding to the second size, so that the target tracking device finishes screening the at least one second area one by one, and determines the remaining second areas which do not meet the preset area screening relationship as candidate areas.
It should be noted that, when the target tracking apparatus rejects the second areas corresponding to the second sizes, of which the first sizes satisfy the preset area screening relationship, from the at least one second area, the obtained second sizes corresponding to all the second areas are compared with the first sizes one by one, and all the second areas corresponding to the second sizes, of which the first sizes satisfy the preset area screening relationship, are rejected, so as to achieve the purpose of screening out the candidate areas from the obtained at least one second area.
Optionally, the preset region screening relationship may be set by a developer according to an actual situation, as long as the purpose of screening the candidate region from the determined at least one second region can be achieved, and the embodiment of the present invention is not specifically limited herein.
Illustratively, the target tracking device is determining a first dimension Hlast_frameAnd Wlast_frameAnd at least one second dimension Hi、WiThen, the target tracking apparatus judges Hlast_frame、Wlast_frameAnd Hi、WiWhether any of the following formulas is satisfied, anWill satisfy the sub-formula Hi、WiThe corresponding second area is removed, and when the target tracking device meets all the sub-type Hi、WiDetermining the remaining second area as a candidate area after the corresponding second area is removed, wherein the candidate area determined by the target tracking device is the MTCandidate region={BB0,BB1,…,BBMAnd N is more than or equal to M.
Figure BDA0002164147130000181
At this point, the target tracking apparatus completes the process of determining the candidate region.
S1034, according to a preset region expansion rule, performing expansion operation on the candidate region to obtain a prediction region; the preset region expansion rule is used for representing that the width and the height of the candidate region are expanded proportionally.
After the target tracking device determines the candidate region, in order to facilitate image blocking operation in subsequent processing, the target tracking device also expands the candidate region according to a preset region expansion rule to obtain a prediction region, and then performs image blocking operation on the prediction region.
The target tracking apparatus performs the expansion operation on the candidate region, which is to expand the area of the pixel region occupied by the candidate region so that the area of the pixel region occupied by the prediction region is several times the area of the pixel region occupied by the candidate region.
Optionally, in the embodiment of the present invention, the candidate region may be expanded according to a preset region expansion rule that expands the width and the height of the candidate region by 1 time, respectively, to generate a prediction region having an area 4 times that of the candidate region.
Optionally, the preset region expansion rule may also be set according to an actual situation, and the embodiment of the present invention is not limited herein.
As an example, in the embodiment of the present invention, an effect of determining a candidate region by a target tracking device is shown, as shown in fig. 9, assuming that a target tracking result of an i-1 th frame of video obtained by the target tracking device is an automobile, and a first region (not shown) corresponding to the automobile in the i-1 th frame of video is determined, when the target tracking device obtains the i-th frame of video, that is, a video frame located at the leftmost side of fig. 9, the target tracking device extracts a moving object by using a ViBe algorithm, and obtains a region where the object is located, as shown in fig. 9, the target tracking device obtains an automobile 1 and a pedestrian 2 shown in a second left figure, and then the target tracking device takes the height and width of a pixel region where the automobile 1 is located as first sizes and the height and width of a pixel region where the pedestrian 2 is located as second sizes, and then the target tracking device compares the first sizes with the second sizes, and (3) eliminating a second area meeting the preset area screening condition, namely eliminating an area with obvious difference with the area of the target tracking result of the ith frame of video, wherein at the moment, as shown in the third image from the left in the figure 9, a pixel area where the pedestrian 2 is located and has larger difference with the pixel area where the automobile 1 is located is eliminated by a target tracking device, and then, as shown in the fourth image from the left in the figure 9, the target tracking device determines the pixel area where the automobile 1 is located in the remaining ith frame of video as a candidate area, and respectively expands the width and height of the candidate area by 1 time according to a preset area expansion rule to obtain a prediction area 3.
And S1035, dividing the prediction area into a plurality of image blocks according to a preset image block generation strategy.
After obtaining the predicted area, the target tracking apparatus may divide the predicted area into a plurality of image blocks according to a preset image block generation strategy, so as to detect a temporary detection result from the predicted area by using a detection classification model corresponding to the i-1 frame video frame.
The step of dividing the prediction area into a plurality of image blocks by the target tracking apparatus means that the prediction area is divided into a series of image blocks with different sizes, wherein each image block with different sizes is an image block obtained by gradually sliding the target tracking apparatus from the upper left of the prediction area according to the step size by using a sliding window. In the embodiment of the invention, the size of the sliding window can be reduced and enlarged on the basis of the size of the initial image block, and the step length of the horizontal displacement and the vertical position can be transformed on the basis of the size of the initial image block so as to generate a series of image blocks with different sizes.
Optionally, in the embodiment of the present invention, the size of the initial image block may be set to 20 × 20, or may be set according to actual requirements, and the embodiment of the present invention is not limited herein.
Specifically, the target tracking device may divide the prediction area into a plurality of image blocks according to the image block generation strategy in table 1:
TABLE 1
Transformation description for image blocks Step size coefficient
Shrinking 1.2
Amplification of 1.2
Horizontal displacement Initial image block width × 10%
Vertical displacement Initial image Block height × 10%
In table 1, a series of image blocks with different sizes may be generated according to the transform description, the step size coefficient, and the size of the initial image block, assuming that the size of the initial image block is 20 × 20, the target tracking device may generate a sliding window according to a rule of reducing the initial image block by 1.2 times or expanding the initial image block by 1.2 times, and move the sliding window according to a rule of horizontally shifting the initial image block by × 10% and vertically shifting the initial image block by × 10% to capture a series of image blocks with different sizes.
S1036, detecting the image blocks by using a detection classification model corresponding to the i-1 frame of video to obtain a temporary detection result.
After the target tracking device obtains a plurality of image blocks, the image blocks can be detected by using a detection classification model corresponding to the i-1 frame video to obtain a temporary detection result, so that the target tracking device integrates the temporary tracking result and the temporary detection result to obtain a final target classification result.
It should be noted that the detection classification model of the target tracking apparatus may include an image element variance classifier, an integrated classifier, and a nearest neighbor classifier, so that the target tracking apparatus may perform image element variance classification, integrated classification, and nearest neighbor classification on the generated plurality of image blocks, and determine a temporary detection result according to results of the three classifications.
For example, fig. 10 shows a process of classifying image blocks by using a detection classification model corresponding to an i-1 th frame of video by a target tracking device in the embodiment of the present invention, in fig. 10, the target tracking device reads in the i-th frame of video, that is, reads in an image 01 to be classified, then, the target tracking device obtains a prediction region of the i-th frame of video by using a ViBe region prediction 02, then, the target tracking device divides the prediction region into a plurality of image blocks, and sends the obtained plurality of image blocks into an image element variance classifier 03, an integrated classifier 04, and a nearest neighbor classifier 05 in the detection classification model to perform image element variance classification, integrated classification, and nearest neighbor classification, so as to obtain reliable image blocks 06, that is, temporary detection results.
In the embodiment of the invention, the target tracking device can remove the background image in the video image by using a region prediction algorithm to obtain the prediction region, and the prediction region is divided into a plurality of image blocks, so that the target tracking device only needs to detect and calculate the image blocks containing the foreground target through the detection module, and the real-time performance of target tracking is improved.
In some embodiments, in the first embodiment, the temporary tracking result and the temporary detection result are integrated to determine the target tracking result of the ith frame of video, that is, as shown in fig. 11, the specific implementation process of S104 in the first embodiment may include: S1041-S1044, as follows:
s1041, when the temporary tracking result exists and the temporary detection result exists, clustering the temporary detection result to obtain a clustering result, and weighting the clustering result and the temporary tracking result to obtain a target tracking result of the ith frame of video.
The target tracking device processes the ith frame of video through the tracking module, and then the temporary tracking result and the temporary tracking result are present, and the target tracking device processes the ith frame of video through the detection module, and then the temporary detection result and the temporary detection result are also present, so that when the target tracking device integrates the temporary tracking result and the temporary detection result, whether the temporary tracking result and the temporary detection result are present need to be comprehensively considered. When the target tracking device considers that a temporary tracking result exists and a temporary detection result exists, the temporary detection result obtained by the target tracking device passing through the detection module may not be unique, so that the target tracking device needs to cluster the temporary detection result to obtain a clustering result, and weight the clustering result and the temporary tracking result to obtain a final target tracking result of the ith frame of video.
It should be noted that, in the embodiment of the present invention, clustering the temporary detection results refers to clustering all temporary detection results, that is, image blocks with target tracking objects, detected by the target tracking device through the detection module, by using an unsupervised method, so that a clustering result capable of better describing the target tracking objects can be obtained.
Optionally, when clustering is performed on the temporary detection result, a specific unsupervised method may be set according to actual requirements, and the embodiment of the present invention is not limited herein.
It should be noted that, the target tracking apparatus weights the clustering result and the temporary tracking result, and performs coordinate and size weighted averaging on the clustering result and the temporary tracking result, so as to obtain a final target tracking result of the ith frame of video. Wherein, during weighting, the weight occupied by the temporary tracking result is larger.
Optionally, when the target tracking device weights the clustering result and the temporary tracking result, the weighted weight may be set according to actual conditions, and the embodiment of the present invention is not limited herein.
And S1042, when the temporary tracking result exists and the temporary detection result does not exist, taking the temporary tracking result as the target tracking result of the ith frame of video.
When the target tracking device judges that the temporary tracking result exists and the temporary detection result does not exist, the target tracking device indicates that the target tracking device cannot detect the target tracking object in the ith frame of video, and at the moment, the target tracking device directly determines the temporary tracking result as the final target tracking result of the ith frame of video.
And S1043, clustering the temporary detection result to obtain a clustering result when the temporary tracking result does not exist and the temporary detection result exists, and taking the clustering result as a target tracking result of the ith frame of video.
And when the target tracking device judges that no temporary tracking result exists and a temporary detection result exists, the target tracking device indicates that the target tracking device cannot obtain the target tracking object in the ith frame of video, at the moment, the target tracking device clusters the temporary detection result obtained by detection to obtain a clustering result, and the clustering result is directly determined as the final target tracking result of the ith frame of video.
And S1044, when the temporary tracking result does not exist and the temporary detection result does not exist, determining that the target tracking result of the ith frame of video is an invalid result.
When the target tracking device judges that no temporary tracking result exists or no temporary detection result exists, the target tracking device is indicated to be incapable of obtaining the target tracking object from the ith frame of video, at the moment, the target tracking device determines the target tracking result as an invalid result, and the target tracking fails.
In the embodiment of the invention, the target tracking device can integrate the temporary tracking result and the temporary detection result so as to obtain the final target tracking result of the ith frame of video, so that the target tracking device can comprehensively consider various conditions when determining the target tracking result, and the accuracy and the robustness of target tracking are ensured.
Example four
Based on the same inventive concept as the first embodiment, referring to fig. 2 and fig. 12, in the first embodiment, a region prediction algorithm is used to obtain a prediction region from the i-th frame of video, and the prediction region is detected according to a detection classification model corresponding to the i-1-th frame of video, after a temporary detection result is obtained, the temporary tracking result and the temporary detection result are integrated, and before determining a target tracking result of the i-th frame of video, that is, after S103 and before S104, the method may further include: s109, the following steps:
and S109, updating the detection classification model corresponding to the i-1 frame video according to the temporary tracking result and the temporary detection result to obtain the detection classification model corresponding to the i frame video.
After the target tracking device integrates the final target tracking result of the ith frame of video, the detection classification model corresponding to the (i-1) th frame of video can be updated according to the temporary tracking result and the temporary detection result obtained from the ith frame of video to obtain the detection classification model corresponding to the ith frame of video, so that the target tracking device can use the updated detection classification model for processing when the (i + 1) th frame of video is loaded.
Specifically, according to the temporary tracking result and the temporary detection result, updating the detection classification model corresponding to the i-1 th frame of video to obtain the detection classification model corresponding to the i-th frame of video, that is, as shown in fig. 13, the specific implementation process of S109 may include: S1091-S1093, as follows:
s1091, evaluating the temporary tracking result of the ith frame of video by using the temporary tracking result of the ith-1 frame of video.
After obtaining the temporary result of the ith frame of video, the target tracking device may use the temporary tracking result of the (i-1) th frame of video to evaluate whether the temporary tracking result of the ith frame of video is reliable.
Specifically, the target tracking object moves along a motion track in the video image, that is, the video image has a certain structural feature in a time domain, so that the distance between the target tracking object in the video images of two adjacent frames is not too far away, and therefore, the target tracking device can predict a position which may appear in the ith frame video of the target tracking object by using the temporary tracking result of the (i-1) th frame video, and evaluate whether the temporary tracking result of the ith frame video is reliable or not according to the predicted position. When the temporary tracking result of the ith frame of video is located at the predicted position, the target tracking device considers that the temporary tracking result of the ith frame of video is reliable, and when the temporary tracking result of the ith frame of video is far away from the predicted position, the target tracking device considers that the temporary tracking result of the ith frame of video is unreliable.
It should be noted that, when the target tracking device determines that the temporary tracking result of the ith frame of video does not exist, the target tracking device cannot evaluate the temporary tracking result of the ith frame of video, and at this time, the target tracking device directly uses the detection classification model corresponding to the i-1 th frame of video as the detection classification model corresponding to the ith frame of video, and ends the learning and updating.
S1092, when the temporary tracking result of the ith frame of video is reliable, correcting the temporary detection result of the ith frame of video according to the temporary tracking result of the ith frame of video to obtain a correction result.
When the target tracking device judges that the temporary tracking result of the ith frame of video is reliable, the temporary detection result of the ith frame of video can be corrected to obtain positive and negative samples of the correction result, so that the detection classification model corresponding to the (i-1) th frame of video can be updated subsequently.
Specifically, the target tracking device acquires a temporary detection result of the ith frame of video, judges whether an image block which is overlapped with a pixel area where the temporary tracking result of the ith frame of video is located exists in the temporary detection result, and when the overlapped image block exists, the target tracking device takes the residual image block which is not overlapped with the image area where the temporary tracking result is located as a negative sample of a correction result; when the overlapped image blocks do not exist, the target tracking device intercepts one image block according to the temporary tracking result and takes the image block as a positive sample of the correction result.
It should be noted that, when determining the temporary tracking result of the i-th frame of video, according to the position information of all the tracking feature points, a minimum area including all the tracking feature points is determined, and the image content in the area is used as the temporary tracking result, so that the target tracking device can definitely intercept an image block according to the temporary tracking result, and further use the image block as a positive sample of the correction result.
Optionally, in the embodiment of the present invention, the temporary detection result of the ith frame video may be corrected according to the temporary tracking result of the ith frame video by using PN learning, or may be corrected by using other algorithms that can achieve the same purpose, which is not limited herein.
S1093, updating the detection classification model corresponding to the i-1 frame of video by using the correction result to obtain the detection classification model corresponding to the i frame of video.
After the correction result is obtained, the target tracking device can train and update the detection classification model corresponding to the ith frame of video by using the positive and negative samples of the correction result to obtain the detection classification model corresponding to the ith frame of video.
It should be noted that, since the detection classification model includes the image element variance classifier, the integrated classifier and the nearest neighbor classifier, updating the detection classification model corresponding to the i-1 th frame of video by using the positive and negative samples in the correction result means that the image element variance classifier, the integrated classifier and the nearest neighbor classifier are trained and updated, and the training results of the three classifiers are collectively used as the detection classification model corresponding to the i-th frame of video.
The embodiment of the present invention provides an example of a process for updating a detection classification model, as shown in fig. 14, a target tracking apparatus first performs detection processing 02 and tracking processing 03 on a loaded i-th frame video, that is, an image 01 to be classified, in the tracking processing 03, the target tracking apparatus obtains a temporary tracking result of the i-th frame video by using a median optical flow tracker 08, in the detection processing 02, the target tracking apparatus extracts a background 04 by using a ViBe algorithm to obtain a prediction region, and divides the prediction region into a plurality of image blocks, processes the plurality of image blocks by using an image element variance classifier 05, an integration classifier 06, and a nearest neighbor classifier 07 in the detection classification model to obtain temporary detection results, and then, when the temporary tracking result exists, the target tracking apparatus corrects the temporary detection result by using a PN learning 11, and (3) utilizing the correction result to carry out learning processing 09 on the detection classification model corresponding to the i-1 frame video, and carrying out integration processing 10 on the temporary tracking result and the temporary detection result to obtain a target tracking result, wherein when the temporary tracking result does not exist, the target tracking device directly uses the detection classification model corresponding to the i-1 frame video as the detection classification model corresponding to the i-frame video, and uses the clustering result of the temporary detection result as the target tracking result of the i-frame video.
In the embodiment of the invention, the target tracking device can update and train the detection classification model corresponding to the ith frame of video by using the temporary tracking result and the temporary detection result of the ith frame of video to obtain the detection classification model corresponding to the ith frame of video, so that the target tracking device can be used for target tracking on the (i + 1) th frame of video, and thus, the target tracking device can update the detection classification model in time, the accuracy of detection processing is higher, and the accuracy of target tracking is further improved.
EXAMPLE five
Based on the same inventive concept as the first to fourth embodiments, as shown in fig. 15, an embodiment of the present invention provides a target tracking apparatus 1, which may include:
the acquisition module 10 is used for loading the ith frame of video, acquiring a target tracking result of the ith-1 frame of video and a detection classification model corresponding to the ith-1 frame of video; wherein i is not equal to 1; the system is used for continuously loading the (i + 1) th frame of video to track the target until the i is equal to N, and finishing target tracking; where N is the total number of frames of the video.
And the tracking module 11 is configured to perform tracking processing on the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result.
The detection module 12 is configured to obtain a prediction region from the i-th frame of video by using a region prediction algorithm, and detect the prediction region according to a detection classification model corresponding to the i-1 th frame of video to obtain a temporary detection result; and the prediction region is used for representing a region with a redundant part removed in the ith frame of video.
And an integrating module 13, configured to integrate the temporary tracking result and the temporary detection result, and determine a target tracking result of the i-th frame of video.
In some embodiments of the present invention, the obtaining module 10 may be further configured to load an initial video frame; wherein the initialization video frame is used to characterize the first frame of the video.
The tracking module 11 may be further configured to receive an initialization operation, and determine a target tracking result of the initial video frame according to the initialization operation.
The detection module 12 may be further configured to receive an initialization operation, and determine a detection classification model corresponding to the initial video frame according to the initialization operation.
In some embodiments of the present invention, the tracking module 11 is specifically configured to determine a tracking feature point according to a target tracking result of the i-1 th frame video; searching the position information of the tracking feature points in the ith frame of video; and determining the temporary tracking result by using the position information.
In some embodiments of the present invention, the detecting module 12 is specifically configured to determine a first area corresponding to a target tracking result of the i-1 th frame of video; determining at least one moving object from the ith frame of video by using the region prediction algorithm, and determining at least one second region according to the at least one moving object; removing second areas which meet a preset area screening relation with the first area from the at least one second area, and determining candidate areas; according to a preset region expansion rule, performing expansion operation on the candidate region to obtain the prediction region; the preset region expansion rule is used for representing that the width and the height of the candidate region are expanded according to a proportion; dividing the prediction area into a plurality of image blocks according to a preset image block generation strategy; and detecting the plurality of image blocks by using a detection classification model corresponding to the i-1 frame of video to obtain the temporary detection result.
In some embodiments of the present invention, the detecting module 12 is specifically configured to obtain a first size corresponding to the first area; acquiring at least one second size corresponding to the at least one second area; and removing second areas corresponding to second sizes, of which the first sizes meet the preset area screening relation, from the at least one second area, and determining the remaining second areas as candidate areas.
In some embodiments of the present invention, the integrating module 12 is specifically configured to cluster the temporary detection results to obtain a clustering result when the temporary tracking result exists and the temporary detection result exists, and weight the clustering result and the temporary tracking result to obtain a target tracking result of the i-th frame of video; when the temporary tracking result exists and the temporary detection result does not exist, taking the temporary tracking result as a target tracking result of the ith frame of video; when the temporary tracking result does not exist and the temporary detection result exists, clustering the temporary detection result to obtain a clustering result, and taking the clustering result as a target tracking result of the ith frame of video; and when the temporary tracking result does not exist and the temporary detection result does not exist, determining that the target tracking result of the ith frame of video is an invalid result.
In some embodiments of the present invention, as shown in fig. 16, the target tracking apparatus 1 further includes a display module 14, configured to display the target tracking result of the ith frame of video on a current display interface.
In some embodiments of the present invention, as shown in fig. 17, the target tracking apparatus 1 further includes a learning module 15, where the learning module is configured to update the detection classification model corresponding to the i-1 th frame of video according to the temporary tracking result and the temporary detection result, so as to obtain the detection classification model corresponding to the i-th frame of video.
In some embodiments of the present invention, the learning module 15 is specifically configured to evaluate the temporary tracking result of the i-1 th frame of video by using the temporary tracking result of the i-1 th frame of video; when the temporary tracking result of the ith frame of video is reliable, correcting the temporary detection result of the ith frame according to the temporary tracking result of the ith frame of video to obtain a correction result; and updating the detection classification model corresponding to the i-1 frame of video by using the correction result to obtain the detection classification model corresponding to the i frame of video.
EXAMPLE six
Based on the same inventive concept of the first to fourth embodiments, fig. 18 is a schematic structural diagram of a target tracking device according to an embodiment of the present invention, and as shown in fig. 18, the target tracking device according to the present invention may include a processor 01 and a memory 02 storing executable instructions of the processor 01. Wherein, the processor 01 is configured to execute the executable target tracking instructions stored in the memory to implement the method in any one or more of the first to fourth embodiments.
In an embodiment of the present invention, the Processor 01 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a CPU, a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic devices used to implement the processor functions described above may be other devices, and embodiments of the present invention are not limited in particular. The terminal further comprises a memory 02, which memory 02 may be connected to the processor 01, wherein the memory 02 may comprise a high speed RAM memory, and may further comprise a non-volatile memory, such as at least two disk memories.
In practical applications, the Memory 02 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 01.
In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
The embodiment of the invention provides a computer-readable storage medium, which stores an implementable program, which is applied to a target tracking device, and when the program is executed by a processor, the method in any one or more of the first to fourth embodiments is implemented.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (12)

1. A method of target tracking, the method comprising:
loading an ith frame of video, and acquiring a target tracking result of the ith-1 frame of video and a detection classification model corresponding to the ith-1 frame of video; wherein i is not equal to 1;
tracking the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result;
obtaining a prediction region from the ith frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the (i-1) th frame of video to obtain a temporary detection result; the prediction area is used for representing an area with a redundant part removed in the ith frame of video;
integrating the temporary tracking result and the temporary detection result to determine a target tracking result of the ith frame of video;
continuously loading the (i + 1) th frame of video for target tracking until the i is equal to N, and finishing target tracking; where N is the total number of frames of the video.
2. The method according to claim 1, wherein before the loading of the ith frame of video, the obtaining of the target tracking result of the ith-1 frame of video, and the detection classification model corresponding to the ith-1 frame of video, the method further comprises:
loading an initial video frame; wherein the initial video frame is used for representing a first frame of a video;
and receiving initialization operation, and determining a target tracking result of the initial video frame and a detection classification model corresponding to the initial video frame according to the initialization operation.
3. The method according to claim 1, wherein the tracking processing on the i-th frame video according to the target tracking result of the i-1 th frame video to obtain a temporary tracking result comprises:
determining tracking characteristic points according to a target tracking result of the i-1 frame video;
searching the position information of the tracking feature points in the ith frame of video;
and determining the temporary tracking result by using the position information.
4. The method according to claim 1, wherein the obtaining a prediction region from the i-th frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the i-1 th frame of video to obtain a temporary detection result comprises:
determining a first area corresponding to a target tracking result of the i-1 frame video;
determining at least one moving object from the ith frame of video by using the region prediction algorithm, and determining at least one second region according to the at least one moving object;
removing second areas which meet a preset area screening relation with the first area from the at least one second area, and determining candidate areas;
according to a preset region expansion rule, performing expansion operation on the candidate region to obtain the prediction region; the preset region expansion rule is used for representing that the width and the height of the candidate region are expanded according to a proportion;
dividing the prediction area into a plurality of image blocks according to a preset image block generation strategy;
and detecting the plurality of image blocks by using a detection classification model corresponding to the i-1 frame of video to obtain the temporary detection result.
5. The method according to claim 4, wherein the determining a candidate region by removing, from the at least one second region, a second region that satisfies a predetermined region screening relationship with the first region comprises:
acquiring a first size corresponding to the first area;
acquiring at least one second size corresponding to the at least one second area;
and removing second areas corresponding to second sizes, of which the first sizes meet the preset area screening relation, from the at least one second area, and determining the remaining second areas as candidate areas.
6. The method according to claim 1, wherein said integrating the temporary tracking result and the temporary detection result to determine the target tracking result of the ith frame of video comprises:
when the temporary tracking result exists and the temporary detection result exists, clustering the temporary detection result to obtain a clustering result, and weighting the clustering result and the temporary tracking result to obtain a target tracking result of the ith frame of video;
when the temporary tracking result exists and the temporary detection result does not exist, taking the temporary tracking result as a target tracking result of the ith frame of video;
when the temporary tracking result does not exist and the temporary detection result exists, clustering the temporary detection result to obtain a clustering result, and taking the clustering result as a target tracking result of the ith frame of video;
and when the temporary tracking result does not exist and the temporary detection result does not exist, determining that the target tracking result of the ith frame of video is an invalid result.
7. The method according to claim 1, wherein after integrating the temporary tracking result and the temporary detection result to determine a target tracking result of the ith frame of video, the method further comprises:
and displaying the target tracking result of the ith frame of video on a current display interface.
8. The method according to claim 1, wherein after obtaining a prediction region from the i-th frame of video by using a region prediction algorithm, detecting the prediction region according to a detection classification model corresponding to the i-1 th frame of video, and obtaining a temporary detection result, and before integrating the temporary tracking result and the temporary detection result to determine a target tracking result of the i-th frame of video, the method further comprises:
and updating the detection classification model corresponding to the i-1 frame video according to the temporary tracking result and the temporary detection result to obtain the detection classification model corresponding to the i frame video.
9. The method according to claim 8, wherein the updating the detection classification model corresponding to the i-1 th frame of video according to the temporary tracking result and the temporary detection result to obtain the detection classification model corresponding to the i-th frame of video comprises:
evaluating the temporary tracking result of the ith frame of video by using the temporary tracking result of the ith-1 frame of video;
when the temporary tracking result of the ith frame of video is reliable, correcting the temporary detection result of the ith frame according to the temporary tracking result of the ith frame of video to obtain a correction result;
and updating the detection classification model corresponding to the i-1 frame of video by using the correction result to obtain the detection classification model corresponding to the i frame of video.
10. An object tracking apparatus, characterized in that the apparatus comprises:
the acquisition module is used for loading the ith frame of video, acquiring a target tracking result of the (i-1) th frame of video and a detection classification model corresponding to the (i-1) th frame of video; wherein i is not equal to 1; the system is used for continuously loading the (i + 1) th frame of video to track the target until the i is equal to N, and finishing target tracking; wherein N is the total frame number of the video;
the tracking module is used for tracking the ith frame of video according to the target tracking result of the ith-1 frame of video to obtain a temporary tracking result;
the detection module is used for obtaining a prediction region from the ith frame of video by using a region prediction algorithm, and detecting the prediction region according to a detection classification model corresponding to the (i-1) th frame of video to obtain a temporary detection result; the prediction area is used for representing an area with a redundant part removed in the ith frame of video;
and the integration module is used for integrating the temporary tracking result and the temporary detection result and determining the target tracking result of the ith frame of video.
11. An object tracking apparatus, characterized in that the apparatus comprises: a memory and a processor;
the memory is used for storing executable target tracking instructions;
the processor, configured to execute the executable target tracking instructions stored in the memory, to implement the method of claims 1-9.
12. A computer-readable storage medium having stored thereon executable object tracking instructions for causing a processor to perform the method of any one of claims 1-9 when executed.
CN201910741611.4A 2019-08-12 2019-08-12 Target tracking method and device and computer readable storage medium Pending CN112396627A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910741611.4A CN112396627A (en) 2019-08-12 2019-08-12 Target tracking method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910741611.4A CN112396627A (en) 2019-08-12 2019-08-12 Target tracking method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112396627A true CN112396627A (en) 2021-02-23

Family

ID=74602405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910741611.4A Pending CN112396627A (en) 2019-08-12 2019-08-12 Target tracking method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112396627A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883915A (en) * 2023-09-06 2023-10-13 常州星宇车灯股份有限公司 Target detection method and system based on front and rear frame image association

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
成正国 等: "基于TLD的动态背景下视觉跟踪技术研究", 《电视技术》 *
杜奎: "基于TLD的目标跟踪算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883915A (en) * 2023-09-06 2023-10-13 常州星宇车灯股份有限公司 Target detection method and system based on front and rear frame image association
CN116883915B (en) * 2023-09-06 2023-11-21 常州星宇车灯股份有限公司 Target detection method and system based on front and rear frame image association

Similar Documents

Publication Publication Date Title
CN106570453B (en) Method, device and system for pedestrian detection
EP3410351B1 (en) Learning program, learning method, and object detection device
CN105631418B (en) People counting method and device
CN110765860B (en) Tumble judging method, tumble judging device, computer equipment and storage medium
US9142011B2 (en) Shadow detection method and device
CN110189256B (en) Panoramic image stitching method, computer readable storage medium and panoramic camera
US9619753B2 (en) Data analysis system and method
JP2006338313A (en) Similar image retrieving method, similar image retrieving system, similar image retrieving program, and recording medium
KR102655789B1 (en) Face detecting method and apparatus
CN112348116B (en) Target detection method and device using space context and computer equipment
CN111461145A (en) Method for detecting target based on convolutional neural network
CN110807767A (en) Target image screening method and target image screening device
WO2018121414A1 (en) Electronic device, and target image recognition method and apparatus
CN113762220B (en) Object recognition method, electronic device, and computer-readable storage medium
JP5192437B2 (en) Object region detection apparatus, object region detection method, and object region detection program
JP7014005B2 (en) Image processing equipment and methods, electronic devices
JP2011186780A (en) Information processing apparatus, information processing method, and program
US11250269B2 (en) Recognition method and apparatus for false detection of an abandoned object and image processing device
CN112396627A (en) Target tracking method and device and computer readable storage medium
KR20180082680A (en) Method for learning classifier and prediction classification apparatus using the same
US10026181B2 (en) Method and apparatus for detecting object
CN113033593A (en) Text detection training method and device based on deep learning
JP7396115B2 (en) Template image update program, template image update method, and template image update device
KR20190052785A (en) Method and apparatus for detecting object, and computer program for executing the method
TW201939354A (en) License plate recognition methods and systems thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210223