CN110929560B - Video semi-automatic target labeling method integrating target detection and tracking - Google Patents

Video semi-automatic target labeling method integrating target detection and tracking Download PDF

Info

Publication number
CN110929560B
CN110929560B CN201910963482.3A CN201910963482A CN110929560B CN 110929560 B CN110929560 B CN 110929560B CN 201910963482 A CN201910963482 A CN 201910963482A CN 110929560 B CN110929560 B CN 110929560B
Authority
CN
China
Prior art keywords
target
frame
value
tracking
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910963482.3A
Other languages
Chinese (zh)
Other versions
CN110929560A (en
Inventor
徐英
谷雨
刘俊
彭冬亮
陈庆林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910963482.3A priority Critical patent/CN110929560B/en
Publication of CN110929560A publication Critical patent/CN110929560A/en
Application granted granted Critical
Publication of CN110929560B publication Critical patent/CN110929560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a video semi-automatic target marking method integrating target detection and tracking. In the subsequent frames, fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image; the invention judges whether the target marking is finished according to the target tracking algorithm. If the target marking is finished, extracting the video key frames according to the size of the significant value of each frame of target to obtain a target marking result, and otherwise, continuously estimating the position of the target in the video image; the method for extracting the video key frame based on the target significance enables the key frame to reflect the diversity of target changes. The invention adopts multi-lens multi-ship video to carry out experimental test, and verifies the effectiveness of the method provided by the invention.

Description

Video semi-automatic target labeling method integrating target detection and tracking
Technical Field
The invention belongs to the field of video data marking, and relates to a video target marking method which integrates target detection and target tracking and extracts video key frames according to target significance.
Background
In recent years, the deep learning technology is rapidly developed, and the target detection and target tracking field is promoted to continuously realize new breakthroughs. Because the deep learning technology needs the support of big data, obtaining a large amount of accurate label training data with sample diversity is the key for obtaining excellent performance by the deep learning technology.
At present, two methods of manual marking and automatic marking are mainly used for acquiring training data. The manual marking adopts a manual mode to mark the target position and the label in a single image, a large number of continuous image frames exist in the video, the manual marking efficiency is low, and the automatic marking becomes possible due to the fact that the target in the video has the characteristic of space-time continuity. In the prior art, only a target tracking algorithm based on relevant filtering is used for video target labeling, and the accuracy of a labeling result cannot meet the requirement of being used as training data. And only the target detection algorithm is used for marking the video target, the detector marks all targets which accord with the target type in the subsequent frame according to the type of the target of the initial frame, and whether the targets are the same as the initial frame cannot be judged, or the detector fails to detect due to factors such as jitter and blurring of the target and the like to cause inconsistent marking of the video target. The invention integrates the detection algorithm and the tracking algorithm, combines the advantages of the two algorithms, can improve the accuracy of automatic labeling, can determine the same target by utilizing the space-time continuity of the tracking algorithm, solves the problem of detection omission of a detector, can automatically judge that the target disappears and improves the labeling efficiency.
The invention provides a video semi-automatic labeling method, which comprises the steps of manually labeling a target position in an initial frame, automatically labeling the target position in a subsequent frame, and finally automatically extracting a plurality of key frames to obtain a labeling result. The main problems to be solved include: (1) How to improve the accuracy and the consistency of video target labeling is the first problem to be solved. (2) In order to reduce the manual participation and improve the labeling efficiency, the target disappearance and the labeling end need to be automatically judged. (3) The extracted key frames can reflect the diversity of changes of the size, the angle, the illumination and the like of the target scale.
Aiming at the condition that the current independent target detection algorithm or target tracking algorithm cannot meet the automatic labeling requirement of the video target, the invention fuses target detection and target tracking through reasonable rules, thereby greatly improving the efficiency and accuracy of video target labeling; in addition, a method for extracting video key frames based on target significance is provided, so that the extracted key frames can accurately reflect the diversity of target changes.
Disclosure of Invention
The invention provides a video semi-automatic target marking method integrating target detection and tracking, aiming at solving the technical problems that the existing automatic marking means is low in precision and continuity or low in manual marking speed.
Firstly, a certain frame is selected as an initial frame in a video image, the initial position of a target is marked manually, and a category label of the target is determined. In subsequent frames, fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image, and judging whether target labeling is finished according to the target tracking algorithm. And if the target marking is finished, extracting the video key frame according to the significant value of each frame of target to obtain a target marking result, otherwise, continuously estimating the position of the target in the video image. The method disclosed by the invention integrates a target detection algorithm and a target tracking algorithm to accurately label the video target, automatically judges the end of target labeling, and extracts a video key frame according to the target significance to obtain a target labeling result.
The technical scheme adopted by the invention comprises the following steps:
1. the video semi-automatic target labeling method integrating target detection and tracking is characterized by comprising the following steps of:
selecting a certain frame as an initial frame in a certain shot of a video, manually marking the initial position and size of a target, and determining a category label of the target;
step (2), adopting automatic labeling for other subsequent frames after the initial frame, specifically fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image; the method comprises the following steps:
2.1 detecting the target in each frame of image by adopting YOLO V3 and marking a detection frame;
the YOLO V3 is used for adjusting the target image with the label to a fixed size as a training sample and training YOLO-V3; wherein, the YOLO layer is increased to 4 layers, and four different receptive field characteristic diagrams with different scales of 13 multiplied by 13, 26 multiplied by 26, 52 multiplied by 52 and 104 multiplied by 104 are obtained through multi-scale characteristic fusion; using three prior boxes of (116 x 90), (156 x 198) and (373 x 326) to predict the 13 x13 feature map, detecting a larger object; predicting the 26 × 26 feature map by using three prior boxes of (30 x 61), (62 x 45) and (59 x 119), and detecting a medium-sized object; using three prior boxes of (10 x 13), (16 x 30) and (33 x 23) to predict a 52 x 52 feature map, detecting smaller objects; using newly added (5 x 6), (8 x 15) and (16 x 10) three prior boxes to predict a 104 x 104 feature map and detect a smaller target;
2.2, acquiring a tracking frame of the target by adopting a KCF related filtering tracking algorithm;
firstly, HOG characteristics are extracted according to the target position and size of the previous frame, then the HOG characteristics are converted into a frequency domain through Fourier transform, the obtained frequency domain characteristics are mapped to a high dimension through a Gaussian kernel function, and a filtering template alpha is obtained according to the formula (1):
Figure GDA0003744668640000031
wherein x represents the HOG characteristic of the sample, ^ represents Fourier transform, g is a two-dimensional Gaussian function with the center as the peak value, and λ is a regularization parameter used for controlling overfitting of training; k is a radical of xx The kernel autocorrelation matrix of x in the high-dimensional space is represented, and the calculation mode is given by the formula (2):
Figure GDA0003744668640000032
wherein σ is the width parameter of the Gaussian kernel function, controls the radial extent of the function, { character }, indicates complex conjugation, { character }, indicates point multiplication,
Figure GDA0003744668640000033
representing the inverse fourier transform, c is the number of channels of the HOG feature x;
in order to adapt to the change of the target appearance, the filter needs to be updated online; when target tracking is performed on the t-th frame image, the update of the correlation filter α is given by:
Figure GDA0003744668640000034
wherein η is an update parameter;
to adapt to the scale change of the object, the filter α of the current frame t Scaling is needed so as to predict the size of the next frame target; wherein the scaling ratios are [1.1,1.05,1,0.95,0.9, respectively];
Extracting a candidate sample HOG characteristic z at a t frame target position on a t +1 frame image; in conjunction with each of the above-mentioned size-scaled filters, each corresponding filtered output response plot f is shown in equation (4):
Figure GDA0003744668640000035
wherein m = (1, 2,3,4, 5), corresponding to scaled ratios [1.1,1.05,1,0.95,0.9], respectively; x represents the HOG characteristic of the t frame target;
the maximum value f is selected from the maximum values max (f) of the 5 response maps f max ,f max The corresponding position is the position of the target center, f max The corresponding scaling is the target size, and a tracking frame of the t +1 th frame is obtained;
2.3 fusing the results of target detection and target tracking to determine the labeled target frame;
firstly, judging whether each frame of image contains a detection frame or not, and if not, taking the target frame as a tracking frame; if yes, continuously judging whether the detection frame is only one, if yes, calculating the IOU values of the tracking frame and the detection frame, if the IOU value is larger than a threshold value, taking the target frame as the detection frame, initializing a KCF tracking algorithm by using the detection frame, and if not, taking the target frame as the tracking frame; if the number of the detection boxes is multiple, the IOU value of the tracking box and the IOU value of each detection box need to be calculated, the maximum IOU value is further screened out, if the maximum IOU value is larger than a threshold value, the target box is the detection box corresponding to the maximum IOU value, the KCF tracking algorithm is initialized by using the detection box, and if not, the target box is the tracking box;
the IOU value is used for evaluating the coincidence degree of the tracking frame and each detection frame under the current frame, and the formula is as follows:
Figure GDA0003744668640000041
wherein S I Indicates the overlapping area of the tracking frame and each detection frame under the same frame, S U Representing the area of the set part of the tracking frame and each detection frame under the same frame, wherein the area of the set part is the sum of the areas of the tracking frame and the detection frame minus the overlapping area;
step (3), judging whether the target marking is finished or not according to a target tracking algorithm;
according to a response graph f of the KCF correlation filtering tracker, judging whether max (f) is smaller than a set threshold value theta and the peak side lobe ratio PSR is smaller than the set threshold value theta PSR When the method is as follows:
max(f)<θandPSR<θ PSR (7)
if yes, judging that the target marking is finished, and turning to the step (4) to select the key frame; otherwise, turning to the step (2), and continuing to estimate the position of the target in the next frame image;
the PSR is calculated as follows:
Figure GDA0003744668640000042
where max (f) is the peak of the correlation filter response plot f, Φ =0.5, μ Φ (f) And σ Φ (f) Mean and standard deviation of 50% response area centered on the f peak, respectively;
step (4), calculating a significant value of each frame of target in the current shot; extracting a set number of video key frames according to the significant value of each frame of target to obtain a target labeling result; the method comprises the following steps:
4.1 local binary pattern LBP extracts the texture feature of the image, the basic idea is to define in the neighborhood of pixel 3x3, regard neighborhood center pixel as the threshold, the gray value of adjacent 8 pixels compares with it, if the gray value of a certain surrounding pixel is greater than the center pixel value, the position of the surrounding pixel is marked as 1, otherwise is 0; comparing 8 points in 3-by-3 neighborhood to generate 8-bit binary number, converting the binary number into decimal number to obtain LBP value of the central pixel, and reflecting the LBP information of the region by using the value; the specific calculation formula is shown as (8):
Figure GDA0003744668640000051
wherein (x) 0 ,y 0 ) Is the coordinate of the central pixel, p is the p-th pixel of the neighborhood, j p Is the gray value of the neighborhood pixel, j 0 The gray value of the central pixel; s (x) is a sign function:
Figure GDA0003744668640000052
4.2 the calculation formula of the color saliency characteristic map is as follows:
Figure GDA0003744668640000053
wherein the patch is an original image of the target frame area gaussian The method is characterized in that the image is the image of patch after Gaussian filtering processing with 5 multiplied by 5 Gaussian kernel and 0 standard deviation, | | represents absolute value, i represents channel number, and (x, y) is pixel coordinate;
4.3 obtaining edge saliency characteristic map for pixel points of target edge region in each frame of image target frame
In the target edge area in the target frame, pixel values can jump, derivatives are obtained for the pixel values, and the first derivative of the derivatives is an extreme value at the edge position, namely the edge is at the extreme value, which is the principle used by the Sobel operator; if the second derivative is calculated for the pixel value, the derivative value at the edge is 0; the Laplace function realization method comprises the steps of firstly calculating second-order x and y derivatives by using a Sobel operator, then summing to obtain an edge significance characteristic diagram, wherein the calculation formula is as follows:
Figure GDA0003744668640000054
wherein I represents an image in the target frame, and (x, y) represents pixel coordinates of a target edge region in the target frame;
4.4, carrying out average weighted fusion on LBP texture features, color saliency features, edge saliency features and other features to obtain a fusion value mean, wherein a fusion calculation formula is as follows:
Figure GDA0003744668640000055
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003744668640000056
respectively representing the values of pixel points (x, y) in an LBP texture characteristic graph, a color saliency characteristic graph and an edge saliency characteristic graph in the t frame;
4.5 the color histogram variation value Dist is obtained by calculating the babbitt distance between the color histogram of the selected target area of the initial frame and the target area of the t-th frame, and the calculation formula is as follows:
Figure GDA0003744668640000061
wherein H 0 Manually labeling a selected target frame color histogram for an initial frame, H t Automatically labeling the color histogram of the target frame for the t-th frame,
Figure GDA0003744668640000062
is H 0 The value obtained by the operation of the formula (14),
Figure GDA0003744668640000063
is H t A value obtained by the operation of expression (14), n represents the total number of color histogram bins,
Figure GDA0003744668640000064
is given by:
Figure GDA0003744668640000065
Wherein k =0 or t;
4.6 the scale change value is obtained by calculating the width and height change of the initial frame target frame and the t frame target frame, and the calculation formula is as follows:
Figure GDA0003744668640000066
wherein
Figure GDA0003744668640000067
For the width and height of the target box of the initial frame,
Figure GDA0003744668640000068
and
Figure GDA00037446686400000610
the width and height of the target frame of the t frame;
4.7 according to the fusion value, the color histogram change value and the scale change value of the image target frame region, the calculation formula of the target significant value of the t-th frame is as follows:
Figure GDA0003744668640000069
wherein T represents the total number of frames of the video;
4.8 saliency S of each frame object in video t Constructing a significant value line graph, and solving all peak values and corresponding frames;
assuming that the video has T frames, setting the number of extracted key frames as a; b significant value peaks are obtained, if a is less than b, the peak values are sorted in a descending order, frames corresponding to the first a peak values are extracted as key frames, if b is less than a < T, frames corresponding to all the peak values are extracted, and the rest a-b key frames adopt a random and unrepeated extraction mode; if a is larger than T, all video frames are used as key frames;
and (5) returning to the step (1) to label the target of the next video shot.
Compared with the prior art, the invention has the following remarkable advantages: (1) The invention creatively fuses the target detection algorithm and the target tracking algorithm, thereby improving the accuracy of target positioning and the continuity of target state estimation in the video image; (2) Only the target initial position is marked manually in the initial frame, and the marking is automatically judged to be finished in the marking process, so that the times of artificial participation are reduced; (3) And fusing the LBP texture features, the color saliency features and the edge saliency features of the target region, and calculating the target saliency by combining the color histogram change and the scale change, so that the extracted key frame can reflect the diversity of the target change.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of fused target detection and target tracking;
FIG. 3 is a flow chart of target saliency calculation;
FIG. 4 is a detection result of a 2 nd frame image in an example video;
FIG. 5 is a tracking result of a 2 nd frame image in an example video;
FIG. 6 is a fusion detection and tracking result for a 2 nd frame of image in an example video;
FIG. 7 is a KCF response plot peak change curve for the 2 nd lens of an example video;
FIG. 8 is a PSR plot of KCF response for the 2 nd shot of an example video;
FIG. 9 is a 243 frame image of an example video shot 2;
FIG. 10 is a 1 st frame image of an example video shot 3;
FIG. 11 is a target saliency curve for an example video shot 6;
fig. 12 is a key frame extracted for the 6 th shot of an example video.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in FIG. 1, the method comprises the following steps:
selecting a certain frame in a video image as an initial frame, manually marking the initial position of a target, and determining the category label of the target.
And (2) fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image in a subsequent frame. The YOLO V3 detection algorithm and the KCF related filtering tracking algorithm are adopted, and a fusion method is shown as a figure 2 and specifically comprises the following steps:
2.1 the detector of the invention adopts a higher-speed YOLO V3 detection algorithm in the current mainstream detection network, which meets the requirements of real-time and accuracy in the video annotation technology, including a Feature extraction network Darknet-53 and a prediction network, the Darknet53 network adopts ResNet short cut connection, and avoids gradient disappearance, in the prediction stage, the algorithm uses a method for extracting an region of interest based on an anchor in an RPN network, and uses Feature maps of 3 scales in an FPN (Feature pyramid network), a small Feature map provides semantic information, a large Feature map has finer granularity information, and a small Feature map fuses through upsampling and large scale, so as to achieve a better detection effect.
The invention carries out the following improvement and optimization on the basis of the original model:
firstly, initializing training parameters by adopting a darknet53.Conv.74 pre-training model in a feature extraction part, then increasing a YOLO layer of an original model to 4 layers, obtaining four different receptive field feature maps of 13 x13, 26 x 26, 52 x 52 and 104 x 104 in different scales through multi-scale feature fusion, and then predicting the feature map of 13 x13 by using three prior frames of (116 x 90), (156 x 198) and (373 x 326) to detect a larger object; using (30 x 61), (62 x 45), (59 x 119) to predict the 26 x 26 feature map, detecting objects of medium size; the 52 × 52 feature map is predicted using (10 x 13), (16 x 30), (33 x 23) to detect smaller objects, and the 104 × 104 feature map is predicted using the newly added (5 x 6), (8 x 15), (16 x 10) three prior blocks to detect smaller objects. Compared with the original model, the improved detection network integrates the characteristics of lower layers, so that the detection rate of the small target is improved.
In each detection operation, inputting a t +1 th frame image, firstly resize to a fixed scale, and finally obtaining a detection frame containing an object category and a score value as a detection result of the t +1 th frame through a feature extraction network and a prediction network.
2.2KCF related filtering tracking algorithm firstly extracts HOG characteristics according to the target position and size of the t frame, then transfers the HOG characteristics to a frequency domain through Fourier transformation, maps the obtained frequency domain characteristics to a high dimension through a Gaussian kernel function, and obtains a filtering template alpha according to the formula (1):
Figure GDA0003744668640000081
where x represents the HOG characteristics of the sample, a represents the Fourier transform, g is a two-dimensional Gaussian function centered at the peak, and λ is a regularization parameter used to control the overfitting of the training. k is a radical of xx The kernel autocorrelation matrix of x in the high-dimensional space is represented, and the calculation mode is given by the formula (2):
Figure GDA0003744668640000082
wherein σ is a width parameter of the Gaussian kernel function, controlling the radial extent of the function,. Indicates a complex conjugate,. Indicates a dot product,
Figure GDA0003744668640000083
denotes the inverse fourier transform and c is the number of channels of the HOG feature x.
In order to accommodate the target appearance change, the filter needs to be updated online. When target tracking is performed on the t-th frame image, the update of the correlation filter α is given by:
Figure GDA0003744668640000091
where m = (1, 2,3,4, 5), corresponding to scaled ratios [1.1,1.05,1,0.95,0.9], respectively; x represents the HOG characteristic of the t frame target;
the maximum value f is selected from the maximum values max (f) of the 5 response maps f max ,f max The corresponding position is the position of the target center, f max The corresponding scaling is the size of the target, and a tracking frame of the t +1 th frame is obtained;
and 2.3, fusing the results of target detection and target tracking to determine the labeled target frame.
Firstly, judging whether each frame of image contains a detection frame or not, and if not, taking the target frame as a tracking frame; if yes, continuously judging whether the detection frame is only one, if yes, calculating the IOU values of the tracking frame and the detection frame, if the IOU value is larger than a threshold value, taking the target frame as the detection frame, initializing a KCF tracking algorithm by using the detection frame, and if not, taking the target frame as the tracking frame; if the number of the detection frames is multiple, the IOU value of the tracking frame and each detection frame needs to be calculated, the maximum IOU value is further screened out, if the maximum IOU value is larger than a threshold value, the target frame is the detection frame corresponding to the maximum IOU value, a KCF tracking algorithm is initialized by using the detection frame, and if not, the target frame is the tracking frame;
the IOU value is used for evaluating the coincidence degree of the tracking frame and each detection frame under the current frame, and the formula is as follows:
Figure GDA0003744668640000092
wherein S I Indicates the overlapping area of the tracking frame and each detection frame under the same frame, S U And the area of the collection part of the tracking frame and each detection frame under the same frame is shown. The area of the set part is the total area of the tracking frame and the detection frame minus the overlapping area;
and (3) the peak value of the response image f of the KCF correlation filtering tracker represents the confidence that the corresponding position is the target, and the higher the peak value is, the higher the probability that the position is the target is. The PSR measures the peak intensity of the relevant filtering output, and the higher the PSR value is, the higher the reliability of the tracking result is. If the peak value and the PSR are lower than the set threshold value, the target is possibly disappeared, and therefore the video target marking is judged to be finished. The PSR is calculated as follows:
Figure GDA0003744668640000093
where max (f) is the peak of the correlation filter response plot f, Φ =0.5, μ Φ (f) And σ Φ (f) Mean and standard deviation, respectively, of a 50% response region centered at the f-peak. If max (f) is less than the set threshold θ and PSR is less than the set threshold θ PSR When, namely:
max(f)<θandPSR<θ PSR (7) And (4) judging that the target marking is finished, and turning to the step (4) to select the key frame. Otherwise, turning to the step (2), and continuing to estimate the position of the target in the next frame image.
And (4) calculating a significant value of each frame of target, as shown in fig. 3, in the labeling process, acquiring a target region by using the target frame obtained in the step (2), then performing LBP texture feature, color significant feature and edge significant feature fusion on the target region, and calculating the significant value of the target by combining color histogram change and scale change. The method comprises the following specific steps:
4.1LBP extracts the texture feature of the target region, the basic idea is to define in the neighborhood of pixel 3x3, regard neighborhood center pixel as the threshold, the gray value of adjacent 8 pixels is compared with it, the gray value of a certain peripheral pixel is greater than the center pixel value, the position of this peripheral pixel is marked as 1, otherwise is 0. 8 points in the 3-by-3 neighborhood can generate 8-bit binary numbers through comparison, the binary numbers are converted into decimal numbers to obtain the LBP value of the central pixel, and the LBP information of the area is reflected by the value. The specific calculation formula is shown as (8):
Figure GDA0003744668640000101
wherein (x) 0 ,y 0 ) Is the coordinate of the central pixel, p is the p-th pixel of the neighborhood, j p Is the gray value of the neighborhood pixel, j 0 Is the gray value of the neighborhood pixel. s (x) is a sign function:
Figure GDA0003744668640000102
4.2 the calculation formula of the color saliency map is as follows:
Figure GDA0003744668640000103
wherein the patch is a target area image gaussian In the image after patch is subjected to gaussian filtering with a gaussian kernel of 5 × 5 and a standard deviation of 0, | | represents an absolute value, i represents the number of channels in the picture, and (x, y) are horizontal and vertical coordinates.
4.3 in the edge area of the target area image, the pixel value will generate "jump", and the derivative of these pixel values is taken, and its first derivative is extreme value at the edge position, which is the principle used by Sobel operator-the extreme value is the edge. If the second derivative is taken over the pixel value, the derivative value at the edge is 0. The method for realizing the Laplace function is that first Sobel operators are used for calculating second-order x and y derivatives, then edge significance characteristic graphs are obtained through summation, and the calculation formula is as follows:
Figure GDA0003744668640000104
wherein I represents an image, (x, y) represents pixel coordinates of an edge region of the object in the object frame;
4.4, carrying out average weighted fusion on the LBP texture characteristics, the color saliency characteristics, the edge saliency characteristics and other characteristics to obtain a fusion value mean, wherein a fusion calculation formula is as follows:
Figure GDA0003744668640000111
wherein the content of the first and second substances,
Figure GDA0003744668640000112
and respectively representing the values of pixel points (x, y) in an LBP texture characteristic graph, a color saliency characteristic graph and an edge saliency characteristic graph in the t-th frame.
4.5 color histogram of the target area image represents the distribution of color components in the image, showing different types of colors and the number of pixels in each color. The color histogram change value Dist is obtained by calculating the babbit distance between the color histogram of the initial frame selected target area and the color histogram of the t frame target area, the greater the Dist value is, the lower the similarity is, the more obvious the target change is, and the calculation formula is as follows:
Figure GDA0003744668640000113
wherein H 0 Selecting a target region color histogram for the initial frame, H t Is the color histogram of the target area of the t-th frame,
Figure GDA0003744668640000114
is H 0 The value obtained by the operation of the formula (14),
Figure GDA0003744668640000115
is H t A value obtained by the operation of expression (14), n represents the total number of color histogram bins,
Figure GDA0003744668640000116
the calculation formula of (a) is given by:
Figure GDA0003744668640000117
where k =0 or t.
4.6 the scale change value is obtained by calculating the width and height change of the initial frame target frame and the t frame target frame, and the calculation formula is as follows:
Figure GDA0003744668640000118
wherein
Figure GDA0003744668640000119
For the width and height of the target box of the initial frame,
Figure GDA00037446686400001110
and
Figure GDA00037446686400001112
the width and height of the target box of the t-th frame.
4.7 by the above calculation, the calculation formula of the target significant value of the t-th frame is as follows:
Figure GDA00037446686400001111
where T represents the total number of video frames for a shot.
And 4.8 drawing a salient value line graph according to the salient value of each frame of target in the scene shot to obtain all peak values and corresponding frames. Supposing that the shot has T video frames, the number of key frames to be extracted is a, the number of peak values is b, if a is less than b, the sizes of the peak values are sorted in a descending order, the frames corresponding to the first a peak values are extracted as key frames, if b is less than a, the frames corresponding to all the peak values are extracted, and the rest a-b key frames adopt a random and unrepeated extraction mode; if a > T, all video frames are used as key frames.
And (5) returning to the step (1) to mark the target of the next lens.
In order to verify the effectiveness of the method provided by the invention, a section of multi-lens multi-ship video is adopted for experimental testing. The video has 9 scene shots of multiple ships, the frame number of each scene shot is shown in table 1, and for accelerated calculation, the experiment is labeled once every 5 frames.
TABLE 1 video shot and frame number
Figure GDA0003744668640000121
In the stage of target detection, a single-stage target detection algorithm YOLO V3 is first trained on a large number of labeled samples with ship label information and position information to obtain a detection model, and then the model is used as a detector. In consideration of the fact that the original algorithm has low capability of detecting small targets, a small-scale anchor is added on the original basis, the defect of low detection precision is improved, the detection capability of targets with various scales is improved on the premise of ensuring the detection speed, and accurate real-time detection is realized. In the target tracking stage, parameter setting lambda =1 × 10 of the KCF tracking algorithm -4 σ =0.5, η =0.02. Considering that the original algorithm cannot adapt to the change of the target scale, scale judgment is added to the KCF tracking algorithm, and the improved KCF tracking algorithm is used as a tracker.
In the stage of fusing the detection result and the tracking result, the IOU threshold is set to be 0.5. If the IOU value of the tracking frame and each detection frame is less than 0.5, the detector does not detect the target to be marked, and the target frame of the target is the tracking frame. If the IOU values of the tracking frame and one or more detection frames are greater than 0.5, which indicates that the detector detects the target to be labeled, the target frame of the target is the detection frame corresponding to the maximum IOU value. For example, after the 1 st frame of the video shot 1 is manually marked with the target, the detection result and the tracking result of the 2 nd frame are shown in fig. 4 and 5. As can be seen from the figure, there are multiple targets in the detection result of the detector, and the tracking result of the tracker has only one target. By calculating the IOU values of the tracking frame and each detection frame, only one detection frame and the IOU value of the tracking frame are greater than the threshold value of 0.5, the result of fusing and outputting the target frame is shown in fig. 6, and the result of fusing and outputting the detection frame is output.
When judging whether the target marking is finished or not, setting a peak value threshold theta of a KCF tracker to be 0.3 PSR 3.5, when the peak sum PSR is less than the threshold, the labeling ends. For example, when the target disappears during the process of labeling the 2 nd shot target in the video, the KCF tracking algorithm responds to the peak value sum P of the graphThe SR becomes small as shown in fig. 7 and 8. In frames 0-47 under a scene shot, the peak value and PSR value of a response image of a KCF tracking algorithm are large, and the peak value and PSR value in a 48 th frame are small, which indicates that the target of the frame disappears, and actually 243 frames corresponding to the scene shot are marked once every 5 frames, and the scene shot of the next 243 frames is switched. Wherein the 243 st frame image of the shot 2 and the 1 st frame image of the shot 3 are as shown in fig. 9 and 10. In the figure, it can be seen that the target disappears when the video is switched from shot 2 to shot 3, which indicates that the method judges that the annotation is finished accurately.
When the tracker judges that the video shot target labeling is finished, a video shot target significant value curve is obtained according to the target significant value of each frame, key frames are extracted at the local maximum value of the curve, and in the experiment, 10 frames are extracted from each shot to serve as the key frames. For example, the target saliency curve for shot 6 is shown in fig. 11. First, the local maxima are arranged from large to small, then the frames corresponding to the first 10 local maxima are taken as key frames, and the extracted key frames are shown in fig. 12 (a-j). As can be seen from the figure, the extracted key frame has strong representativeness, and the diversity of changes such as the size, the angle and the like of the target size can be accurately reflected.
The results of this experiment are shown in table 2,
TABLE 2 Key Frames for each shot
Lens barrel Key frame
1 5,10,25,30,40,50,55,65,75,80
2 90,110,125,135,145,160,180,195,205,215
3 325,340,365,380,400,420,430,445,460,480
4 1099,1109,1119,1139,1149,1159,1169,1179,1329,1369
5 1424,1519,1559,1594,1604,1624,1634,1674,1754,1764
6 1779,1854,1869,1994,2054,2064,2089,2114,2144,2154
7 2194,2199,2214,2229,2249,2269,2279,2289,2294,2314
8 2349,2359,2379,2399,2414,2424,2444,2459,2474,2539
9 2974,3094,3164,3179,3189,3199,3214,3229,3259,3274
It can be seen from the table that the extraction ranges of the key frames are all in the corresponding shots, which further proves that the method can distinguish different shots and automatically judge the end of the target marking. The method adopts the local maximum value of the target significant value as the extraction basis of the key frame, so that the extracted key frame is representative. According to the experimental result, the video target labeling method based on the fusion target detection algorithm and the target tracking algorithm obtains higher accuracy.

Claims (1)

1. The video semi-automatic target labeling method integrating target detection and tracking is characterized by comprising the following steps of:
selecting a certain frame as an initial frame in a certain shot of a video, manually marking the initial position and size of a target, and determining a category label of the target;
step (2), adopting automatic labeling for other subsequent frames after the initial frame, specifically fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image; the method comprises the following steps:
2.1 detecting the target in each frame of image by adopting YOLOV3 and marking a detection frame;
the YOLOV3 is to adjust the target image with the label to a fixed scale size as a training sample, and train yolo-v 3; wherein, the YOLO layer is increased to 4 layers, and four different receptive field characteristic maps with different scales of 13 multiplied by 13, 26 multiplied by 26, 52 multiplied by 52 and 104 multiplied by 104 are obtained through multi-scale characteristic fusion; using three prior boxes of (116 x 90), (156 x 198) and (373 x 326) to predict the 13 x13 feature map, detecting a larger object; predicting the 26 × 26 feature map by using three prior boxes of (30 x 61), (62 x 45) and (59 x 119), and detecting a medium-sized object; using three prior boxes of (10 x 13), (16 x 30) and (33 x 23) to predict a 52 x 52 feature map, detecting smaller objects; the 104 x 104 feature map is predicted by using three new and added prior boxes of (5 x 6), (8 x 15) and (16 x 10), and a smaller target is detected;
2.2 acquiring a tracking frame of the target by adopting a KCF related filtering tracking algorithm;
firstly, HOG characteristics are extracted according to the target position and size of the previous frame, then the HOG characteristics are converted into a frequency domain through Fourier transform, the obtained frequency domain characteristics are mapped to a high dimension through a Gaussian kernel function, and a filtering template alpha is obtained according to the formula (1):
Figure FDA0003744668630000011
wherein x represents the HOG characteristic of the sample, ^ represents Fourier transform, g is a two-dimensional Gaussian function with the center as the peak value, and λ is a regularization parameter used for controlling overfitting of training; k is a radical of formula xx The kernel autocorrelation matrix of x in the high-dimensional space is represented, and the calculation mode is given by the formula (2):
Figure FDA0003744668630000012
wherein σ is the width parameter of the Gaussian kernel function, controls the radial extent of the function, { character }, indicates complex conjugation, { character }, indicates point multiplication,
Figure FDA0003744668630000013
representing the inverse fourier transform, c is the number of channels of the HOG feature x;
in order to adapt to the change of the target appearance, the filter needs to be updated online; when target tracking is performed on the t-th frame image, the update of the correlation filter α is given by:
Figure FDA0003744668630000021
wherein eta is an updating parameter;
to adapt to the scale change of the object, the filter alpha of the current frame t Scaling is needed so as to predict the size of the next frame of target; wherein the scaling ratios are [1.1,1.05,1,0.95,0.9, respectively];
Extracting a candidate sample HOG characteristic z at the t frame target position on the t +1 frame image; in conjunction with each of the above-mentioned size-scaled filters, each corresponding correlated filtered output response plot f is shown in equation (4):
Figure FDA0003744668630000022
Figure FDA0003744668630000023
wherein m = (1, 2,3,4, 5), corresponding to scaled ratios [1.1,1.05,1,0.95,0.9], respectively; x represents the HOG characteristic of the t frame target;
the maximum value f is selected from the 5 response diagrams f maximum values max (f) max ,f max The corresponding position is the position of the target center, f max The corresponding scaling is the target size, and a tracking frame of the t +1 th frame is obtained;
2.3 fusing the results of target detection and target tracking to determine the labeled target frame;
firstly, judging whether each frame of image contains a detection frame or not, and if not, taking the target frame as a tracking frame; if yes, continuously judging whether the number of the detection boxes is only one, if yes, calculating IOU values of the tracking boxes and the detection boxes, if the IOU value is larger than a threshold value, taking the target box as the detection box, initializing a KCF tracking algorithm by using the detection box, and if not, taking the target box as the tracking box; if the number of the detection frames is multiple, the IOU value of the tracking frame and each detection frame needs to be calculated, the maximum IOU value is further screened out, if the maximum IOU value is larger than a threshold value, the target frame is the detection frame corresponding to the maximum IOU value, a KCF tracking algorithm is initialized by using the detection frame, and if not, the target frame is the tracking frame;
the IOU value is used for evaluating the coincidence degree of the tracking frame and each detection frame under the current frame, and the formula is as follows:
Figure FDA0003744668630000024
wherein S I Showing the overlapping area of the tracking frame and each detection frame in the same frame, S U Representing the area of the set part of the tracking frame and each detection frame under the same frame, wherein the area of the set part is the sum of the areas of the tracking frame and the detection frame minus the overlapping area;
step (3), judging whether the target marking is finished according to a target tracking algorithm;
judging whether max (f) is less than or not according to a response graph f of a KCF related filtering trackerA set threshold value theta and a peak-to-side lobe ratio PSR smaller than the set threshold value theta PSR When, namely:
max(f)<θandPSR<θ PSR (7)
if yes, judging that the target marking is finished, and turning to the step (4) to select the key frame; otherwise, turning to the step (2), and continuing to estimate the position of the target in the next frame image;
the PSR is calculated as follows:
Figure FDA0003744668630000031
where max (f) is the peak of the correlation filter response plot f, Φ =0.5, μ Φ (f) And σ Φ (f) Mean and standard deviation of 50% response area centered on the f peak, respectively;
step (4), calculating a significant value of each frame of target in the current shot; extracting a set number of video key frames according to the significant value of each frame of target to obtain a target labeling result; the method comprises the following steps:
4.1 local binary pattern LBP extracts the texture characteristics of the image, the basic idea is to define in the neighborhood of pixel 3x3, regard neighborhood center pixel as the threshold, the gray value of adjacent 8 pixels is compared with it, if the gray value of a certain surrounding pixel is greater than the center pixel value, the position of the surrounding pixel is marked as 1, otherwise is 0; comparing 8 points in 3-by-3 neighborhood to generate 8-bit binary number, converting the binary number into decimal number to obtain LBP value of the central pixel, and reflecting the LBP information of the region by using the value; the specific calculation formula is shown as (8):
Figure FDA0003744668630000032
wherein (x) 0 ,y 0 ) Is the coordinate of the central pixel, p is the p-th pixel of the neighborhood, j p Is the gray value of the neighborhood pixel, j 0 The gray value of the central pixel; s (x) is a sign function:
Figure FDA0003744668630000033
4.2 the calculation formula of the color saliency characteristic map is as follows:
Figure FDA0003744668630000034
wherein the patch is an original image of the target frame area gaussian The method is characterized in that the image is the image of patch after Gaussian filtering processing with 5 multiplied by 5 Gaussian kernel and 0 standard deviation, | | represents absolute value, i represents channel number, and (x, y) is pixel coordinate;
4.3 obtaining edge saliency characteristic map for pixel points of target edge region in each frame of image target frame
In the target edge area in the target frame, pixel values can jump, derivatives are obtained for the pixel values, and the first derivative of the derivatives is an extreme value at the edge position, namely the edge is at the extreme value, which is the principle used by the Sobel operator; if the second derivative is calculated for the pixel value, the derivative value at the edge is 0; the method for realizing the Laplace function is that first Sobel operators are used for calculating second-order x and y derivatives, then edge significance characteristic graphs are obtained through summation, and the calculation formula is as follows:
Figure FDA0003744668630000041
wherein I represents an image in the target frame, and (x, y) represents pixel coordinates of a target edge region in the target frame;
4.4, carrying out average weighted fusion on LBP texture features, color saliency features, edge saliency features and other features to obtain a fusion value mean, wherein a fusion calculation formula is as follows:
Figure FDA0003744668630000042
wherein the content of the first and second substances,
Figure FDA0003744668630000043
respectively representing the values of pixel points (x, y) in an LBP texture characteristic graph, a color saliency characteristic graph and an edge saliency characteristic graph in the t frame;
4.5 the color histogram variation value Dist is obtained by calculating the babbitt distance between the color histogram of the selected target area of the initial frame and the target area of the t-th frame, and the calculation formula is as follows:
Figure FDA0003744668630000044
wherein H 0 Manually labeling a selected target frame color histogram for an initial frame, H t Automatically labeling the color histogram of the target frame for the t-th frame,
Figure FDA0003744668630000045
is H 0 The value obtained by the operation of the formula (14),
Figure FDA0003744668630000046
is H t A value obtained by the operation of expression (14), n represents the total number of color histogram bins,
Figure FDA0003744668630000047
the formula of (c) is given by:
Figure FDA0003744668630000048
wherein k =0 or t;
4.6 the scale change value is obtained by calculating the width and height change of the initial frame target frame and the t frame target frame, and the calculation formula is as follows:
Figure FDA0003744668630000049
wherein
Figure FDA00037446686300000410
For the width and height of the target box of the initial frame,
Figure FDA00037446686300000411
and
Figure FDA00037446686300000412
the width and height of the target frame of the t frame;
4.7 according to the fusion value, the color histogram change value and the scale change value of the image target frame region, the calculation formula of the target significant value of the t-th frame is as follows:
Figure FDA00037446686300000413
wherein T represents the total frame number of the video;
4.8 saliency value S of each frame of object in video t Constructing a significant value line graph, and solving all peak values and corresponding frames;
assuming that the video has T frames, setting the number of extracted key frames as a; b significant value peaks are obtained, if a is less than b, the peak values are sorted in a descending order, frames corresponding to the first a peak values are extracted as key frames, if b is less than a < T, frames corresponding to all the peak values are extracted, and the rest a-b key frames adopt a random and unrepeated extraction mode; if a is larger than T, all video frames are used as key frames;
and (5) returning to the step (1) to label the target of the next video shot.
CN201910963482.3A 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking Active CN110929560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910963482.3A CN110929560B (en) 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910963482.3A CN110929560B (en) 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking

Publications (2)

Publication Number Publication Date
CN110929560A CN110929560A (en) 2020-03-27
CN110929560B true CN110929560B (en) 2022-10-14

Family

ID=69848801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910963482.3A Active CN110929560B (en) 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking

Country Status (1)

Country Link
CN (1) CN110929560B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768668B (en) * 2020-03-31 2022-09-02 杭州海康威视数字技术股份有限公司 Experimental operation scoring method, device, equipment and storage medium
CN111415370A (en) * 2020-04-13 2020-07-14 中山大学 Embedded infrared complex scene target real-time tracking method and system
CN111626990B (en) * 2020-05-06 2023-05-23 北京字节跳动网络技术有限公司 Target detection frame processing method and device and electronic equipment
CN111652080B (en) * 2020-05-12 2023-10-17 合肥的卢深视科技有限公司 Target tracking method and device based on RGB-D image
CN111709971A (en) * 2020-05-29 2020-09-25 西安理工大学 Semi-automatic video labeling method based on multi-target tracking
WO2021237678A1 (en) * 2020-05-29 2021-12-02 深圳市大疆创新科技有限公司 Target tracking method and device
CN113761981B (en) * 2020-06-05 2023-07-11 北京四维图新科技股份有限公司 Automatic driving visual perception method, device and storage medium
CN111681260A (en) * 2020-06-15 2020-09-18 深延科技(北京)有限公司 Multi-target tracking method and tracking system for aerial images of unmanned aerial vehicle
CN111754545B (en) * 2020-06-16 2024-05-03 江南大学 IOU (input-output unit) matching-based double-filter video multi-target tracking method
CN111882582B (en) * 2020-07-24 2021-10-08 广州云从博衍智能科技有限公司 Image tracking correlation method, system, device and medium
CN112132855B (en) * 2020-09-22 2022-05-20 山东工商学院 Target tracking method of self-adaptive Gaussian function based on foreground segmentation guide
CN112164097B (en) * 2020-10-20 2024-03-29 南京莱斯网信技术研究院有限公司 Ship video detection sample collection method
CN112257612B (en) * 2020-10-23 2023-06-02 华侨大学 Unmanned aerial vehicle video frame filtering method and device based on edge intelligence
CN112395957A (en) * 2020-10-28 2021-02-23 连云港杰瑞电子有限公司 Online learning method for video target detection
CN112308082B (en) * 2020-11-05 2023-04-07 湖南科技大学 Dynamic video image segmentation method based on dual-channel convolution kernel and multi-frame feature fusion
CN112070071B (en) * 2020-11-11 2021-03-26 腾讯科技(深圳)有限公司 Method and device for labeling objects in video, computer equipment and storage medium
CN112509148A (en) * 2020-12-04 2021-03-16 全球能源互联网研究院有限公司 Interaction method and device based on multi-feature recognition and computer equipment
CN112489089B (en) * 2020-12-15 2022-06-07 中国人民解放军国防科技大学 Airborne ground moving target identification and tracking method for micro fixed wing unmanned aerial vehicle
CN113095239A (en) * 2021-04-15 2021-07-09 深圳市英威诺科技有限公司 Key frame extraction method, terminal and computer readable storage medium
CN113112519B (en) * 2021-04-23 2023-04-18 电子科技大学 Key frame screening method based on interested target distribution
CN113034551A (en) * 2021-05-31 2021-06-25 南昌虚拟现实研究院股份有限公司 Target tracking and labeling method and device, readable storage medium and computer equipment
CN113705643B (en) * 2021-08-17 2022-10-28 荣耀终端有限公司 Target detection method and device and electronic equipment
CN114697702B (en) * 2022-03-23 2024-01-30 咪咕文化科技有限公司 Audio and video marking method, device, equipment and storage medium
CN114972418B (en) * 2022-03-30 2023-11-21 北京航空航天大学 Maneuvering multi-target tracking method based on combination of kernel adaptive filtering and YOLOX detection
CN115082862A (en) * 2022-07-07 2022-09-20 南京杰迈视讯科技有限公司 High-precision pedestrian flow statistical method based on monocular camera
CN115018885B (en) * 2022-08-05 2022-11-11 四川迪晟新达类脑智能技术有限公司 Multi-scale target tracking algorithm suitable for edge equipment
CN115424207B (en) * 2022-09-05 2023-04-14 南京星云软件科技有限公司 Self-adaptive monitoring system and method
CN116109975B (en) * 2023-02-08 2023-10-20 广州宝立科技有限公司 Power grid safety operation monitoring image processing method and intelligent video monitoring system
CN116912289B (en) * 2023-08-09 2024-01-30 北京航空航天大学 Weak and small target layering visual tracking method oriented to edge intelligence
CN117671801B (en) * 2024-02-02 2024-04-23 中科方寸知微(南京)科技有限公司 Real-time target detection method and system based on binary reduction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A COMPARATIVE STUDY OF TEXTURE MEASURES WITH CLASSIFICATION BASED ON FEATURE DISTRIBUTIONS;TIMO OJALA等;《PatternRecoonition》;10050515;第29卷(第1期);第1-9页 *

Also Published As

Publication number Publication date
CN110929560A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110929560B (en) Video semi-automatic target labeling method integrating target detection and tracking
CN111223088B (en) Casting surface defect identification method based on deep convolutional neural network
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN111062973B (en) Vehicle tracking method based on target feature sensitivity and deep learning
US11922615B2 (en) Information processing device, information processing method, and storage medium
CN107610114B (en) optical satellite remote sensing image cloud and snow fog detection method based on support vector machine
CN102426649B (en) Simple steel seal digital automatic identification method with high accuracy rate
CN111640157B (en) Checkerboard corner detection method based on neural network and application thereof
CN111310558A (en) Pavement disease intelligent extraction method based on deep learning and image processing method
CN113139521B (en) Pedestrian boundary crossing monitoring method for electric power monitoring
CN114677554A (en) Statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort
CN112734761B (en) Industrial product image boundary contour extraction method
CN107944354B (en) Vehicle detection method based on deep learning
CN113052170B (en) Small target license plate recognition method under unconstrained scene
CN108319961B (en) Image ROI rapid detection method based on local feature points
CN114155527A (en) Scene text recognition method and device
Zhang et al. Automatic detection of road traffic signs from natural scene images based on pixel vector and central projected shape feature
CN110689003A (en) Low-illumination imaging license plate recognition method and system, computer equipment and storage medium
Wang et al. Unstructured road detection using hybrid features
Gou et al. Pavement crack detection based on the improved faster-rcnn
CN110458019B (en) Water surface target detection method for eliminating reflection interference under scarce cognitive sample condition
CN110097524B (en) SAR image target detection method based on fusion convolutional neural network
CN111754525A (en) Industrial character detection process based on non-precise segmentation
CN111369570A (en) Multi-target detection tracking method for video image
CN113111878B (en) Infrared weak and small target detection method under complex background

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant