CN114882523A - Task target detection method and system based on fragmented video information - Google Patents
Task target detection method and system based on fragmented video information Download PDFInfo
- Publication number
- CN114882523A CN114882523A CN202210375278.1A CN202210375278A CN114882523A CN 114882523 A CN114882523 A CN 114882523A CN 202210375278 A CN202210375278 A CN 202210375278A CN 114882523 A CN114882523 A CN 114882523A
- Authority
- CN
- China
- Prior art keywords
- frame
- video
- video frame
- output
- effective
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 105
- 230000003287 optical effect Effects 0.000 claims abstract description 96
- 238000000605 extraction Methods 0.000 claims abstract description 77
- 238000004364 calculation method Methods 0.000 claims abstract description 31
- 238000013528 artificial neural network Methods 0.000 claims description 44
- 230000006870 function Effects 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 20
- 230000002776 aggregation Effects 0.000 claims description 19
- 238000004220 aggregation Methods 0.000 claims description 19
- 238000010586 diagram Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 11
- 238000000034 method Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 7
- 230000008602 contraction Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a task target detection method and system based on fragmented video information, which construct a target person detection model based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module to complete the detection of a preset target person.
Description
Technical Field
The invention relates to a task target detection method based on fragmented video information, and also relates to a system for realizing the task target detection method based on fragmented video information.
Background
Deep learning networks have made significant advances in object detection, and in recent years, excellent image-based object detection algorithms have moved directly to video object detection. Video object detection is more challenging than still image object detection. Due to the fact that a video detection scene is usually complex, for example, the detected person is not matched, and the obtained video information is discontinuous, extracted image information is incomplete, such as motion blur, defocusing, rare postures and the like, and the detection accuracy is reduced to a great extent.
The existing feature aggregation-based method is to compensate for the misalignment between frames by aggregating features of multiple adjacent frames, and a key issue is whether these frames should be treated equally. There are two approaches to solve this problem, one is to treat each frame equally and give them the same weight, and the other is to learn the weights using a light network during training, both of which lack special considerations for fuzzy impact.
In the invention, a human target detection method based on fragmented video information is provided. The method comprises the steps of improving weight distribution of an aggregation frame by fuzzy prior, particularly, introducing a fuzzy mapping network to mark each pixel as fuzzy or non-fuzzy, focusing a target by adopting a saliency detection network because the fuzzy degree of the target is only concerned and the background is not considered, calibrating by using a saliency map to obtain a calibration fuzzy map taking the target fuzzy degree as a focus, and calculating the weight of each frame of image; the method has better performance than the current most advanced video target detection algorithm on the premise of increasing the calculated amount.
Disclosure of Invention
The purpose of the invention is as follows: the task target detection method and system based on fragmented video information are provided, the weight distribution of an aggregation frame is improved through a fuzzy prior method, the weight of each frame of image is calculated instead of giving the same weight to each frame, and the accuracy and reliability of person detection are effectively improved.
In order to realize the functions, the task object detection method based on the fragmented video information comprises the steps of executing the steps S1-S7 according to a preset period to obtain an object person detection model, and then applying the object person detection model to finish the detection of an object person;
s1, collecting videos containing walking of target characters in real time, converting the videos containing the walking of the target characters into video frame sequences arranged according to time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequences to serve as effective frame sequences, and constructing an effective frame sequence extraction module by taking the video frame sequences as input and the effective frame sequences as output;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
As a preferred technical scheme of the invention: step S3 is a specific step of taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs constituted by two video frames spaced by a preset number of frames in the effective frame sequence as motion information of a target person based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output, and constructing an optical flow information module as follows:
s31: defining a tth frame video frame I in a sequence of active frames t Video frame I of the t-t frame in the active frame sequence as a reference frame t-τ T + T frame video frame I t+τ For support frames, reference frame I t Support frame I t-τ Support frame I t+τ Inputting an optical flow neural network;
s32: optical flowThe neural network comprises a convolution layer, an expansion layer and a reference frame I t Support frame I t-τ Support frame I t+τ Obtaining a reference frame I through a contraction part formed by the optical flow neural network convolution layer t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s33: reference frame I t Support frame I t-τ Support frame I t+τ Respectively obtaining a reference frame I with the size enlarged to the size of the original image by the corresponding feature graphs through an enlarging layer of the optical flow neural network t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s34: based on the reference frame I obtained in step S33 t Support frame I t-τ Support frame I t+τ Respectively carrying out optical flow prediction by the corresponding characteristic graphs and respectively using reference frames I t Support frame I t-τ As a video frame pair, and reference frame I t Support frame I t+τ Obtaining reference frames I as a video frame pair t Support frame I t-τ Optical flow parameters M between respectively corresponding feature maps t-τ→t And a reference frame I t Support frame I t+τ Optical flow parameters M between respectively corresponding feature maps t+τ→t The following formula:
M t-τ→t =FlowNet(I t-τ ,I t )
M t+τ→t =FlowNet(I t+τ ,I t )
in the formula, M t-τ→t As a reference frame I t Support frame I t-τ Optical flow parameters between the corresponding feature maps, t- τ → t, representing the reference frame I t Support frame I t-τ Corresponding relation of (1), M t+τ→t As a reference frame I t Support frame I t+τ Optical flow parameters between the corresponding feature maps, t + τ → t, representing the reference frame I t Support frame I t+τ The FlowNet represents the optical flow neural network computation.
As a preferred technical scheme of the invention: step S4 is implemented by taking as input each depth convolution feature map output by the depth convolution feature map extraction module and optical flow parameters of each set of video frame pairs in the effective frame sequence output by the optical flow information module, and based on a bilinear warping function, taking as output the deformation features of each set of video frame pairs, and constructing a deformation feature module according to the following formula:
f t-τ→t =W(f t-τ ,M t-τ→t )
f t+τ→t =W(f t+τ ,M t+τ→t )
in the formula (f) t-τ→t As a reference frame I t Support frame I t-τ Deformation characteristic of f t+τ→t As a reference frame I t Support frame I t+τ W represents a bilinear warp function calculation, f t-τ Support frame I output by deep convolution characteristic image extraction module t-τ Corresponding characteristic diagram, f t+τ Support frame I output by deep convolution characteristic image extraction module t-τ The corresponding characteristic diagram.
As a preferred technical scheme of the invention: step S5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining the weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; the specific steps of constructing the weight coefficient calculation module by taking the weight coefficient of each video frame in the effective frame sequence as output are as follows:
s51, respectively inputting each video frame in the effective frame sequence into a fuzzy mapping network and a significance detection network to obtain fuzzy characteristics and significance characteristics corresponding to each video frame;
s52, obtaining the corrected fuzzy mapping M corresponding to each video frame by dot multiplication of the fuzzy characteristics and the saliency characteristics corresponding to each video frame obtained in the step S51 blur-sali ;
S53 mapping the corrected blur corresponding to each video frame based on the step function with the threshold value of 0.5M blur-sali And carrying out binarization, wherein the step function is as follows:
where M is the corrected blur map M corresponding to each video frame blur-sali U (M) is the corrected blur map M corresponding to each video frame after binarization processing blur-sali A value of (d);
and S54, respectively adding all u (m) obtained in the step S53 aiming at each video frame to obtain a fuzziness parameter Vcb of each video frame, and carrying out standardization processing on the fuzziness parameter Vcb of each video frame, wherein the standardization processing method is as follows:
in the formula, Vcb i Representing the blurriness parameter, VcbNorm, of video frame i i Representing the ambiguity parameter of a video frame i subjected to standardization processing, wherein the value of i is { t-tau, t, t + tau };
s55, the fuzziness parameter VcbNorm of each video frame which is obtained in the step S54 and is subjected to standardization processing i Inputting the data into a softmax classification network to obtain a support frame I t-τ Reference frame I t Support frame I t+τ The weight coefficients ω corresponding to the respective t-τ 、ω t 、ω t+τ 。
As a preferred technical scheme of the invention: step S6 is implemented by taking the deformation features of each group of video frame pairs output by the deformation feature module, the depth convolution feature maps corresponding to each video frame output by the depth convolution feature map extraction module, and the weight coefficients of each video frame output by the weight coefficient calculation module as inputs to obtain the aggregation features corresponding to the video frame groups formed by the video frame pairs, and constructing the detection network module by using the preset information of the target person as an output through the detection neural network, which specifically includes the following steps:
s61: module based on shape characteristicsDeformation characteristic f of output t-τ→t 、f t+τ→t Reference frame I output by deep convolution characteristic image extraction module t Corresponding characteristic diagram f t And the weight coefficient omega corresponding to each deformation characteristic output by the weight coefficient calculation module t-τ 、ω t 、ω t+τ Obtaining a support frame I according to the following formula t-τ Reference frame I t Support frame I t+τ Aggregation characteristics J of the composed video frame group;
J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
s62: and inputting the aggregation characteristics into a detection neural network to obtain preset information of the target person.
The invention also designs a task target detection system based on fragmented video information, which comprises the following steps:
one or more processors;
a memory storing instructions operable, when executed by the one or more processors, to cause the one or more processors to perform operations to obtain a target person detection model and then apply the target person detection model to accomplish detection of a preset target person:
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
The invention also relates to a computer readable medium storing software, which is characterized in that the readable medium includes instructions executable by one or more computers, and the instructions, when executed by the one or more computers, perform the operations of the fragmented video information-based task object detection method.
Has the advantages that: compared with the prior art, the invention has the advantages that:
1. the invention introduces the optical flow neural network to calculate the fluency between any two frames, not only focuses on the characteristics of a certain frame, but also focuses on the relationship between the upper frame and the lower frame.
2. The invention provides a new video target detection algorithm, which mainly researches the influence of blurring on video target detection, and the frame with clear object appearance contributes more to the result than the frame with blurred object appearance.
3. The human target detection method based on the fragmented video information is beneficial to detecting human objects when the video is discontinuous, and improves the precision of video target detection.
Drawings
Fig. 1 is a flowchart of a task object detection method based on fragmented video information according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a network framework for task object detection based on fragmented video information according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Referring to fig. 1 and 2, in the task target detection method based on fragmented video information according to the embodiment of the present invention, steps S1 to S7 are executed according to a preset period to obtain a target person detection model, and then the target person detection model is applied to complete detection of a preset target person;
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
in one embodiment, the deep convolutional neural network is Restnet-101.
S3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
since the resolution of the optical flow parameters output by the optical flow neural network does not match the resolution of the deep convolution feature map, the optical flow parameters need to be sized to match the feature map.
In step S3, the specific steps of taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each set of video frame pairs as motion information of a target person for each set of video frame pairs composed of two video frames spaced by a preset number of frames in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each set of video frame pairs in the effective frame sequence as output, and constructing an optical flow information module are as follows:
s31: defining a tth frame video frame I in a sequence of active frames t Video frame I of the t-t frame in the active frame sequence as a reference frame t-τ T + T frame video frame I t+τ For support frames, reference frame I t Support frame I t-τ Support frame I t+τ Inputting an optical flow neural network;
s32: the optical flow neural network comprises a convolution layer, an expansion layer and a reference frame I t Support frame I t-τ Support frame I t+τ Obtaining by means of a constriction formed by a convolution layer of an optical flow neural networkReference frame I t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s33: since each feature map is reduced in size in step S32, it is necessary to enlarge the size of each feature map to the original size by one enlargement layer. Reference frame I t Support frame I t-τ Support frame I t+τ Respectively obtaining a reference frame I with the size enlarged to the size of the original image by the corresponding feature graphs through an enlarging layer of the optical flow neural network t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s34: the optical flow parameter is to calculate motion information of a target person by finding a correspondence between two video frames using a change in a temporal domain of a pixel in each video frame in an effective frame sequence and a correlation between the two video frames.
Based on the reference frame I obtained in step S33 t Support frame I t-τ Support frame I t+τ Respectively carrying out optical flow prediction by the corresponding characteristic graphs and respectively using reference frames I t Support frame I t-τ As a video frame pair, and reference frame I t Support frame I t+τ Obtaining reference frames I as a video frame pair t Support frame I t-τ Optical flow parameters M between respectively corresponding feature maps t-τ→t And a reference frame I t Support frame I t+τ Optical flow parameters M between respectively corresponding feature maps t+τ→t The following formula:
M t-τ→t =FlowNet(I t-τ ,I t )
M t+τ→t =FlowNet(I t+τ ,I t )
in the formula, M t-τ→t As a reference frame I t Support frame I t-τ Optical flow parameters between the corresponding feature maps, t- τ → t, representing the reference frame I t Support frame I t-τ Corresponding relation of (1), M t+τ→t As a reference frame I t Support frame I t+τ Optical flow parameters between the respectively corresponding feature maps,t + τ → t denotes the reference frame I t Support frame I t+τ The FlowNet represents the optical flow neural network computation.
S4, referring to FIG. 2, in the graph, WARP represents a bilinear distortion function, aggregation represents deformation characteristics, optical flow parameters of each group of video frame pairs in an effective frame sequence output by a depth convolution characteristic image extraction module and an optical flow information module are used as input, and deformation characteristics of each group of video frame pairs are used as output based on the bilinear distortion function to construct a deformation characteristic module;
step S4 is implemented by taking as input each depth convolution feature map output by the depth convolution feature map extraction module and optical flow parameters of each set of video frame pairs in the effective frame sequence output by the optical flow information module, and based on a bilinear warping function, taking as output the deformation features of each set of video frame pairs, and constructing a deformation feature module according to the following formula:
f t-τ→t =W(f t-τ ,M t-τ→t )
f t+τ→t =W(f t+τ ,M t+τ→t )
in the formula (f) t-τ→t As a reference frame I t Support frame I t-τ Deformation characteristic of f t+τ→t As a reference frame I t Support frame I t+τ W represents a bilinear warp function calculation, f t-τ Support frame I output by deep convolution characteristic image extraction module t-τ Corresponding characteristic diagram, f t+τ Support frame I output by deep convolution characteristic image extraction module t-τ And (5) corresponding characteristic diagrams.
S5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
the fuzzy mapping network is DBM, the significance detection network is CSNet, the fuzzy mapping network is used for obtaining the fuzzy degree of the video frames, and the significance detection network is used for eliminating background interference in the images.
Step S5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame according to the fuzzy characteristics and the saliency characteristics and based on a softmax classification network; the specific steps of constructing the weight coefficient calculation module by taking the weight coefficient of each video frame in the effective frame sequence as output are as follows:
s51, respectively inputting each video frame in the effective frame sequence into a fuzzy mapping network and a significance detection network to obtain fuzzy characteristics and significance characteristics corresponding to each video frame;
s52, obtaining the corrected fuzzy mapping M corresponding to each video frame by dot multiplication of the fuzzy characteristics and the saliency characteristics corresponding to each video frame obtained in the step S51 blur-sali ;
S53 corrected fuzzy mapping M corresponding to each video frame based on step function with threshold value of 0.5 blur-sali And carrying out binarization, wherein the step function is as follows:
where M is the corrected blur map M corresponding to each video frame blur-sali U (M) is the corrected blur map M corresponding to each video frame after binarization processing blur-sali A value of (d);
and S54, respectively adding all u (m) obtained in the step S53 aiming at each video frame to obtain a fuzziness parameter Vcb of each video frame, and carrying out standardization processing on the fuzziness parameter Vcb of each video frame, wherein the standardization processing method is as follows:
in the formula, Vcb i Representing the blurriness parameter, VcbNorm, of video frame i i Representing the ambiguity parameter of a video frame i subjected to standardization processing, wherein the value of i is { t-tau, t, t + tau };
s55, the fuzziness parameter VcbNorm of each video frame which is obtained in the step S54 and is subjected to standardization processing i Inputting the data into a softmax classification network to obtain a support frame I t-τ Reference frame I t Support frame I t+τ The weight coefficients ω corresponding to the respective t-τ 、ω t 、ω t+τ 。
S6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
in one embodiment, the neural network is Faster R-CNN.
Step S6 is implemented by taking the deformation features of each group of video frame pairs output by the deformation feature module, the depth convolution feature maps corresponding to each video frame output by the depth convolution feature map extraction module, and the weight coefficients of each video frame output by the weight coefficient calculation module as inputs to obtain the aggregation features corresponding to the video frame groups formed by the video frame pairs, and constructing the detection network module by using the preset information of the target person as an output through the detection neural network, which specifically includes the following steps:
s61: deformation characteristic f based on shape characteristic module output t-τ→t 、f t+τ→t Reference frame I output by deep convolution characteristic image extraction module t Corresponding characteristic diagram f t And the weight coefficient omega corresponding to each deformation characteristic output by the weight coefficient calculation module t-τ 、ω t 、ω t+τ Obtaining a support frame I according to the following formula t-τ Reference frame I t Support frame I t+τ Aggregation characteristics J of the composed video frame group;
J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
s62: and inputting the aggregation characteristics into a detection neural network to obtain preset information of the target person.
And S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
The embodiment of the invention provides a task target detection system based on fragmented video information, which comprises:
one or more processors;
a memory storing instructions operable, when executed by the one or more processors, to cause the one or more processors to perform operations to obtain a target person detection model and then apply the target person detection model to accomplish detection of a preset target person:
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
The computer readable medium for storing software provided by the embodiment of the invention comprises instructions which can be executed by one or more computers, and when the instructions are executed by the one or more computers, the instructions execute the operation of the task object detection method based on the fragmented video information.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.
Claims (7)
1. A task object detection method based on fragmented video information is characterized in that steps S1-S7 are executed according to a preset period to obtain a target character detection model, and then the target character detection model is applied to complete detection of a target character;
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
2. The method of claim 1, wherein the step S3 is implemented by inputting the effective frame sequence outputted from the effective frame sequence extraction module, calculating optical flow parameters of each video frame pair as the motion information of the target person for each video frame pair consisting of two video frames spaced apart from each other by a predetermined number of frames in the effective frame sequence based on the optical flow neural network, and constructing the optical flow information module by outputting the optical flow parameters of each video frame pair in the effective frame sequence, as follows:
s31: defining a tth frame video frame I in a sequence of active frames t Video frame I of the t-t frame in the active frame sequence as a reference frame t-τ T + T frame video frame I t+τ For support frames, reference frame I t Support frame I t-τ Support frame I t+τ Inputting an optical flow neural network;
s32: the optical flow neural network comprises a convolution layer, an expansion layer and a reference frame I t Support frame I t-τ Support frame I t+τ Obtaining a reference frame I through a contraction part formed by the optical flow neural network convolution layer t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s33: reference frame I t Support frame I t-τ Support frame I t+τ Respectively obtaining a reference frame I with the size enlarged to the size of the original image by the corresponding feature graphs through an enlarging layer of the optical flow neural network t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s34: based on the reference frame I obtained in step S33 t Support frame I t-τ Support frame I t+τ Respectively carrying out optical flow prediction by the corresponding characteristic graphs and respectively using reference frames I t Support frame I t-τ As a video frame pair, and reference frame I t Support frame I t+τ Obtaining reference frames I as a video frame pair t Support frame I t-T Optical flow parameters M between respectively corresponding feature maps t-τ→t And a reference frame I t Support frame I t+τ Optical flow parameters M between respectively corresponding feature maps t+ τ →t The following formula:
M t-τ→t =FlowNet(I t-τ ,I t )
M t+τ→t =FlowNet(I t+τ ,I t )
in the formula, M t-τ→t As a reference frame I t Support frame I t-τ Respectively corresponding characteristic diagramInter optical flow parameter, t- τ → t for reference frame I t Support frame I t-τ Corresponding relation of (1), M t+τ→t As a reference frame I t Support frame I t+τ Optical flow parameters between the corresponding feature maps, t + τ → t, respectively, representing the reference frame I t Support frame I t+τ The FlowNet represents the optical flow neural network computation.
3. The method as claimed in claim 2, wherein the step S4 takes as input the depth convolution feature maps outputted from the depth convolution feature map extraction module, the optical flow parameters of each set of video frame pairs in the effective frame sequence outputted from the optical flow information module, and the deformation features of each set of video frame pairs as output based on the bilinear warping function, and the specific method for constructing the deformation feature module is as follows:
f t-τ→t =W(f t-τ ,M t-τ→t )
f t+τ→t =W(f t+τ ,M t+τ→t )
in the formula (f) t-τ→t As a reference frame I t Support frame I t-τ Deformation characteristic of f t+τ→t As a reference frame I t Support frame I t+τ W represents a bilinear warp function calculation, f t-τ Support frame I output by deep convolution feature map extraction module t-τ Corresponding characteristic diagram, f t+τ Support frame I output by deep convolution characteristic image extraction module t-τ The corresponding characteristic diagram.
4. The method according to claim 3, wherein step S5 is implemented by taking the valid frame sequence output by the valid frame sequence extraction module as input, respectively obtaining the blur characteristic and the saliency characteristic corresponding to each video frame in the valid frame sequence based on the blur mapping network and the saliency detection network, and obtaining the weight coefficient of each video frame in the valid frame sequence based on the blur characteristic and the saliency characteristic and on the softmax classification network; the specific steps of constructing the weight coefficient calculation module by taking the weight coefficient of each video frame in the effective frame sequence as output are as follows:
s51: respectively inputting each video frame in the effective frame sequence into a fuzzy mapping network and a significance detection network to obtain fuzzy characteristics and significance characteristics corresponding to each video frame;
s52: the corrected blur map M corresponding to each video frame is obtained by dot-multiplying the blur features and saliency features corresponding to each video frame obtained in step S51 blur-sali ;
S53: corrected fuzzy mapping M corresponding to each video frame based on step function with threshold value of 0.5 blur-sali And carrying out binarization, wherein the step function is as follows:
where M is the corrected blur map M corresponding to each video frame blur-sali U (M) is the corrected blur map M corresponding to each video frame after binarization processing blur-sali A value of (d);
s54: adding all u (m) obtained in step S53 to obtain a blur parameter Vcb of each video frame, and normalizing the blur parameter Vcb of each video frame, wherein the normalization method is as follows:
in the formula, Vcb i Representing the blurriness parameter, VcbNorm, of video frame i i Representing the ambiguity parameter of a video frame i subjected to standardization processing, wherein the value of i is { t-tau, t, t + tau };
s55: the fuzziness parameter VcbNorm of each video frame subjected to the normalization processing obtained in step S54 i Inputting the data into a softmax classification network to obtain a support frame I t-τ Reference frame I t Support frame I t+τ The weight coefficients ω corresponding to the respective t-τ 、ω t 、ω t+τ 。
5. The method according to claim 4, wherein the step S6 is implemented by taking the distortion features of each group of video frame pairs output by the distortion feature module, the depth convolution feature maps corresponding to each video frame output by the depth convolution feature map extraction module, and the weight coefficients of each video frame output by the weight coefficient calculation module as input, obtaining the aggregation features corresponding to the video frame group formed by the video frame pairs, and constructing the detection network module by taking the preset information of the target person as output through the detection neural network, and comprises the specific steps of:
s61: deformation characteristic f based on shape characteristic module output t-τ→t 、f t+τ→t Reference frame I output by deep convolution characteristic image extraction module t Corresponding characteristic diagram f t And the weight coefficient omega corresponding to each deformation characteristic output by the weight coefficient calculation module t-τ 、ω t 、ω t+τ Obtaining a support frame I according to the following formula t-τ Reference frame I t Support frame I t+τ Aggregation characteristics J of the composed video frame group;
J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
s62: and inputting the aggregation characteristics into a detection neural network to obtain preset information of the target person.
6. A system for task object detection based on fragmented video information, comprising:
one or more processors;
a memory storing instructions operable, when executed by the one or more processors, to cause the one or more processors to perform operations to obtain a target person detection model and then apply the target person detection model to accomplish detection of a preset target person:
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
7. A computer-readable medium storing software, the computer-readable medium comprising instructions executable by one or more computers, the instructions, when executed by the one or more computers, performing the operations of the fragmented video information based task object detection method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210375278.1A CN114882523A (en) | 2022-04-11 | 2022-04-11 | Task target detection method and system based on fragmented video information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210375278.1A CN114882523A (en) | 2022-04-11 | 2022-04-11 | Task target detection method and system based on fragmented video information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114882523A true CN114882523A (en) | 2022-08-09 |
Family
ID=82669897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210375278.1A Pending CN114882523A (en) | 2022-04-11 | 2022-04-11 | Task target detection method and system based on fragmented video information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114882523A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111476314A (en) * | 2020-04-27 | 2020-07-31 | 中国科学院合肥物质科学研究院 | Fuzzy video detection method integrating optical flow algorithm and deep learning |
CN111814884A (en) * | 2020-07-10 | 2020-10-23 | 江南大学 | Target detection network model upgrading method based on deformable convolution |
CN113239825A (en) * | 2021-05-19 | 2021-08-10 | 四川中烟工业有限责任公司 | High-precision tobacco beetle detection method in complex scene |
US20210327031A1 (en) * | 2020-04-15 | 2021-10-21 | Tsinghua Shenzhen International Graduate School | Video blind denoising method based on deep learning, computer device and computer-readable storage medium |
-
2022
- 2022-04-11 CN CN202210375278.1A patent/CN114882523A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210327031A1 (en) * | 2020-04-15 | 2021-10-21 | Tsinghua Shenzhen International Graduate School | Video blind denoising method based on deep learning, computer device and computer-readable storage medium |
CN111476314A (en) * | 2020-04-27 | 2020-07-31 | 中国科学院合肥物质科学研究院 | Fuzzy video detection method integrating optical flow algorithm and deep learning |
CN111814884A (en) * | 2020-07-10 | 2020-10-23 | 江南大学 | Target detection network model upgrading method based on deformable convolution |
CN113239825A (en) * | 2021-05-19 | 2021-08-10 | 四川中烟工业有限责任公司 | High-precision tobacco beetle detection method in complex scene |
Non-Patent Citations (1)
Title |
---|
李森等: "基于时空建模的视频帧预测模型", 物联网技术, no. 02, 20 February 2020 (2020-02-20) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111354017B (en) | Target tracking method based on twin neural network and parallel attention module | |
CN111080675B (en) | Target tracking method based on space-time constraint correlation filtering | |
CN111627044B (en) | Target tracking attack and defense method based on deep network | |
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network | |
CN110120064B (en) | Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning | |
Xu et al. | Multi-stream attention-aware graph convolution network for video salient object detection | |
CN103761710B (en) | The blind deblurring method of efficient image based on edge self-adaption | |
CN110853074B (en) | Video target detection network system for enhancing targets by utilizing optical flow | |
CN107688829A (en) | A kind of identifying system and recognition methods based on SVMs | |
CN109902667A (en) | Human face in-vivo detection method based on light stream guide features block and convolution GRU | |
CN112949493A (en) | Lane line detection method and system combining semantic segmentation and attention mechanism | |
CN113361542A (en) | Local feature extraction method based on deep learning | |
CN112561879B (en) | Ambiguity evaluation model training method, image ambiguity evaluation method and image ambiguity evaluation device | |
CN115937254B (en) | Multi-aerial flying target tracking method and system based on semi-supervised learning | |
CN111310609A (en) | Video target detection method based on time sequence information and local feature similarity | |
CN115588030B (en) | Visual target tracking method and device based on twin network | |
CN112016454A (en) | Face alignment detection method | |
CN111368831B (en) | Positioning system and method for vertical text | |
CN117561540A (en) | System and method for performing computer vision tasks using a sequence of frames | |
CN111753670A (en) | Human face overdividing method based on iterative cooperation of attention restoration and key point detection | |
CN111145221A (en) | Target tracking algorithm based on multi-layer depth feature extraction | |
CN114882523A (en) | Task target detection method and system based on fragmented video information | |
CN116612355A (en) | Training method and device for face fake recognition model, face recognition method and device | |
CN116188265A (en) | Space variable kernel perception blind super-division reconstruction method based on real degradation | |
US11989927B2 (en) | Apparatus and method for detecting keypoint based on deep learning using information change across receptive fields |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |