CN114882523A - Task target detection method and system based on fragmented video information - Google Patents

Task target detection method and system based on fragmented video information Download PDF

Info

Publication number
CN114882523A
CN114882523A CN202210375278.1A CN202210375278A CN114882523A CN 114882523 A CN114882523 A CN 114882523A CN 202210375278 A CN202210375278 A CN 202210375278A CN 114882523 A CN114882523 A CN 114882523A
Authority
CN
China
Prior art keywords
frame
video
video frame
output
effective
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210375278.1A
Other languages
Chinese (zh)
Inventor
陈志�
何丽
岳文静
周晨
王悦
艾虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210375278.1A priority Critical patent/CN114882523A/en
Publication of CN114882523A publication Critical patent/CN114882523A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a task target detection method and system based on fragmented video information, which construct a target person detection model based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module to complete the detection of a preset target person.

Description

Task target detection method and system based on fragmented video information
Technical Field
The invention relates to a task target detection method based on fragmented video information, and also relates to a system for realizing the task target detection method based on fragmented video information.
Background
Deep learning networks have made significant advances in object detection, and in recent years, excellent image-based object detection algorithms have moved directly to video object detection. Video object detection is more challenging than still image object detection. Due to the fact that a video detection scene is usually complex, for example, the detected person is not matched, and the obtained video information is discontinuous, extracted image information is incomplete, such as motion blur, defocusing, rare postures and the like, and the detection accuracy is reduced to a great extent.
The existing feature aggregation-based method is to compensate for the misalignment between frames by aggregating features of multiple adjacent frames, and a key issue is whether these frames should be treated equally. There are two approaches to solve this problem, one is to treat each frame equally and give them the same weight, and the other is to learn the weights using a light network during training, both of which lack special considerations for fuzzy impact.
In the invention, a human target detection method based on fragmented video information is provided. The method comprises the steps of improving weight distribution of an aggregation frame by fuzzy prior, particularly, introducing a fuzzy mapping network to mark each pixel as fuzzy or non-fuzzy, focusing a target by adopting a saliency detection network because the fuzzy degree of the target is only concerned and the background is not considered, calibrating by using a saliency map to obtain a calibration fuzzy map taking the target fuzzy degree as a focus, and calculating the weight of each frame of image; the method has better performance than the current most advanced video target detection algorithm on the premise of increasing the calculated amount.
Disclosure of Invention
The purpose of the invention is as follows: the task target detection method and system based on fragmented video information are provided, the weight distribution of an aggregation frame is improved through a fuzzy prior method, the weight of each frame of image is calculated instead of giving the same weight to each frame, and the accuracy and reliability of person detection are effectively improved.
In order to realize the functions, the task object detection method based on the fragmented video information comprises the steps of executing the steps S1-S7 according to a preset period to obtain an object person detection model, and then applying the object person detection model to finish the detection of an object person;
s1, collecting videos containing walking of target characters in real time, converting the videos containing the walking of the target characters into video frame sequences arranged according to time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequences to serve as effective frame sequences, and constructing an effective frame sequence extraction module by taking the video frame sequences as input and the effective frame sequences as output;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
As a preferred technical scheme of the invention: step S3 is a specific step of taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs constituted by two video frames spaced by a preset number of frames in the effective frame sequence as motion information of a target person based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output, and constructing an optical flow information module as follows:
s31: defining a tth frame video frame I in a sequence of active frames t Video frame I of the t-t frame in the active frame sequence as a reference frame t-τ T + T frame video frame I t+τ For support frames, reference frame I t Support frame I t-τ Support frame I t+τ Inputting an optical flow neural network;
s32: optical flowThe neural network comprises a convolution layer, an expansion layer and a reference frame I t Support frame I t-τ Support frame I t+τ Obtaining a reference frame I through a contraction part formed by the optical flow neural network convolution layer t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s33: reference frame I t Support frame I t-τ Support frame I t+τ Respectively obtaining a reference frame I with the size enlarged to the size of the original image by the corresponding feature graphs through an enlarging layer of the optical flow neural network t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s34: based on the reference frame I obtained in step S33 t Support frame I t-τ Support frame I t+τ Respectively carrying out optical flow prediction by the corresponding characteristic graphs and respectively using reference frames I t Support frame I t-τ As a video frame pair, and reference frame I t Support frame I t+τ Obtaining reference frames I as a video frame pair t Support frame I t-τ Optical flow parameters M between respectively corresponding feature maps t-τ→t And a reference frame I t Support frame I t+τ Optical flow parameters M between respectively corresponding feature maps t+τ→t The following formula:
M t-τ→t =FlowNet(I t-τ ,I t )
M t+τ→t =FlowNet(I t+τ ,I t )
in the formula, M t-τ→t As a reference frame I t Support frame I t-τ Optical flow parameters between the corresponding feature maps, t- τ → t, representing the reference frame I t Support frame I t-τ Corresponding relation of (1), M t+τ→t As a reference frame I t Support frame I t+τ Optical flow parameters between the corresponding feature maps, t + τ → t, representing the reference frame I t Support frame I t+τ The FlowNet represents the optical flow neural network computation.
As a preferred technical scheme of the invention: step S4 is implemented by taking as input each depth convolution feature map output by the depth convolution feature map extraction module and optical flow parameters of each set of video frame pairs in the effective frame sequence output by the optical flow information module, and based on a bilinear warping function, taking as output the deformation features of each set of video frame pairs, and constructing a deformation feature module according to the following formula:
f t-τ→t =W(f t-τ ,M t-τ→t )
f t+τ→t =W(f t+τ ,M t+τ→t )
in the formula (f) t-τ→t As a reference frame I t Support frame I t-τ Deformation characteristic of f t+τ→t As a reference frame I t Support frame I t+τ W represents a bilinear warp function calculation, f t-τ Support frame I output by deep convolution characteristic image extraction module t-τ Corresponding characteristic diagram, f t+τ Support frame I output by deep convolution characteristic image extraction module t-τ The corresponding characteristic diagram.
As a preferred technical scheme of the invention: step S5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining the weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; the specific steps of constructing the weight coefficient calculation module by taking the weight coefficient of each video frame in the effective frame sequence as output are as follows:
s51, respectively inputting each video frame in the effective frame sequence into a fuzzy mapping network and a significance detection network to obtain fuzzy characteristics and significance characteristics corresponding to each video frame;
s52, obtaining the corrected fuzzy mapping M corresponding to each video frame by dot multiplication of the fuzzy characteristics and the saliency characteristics corresponding to each video frame obtained in the step S51 blur-sali
S53 mapping the corrected blur corresponding to each video frame based on the step function with the threshold value of 0.5M blur-sali And carrying out binarization, wherein the step function is as follows:
Figure BDA0003590118080000041
where M is the corrected blur map M corresponding to each video frame blur-sali U (M) is the corrected blur map M corresponding to each video frame after binarization processing blur-sali A value of (d);
and S54, respectively adding all u (m) obtained in the step S53 aiming at each video frame to obtain a fuzziness parameter Vcb of each video frame, and carrying out standardization processing on the fuzziness parameter Vcb of each video frame, wherein the standardization processing method is as follows:
Figure BDA0003590118080000042
in the formula, Vcb i Representing the blurriness parameter, VcbNorm, of video frame i i Representing the ambiguity parameter of a video frame i subjected to standardization processing, wherein the value of i is { t-tau, t, t + tau };
s55, the fuzziness parameter VcbNorm of each video frame which is obtained in the step S54 and is subjected to standardization processing i Inputting the data into a softmax classification network to obtain a support frame I t-τ Reference frame I t Support frame I t+τ The weight coefficients ω corresponding to the respective t-τ 、ω t 、ω t+τ
As a preferred technical scheme of the invention: step S6 is implemented by taking the deformation features of each group of video frame pairs output by the deformation feature module, the depth convolution feature maps corresponding to each video frame output by the depth convolution feature map extraction module, and the weight coefficients of each video frame output by the weight coefficient calculation module as inputs to obtain the aggregation features corresponding to the video frame groups formed by the video frame pairs, and constructing the detection network module by using the preset information of the target person as an output through the detection neural network, which specifically includes the following steps:
s61: module based on shape characteristicsDeformation characteristic f of output t-τ→t 、f t+τ→t Reference frame I output by deep convolution characteristic image extraction module t Corresponding characteristic diagram f t And the weight coefficient omega corresponding to each deformation characteristic output by the weight coefficient calculation module t-τ 、ω t 、ω t+τ Obtaining a support frame I according to the following formula t-τ Reference frame I t Support frame I t+τ Aggregation characteristics J of the composed video frame group;
J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
s62: and inputting the aggregation characteristics into a detection neural network to obtain preset information of the target person.
The invention also designs a task target detection system based on fragmented video information, which comprises the following steps:
one or more processors;
a memory storing instructions operable, when executed by the one or more processors, to cause the one or more processors to perform operations to obtain a target person detection model and then apply the target person detection model to accomplish detection of a preset target person:
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
The invention also relates to a computer readable medium storing software, which is characterized in that the readable medium includes instructions executable by one or more computers, and the instructions, when executed by the one or more computers, perform the operations of the fragmented video information-based task object detection method.
Has the advantages that: compared with the prior art, the invention has the advantages that:
1. the invention introduces the optical flow neural network to calculate the fluency between any two frames, not only focuses on the characteristics of a certain frame, but also focuses on the relationship between the upper frame and the lower frame.
2. The invention provides a new video target detection algorithm, which mainly researches the influence of blurring on video target detection, and the frame with clear object appearance contributes more to the result than the frame with blurred object appearance.
3. The human target detection method based on the fragmented video information is beneficial to detecting human objects when the video is discontinuous, and improves the precision of video target detection.
Drawings
Fig. 1 is a flowchart of a task object detection method based on fragmented video information according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a network framework for task object detection based on fragmented video information according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Referring to fig. 1 and 2, in the task target detection method based on fragmented video information according to the embodiment of the present invention, steps S1 to S7 are executed according to a preset period to obtain a target person detection model, and then the target person detection model is applied to complete detection of a preset target person;
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
in one embodiment, the deep convolutional neural network is Restnet-101.
S3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
since the resolution of the optical flow parameters output by the optical flow neural network does not match the resolution of the deep convolution feature map, the optical flow parameters need to be sized to match the feature map.
In step S3, the specific steps of taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each set of video frame pairs as motion information of a target person for each set of video frame pairs composed of two video frames spaced by a preset number of frames in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each set of video frame pairs in the effective frame sequence as output, and constructing an optical flow information module are as follows:
s31: defining a tth frame video frame I in a sequence of active frames t Video frame I of the t-t frame in the active frame sequence as a reference frame t-τ T + T frame video frame I t+τ For support frames, reference frame I t Support frame I t-τ Support frame I t+τ Inputting an optical flow neural network;
s32: the optical flow neural network comprises a convolution layer, an expansion layer and a reference frame I t Support frame I t-τ Support frame I t+τ Obtaining by means of a constriction formed by a convolution layer of an optical flow neural networkReference frame I t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s33: since each feature map is reduced in size in step S32, it is necessary to enlarge the size of each feature map to the original size by one enlargement layer. Reference frame I t Support frame I t-τ Support frame I t+τ Respectively obtaining a reference frame I with the size enlarged to the size of the original image by the corresponding feature graphs through an enlarging layer of the optical flow neural network t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s34: the optical flow parameter is to calculate motion information of a target person by finding a correspondence between two video frames using a change in a temporal domain of a pixel in each video frame in an effective frame sequence and a correlation between the two video frames.
Based on the reference frame I obtained in step S33 t Support frame I t-τ Support frame I t+τ Respectively carrying out optical flow prediction by the corresponding characteristic graphs and respectively using reference frames I t Support frame I t-τ As a video frame pair, and reference frame I t Support frame I t+τ Obtaining reference frames I as a video frame pair t Support frame I t-τ Optical flow parameters M between respectively corresponding feature maps t-τ→t And a reference frame I t Support frame I t+τ Optical flow parameters M between respectively corresponding feature maps t+τ→t The following formula:
M t-τ→t =FlowNet(I t-τ ,I t )
M t+τ→t =FlowNet(I t+τ ,I t )
in the formula, M t-τ→t As a reference frame I t Support frame I t-τ Optical flow parameters between the corresponding feature maps, t- τ → t, representing the reference frame I t Support frame I t-τ Corresponding relation of (1), M t+τ→t As a reference frame I t Support frame I t+τ Optical flow parameters between the respectively corresponding feature maps,t + τ → t denotes the reference frame I t Support frame I t+τ The FlowNet represents the optical flow neural network computation.
S4, referring to FIG. 2, in the graph, WARP represents a bilinear distortion function, aggregation represents deformation characteristics, optical flow parameters of each group of video frame pairs in an effective frame sequence output by a depth convolution characteristic image extraction module and an optical flow information module are used as input, and deformation characteristics of each group of video frame pairs are used as output based on the bilinear distortion function to construct a deformation characteristic module;
step S4 is implemented by taking as input each depth convolution feature map output by the depth convolution feature map extraction module and optical flow parameters of each set of video frame pairs in the effective frame sequence output by the optical flow information module, and based on a bilinear warping function, taking as output the deformation features of each set of video frame pairs, and constructing a deformation feature module according to the following formula:
f t-τ→t =W(f t-τ ,M t-τ→t )
f t+τ→t =W(f t+τ ,M t+τ→t )
in the formula (f) t-τ→t As a reference frame I t Support frame I t-τ Deformation characteristic of f t+τ→t As a reference frame I t Support frame I t+τ W represents a bilinear warp function calculation, f t-τ Support frame I output by deep convolution characteristic image extraction module t-τ Corresponding characteristic diagram, f t+τ Support frame I output by deep convolution characteristic image extraction module t-τ And (5) corresponding characteristic diagrams.
S5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
the fuzzy mapping network is DBM, the significance detection network is CSNet, the fuzzy mapping network is used for obtaining the fuzzy degree of the video frames, and the significance detection network is used for eliminating background interference in the images.
Step S5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame according to the fuzzy characteristics and the saliency characteristics and based on a softmax classification network; the specific steps of constructing the weight coefficient calculation module by taking the weight coefficient of each video frame in the effective frame sequence as output are as follows:
s51, respectively inputting each video frame in the effective frame sequence into a fuzzy mapping network and a significance detection network to obtain fuzzy characteristics and significance characteristics corresponding to each video frame;
s52, obtaining the corrected fuzzy mapping M corresponding to each video frame by dot multiplication of the fuzzy characteristics and the saliency characteristics corresponding to each video frame obtained in the step S51 blur-sali
S53 corrected fuzzy mapping M corresponding to each video frame based on step function with threshold value of 0.5 blur-sali And carrying out binarization, wherein the step function is as follows:
Figure BDA0003590118080000091
where M is the corrected blur map M corresponding to each video frame blur-sali U (M) is the corrected blur map M corresponding to each video frame after binarization processing blur-sali A value of (d);
and S54, respectively adding all u (m) obtained in the step S53 aiming at each video frame to obtain a fuzziness parameter Vcb of each video frame, and carrying out standardization processing on the fuzziness parameter Vcb of each video frame, wherein the standardization processing method is as follows:
Figure BDA0003590118080000092
in the formula, Vcb i Representing the blurriness parameter, VcbNorm, of video frame i i Representing the ambiguity parameter of a video frame i subjected to standardization processing, wherein the value of i is { t-tau, t, t + tau };
s55, the fuzziness parameter VcbNorm of each video frame which is obtained in the step S54 and is subjected to standardization processing i Inputting the data into a softmax classification network to obtain a support frame I t-τ Reference frame I t Support frame I t+τ The weight coefficients ω corresponding to the respective t-τ 、ω t 、ω t+τ
S6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
in one embodiment, the neural network is Faster R-CNN.
Step S6 is implemented by taking the deformation features of each group of video frame pairs output by the deformation feature module, the depth convolution feature maps corresponding to each video frame output by the depth convolution feature map extraction module, and the weight coefficients of each video frame output by the weight coefficient calculation module as inputs to obtain the aggregation features corresponding to the video frame groups formed by the video frame pairs, and constructing the detection network module by using the preset information of the target person as an output through the detection neural network, which specifically includes the following steps:
s61: deformation characteristic f based on shape characteristic module output t-τ→t 、f t+τ→t Reference frame I output by deep convolution characteristic image extraction module t Corresponding characteristic diagram f t And the weight coefficient omega corresponding to each deformation characteristic output by the weight coefficient calculation module t-τ 、ω t 、ω t+τ Obtaining a support frame I according to the following formula t-τ Reference frame I t Support frame I t+τ Aggregation characteristics J of the composed video frame group;
J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
s62: and inputting the aggregation characteristics into a detection neural network to obtain preset information of the target person.
And S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
The embodiment of the invention provides a task target detection system based on fragmented video information, which comprises:
one or more processors;
a memory storing instructions operable, when executed by the one or more processors, to cause the one or more processors to perform operations to obtain a target person detection model and then apply the target person detection model to accomplish detection of a preset target person:
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
The computer readable medium for storing software provided by the embodiment of the invention comprises instructions which can be executed by one or more computers, and when the instructions are executed by the one or more computers, the instructions execute the operation of the task object detection method based on the fragmented video information.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (7)

1. A task object detection method based on fragmented video information is characterized in that steps S1-S7 are executed according to a preset period to obtain a target character detection model, and then the target character detection model is applied to complete detection of a target character;
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking an effective frame sequence output by an effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
2. The method of claim 1, wherein the step S3 is implemented by inputting the effective frame sequence outputted from the effective frame sequence extraction module, calculating optical flow parameters of each video frame pair as the motion information of the target person for each video frame pair consisting of two video frames spaced apart from each other by a predetermined number of frames in the effective frame sequence based on the optical flow neural network, and constructing the optical flow information module by outputting the optical flow parameters of each video frame pair in the effective frame sequence, as follows:
s31: defining a tth frame video frame I in a sequence of active frames t Video frame I of the t-t frame in the active frame sequence as a reference frame t-τ T + T frame video frame I t+τ For support frames, reference frame I t Support frame I t-τ Support frame I t+τ Inputting an optical flow neural network;
s32: the optical flow neural network comprises a convolution layer, an expansion layer and a reference frame I t Support frame I t-τ Support frame I t+τ Obtaining a reference frame I through a contraction part formed by the optical flow neural network convolution layer t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s33: reference frame I t Support frame I t-τ Support frame I t+τ Respectively obtaining a reference frame I with the size enlarged to the size of the original image by the corresponding feature graphs through an enlarging layer of the optical flow neural network t Support frame I t-τ Support frame I t+τ Respectively corresponding characteristic diagrams;
s34: based on the reference frame I obtained in step S33 t Support frame I t-τ Support frame I t+τ Respectively carrying out optical flow prediction by the corresponding characteristic graphs and respectively using reference frames I t Support frame I t-τ As a video frame pair, and reference frame I t Support frame I t+τ Obtaining reference frames I as a video frame pair t Support frame I t-T Optical flow parameters M between respectively corresponding feature maps t-τ→t And a reference frame I t Support frame I t+τ Optical flow parameters M between respectively corresponding feature maps t+ τ →t The following formula:
M t-τ→t =FlowNet(I t-τ ,I t )
M t+τ→t =FlowNet(I t+τ ,I t )
in the formula, M t-τ→t As a reference frame I t Support frame I t-τ Respectively corresponding characteristic diagramInter optical flow parameter, t- τ → t for reference frame I t Support frame I t-τ Corresponding relation of (1), M t+τ→t As a reference frame I t Support frame I t+τ Optical flow parameters between the corresponding feature maps, t + τ → t, respectively, representing the reference frame I t Support frame I t+τ The FlowNet represents the optical flow neural network computation.
3. The method as claimed in claim 2, wherein the step S4 takes as input the depth convolution feature maps outputted from the depth convolution feature map extraction module, the optical flow parameters of each set of video frame pairs in the effective frame sequence outputted from the optical flow information module, and the deformation features of each set of video frame pairs as output based on the bilinear warping function, and the specific method for constructing the deformation feature module is as follows:
f t-τ→t =W(f t-τ ,M t-τ→t )
f t+τ→t =W(f t+τ ,M t+τ→t )
in the formula (f) t-τ→t As a reference frame I t Support frame I t-τ Deformation characteristic of f t+τ→t As a reference frame I t Support frame I t+τ W represents a bilinear warp function calculation, f t-τ Support frame I output by deep convolution feature map extraction module t-τ Corresponding characteristic diagram, f t+τ Support frame I output by deep convolution characteristic image extraction module t-τ The corresponding characteristic diagram.
4. The method according to claim 3, wherein step S5 is implemented by taking the valid frame sequence output by the valid frame sequence extraction module as input, respectively obtaining the blur characteristic and the saliency characteristic corresponding to each video frame in the valid frame sequence based on the blur mapping network and the saliency detection network, and obtaining the weight coefficient of each video frame in the valid frame sequence based on the blur characteristic and the saliency characteristic and on the softmax classification network; the specific steps of constructing the weight coefficient calculation module by taking the weight coefficient of each video frame in the effective frame sequence as output are as follows:
s51: respectively inputting each video frame in the effective frame sequence into a fuzzy mapping network and a significance detection network to obtain fuzzy characteristics and significance characteristics corresponding to each video frame;
s52: the corrected blur map M corresponding to each video frame is obtained by dot-multiplying the blur features and saliency features corresponding to each video frame obtained in step S51 blur-sali
S53: corrected fuzzy mapping M corresponding to each video frame based on step function with threshold value of 0.5 blur-sali And carrying out binarization, wherein the step function is as follows:
Figure FDA0003590118070000031
where M is the corrected blur map M corresponding to each video frame blur-sali U (M) is the corrected blur map M corresponding to each video frame after binarization processing blur-sali A value of (d);
s54: adding all u (m) obtained in step S53 to obtain a blur parameter Vcb of each video frame, and normalizing the blur parameter Vcb of each video frame, wherein the normalization method is as follows:
Figure FDA0003590118070000032
in the formula, Vcb i Representing the blurriness parameter, VcbNorm, of video frame i i Representing the ambiguity parameter of a video frame i subjected to standardization processing, wherein the value of i is { t-tau, t, t + tau };
s55: the fuzziness parameter VcbNorm of each video frame subjected to the normalization processing obtained in step S54 i Inputting the data into a softmax classification network to obtain a support frame I t-τ Reference frame I t Support frame I t+τ The weight coefficients ω corresponding to the respective t-τ 、ω t 、ω t+τ
5. The method according to claim 4, wherein the step S6 is implemented by taking the distortion features of each group of video frame pairs output by the distortion feature module, the depth convolution feature maps corresponding to each video frame output by the depth convolution feature map extraction module, and the weight coefficients of each video frame output by the weight coefficient calculation module as input, obtaining the aggregation features corresponding to the video frame group formed by the video frame pairs, and constructing the detection network module by taking the preset information of the target person as output through the detection neural network, and comprises the specific steps of:
s61: deformation characteristic f based on shape characteristic module output t-τ→t 、f t+τ→t Reference frame I output by deep convolution characteristic image extraction module t Corresponding characteristic diagram f t And the weight coefficient omega corresponding to each deformation characteristic output by the weight coefficient calculation module t-τ 、ω t 、ω t+τ Obtaining a support frame I according to the following formula t-τ Reference frame I t Support frame I t+τ Aggregation characteristics J of the composed video frame group;
J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
s62: and inputting the aggregation characteristics into a detection neural network to obtain preset information of the target person.
6. A system for task object detection based on fragmented video information, comprising:
one or more processors;
a memory storing instructions operable, when executed by the one or more processors, to cause the one or more processors to perform operations to obtain a target person detection model and then apply the target person detection model to accomplish detection of a preset target person:
s1, acquiring a video containing walking of a target person in real time, converting the video containing walking of the target person into a video frame sequence arranged according to a time sequence, extracting continuous video frames with preset frame numbers at preset positions in the video frame sequence as an effective frame sequence, taking the video frame sequence as input, and taking the effective frame sequence as output, and constructing an effective frame sequence extraction module;
s2, taking an effective frame sequence output by an effective frame sequence extraction module as input, and constructing a depth convolution feature map extraction module based on a depth convolution neural network and taking each depth convolution feature map corresponding to each video frame in the effective frame sequence as output;
s3, taking the effective frame sequence output by the effective frame sequence extraction module as input, calculating optical flow parameters of each group of video frame pairs as motion information of a target person aiming at each group of video frame pairs formed by two video frames with preset frame numbers spaced in the effective frame sequence based on an optical flow neural network, and taking the optical flow parameters of each group of video frame pairs in the effective frame sequence as output to construct an optical flow information module;
s4, taking the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the optical flow information module as input, and taking the deformation features of each group of video frame pairs as output to construct a deformation feature module based on a bilinear distortion function;
s5, taking the effective frame sequence output by the effective frame sequence extraction module as input, respectively obtaining fuzzy characteristics and saliency characteristics corresponding to each video frame in the effective frame sequence based on a fuzzy mapping network and a saliency detection network, and obtaining a weight coefficient of each video frame in the effective frame sequence based on the fuzzy characteristics and the saliency characteristics and a softmax classification network; taking the weight coefficient of each video frame in the effective frame sequence as output, and constructing a weight coefficient calculation module;
s6, obtaining aggregation characteristics corresponding to video frame groups formed by video frame pairs by taking deformation characteristics of each group of video frame pairs output by a deformation characteristic module, depth convolution characteristic graphs corresponding to each video frame output by a depth convolution characteristic graph extraction module and weight coefficients of each video frame output by a weight coefficient calculation module as input, and constructing a detection network module by taking preset information of a target person as output through a detection neural network;
and S7, acquiring a video frame sequence corresponding to a video containing the walking of the target person in real time as input, outputting preset information of the target person, constructing a model to be trained for the detection of the target person based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module and a weight coefficient calculation module, and obtaining a target person detection model based on the participation training of a video sample containing the walking of the target person to finish the detection of the target person.
7. A computer-readable medium storing software, the computer-readable medium comprising instructions executable by one or more computers, the instructions, when executed by the one or more computers, performing the operations of the fragmented video information based task object detection method according to any one of claims 1 to 5.
CN202210375278.1A 2022-04-11 2022-04-11 Task target detection method and system based on fragmented video information Pending CN114882523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210375278.1A CN114882523A (en) 2022-04-11 2022-04-11 Task target detection method and system based on fragmented video information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210375278.1A CN114882523A (en) 2022-04-11 2022-04-11 Task target detection method and system based on fragmented video information

Publications (1)

Publication Number Publication Date
CN114882523A true CN114882523A (en) 2022-08-09

Family

ID=82669897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210375278.1A Pending CN114882523A (en) 2022-04-11 2022-04-11 Task target detection method and system based on fragmented video information

Country Status (1)

Country Link
CN (1) CN114882523A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476314A (en) * 2020-04-27 2020-07-31 中国科学院合肥物质科学研究院 Fuzzy video detection method integrating optical flow algorithm and deep learning
CN111814884A (en) * 2020-07-10 2020-10-23 江南大学 Target detection network model upgrading method based on deformable convolution
CN113239825A (en) * 2021-05-19 2021-08-10 四川中烟工业有限责任公司 High-precision tobacco beetle detection method in complex scene
US20210327031A1 (en) * 2020-04-15 2021-10-21 Tsinghua Shenzhen International Graduate School Video blind denoising method based on deep learning, computer device and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210327031A1 (en) * 2020-04-15 2021-10-21 Tsinghua Shenzhen International Graduate School Video blind denoising method based on deep learning, computer device and computer-readable storage medium
CN111476314A (en) * 2020-04-27 2020-07-31 中国科学院合肥物质科学研究院 Fuzzy video detection method integrating optical flow algorithm and deep learning
CN111814884A (en) * 2020-07-10 2020-10-23 江南大学 Target detection network model upgrading method based on deformable convolution
CN113239825A (en) * 2021-05-19 2021-08-10 四川中烟工业有限责任公司 High-precision tobacco beetle detection method in complex scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李森等: "基于时空建模的视频帧预测模型", 物联网技术, no. 02, 20 February 2020 (2020-02-20) *

Similar Documents

Publication Publication Date Title
CN111354017B (en) Target tracking method based on twin neural network and parallel attention module
CN111080675B (en) Target tracking method based on space-time constraint correlation filtering
CN111627044B (en) Target tracking attack and defense method based on deep network
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
CN110120064B (en) Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning
Xu et al. Multi-stream attention-aware graph convolution network for video salient object detection
CN103761710B (en) The blind deblurring method of efficient image based on edge self-adaption
CN110853074B (en) Video target detection network system for enhancing targets by utilizing optical flow
CN107688829A (en) A kind of identifying system and recognition methods based on SVMs
CN109902667A (en) Human face in-vivo detection method based on light stream guide features block and convolution GRU
CN112949493A (en) Lane line detection method and system combining semantic segmentation and attention mechanism
CN113361542A (en) Local feature extraction method based on deep learning
CN112561879B (en) Ambiguity evaluation model training method, image ambiguity evaluation method and image ambiguity evaluation device
CN115937254B (en) Multi-aerial flying target tracking method and system based on semi-supervised learning
CN111310609A (en) Video target detection method based on time sequence information and local feature similarity
CN115588030B (en) Visual target tracking method and device based on twin network
CN112016454A (en) Face alignment detection method
CN111368831B (en) Positioning system and method for vertical text
CN117561540A (en) System and method for performing computer vision tasks using a sequence of frames
CN111753670A (en) Human face overdividing method based on iterative cooperation of attention restoration and key point detection
CN111145221A (en) Target tracking algorithm based on multi-layer depth feature extraction
CN114882523A (en) Task target detection method and system based on fragmented video information
CN116612355A (en) Training method and device for face fake recognition model, face recognition method and device
CN116188265A (en) Space variable kernel perception blind super-division reconstruction method based on real degradation
US11989927B2 (en) Apparatus and method for detecting keypoint based on deep learning using information change across receptive fields

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination