CN114882523A - Task target detection method and system based on fragmented video information - Google Patents
Task target detection method and system based on fragmented video information Download PDFInfo
- Publication number
- CN114882523A CN114882523A CN202210375278.1A CN202210375278A CN114882523A CN 114882523 A CN114882523 A CN 114882523A CN 202210375278 A CN202210375278 A CN 202210375278A CN 114882523 A CN114882523 A CN 114882523A
- Authority
- CN
- China
- Prior art keywords
- frame
- video
- output
- video frame
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 108
- 230000003287 optical effect Effects 0.000 claims abstract description 96
- 238000000605 extraction Methods 0.000 claims abstract description 77
- 238000004364 calculation method Methods 0.000 claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims description 39
- 230000006870 function Effects 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 15
- 230000002776 aggregation Effects 0.000 claims description 10
- 238000004220 aggregation Methods 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 4
- 230000008602 contraction Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims 1
- 238000003672 processing method Methods 0.000 claims 1
- 238000011425 standardization method Methods 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于碎片化视频信息的任务目标检测方法及系统,基于有效帧序列提取模块、深度卷积特征图提取模块、光流信息模块、变形特征模块、权重系数计算模块,构建目标人物检测模型,完成对预设目标人物的检测,本发明通过模糊先验方法改进聚合帧的权值分配,计算出每帧图像的权重,而不是赋予每帧相同的权重,有效地提升了人物检测的准确性与可靠性。
The invention discloses a task target detection method and system based on fragmented video information. Based on an effective frame sequence extraction module, a depth convolution feature map extraction module, an optical flow information module, a deformation feature module, and a weight coefficient calculation module, a target is constructed. The person detection model completes the detection of the preset target person. The present invention improves the weight distribution of the aggregated frame through the fuzzy prior method, and calculates the weight of each frame of the image instead of assigning the same weight to each frame, which effectively improves the character. Accuracy and reliability of detection.
Description
技术领域technical field
本发明涉及一种基于碎片化视频信息的任务目标检测方法,还涉及实现基于碎片化视频信息的任务目标检测方法的系统。The invention relates to a task target detection method based on fragmented video information, and also relates to a system for realizing the task target detection method based on fragmented video information.
背景技术Background technique
深度学习网络在目标检测方面取得了显著进展,近年来,优秀的基于图像的目标检测算法被直接转移到视频目标检测中。与静态图像目标检测相比,视频目标检测具有更大的挑战性。由于视频检测场景通常较为复杂,比如被检测人员的不配合、获取的视频信息不连续,导致提取出来的图像信息不完全,如运动模糊、离焦和罕见的姿态等,这在很大程度上降低了检测精度。Deep learning networks have made significant progress in object detection, and in recent years, excellent image-based object detection algorithms have been directly transferred to video object detection. Compared with still image object detection, video object detection is more challenging. Because the video detection scene is usually complex, such as the uncooperative of the detected person and the discontinuous video information obtained, the extracted image information is incomplete, such as motion blur, defocus, and rare gestures, which to a large extent Reduced detection accuracy.
现有的基于特征聚合的方法是通过聚合多个相邻帧的特征来补偿帧间的不对齐,一个关键的问题是这些帧是否应该被平等对待。现在有两种方法去解决这个问题,一种解决方案是平等地对待每一帧,并赋予它们相同的权重,另一种方法是在训练过程中采用一种轻网络来学习权重,这两种解决方案都缺乏对模糊影响的特殊考虑。Existing feature aggregation-based methods compensate for the misalignment between frames by aggregating features from multiple adjacent frames, and a key question is whether these frames should be treated equally. Now there are two ways to solve this problem, one solution is to treat each frame equally and give them the same weight, the other way is to use a light network to learn the weights during training, these two The solutions all lack special consideration for the effects of ambiguity.
在本发明中,我们提出了一种基于碎片化视频信息的人物目标检测方法。利用模糊先验改进聚合帧的权值分配,特别是,引入了一个模糊映射网络来标记每个像素为模糊或非模糊,由于本发明只关心目标的模糊程度而不考虑背景,所以采用显著性检测网络对目标进行聚焦,通过显著图进行标定,得到以目标模糊度为焦点的标定模糊图,计算出每帧图像的权重;该方法在增加计算量的前提下,比目前最先进的视频目标检测算法有更好的性能。In the present invention, we propose a human object detection method based on fragmented video information. Using fuzzy priors to improve the weight assignment of aggregated frames, in particular, a fuzzy mapping network is introduced to mark each pixel as fuzzy or non-blurred. The detection network focuses on the target and calibrates the saliency map to obtain a calibrated blur map focusing on the blur of the target, and calculates the weight of each frame of image; this method is more computationally intensive than the current most advanced video targets. The detection algorithm has better performance.
发明内容SUMMARY OF THE INVENTION
本发明目的:在于提供一种基于碎片化视频信息的任务目标检测方法及系统,通过模糊先验方法改进聚合帧的权值分配,计算出每帧图像的权重,而不是赋予每帧相同的权重,有效地提升了人物检测的准确性与可靠性。The purpose of the present invention is to provide a task target detection method and system based on fragmented video information, improve the weight distribution of aggregated frames through the fuzzy prior method, and calculate the weight of each frame of image, instead of assigning the same weight to each frame , which effectively improves the accuracy and reliability of person detection.
为实现以上功能,一种基于碎片化视频信息的任务目标检测方法,按预设周期执行步骤S1-步骤S7,获得目标人物检测模型,然后应用目标人物检测模型,完成对目标人物的检测;In order to realize the above functions, a task target detection method based on fragmented video information, execute steps S1-step S7 according to a preset cycle, obtain a target person detection model, and then apply the target person detection model to complete the detection of the target person;
S1.实时采集包含目标人物行走的视频,将包含目标人物行走的视频转换为按时序排列的视频帧序列,提取视频帧序列中预设位置的预设帧数的连续视频帧作为有效帧序列,以视频帧序列为输入,以有效帧序列为输出,构建有效帧序列提取模块;S1. Real-time acquisition of a video containing the walking of the target person, converting the video containing the walking of the target person into a sequence of video frames arranged in time sequence, and extracting continuous video frames of a preset number of frames at a preset position in the video frame sequence as a valid frame sequence, Taking the video frame sequence as the input and the valid frame sequence as the output, constructs the valid frame sequence extraction module;
S2.以有效帧序列提取模块输出的有效帧序列为输入,基于深度卷积神经网络,以有效帧序列中各视频帧所对应的各深度卷积特征图为输出,构建深度卷积特征图提取模块;S2. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the deep convolutional neural network, and using the depth convolution feature map corresponding to each video frame in the valid frame sequence as the output, construct the depth convolution feature map extraction module;
S3.以有效帧序列提取模块输出的有效帧序列为输入,基于光流神经网络,针对由有效帧序列中彼此间隔预设帧数的两视频帧所构成的各组视频帧对,计算该各组视频帧对的光流参数作为目标人物的运动信息,以有效帧序列中各组视频帧对的光流参数为输出,构建光流信息模块;S3. Taking the valid frame sequence output by the valid frame sequence extraction module as input, based on the optical flow neural network, for each group of video frame pairs formed by two video frames separated by a preset number of frames from each other in the valid frame sequence, calculate the The optical flow parameters of the group of video frame pairs are used as the motion information of the target person, and the optical flow parameters of each group of video frame pairs in the valid frame sequence are used as the output to construct an optical flow information module;
S4.以深度卷积特征图提取模块输出的各深度卷积特征图、光流信息模块输出的有效帧序列中各组视频帧对的光流参数为输入,基于双线性扭曲函数,以各组视频帧对的变形特征为输出,构建变形特征模块;S4. Take the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the valid frame sequence output by the optical flow information module as input, based on the bilinear distortion function, take each The deformation features of the group video frame pairs are output, and the deformation feature module is constructed;
S5.以有效帧序列提取模块输出的有效帧序列为输入,基于模糊映射网络、显著性检测网络,分别获得有效帧序列中各视频帧所对应的模糊特征、显著性特征,根据模糊特征、显著性特征,并基于softmax分类网络,获得有效帧序列中各视频帧的权重系数;以有效帧序列中各视频帧的权重系数为输出,构建权重系数计算模块;S5. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the fuzzy mapping network and the saliency detection network, respectively obtain the fuzzy features and saliency features corresponding to each video frame in the valid frame sequence. The weight coefficient of each video frame in the valid frame sequence is obtained based on the softmax classification network; the weight coefficient calculation module of each video frame in the valid frame sequence is used as the output to construct a weight coefficient calculation module;
S6.以变形特征模块输出的各组视频帧对的变形特征、深度卷积特征图提取模块输出的各视频帧所对应的各深度卷积特征图、以及权重系数计算模块输出的各视频帧的权重系数为输入,获得由视频帧对构成的视频帧组所对应的聚合特征,并通过检测神经网络,以目标人物的预设信息为输出,构建检测网络模块;S6. Deformation features of each group of video frame pairs output by the deformation feature module, each depth convolution feature map corresponding to each video frame output by the depth convolution feature map extraction module, and each video frame output by the weight coefficient calculation module. The weight coefficient is used as the input, and the aggregation features corresponding to the video frame group composed of video frame pairs are obtained, and the detection network module is constructed with the preset information of the target person as the output through the detection neural network;
S7.以实时采集包含目标人物行走的视频所对应的视频帧序列为输入,以目标人物的预设信息为输出,基于有效帧序列提取模块、深度卷积特征图提取模块、光流信息模块、变形特征模块、权重系数计算模块,构建目标人物检测待训练模型,并基于包含目标人物行走的视频样本的参与训练,获得目标人物检测模型,完成对目标人物的检测。S7. Take the real-time acquisition of the video frame sequence corresponding to the video of the target person walking as input, and take the preset information of the target person as output, based on the effective frame sequence extraction module, the depth convolution feature map extraction module, the optical flow information module, The deformation feature module and the weight coefficient calculation module are used to construct the target person detection model to be trained, and the target person detection model is obtained based on the participating training of the video samples containing the target person walking, and the target person detection is completed.
作为本发明的一种优选技术方案:步骤S3以有效帧序列提取模块输出的有效帧序列为输入,基于光流神经网络,针对由有效帧序列中彼此间隔预设帧数的两视频帧所构成的各组视频帧对,计算该各组视频帧对的光流参数作为目标人物的运动信息,以有效帧序列中各组视频帧对的光流参数为输出,构建光流信息模块的具体步骤如下:As a preferred technical solution of the present invention: step S3 takes the valid frame sequence output by the valid frame sequence extraction module as the input, based on the optical flow neural network, for two video frames in the valid frame sequence that are separated by a preset number of frames from each other. Each group of video frame pairs is calculated, the optical flow parameters of each group of video frame pairs are calculated as the motion information of the target person, and the optical flow parameters of each group of video frame pairs in the valid frame sequence are used as the output to construct the specific steps of the optical flow information module as follows:
S31:定义有效帧序列中的第t帧视频帧It为参考帧,有效帧序列中的第t-τ帧视频帧It-τ、第t+τ帧视频帧It+τ为支撑帧,将参考帧It、支撑帧It-τ、支撑帧It+τ输入光流神经网络;S31: Define the t-th video frame It in the valid frame sequence as a reference frame, and the t -τ-th video frame It -τ and the t+τ-th video frame It + τ in the valid frame sequence as support frames , input the reference frame It, the support frame It -τ , and the support frame It +τ into the optical flow neural network;
S32:光流神经网络包括卷积层、扩大层,参考帧It、支撑帧It-τ、支撑帧It+τ通过光流神经网络卷积层组成的收缩部分,获得参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图;S32: The optical flow neural network includes a convolutional layer, an expansion layer, the reference frame It, the support frame It -τ , and the support frame It +τ pass through the contraction part composed of the convolutional layer of the optical flow neural network to obtain the reference frame It . , the feature maps corresponding to the support frame It -τ and the support frame It +τ respectively;
S33:参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图经过光流神经网络的扩大层,获得尺寸扩大至原图大小的参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图;S33: The feature maps corresponding to the reference frame It, the support frame It -τ , and the support frame It +τ respectively pass through the expansion layer of the optical flow neural network to obtain the reference frame It and the support frame whose size is expanded to the original image size . The feature maps corresponding to It -τ and the support frame It +τ respectively;
S34:基于步骤S33所获得的参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图进行光流预测,分别以参考帧It、支撑帧It-τ作为一个视频帧对,以及参考帧It、支撑帧It+τ作为一个视频帧对,分别获得参考帧It、支撑帧It-τ分别所对应的特征图之间的光流参数Mt-τ→t、以及参考帧It、支撑帧It+τ分别所对应的特征图之间的光流参数Mt+τ→t如下式:S34: Perform optical flow prediction based on the feature maps corresponding to the reference frame It, the support frame It -τ , and the support frame It +τ obtained in step S33, respectively, using the reference frame It, the support frame It - τ As a video frame pair, and the reference frame It and the support frame It +τ are used as a video frame pair, the optical flow parameters M between the feature maps corresponding to the reference frame It and the support frame It -τ are obtained respectively. t-τ→t and the optical flow parameter M t+τ→t between the feature maps corresponding to the reference frame It and the support frame It +τ respectively are as follows:
Mt-τ→t=FlowNet(It-τ,It)M t-τ→t =FlowNet(I t-τ ,I t )
Mt+τ→t=FlowNet(It+τ,It)M t+τ→t =FlowNet(I t+τ ,I t )
式中,Mt-τ→t为参考帧It、支撑帧It-τ分别所对应的特征图之间的光流参数,t-τ→t表示参考帧It、支撑帧It-τ的对应关系,Mt+τ→t为参考帧It、支撑帧It+τ分别所对应的特征图之间的光流参数,t+τ→t表示参考帧It、支撑帧It+τ的对应关系,FlowNet表示光流神经网络计算。In the formula, M t-τ→t is the optical flow parameter between the feature maps corresponding to the reference frame It and the support frame It -τ respectively, and t -τ→ t represents the reference frame It and the support frame It- The correspondence between τ , M t+τ→t is the optical flow parameter between the feature maps corresponding to the reference frame It and the support frame It +τ respectively, t +τ→ t is the reference frame It, the support frame I The corresponding relationship of t+τ , FlowNet represents the optical flow neural network calculation.
作为本发明的一种优选技术方案:步骤S4以深度卷积特征图提取模块输出的各深度卷积特征图、光流信息模块输出的有效帧序列中各组视频帧对的光流参数为输入,基于双线性扭曲函数,以各组视频帧对的变形特征为输出,构建变形特征模块的具体方法如下式:As a preferred technical solution of the present invention: in step S4, the optical flow parameters of each group of video frame pairs in the effective frame sequence output by the depth convolution feature map extraction module and each depth convolution feature map output by the optical flow information module are input as input , based on the bilinear warping function, with the deformation features of each group of video frame pairs as the output, the specific method of constructing the deformation feature module is as follows:
ft-τ→t=W(ft-τ,Mt-τ→t)f t-τ→t =W(f t-τ ,M t-τ→t )
ft+τ→t=W(ft+τ,Mt+τ→t)f t+τ→t =W(f t+τ ,M t+τ→t )
式中,ft-τ→t为参考帧It、支撑帧It-τ之间的变形特征,ft+τ→t为参考帧It、支撑帧It+τ之间的变形特征,W表示基于双线性扭曲函数计算,ft-τ为深度卷积特征图提取模块输出的支撑帧It-τ所对应的特征图,ft+τ为深度卷积特征图提取模块输出的支撑帧It-τ所对应的特征图。In the formula, f t-τ→t is the deformation feature between the reference frame It and the support frame It -τ , and f t +τ→t is the deformation feature between the reference frame It and the support frame It +τ , W represents the calculation based on the bilinear distortion function, f t-τ is the feature map corresponding to the support frame I t-τ output by the depthwise convolution feature map extraction module, and f t+τ is the output of the depthwise convolution feature map extraction module The feature map corresponding to the supporting frame It -τ of .
作为本发明的一种优选技术方案:步骤S5以有效帧序列提取模块输出的有效帧序列为输入,基于模糊映射网络、显著性检测网络,分别获得有效帧序列中各视频帧所对应的模糊特征、显著性特征,根据模糊特征、显著性特征,并基于softmax分类网络,获得有效帧序列中各视频帧的权重系数;以有效帧序列中各视频帧的权重系数为输出,构建权重系数计算模块的具体步骤如下:As a preferred technical solution of the present invention: step S5 takes the valid frame sequence output by the valid frame sequence extraction module as input, and obtains the fuzzy features corresponding to each video frame in the valid frame sequence based on the fuzzy mapping network and the saliency detection network, respectively. , salient features, according to fuzzy features, salient features, and based on the softmax classification network, the weight coefficients of each video frame in the valid frame sequence are obtained; the weight coefficients of each video frame in the valid frame sequence are used as the output, and the weight coefficient calculation module is constructed The specific steps are as follows:
S51:将有效帧序列中的各视频帧分别输入模糊映射网络、显著性检测网络,获得分别获得各视频帧所对应的模糊特征、显著性特征;S51: input each video frame in the valid frame sequence into the fuzzy mapping network and the saliency detection network respectively, and obtain respectively the fuzzy feature and the saliency feature corresponding to each video frame;
S52:通过点乘步骤S51所获得的各视频帧所对应的模糊特征、显著性特征,获得各视频帧所对应的校正后的模糊映射Mblur-sali;S52: obtain the corresponding fuzzy map M blur-sali of each video frame corresponding to each video frame by point multiplication step S51 corresponding fuzzy feature, salient feature;
S53:基于阈值为0.5的阶跃函数对各视频帧所对应的校正后的模糊映射Mblur-sali进行二值化,其中阶跃函数如下式:S53: Binarize the corrected blur map M blur-sali corresponding to each video frame based on a step function with a threshold of 0.5, wherein the step function is as follows:
式中,m为各视频帧所对应的校正后的模糊映射Mblur-sali的值,u(m)为经过二值化处理后各视频帧所对应的校正后的模糊映射Mblur-sali的值;In the formula, m is the value of the corrected blur map M blur-sali corresponding to each video frame, and u(m) is the corrected blur map M blur-sali corresponding to each video frame after binarization processing. value;
S54:分别针对各视频帧,将步骤S53获得的所有u(m)相加,获得各视频帧的模糊度参数Vcb,并对各视频帧的模糊度参数Vcb进行标准化处理,标准化处理方法如下式:S54: for each video frame, add all u(m) obtained in step S53 to obtain the blur parameter Vcb of each video frame, and standardize the blur parameter Vcb of each video frame, the standardization method is as follows :
式中,Vcbi表示视频帧i的模糊度参数,VcbNormi表示经过标准化处理的视频帧i的模糊度参数,i的取值为{t-τ,t,t+τ};In the formula, Vcb i represents the blurriness parameter of video frame i, VcbNorm i represents the blurriness parameter of the normalized video frame i, and the value of i is {t-τ, t, t+τ};
S55:将步骤S54获得的经过标准化处理的各视频帧的模糊度参数VcbNormi输入至softmax分类网络,获得支撑帧It-τ、参考帧It、支撑帧It+τ分别所对应的权重系数ωt-τ、ωt、ωt+τ。S55: Input the ambiguity parameter VcbNorm i of each standardized video frame obtained in step S54 into the softmax classification network, and obtain the weights corresponding to the support frame It -τ , the reference frame It, and the support frame It +τ respectively Coefficients ω t-τ , ω t , ω t+τ .
作为本发明的一种优选技术方案:步骤S6以变形特征模块输出的各组视频帧对的变形特征、深度卷积特征图提取模块输出的各视频帧所对应的各深度卷积特征图、以及权重系数计算模块输出的各视频帧的权重系数为输入,获得由视频帧对构成的视频帧组所对应的聚合特征,并通过检测神经网络,以目标人物的预设信息为输出,构建检测网络模块的具体步骤如下:As a preferred technical solution of the present invention: step S6 uses the deformation features of each group of video frame pairs output by the deformation feature module, the depth convolution feature maps corresponding to each video frame output by the depth convolution feature map extraction module, and The weight coefficient of each video frame output by the weight coefficient calculation module is used as input, and the aggregation features corresponding to the video frame group composed of video frame pairs are obtained, and through the detection neural network, the preset information of the target person is used as the output to construct a detection network. The specific steps of the module are as follows:
S61:基于形特征模块输出的变形特征ft-τ→t、ft+τ→t,深度卷积特征图提取模块输出的参考帧It所对应的特征图ft,以及权重系数计算模块输出的各变形特征所对应的权重系数ωt-τ、ωt、ωt+τ,根据下式获得由支撑帧It-τ、参考帧It、支撑帧It+τ所构成的视频帧组的聚合特征J;S61: Based on the deformed features f t-τ→t and f t+τ→t output by the shape feature module, the feature map f t corresponding to the reference frame It output by the depth convolution feature map extraction module, and the weight coefficient calculation module The weight coefficients ω t-τ , ω t , ω t+τ corresponding to the output deformation features are obtained according to the following formula to obtain the video composed of the support frame It -τ , the reference frame It , and the support frame It + τ The aggregated feature J of the frame group;
J=ft-τ→tωt-τ+ftωt+ft+τ→tωt+τ J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
S62:将聚合特征输入检测神经网络,获得目标人物的预设信息。S62: Input the aggregated features into the detection neural network to obtain preset information of the target person.
本发明还设计一种基于碎片化视频信息的任务目标检测系统,包括:The present invention also designs a task target detection system based on fragmented video information, including:
一个或多个处理器;one or more processors;
存储器,存储可被操作的指令,所述指令在通过所述一个或多个处理器执行时使得所述一个或多个处理器执行操作,通过以下步骤获得目标人物检测模型,然后应用目标人物检测模型,完成对预设目标人物的检测:a memory storing instructions operable that, when executed by the one or more processors, cause the one or more processors to perform operations to obtain a target person detection model by the following steps, and then apply the target person detection The model completes the detection of the preset target person:
S1.实时采集包含目标人物行走的视频,将包含目标人物行走的视频转换为按时序排列的视频帧序列,提取视频帧序列中预设位置的预设帧数的连续视频帧作为有效帧序列,以视频帧序列为输入,以有效帧序列为输出,构建有效帧序列提取模块;S1. Real-time acquisition of a video containing the walking of the target person, converting the video containing the walking of the target person into a sequence of video frames arranged in time sequence, and extracting continuous video frames of a preset number of frames at a preset position in the video frame sequence as a valid frame sequence, Taking the video frame sequence as the input and the valid frame sequence as the output, constructs the valid frame sequence extraction module;
S2.以有效帧序列提取模块输出的有效帧序列为输入,基于深度卷积神经网络,以有效帧序列中各视频帧所对应的各深度卷积特征图为输出,构建深度卷积特征图提取模块;S2. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the deep convolutional neural network, and using the depth convolution feature map corresponding to each video frame in the valid frame sequence as the output, construct the depth convolution feature map extraction module;
S3.以有效帧序列提取模块输出的有效帧序列为输入,基于光流神经网络,针对由有效帧序列中彼此间隔预设帧数的两视频帧所构成的各组视频帧对,计算该各组视频帧对的光流参数作为目标人物的运动信息,以有效帧序列中各组视频帧对的光流参数为输出,构建光流信息模块;S3. Taking the valid frame sequence output by the valid frame sequence extraction module as input, based on the optical flow neural network, for each group of video frame pairs formed by two video frames separated by a preset number of frames from each other in the valid frame sequence, calculate the The optical flow parameters of the group of video frame pairs are used as the motion information of the target person, and the optical flow parameters of each group of video frame pairs in the valid frame sequence are used as the output to construct an optical flow information module;
S4.以深度卷积特征图提取模块输出的各深度卷积特征图、光流信息模块输出的有效帧序列中各组视频帧对的光流参数为输入,基于双线性扭曲函数,以各组视频帧对的变形特征为输出,构建变形特征模块;S4. Take the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the valid frame sequence output by the optical flow information module as input, based on the bilinear distortion function, take each The deformation features of the group video frame pairs are output, and the deformation feature module is constructed;
S5.以有效帧序列提取模块输出的有效帧序列为输入,基于模糊映射网络、显著性检测网络,分别获得有效帧序列中各视频帧所对应的模糊特征、显著性特征,根据模糊特征、显著性特征,并基于softmax分类网络,获得有效帧序列中各视频帧的权重系数;以有效帧序列中各视频帧的权重系数为输出,构建权重系数计算模块;S5. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the fuzzy mapping network and the saliency detection network, respectively obtain the fuzzy features and saliency features corresponding to each video frame in the valid frame sequence. The weight coefficient of each video frame in the valid frame sequence is obtained based on the softmax classification network; the weight coefficient calculation module of each video frame in the valid frame sequence is used as the output to construct a weight coefficient calculation module;
S6.以变形特征模块输出的各组视频帧对的变形特征、深度卷积特征图提取模块输出的各视频帧所对应的各深度卷积特征图、以及权重系数计算模块输出的各视频帧的权重系数为输入,获得由视频帧对构成的视频帧组所对应的聚合特征,并通过检测神经网络,以目标人物的预设信息为输出,构建检测网络模块;S6. Deformation features of each group of video frame pairs output by the deformation feature module, each depth convolution feature map corresponding to each video frame output by the depth convolution feature map extraction module, and each video frame output by the weight coefficient calculation module. The weight coefficient is used as the input, and the aggregation features corresponding to the video frame group composed of video frame pairs are obtained, and the detection network module is constructed with the preset information of the target person as the output through the detection neural network;
S7.以实时采集包含目标人物行走的视频所对应的视频帧序列为输入,以目标人物的预设信息为输出,基于有效帧序列提取模块、深度卷积特征图提取模块、光流信息模块、变形特征模块、权重系数计算模块,构建目标人物检测待训练模型,并基于包含目标人物行走的视频样本的参与训练,获得目标人物检测模型,完成对目标人物的检测。S7. Take the real-time acquisition of the video frame sequence corresponding to the video of the target person walking as input, and take the preset information of the target person as output, based on the effective frame sequence extraction module, the depth convolution feature map extraction module, the optical flow information module, The deformation feature module and the weight coefficient calculation module are used to construct the target person detection model to be trained, and the target person detection model is obtained based on the participating training of the video samples containing the target person walking, and the target person detection is completed.
本发明还设计一种存储软件的计算机可读取介质,其特征在于,所述可读取介质包括能通过一个或多个计算机执行的指令,所述指令在被所述一个或多个计算机执行时,执行所述一种基于碎片化视频信息的任务目标检测方法的操作。The present invention also provides a computer-readable medium for storing software, wherein the readable medium includes instructions that can be executed by one or more computers, and the instructions are executed by the one or more computers. When , the operations of the task target detection method based on fragmented video information are performed.
有益效果:相对于现有技术,本发明的优点包括:Beneficial effects: Compared with the prior art, the advantages of the present invention include:
1.本发明引入了光流神经网络计算任意两帧之间的流畅,不是只关注于某一帧的特征,而是注重上下帧之间的关系。1. The present invention introduces the optical flow neural network to calculate the smoothness between any two frames, not only focusing on the features of a certain frame, but focusing on the relationship between the upper and lower frames.
2.本发明提出了一种新的视频目标检测算法,主要研究模糊对视频目标检测的影响,物体外观清晰的帧比物体外观模糊的帧对结果的贡献更大。2. The present invention proposes a new video target detection algorithm, which mainly studies the influence of blur on video target detection. Frames with clear object appearance contribute more to the results than frames with blurred object appearance.
3.本发明通过基于碎片化视频信息的人物目标检测方法,有助于在视频不连续时对人物进行检测,提高视频目标检测的精度。3. The present invention, through the method for detecting human objects based on fragmented video information, helps to detect human characters when the video is discontinuous, and improves the detection accuracy of video objects.
附图说明Description of drawings
图1是根据本发明实施例提供的基于碎片化视频信息的任务目标检测方法的流程图;1 is a flowchart of a task target detection method based on fragmented video information provided according to an embodiment of the present invention;
图2是根据本发明实施例提供的基于碎片化视频信息的任务目标检测网络框架示意图。FIG. 2 is a schematic diagram of a network framework for task target detection based on fragmented video information provided according to an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图对本发明作进一步描述。以下实施例仅用于更加清楚地说明本发明的技术方案,而不能以此来限制本发明的保护范围。The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solutions of the present invention more clearly, and cannot be used to limit the protection scope of the present invention.
参照图1、图2,本发明实施例提供的一种基于碎片化视频信息的任务目标检测方法,按预设周期执行步骤S1-步骤S7,获得目标人物检测模型,然后应用目标人物检测模型,完成对预设目标人物的检测;Referring to FIG. 1 and FIG. 2, an embodiment of the present invention provides a task target detection method based on fragmented video information. Steps S1 to S7 are performed according to a preset cycle to obtain a target person detection model, and then the target person detection model is applied, Complete the detection of the preset target person;
S1.实时采集包含目标人物行走的视频,将包含目标人物行走的视频转换为按时序排列的视频帧序列,提取视频帧序列中预设位置的预设帧数的连续视频帧作为有效帧序列,以视频帧序列为输入,以有效帧序列为输出,构建有效帧序列提取模块;S1. Real-time acquisition of a video containing the walking of the target person, converting the video containing the walking of the target person into a sequence of video frames arranged in time sequence, and extracting continuous video frames of a preset number of frames at a preset position in the video frame sequence as a valid frame sequence, Taking the video frame sequence as the input and the valid frame sequence as the output, constructs the valid frame sequence extraction module;
S2.以有效帧序列提取模块输出的有效帧序列为输入,基于深度卷积神经网络,以有效帧序列中各视频帧所对应的各深度卷积特征图为输出,构建深度卷积特征图提取模块;S2. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the deep convolutional neural network, and using the depth convolution feature map corresponding to each video frame in the valid frame sequence as the output, construct the depth convolution feature map extraction module;
在一个实施例中,所述深度卷积神经网络为Restnet-101。In one embodiment, the deep convolutional neural network is Restnet-101.
S3.以有效帧序列提取模块输出的有效帧序列为输入,基于光流神经网络,针对由有效帧序列中彼此间隔预设帧数的两视频帧所构成的各组视频帧对,计算该各组视频帧对的光流参数作为目标人物的运动信息,以有效帧序列中各组视频帧对的光流参数为输出,构建光流信息模块;S3. Taking the valid frame sequence output by the valid frame sequence extraction module as input, based on the optical flow neural network, for each group of video frame pairs formed by two video frames separated by a preset number of frames from each other in the valid frame sequence, calculate the The optical flow parameters of the group of video frame pairs are used as the motion information of the target person, and the optical flow parameters of each group of video frame pairs in the valid frame sequence are used as the output to construct an optical flow information module;
由于光流神经网络输出的光流参数的分辨率与深度卷积特征图的分辨率不匹配,因此需要调整光流参数的大小来匹配特征图。Since the resolution of the optical flow parameters output by the optical flow neural network does not match the resolution of the depthwise convolutional feature maps, it is necessary to adjust the size of the optical flow parameters to match the feature maps.
步骤S3中以有效帧序列提取模块输出的有效帧序列为输入,基于光流神经网络,针对由有效帧序列中彼此间隔预设帧数的两视频帧所构成的各组视频帧对,计算该各组视频帧对的光流参数作为目标人物的运动信息,以有效帧序列中各组视频帧对的光流参数为输出,构建光流信息模块的具体步骤如下:In step S3, the valid frame sequence output by the valid frame sequence extraction module is used as input, and based on the optical flow neural network, for each group of video frame pairs formed by two video frames separated by a preset number of frames from each other in the valid frame sequence, calculate the The optical flow parameters of each group of video frame pairs are used as the motion information of the target person, and the optical flow parameters of each group of video frame pairs in the valid frame sequence are used as the output. The specific steps of constructing the optical flow information module are as follows:
S31:定义有效帧序列中的第t帧视频帧It为参考帧,有效帧序列中的第t-τ帧视频帧It-τ、第t+τ帧视频帧It+τ为支撑帧,将参考帧It、支撑帧It-τ、支撑帧It+τ输入光流神经网络;S31: Define the t-th video frame It in the valid frame sequence as a reference frame, and the t -τ-th video frame It -τ and the t+τ-th video frame It + τ in the valid frame sequence as support frames , input the reference frame It, the support frame It -τ , and the support frame It +τ into the optical flow neural network;
S32:光流神经网络包括卷积层、扩大层,参考帧It、支撑帧It-τ、支撑帧It+τ通过光流神经网络卷积层组成的收缩部分,获得参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图;S32: The optical flow neural network includes a convolutional layer, an expansion layer, the reference frame It, the support frame It -τ , and the support frame It +τ pass through the contraction part composed of the convolutional layer of the optical flow neural network to obtain the reference frame It . , the feature maps corresponding to the support frame It -τ and the support frame It +τ respectively;
S33:由于步骤S32使各特征图尺寸缩小,因此需要通过一个扩大层,将各特征图的尺寸扩大到原图尺寸。参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图经过光流神经网络的扩大层,获得尺寸扩大至原图大小的参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图;S33: Since the size of each feature map is reduced in step S32, an expansion layer is required to expand the size of each feature map to the original size. The feature maps corresponding to the reference frame It, the support frame It -τ , and the support frame It +τ respectively pass through the expansion layer of the optical flow neural network to obtain the reference frame It and the support frame It whose size is expanded to the original image size . -τ and the feature maps corresponding to the support frame It +τ respectively;
S34:所谓光流参数,就是利用有效帧序列中各视频帧中的像素在时间域上的变化,以及两帧视频帧之间的相关性来寻找两帧视频帧之间所存在的对应关系,从而计算出目标人物的运动信息。S34: The so-called optical flow parameter is to use the change of the pixels in each video frame in the valid frame sequence in the time domain and the correlation between the two video frames to find the corresponding relationship between the two video frames. Thereby, the motion information of the target person is calculated.
基于步骤S33所获得的参考帧It、支撑帧It-τ、支撑帧It+τ分别所对应的特征图进行光流预测,分别以参考帧It、支撑帧It-τ作为一个视频帧对,以及参考帧It、支撑帧It+τ作为一个视频帧对,分别获得参考帧It、支撑帧It-τ分别所对应的特征图之间的光流参数Mt-τ→t、以及参考帧It、支撑帧It+τ分别所对应的特征图之间的光流参数Mt+τ→t如下式:Optical flow prediction is performed based on the feature maps corresponding to the reference frame It, the support frame It -τ , and the support frame It +τ obtained in step S33, respectively, and the reference frame It and the support frame It -τ are taken as one The video frame pair, the reference frame It and the support frame It +τ are taken as a video frame pair to obtain the optical flow parameter M t- between the feature maps corresponding to the reference frame It and the support frame It - τ respectively τ→t and the optical flow parameter M t+τ→t between the feature maps corresponding to the reference frame It and the support frame It +τ respectively are as follows:
Mt-τ→t=FlowNet(It-τ,It)M t-τ→t =FlowNet(I t-τ ,I t )
Mt+τ→t=FlowNet(It+τ,It)M t+τ→t =FlowNet(I t+τ ,I t )
式中,Mt-τ→t为参考帧It、支撑帧It-τ分别所对应的特征图之间的光流参数,t-τ→t表示参考帧It、支撑帧It-τ的对应关系,Mt+τ→t为参考帧It、支撑帧It+τ分别所对应的特征图之间的光流参数,t+τ→t表示参考帧It、支撑帧It+τ的对应关系,FlowNet表示光流神经网络计算。In the formula, M t-τ→t is the optical flow parameter between the feature maps corresponding to the reference frame It and the support frame It -τ respectively, and t -τ→ t represents the reference frame It and the support frame It- The correspondence between τ , M t+τ→t is the optical flow parameter between the feature maps corresponding to the reference frame It and the support frame It +τ respectively, t +τ→ t is the reference frame It, the support frame I The corresponding relationship of t+τ , FlowNet represents the optical flow neural network calculation.
S4.参考图2,图中WARP表示双线性扭曲函数,aggregation表示变形特征,以深度卷积特征图提取模块输出的各深度卷积特征图、光流信息模块输出的有效帧序列中各组视频帧对的光流参数为输入,基于双线性扭曲函数,以各组视频帧对的变形特征为输出,构建变形特征模块;S4. Referring to Figure 2, in the figure WARP represents the bilinear distortion function, aggregation represents the deformation feature, and each group in the effective frame sequence output by the depth convolution feature map extraction module and the effective frame sequence output by the optical flow information module is used. The optical flow parameters of the video frame pair are input, and the deformation feature module is constructed based on the bilinear distortion function with the deformation features of each group of video frame pairs as the output;
步骤S4以深度卷积特征图提取模块输出的各深度卷积特征图、光流信息模块输出的有效帧序列中各组视频帧对的光流参数为输入,基于双线性扭曲函数,以各组视频帧对的变形特征为输出,构建变形特征模块的具体方法如下式:Step S4 takes each depthwise convolutional feature map output by the depthwise convolutional feature map extraction module and the optical flow parameters of each group of video frame pairs in the valid frame sequence outputted by the optical flow information module as input, based on the bilinear distortion function, with each The deformation features of the video frame pair are output, and the specific method of constructing the deformation feature module is as follows:
ft-τ→t=W(ft-τ,Mt-τ→t)f t-τ→t =W(f t-τ ,M t-τ→t )
ft+τ→t=W(ft+τ,Mt+τ→t)f t+τ→t =W(f t+τ ,M t+τ→t )
式中,ft-τ→t为参考帧It、支撑帧It-τ之间的变形特征,ft+τ→t为参考帧It、支撑帧It+τ之间的变形特征,W表示基于双线性扭曲函数计算,ft-τ为深度卷积特征图提取模块输出的支撑帧It-τ所对应的特征图,ft+τ为深度卷积特征图提取模块输出的支撑帧It-τ所对应的特征图。In the formula, f t-τ→t is the deformation feature between the reference frame It and the support frame It -τ , and f t +τ→t is the deformation feature between the reference frame It and the support frame It +τ , W represents the calculation based on the bilinear distortion function, f t-τ is the feature map corresponding to the support frame I t-τ output by the depthwise convolution feature map extraction module, and f t+τ is the output of the depthwise convolution feature map extraction module The feature map corresponding to the supporting frame It -τ of .
S5.以有效帧序列提取模块输出的有效帧序列为输入,基于模糊映射网络、显著性检测网络,分别获得有效帧序列中各视频帧所对应的模糊特征、显著性特征,根据模糊特征、显著性特征,并基于softmax分类网络,获得有效帧序列中各视频帧的权重系数;以有效帧序列中各视频帧的权重系数为输出,构建权重系数计算模块;S5. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the fuzzy mapping network and the saliency detection network, respectively obtain the fuzzy features and saliency features corresponding to each video frame in the valid frame sequence. The weight coefficient of each video frame in the valid frame sequence is obtained based on the softmax classification network; the weight coefficient calculation module of each video frame in the valid frame sequence is used as the output to construct a weight coefficient calculation module;
其中模糊映射网络为DBM,显著性检测网络为CSNet,模糊映射网络用于获得视频帧的模糊程度,显著性检测网络用于消除图像中的背景干扰。The fuzzy mapping network is DBM, the saliency detection network is CSNet, the fuzzy mapping network is used to obtain the blur degree of the video frame, and the saliency detection network is used to eliminate the background interference in the image.
步骤S5以有效帧序列提取模块输出的有效帧序列为输入,基于模糊映射网络、显著性检测网络,分别获得各视频帧所对应的模糊特征、显著性特征,根据模糊特征、显著性特征,并基于softmax分类网络,获得各视频帧的权重系数;以有效帧序列中各视频帧的权重系数为输出,构建权重系数计算模块的具体步骤如下:Step S5 takes the valid frame sequence output by the valid frame sequence extraction module as the input, and based on the fuzzy mapping network and the saliency detection network, respectively obtains the fuzzy features and salient features corresponding to each video frame. Based on the softmax classification network, the weight coefficient of each video frame is obtained; with the weight coefficient of each video frame in the valid frame sequence as the output, the specific steps of constructing the weight coefficient calculation module are as follows:
S51:将有效帧序列中的各视频帧分别输入模糊映射网络、显著性检测网络,获得分别获得各视频帧所对应的模糊特征、显著性特征;S51: input each video frame in the valid frame sequence into the fuzzy mapping network and the saliency detection network respectively, and obtain respectively the fuzzy feature and the saliency feature corresponding to each video frame;
S52:通过点乘步骤S51所获得的各视频帧所对应的模糊特征、显著性特征,获得各视频帧所对应的校正后的模糊映射Mblur-sali;S52: obtain the corresponding fuzzy map M blur-sali of each video frame corresponding to each video frame by point multiplication step S51 corresponding fuzzy feature, salient feature;
S53:基于阈值为0.5的阶跃函数对各视频帧所对应的校正后的模糊映射Mblur-sali进行二值化,其中阶跃函数如下式:S53: Binarize the corrected blur map M blur-sali corresponding to each video frame based on a step function with a threshold of 0.5, wherein the step function is as follows:
式中,m为各视频帧所对应的校正后的模糊映射Mblur-sali的值,u(m)为经过二值化处理后各视频帧所对应的校正后的模糊映射Mblur-sali的值;In the formula, m is the value of the corrected blur map M blur-sali corresponding to each video frame, and u(m) is the corrected blur map M blur-sali corresponding to each video frame after binarization processing. value;
S54:分别针对各视频帧,将步骤S53获得的所有u(m)相加,获得各视频帧的模糊度参数Vcb,并对各视频帧的模糊度参数Vcb进行标准化处理,标准化处理方法如下式:S54: for each video frame, add all u(m) obtained in step S53 to obtain the blur parameter Vcb of each video frame, and standardize the blur parameter Vcb of each video frame, the standardization method is as follows :
式中,Vcbi表示视频帧i的模糊度参数,VcbNormi表示经过标准化处理的视频帧i的模糊度参数,i的取值为{t-τ,t,t+τ};In the formula, Vcb i represents the blurriness parameter of video frame i, VcbNorm i represents the blurriness parameter of the normalized video frame i, and the value of i is {t-τ, t, t+τ};
S55:将步骤S54获得的经过标准化处理的各视频帧的模糊度参数VcbNormi输入至softmax分类网络,获得支撑帧It-τ、参考帧It、支撑帧It+τ分别所对应的权重系数ωt-τ、ωt、ωt+τ。S55: Input the ambiguity parameter VcbNorm i of each standardized video frame obtained in step S54 into the softmax classification network, and obtain the weights corresponding to the support frame It -τ , the reference frame It, and the support frame It +τ respectively Coefficients ω t-τ , ω t , ω t+τ .
S6.以变形特征模块输出的各组视频帧对的变形特征、深度卷积特征图提取模块输出的各视频帧所对应的各深度卷积特征图、以及权重系数计算模块输出的各视频帧的权重系数为输入,获得由视频帧对构成的视频帧组所对应的聚合特征,并通过检测神经网络,以目标人物的预设信息为输出,构建检测网络模块;S6. Deformation features of each group of video frame pairs output by the deformation feature module, each depth convolution feature map corresponding to each video frame output by the depth convolution feature map extraction module, and each video frame output by the weight coefficient calculation module. The weight coefficient is used as the input, and the aggregation features corresponding to the video frame group composed of video frame pairs are obtained, and the detection network module is constructed with the preset information of the target person as the output through the detection neural network;
在一个实施例中,所述检测神经网络为Faster R-CNN。In one embodiment, the detection neural network is Faster R-CNN.
步骤S6以变形特征模块输出的各组视频帧对的变形特征、深度卷积特征图提取模块输出的各视频帧所对应的各深度卷积特征图、以及权重系数计算模块输出的各视频帧的权重系数为输入,获得由视频帧对构成的视频帧组所对应的聚合特征,并通过检测神经网络,以目标人物的预设信息为输出,构建检测网络模块的具体步骤如下:Step S6 uses the deformation features of each group of video frame pairs output by the deformation feature module, each depth convolution feature map corresponding to each video frame output by the depth convolution feature map extraction module, and each video frame output by the weight coefficient calculation module. The weight coefficient is used as the input, and the aggregation features corresponding to the video frame group composed of video frame pairs are obtained, and the preset information of the target person is used as the output through the detection neural network. The specific steps of constructing the detection network module are as follows:
S61:基于形特征模块输出的变形特征ft-τ→t、ft+τ→t,深度卷积特征图提取模块输出的参考帧It所对应的特征图ft,以及权重系数计算模块输出的各变形特征所对应的权重系数ωt-τ、ωt、ωt+τ,根据下式获得由支撑帧It-τ、参考帧It、支撑帧It+τ所构成的视频帧组的聚合特征J;S61: Based on the deformed features f t-τ→t and f t+τ→t output by the shape feature module, the feature map f t corresponding to the reference frame It output by the depth convolution feature map extraction module, and the weight coefficient calculation module The weight coefficients ω t-τ , ω t , ω t+τ corresponding to the output deformation features are obtained according to the following formula to obtain the video composed of the support frame It -τ , the reference frame It , and the support frame It + τ The aggregated feature J of the frame group;
J=ft-τ→tωt-τ+ftωt+ft+τ→tωt+τ J=f t-τ→t ω t-τ +f t ω t +f t+τ→t ω t+τ
S62:将聚合特征输入检测神经网络,获得目标人物的预设信息。S62: Input the aggregated features into the detection neural network to obtain preset information of the target person.
S7.以实时采集包含目标人物行走的视频所对应的视频帧序列为输入,以目标人物的预设信息为输出,基于有效帧序列提取模块、深度卷积特征图提取模块、光流信息模块、变形特征模块、权重系数计算模块,构建目标人物检测待训练模型,并基于包含目标人物行走的视频样本的参与训练,获得目标人物检测模型,完成对目标人物的检测。S7. Take the real-time acquisition of the video frame sequence corresponding to the video of the target person walking as input, and take the preset information of the target person as output, based on the effective frame sequence extraction module, the depth convolution feature map extraction module, the optical flow information module, The deformation feature module and the weight coefficient calculation module are used to construct the target person detection model to be trained, and the target person detection model is obtained based on the participating training of the video samples containing the target person walking, and the target person detection is completed.
本发明实施例提供的一种基于碎片化视频信息的任务目标检测系统,包括:A task target detection system based on fragmented video information provided by an embodiment of the present invention includes:
一个或多个处理器;one or more processors;
存储器,存储可被操作的指令,所述指令在通过所述一个或多个处理器执行时使得所述一个或多个处理器执行操作,通过以下步骤获得目标人物检测模型,然后应用目标人物检测模型,完成对预设目标人物的检测:a memory storing instructions operable that, when executed by the one or more processors, cause the one or more processors to perform operations to obtain a target person detection model by the following steps, and then apply the target person detection The model completes the detection of the preset target person:
S1.实时采集包含目标人物行走的视频,将包含目标人物行走的视频转换为按时序排列的视频帧序列,提取视频帧序列中预设位置的预设帧数的连续视频帧作为有效帧序列,以视频帧序列为输入,以有效帧序列为输出,构建有效帧序列提取模块;S1. Real-time acquisition of a video containing the walking of the target person, converting the video containing the walking of the target person into a sequence of video frames arranged in time sequence, and extracting continuous video frames of a preset number of frames at a preset position in the video frame sequence as a valid frame sequence, Taking the video frame sequence as the input and the valid frame sequence as the output, constructs the valid frame sequence extraction module;
S2.以有效帧序列提取模块输出的有效帧序列为输入,基于深度卷积神经网络,以有效帧序列中各视频帧所对应的各深度卷积特征图为输出,构建深度卷积特征图提取模块;S2. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the deep convolutional neural network, and using the depth convolution feature map corresponding to each video frame in the valid frame sequence as the output, construct the depth convolution feature map extraction module;
S3.以有效帧序列提取模块输出的有效帧序列为输入,基于光流神经网络,针对由有效帧序列中彼此间隔预设帧数的两视频帧所构成的各组视频帧对,计算该各组视频帧对的光流参数作为目标人物的运动信息,以有效帧序列中各组视频帧对的光流参数为输出,构建光流信息模块;S3. Taking the valid frame sequence output by the valid frame sequence extraction module as input, based on the optical flow neural network, for each group of video frame pairs formed by two video frames separated by a preset number of frames from each other in the valid frame sequence, calculate the The optical flow parameters of the group of video frame pairs are used as the motion information of the target person, and the optical flow parameters of each group of video frame pairs in the valid frame sequence are used as the output to construct an optical flow information module;
S4.以深度卷积特征图提取模块输出的各深度卷积特征图、光流信息模块输出的有效帧序列中各组视频帧对的光流参数为输入,基于双线性扭曲函数,以各组视频帧对的变形特征为输出,构建变形特征模块;S4. Take the depth convolution feature maps output by the depth convolution feature map extraction module and the optical flow parameters of each group of video frame pairs in the valid frame sequence output by the optical flow information module as input, based on the bilinear distortion function, take each The deformation features of the group video frame pairs are output, and the deformation feature module is constructed;
S5.以有效帧序列提取模块输出的有效帧序列为输入,基于模糊映射网络、显著性检测网络,分别获得有效帧序列中各视频帧所对应的模糊特征、显著性特征,根据模糊特征、显著性特征,并基于softmax分类网络,获得有效帧序列中各视频帧的权重系数;以有效帧序列中各视频帧的权重系数为输出,构建权重系数计算模块;S5. Taking the valid frame sequence output by the valid frame sequence extraction module as the input, based on the fuzzy mapping network and the saliency detection network, respectively obtain the fuzzy features and saliency features corresponding to each video frame in the valid frame sequence. The weight coefficient of each video frame in the valid frame sequence is obtained based on the softmax classification network; the weight coefficient calculation module of each video frame in the valid frame sequence is used as the output to construct a weight coefficient calculation module;
S6.以变形特征模块输出的各组视频帧对的变形特征、深度卷积特征图提取模块输出的各视频帧所对应的各深度卷积特征图、以及权重系数计算模块输出的各视频帧的权重系数为输入,获得由视频帧对构成的视频帧组所对应的聚合特征,并通过检测神经网络,以目标人物的预设信息为输出,构建检测网络模块;S6. Deformation features of each group of video frame pairs output by the deformation feature module, each depth convolution feature map corresponding to each video frame output by the depth convolution feature map extraction module, and each video frame output by the weight coefficient calculation module. The weight coefficient is used as the input, and the aggregation features corresponding to the video frame group composed of video frame pairs are obtained, and the detection network module is constructed with the preset information of the target person as the output through the detection neural network;
S7.以实时采集包含目标人物行走的视频所对应的视频帧序列为输入,以目标人物的预设信息为输出,基于有效帧序列提取模块、深度卷积特征图提取模块、光流信息模块、变形特征模块、权重系数计算模块,构建目标人物检测待训练模型,并基于包含目标人物行走的视频样本的参与训练,获得目标人物检测模型,完成对目标人物的检测。S7. Take the real-time acquisition of the video frame sequence corresponding to the video of the target person walking as input, and take the preset information of the target person as output, based on the effective frame sequence extraction module, the depth convolution feature map extraction module, the optical flow information module, The deformation feature module and the weight coefficient calculation module are used to construct the target person detection model to be trained, and the target person detection model is obtained based on the participating training of the video samples containing the target person walking, and the target person detection is completed.
本发明实施例提供的一种存储软件的计算机可读取介质,所述可读取介质包括能通过一个或多个计算机执行的指令,所述指令在被所述一个或多个计算机执行时,执行所述一种基于碎片化视频信息的任务目标检测方法的操作。An embodiment of the present invention provides a computer-readable medium for storing software, where the readable medium includes instructions that can be executed by one or more computers, and when the instructions are executed by the one or more computers, The operations of the method for task target detection based on fragmented video information are performed.
上面结合附图对本发明的实施方式作了详细说明,但是本发明并不限于上述实施方式,在本领域普通技术人员所具备的知识范围内,还可以在不脱离本发明宗旨的前提下做出各种变化。The embodiments of the present invention have been described in detail above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned embodiments, and can also be made within the scope of knowledge possessed by those of ordinary skill in the art without departing from the purpose of the present invention. Various changes.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210375278.1A CN114882523B (en) | 2022-04-11 | 2022-04-11 | A mission target detection method and system based on fragmented video information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210375278.1A CN114882523B (en) | 2022-04-11 | 2022-04-11 | A mission target detection method and system based on fragmented video information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114882523A true CN114882523A (en) | 2022-08-09 |
CN114882523B CN114882523B (en) | 2024-11-05 |
Family
ID=82669897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210375278.1A Active CN114882523B (en) | 2022-04-11 | 2022-04-11 | A mission target detection method and system based on fragmented video information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114882523B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111476314A (en) * | 2020-04-27 | 2020-07-31 | 中国科学院合肥物质科学研究院 | Fuzzy video detection method integrating optical flow algorithm and deep learning |
CN111814884A (en) * | 2020-07-10 | 2020-10-23 | 江南大学 | An upgrade method of target detection network model based on deformable convolution |
CN113239825A (en) * | 2021-05-19 | 2021-08-10 | 四川中烟工业有限责任公司 | High-precision tobacco beetle detection method in complex scene |
US20210327031A1 (en) * | 2020-04-15 | 2021-10-21 | Tsinghua Shenzhen International Graduate School | Video blind denoising method based on deep learning, computer device and computer-readable storage medium |
-
2022
- 2022-04-11 CN CN202210375278.1A patent/CN114882523B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210327031A1 (en) * | 2020-04-15 | 2021-10-21 | Tsinghua Shenzhen International Graduate School | Video blind denoising method based on deep learning, computer device and computer-readable storage medium |
CN111476314A (en) * | 2020-04-27 | 2020-07-31 | 中国科学院合肥物质科学研究院 | Fuzzy video detection method integrating optical flow algorithm and deep learning |
CN111814884A (en) * | 2020-07-10 | 2020-10-23 | 江南大学 | An upgrade method of target detection network model based on deformable convolution |
CN113239825A (en) * | 2021-05-19 | 2021-08-10 | 四川中烟工业有限责任公司 | High-precision tobacco beetle detection method in complex scene |
Non-Patent Citations (1)
Title |
---|
李森等: "基于时空建模的视频帧预测模型", 物联网技术, no. 02, 20 February 2020 (2020-02-20) * |
Also Published As
Publication number | Publication date |
---|---|
CN114882523B (en) | 2024-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111627044B (en) | Target tracking attack and defense method based on deep network | |
CN109583340B (en) | A video object detection method based on deep learning | |
CN112434655B (en) | A Gait Recognition Method Based on Adaptive Confidence Graph Convolutional Network | |
CN110807742B (en) | A low-light image enhancement method based on integrated network | |
CN107452015B (en) | A Target Tracking System with Redetection Mechanism | |
CN111047543B (en) | Image enhancement method, device and storage medium | |
CN107688829A (en) | A kind of identifying system and recognition methods based on SVMs | |
CN113361542A (en) | Local feature extraction method based on deep learning | |
CN111429485B (en) | Cross-modal filter tracking method based on adaptive regularization and high confidence update | |
CN106951826B (en) | Face detection method and device | |
CN105374039A (en) | Monocular image depth information estimation method based on contour acuity | |
CN114529946A (en) | Pedestrian re-identification method, device, equipment and storage medium based on self-supervision learning | |
CN102025919A (en) | Method and device for detecting image flicker and camera applying device | |
CN112037109A (en) | Improved image watermarking method and system based on saliency target detection | |
CN114925848A (en) | Target detection method based on transverse federated learning framework | |
CN110135435B (en) | Saliency detection method and device based on breadth learning system | |
CN109978858B (en) | A dual-frame thumbnail image quality assessment method based on foreground detection | |
CN111861949A (en) | A method and system for multi-exposure image fusion based on generative adversarial network | |
CN113920171B (en) | Bimodal target tracking method based on feature level and decision level fusion | |
CN115527050A (en) | Image feature matching method, computer device and readable storage medium | |
CN114882523B (en) | A mission target detection method and system based on fragmented video information | |
CN111145221A (en) | A Target Tracking Algorithm Based on Multi-layer Depth Feature Extraction | |
CN117496205A (en) | A heterogeneous scene matching method based on ITHM-Net | |
CN117151207A (en) | Antagonistic patch generation method based on dynamic optimization integrated model | |
CN110705568A (en) | Optimization method for image feature point extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |