CN110287826B - Video target detection method based on attention mechanism - Google Patents

Video target detection method based on attention mechanism Download PDF

Info

Publication number
CN110287826B
CN110287826B CN201910499786.9A CN201910499786A CN110287826B CN 110287826 B CN110287826 B CN 110287826B CN 201910499786 A CN201910499786 A CN 201910499786A CN 110287826 B CN110287826 B CN 110287826B
Authority
CN
China
Prior art keywords
feature
detected
frame
candidate
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910499786.9A
Other languages
Chinese (zh)
Other versions
CN110287826A (en
Inventor
李建强
白骏
刘雅琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kuaima (Beijing) Electronic Technology Co.,Ltd.
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910499786.9A priority Critical patent/CN110287826B/en
Publication of CN110287826A publication Critical patent/CN110287826A/en
Application granted granted Critical
Publication of CN110287826B publication Critical patent/CN110287826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种基于注意力机制的视频目标检测方法,涉及计算机视觉。本发明包括如下步骤:步骤S1,提取当前时间帧的候选特征图;步骤S2,在过去时间段设定融合窗口,计算窗口内各帧的拉普拉斯方差,将方差归一化作为窗口内各帧的权重,将窗口内所有帧的候选特征图进行加权求和得到时序特征,将当前时间帧的候选特征与时序特征相连接,得到待检测特征图;步骤S3,利用卷积层在待检测特征图上提取出额外尺度的特征图;步骤S4,在不同尺度的特征图上利用卷积层进行目标类别及位置预测。本发明的特征融合方法对过去时间段内不同质量的帧特征分配了不同的权重,使得时序信息的融合更加充分,提高了检测模型的性能。

Figure 201910499786

The invention relates to a video target detection method based on an attention mechanism, and relates to computer vision. The present invention includes the following steps: step S1, extracting candidate feature maps of the current time frame; step S2, setting a fusion window in the past time period, calculating the Laplacian variance of each frame in the window, and normalizing the variance as a value in the window For the weight of each frame, the candidate feature maps of all frames in the window are weighted and summed to obtain time series features, and the candidate features of the current time frame are connected with the time series features to obtain the feature map to be detected; step S3, the convolution layer is used to Feature maps of additional scales are extracted from the detection feature maps; in step S4, target categories and positions are predicted by using convolutional layers on feature maps of different scales. The feature fusion method of the present invention assigns different weights to frame features of different qualities in the past time period, so that the fusion of time sequence information is more sufficient and the performance of the detection model is improved.

Figure 201910499786

Description

Video target detection method based on attention mechanism
Technical Field
The invention relates to computer vision, deep learning and video target detection technology.
Background
The image target detection method based on deep learning has made great progress in the last five years, such as the RCNN series network, the SSD network and the YOLO series network. However, in the fields of video surveillance, vehicle-assisted driving and the like, video-based target detection has a wider demand. Due to the problems of motion blur, shielding, shape change diversity, illumination change diversity and the like in the video, a good detection result cannot be obtained only by using an image target detection technology to detect the target in the video. The adjacent frames in the video have continuity in time and similarity in space, the positions of the targets between the frames are related, and how to utilize the target time sequence information in the video becomes the key for improving the video target detection performance.
The current video target detection framework mainly comprises three types: a method for detecting video frames as independent images by using an image target detection algorithm ignores time information and independently detects each frame, so that the effect is not ideal; the other method combines target detection and target tracking technology, the method carries out post-processing on the detection result so as to track the target, the tracking precision depends on the detection, and error propagation is easy to cause; there is also a method of detecting only a few key frames and then generating the features of the remaining frames using optical flow information and key frame features, and this method uses time series information but the optical flow is very expensive to calculate and difficult to detect quickly.
Disclosure of Invention
The invention aims to provide a rapid and accurate video target detection method which fully integrates time sequence characteristics.
In order to solve the technical problem, the invention provides a video target detection method based on an attention mechanism, which comprises the following steps:
step S1, inputting the video frame image of the current time point into a Mobilenet network to extract candidate characteristic graphs;
step S2, setting a time sequence feature fusion window in the past time period adjacent to the current time point, respectively calculating the Laplacian variance of the images of the video frames to be fused in the feature fusion window, normalizing the Laplacian variance, taking the normalized Laplacian variance as the fusion weight of each frame to be fused, carrying out weighted summation on the candidate feature graphs of all the frames to be fused according to the fusion weight to obtain the time sequence feature required by the current frame, and connecting the candidate feature of the video frame at the current time point with the channel dimension of the time sequence feature in the feature to obtain the feature graph to be detected fused with the time sequence information;
step S3, extracting the feature diagram to be detected with extra scales on the feature diagram to be detected by utilizing the convolution feature extraction layer and the maximum pooling layer;
and step S4, on the feature maps to be detected with different scales, predicting the object class and the boundary frame coordinates on the current frame by using the convolutional layer.
Further, in step S1, the video frame at the current time point t is detected, and first, the video frame image I at the current time point is detectedtInputting into a Mobilenet network for feature extraction, wherein
Figure GDA0003199200410000021
HIAnd WIExtracting candidate characteristic graphs respectively corresponding to the height and the width of the video frame
Figure GDA0003199200410000022
Represents a real number, C1,H1And W1The number of feature channels, the height and the width of the candidate feature map are respectively.
Further, in step S2, a feature fusion window with a width w of S is set in the past time period of the current time point t, and the video frame images to be fused in the feature fusion window are: { It-i}i∈[1,s]And the candidate feature map corresponding to the video frame to be fused in the feature fusion window is as follows: { Ft-i}i∈[1,s]. Each video frame image I to be fused is processedt-iConversion into a grey-scale map Gt-iAnd calculating the Laplace variance of the image on the basis of the gray level map, wherein the Laplace operator at the G coordinate (x, y) of the gray level map is
Figure GDA0003199200410000023
The Laplace operator of the image captures an area with a rapidly changed pixel value in the image by calculating a second derivative of each pixel point of the image in each direction, and can be used for detecting corners in the image, the Laplace variance of the image reflects the pixel value change condition of the whole image, if the Laplace variance is large, the image is clear, and otherwise, the image is fuzzy.
First, each gray scale map G is calculatedt-iIs a laplace mean of
Figure GDA0003199200410000024
HIAnd WIHeight and width of the grey scale map respectively:
Figure GDA0003199200410000025
next, each gray-scale image G is calculatedt-iVariance of laplacian
Figure GDA0003199200410000026
Figure GDA0003199200410000031
If a video frame is sharp, its candidate features contribute to the detection of objects, whereas some frames cause image blurring due to moving objects. The candidate characteristics of the frames are not beneficial to detecting the target, different fusion weights should be allocated to the video frames with different definition degrees, so that the detection model focuses more on the clear characteristics rather than the fuzzy characteristics, and the fusion weight alpha of all the video frames to be fused is calculated firstlyt-i
Figure GDA0003199200410000032
Fusing the frame candidate features in the feature fusion window in a weighted summation mode to obtain the time sequence features of the current time point
Figure GDA0003199200410000033
And connecting the time sequence characteristics with the candidate characteristics of the current frame in a channel dimension to complete the fusion of time sequence information and obtain a first characteristic diagram to be detected for detection.
Further, in step S3, a feature map to be detected is obtained at the current time point, in which the time series feature is fused
Figure GDA0003199200410000034
Then, in order to obtain more scales of characteristic diagrams to be detected, a 3 x 3 convolutional layer and a 2 x 2 pooling layer are utilized to perform further characteristic extraction on the characteristic diagrams to be detected and reduce the size of the characteristic diagrams to be detected, so that local information in the characteristic diagrams to be detected with large size is rich, the characteristic diagrams to be detected with small size are suitable for predicting small-size targets, the characteristic diagrams to be detected with small size contain stronger global semantic information and are suitable for detecting targets with large size, and e-1 times of characteristic extraction are performed to finally obtain e characteristic diagrams to be detectedCharacteristic diagram to be detected:
Figure GDA0003199200410000035
further, in step S4, a multi-scale feature map to be detected is obtained through additional feature extraction, anchor frames with prior positions are set on the feature maps to be detected of different scales, and the offset of the target boundary frame relative to the anchor frame and the type of the target are respectively performed on the feature maps to be detected by using two 3 × 3 convolutional layers by using the channel dimension. Let the number of classes be d (including background), for each feature map to be detected
Figure GDA0003199200410000036
Obtaining a classification prediction result after prediction of a 3 multiplied by 3 convolution type prediction layer and a 3 multiplied by 3 convolution bounding box prediction layer
Figure GDA0003199200410000037
And bounding box prediction results
Figure GDA0003199200410000038
Drawings
FIG. 1 is a schematic of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views, and merely illustrate the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
Example 1
As shown in fig. 1, the present example provides a video object detection method based on attention mechanism, comprising the following steps
Step S1, inputting the video frame image of the current time point into a Mobilenet network to extract candidate characteristic graphs;
step S2, setting a time sequence feature fusion window in the past time period adjacent to the current time point, respectively calculating the Laplacian variance of the images of the video frames to be fused in the feature fusion window, normalizing the Laplacian variance to serve as the fusion weight of each frame to be fused, carrying out weighted summation on the candidate feature graphs of all the frames to be fused according to the weight to obtain the time sequence feature required by the current frame, and connecting the candidate feature of the video frame at the current time point with the time sequence feature in a channel dimension to obtain the feature graph to be detected fused with time sequence information;
step S3, extracting the feature diagram to be detected with extra scales on the feature diagram to be detected by utilizing the convolution feature extraction layer and the maximum pooling layer;
and step S4, on the feature maps to be detected with different scales, predicting the object class and the boundary frame coordinates on the current frame by using the convolutional layer.
In step S1, detecting the current time point t video frame first, the current time point t video frame image I is detectedtInputting the Mobilene to perform feature extraction, wherein
Figure GDA0003199200410000041
HIAnd WIRespectively the height and width of the frame image, and extracting candidate feature maps
Figure GDA0003199200410000042
Wherein C is1,H1,W1The number of channels, height and width of the candidate feature map are respectively.
In step S2, a feature fusion window with width w of S is set in the past time slot of the current time point t, and the length of the past time slot is q, the setting rule of the feature fusion window width is as follows, that is, if the length of the past time slot is greater than S, the fusion window width is set to S, and if the length of the past time step is less than S and there is not enough features, the fusion window width is set to the length of the past time step.
Figure GDA0003199200410000043
And making the video frame image to be fused in the feature fusion window as: { It-i}i∈[1,s]Video frames to be fused within a feature fusion windowThe corresponding candidate feature maps are: { Ft-i}i∈[1,s]. Each video frame image I to be fused is processedt-iConversion into a grey-scale map Gt-iAnd calculating the Laplace variance of the image on the basis of the gray level image, wherein the Laplace operator at the G coordinate (x, y) of the gray level image is as follows:
Figure GDA0003199200410000051
where G (x, y) represents the pixel value of the grayscale map G at the coordinates (x, y). The Laplace operator of the image captures an area with a rapidly changed pixel value in the image by calculating a second derivative of each pixel point of the image in each direction, and can be used for detecting corners in the image, the Laplace variance of the image reflects the pixel value change condition of the whole image, if the Laplace variance is large, the image is clear, and otherwise, the image is fuzzy.
First, each gray scale map G is calculatedt-iIs a laplace mean of
Figure GDA0003199200410000052
HIAnd WIRespectively the height and width of the grey scale map.
Figure GDA0003199200410000053
Next, each gray-scale image G is calculatedt-iVariance of laplacian
Figure GDA0003199200410000054
Figure GDA0003199200410000055
If a video frame is sharp, its candidate features contribute to the detection of objects, whereas some frames cause image blurring due to moving objects. The candidate characteristics of the frames are not beneficial to detecting the target, and the video with different definition degrees is obtainedThe frames should be assigned different fusion weights, the clearer frame feature weight is larger, so that the detection model focuses more on the clear features rather than the fuzzy features, and the fusion weights alpha of all the video frames to be fused are calculated firstlyt-i
Figure GDA0003199200410000058
Fusing the frame candidate features in the feature fusion window in a weighted summation mode to obtain the time sequence features of the current time point
Figure GDA0003199200410000057
Figure GDA0003199200410000061
Connecting the time sequence characteristics with the candidate characteristics of the current frame in the channel dimension to complete the fusion of the time sequence information and obtain the first characteristic diagram to be detected for detection
Figure GDA0003199200410000062
Figure GDA0003199200410000063
In step S3, a feature map to be detected is obtained in which the time sequence features are fused at the current time point
Figure GDA0003199200410000064
Then, in order to obtain a characteristic diagram to be detected with more scales, the feature diagram to be detected is further extracted by utilizing the convolution layer and the pooling layer, and the size of the characteristic diagram to be detected is reduced, so that the local information in the characteristic diagram to be detected with large size is rich, the characteristic diagram to be detected with small size is suitable for predicting small-size targets, the characteristic diagram to be detected with small size contains stronger global semantic information, and is suitable for detecting targets with large size through e-1 times of feature extraction,finally obtaining e characteristic graphs to be detected:
Figure GDA0003199200410000065
in the step S4, a multi-scale feature map to be detected is obtained through additional feature extraction, anchor frames with prior positions are set on the feature maps to be detected of different scales, and the offset of the target boundary frame relative to the anchor frame and the type of the target are respectively performed on the feature maps to be detected by using two convolutional layers by using the channel dimensions. Let the number of classes be d (including background), for each feature map to be detected
Figure GDA0003199200410000066
Figure GDA0003199200410000067
Wherein C isFi,HFi,WFiThe number of channels, height and width of the feature map are respectively, and the number of anchor frames of each pixel position is niObtaining classification prediction results after prediction of convolution type prediction layer and convolution boundary frame prediction layer
Figure GDA0003199200410000068
And bounding box prediction results
Figure GDA0003199200410000069

Claims (4)

1.一种基于注意力机制的视频目标检测方法,其特征在于,包括如下步骤:1. a video target detection method based on attention mechanism, is characterized in that, comprises the steps: 步骤S1,将当前时间点的视频帧图像输入Mobilenet提取得到候选特征图;Step S1, input the video frame image of the current time point into Mobilenet to extract the candidate feature map; 步骤S2,在与当前时间点相邻的过去时间段内设定一个时序特征融合窗口,对于特征融合窗口内的待融合的视频帧,分别计算其图像拉普拉斯方差,将其归一化后,作为各待融合帧的融合权重,按照权重将所有待融合帧的候选特征图进行加权求和得到当前帧所需的时序特征,将当前时间点的视频帧的候选特征与时序特征在通道维相连接,得到融合了时序信息的待检测特征图;In step S2, a time series feature fusion window is set in the past time period adjacent to the current time point, and for the video frames to be fused in the feature fusion window, the image Laplacian variance is calculated and normalized. Then, as the fusion weight of each frame to be fused, the candidate feature maps of all the frames to be fused are weighted and summed according to the weight to obtain the timing features required by the current frame, and the candidate features and timing features of the video frame at the current time point are in the channel. The dimensions are connected to obtain a feature map to be detected that integrates time series information; 步骤S3,利用卷积特征提取层以及最大池化层在待检测特征图上提取出额外尺度的待检测特征图;Step S3, using the convolution feature extraction layer and the maximum pooling layer to extract the feature map to be detected with extra scale on the feature map to be detected; 步骤S4,在不同尺度的待检测特征图上,利用卷积层进行当前帧上目标类别以及边界框坐标的预测。Step S4, on the feature maps of different scales to be detected, use the convolution layer to predict the target category and the coordinates of the bounding box on the current frame. 2.根据权利要求1所述的基于注意力机制的视频目标检测方法,其特征在于,2. the video target detection method based on attention mechanism according to claim 1, is characterized in that, 所述步骤S1中,对当前时间点t视频帧进行检测,首先将当前时间点视频帧图像It输入Mobilenet网络进行特征提取得到候选特征图Ft;其中
Figure FDA0003199200400000011
HI和WI分别为视频帧的高和宽,提取得到候选特征图
Figure FDA0003199200400000012
Figure FDA0003199200400000013
代表实数,C1,H1和W1分别为候选特征图的特征通道数、高和宽。
In the step S1, the video frame at the current time point t is detected, first, the video frame image I t at the current time point is input into the Mobilenet network for feature extraction to obtain a candidate feature map F t ; wherein
Figure FDA0003199200400000011
H I and W I are the height and width of the video frame, respectively, and extract the candidate feature map
Figure FDA0003199200400000012
Figure FDA0003199200400000013
represents a real number, C 1 , H 1 and W 1 are the number of feature channels, height and width of the candidate feature map, respectively.
3.根据权利要求2所述的基于注意力机制的视频目标检测方法,其特征在于,3. the video target detection method based on attention mechanism according to claim 2, is characterized in that, 所述步骤S2中,在当前时间点t的过去时间段内设定一个宽度w为s的特征融合窗口,令特征融合窗口内的待融合视频帧图像为:{It-i}i∈[1,s],特征融合窗口内待融合视频帧对应的候选特征图为:{Ft-i}i∈[1,s];将每一个待融合视频帧图像It-i转换为灰度图Gt-iIn the step S2, a feature fusion window with a width w of s is set in the past time period of the current time point t, so that the video frame images to be fused in the feature fusion window are: {I ti }i∈[1, s], the candidate feature map corresponding to the video frame to be fused in the feature fusion window is: {F ti }i∈[1,s]; convert each video frame image I ti to be fused into a grayscale image G ti ; 计算每个灰度图Gt-i的拉普拉斯方差
Figure FDA0003199200400000021
通过归一化拉普拉斯方差计算所有待融合视频帧的融合权重αt-i;将特征融合窗口内的帧候选特征以加权求和的方式进行融合得到当前时间点的时序特征
Figure FDA0003199200400000022
将时序特征与当前帧的候选特征在通道维进行连接,完成时序信息的融合,得到第一个用于检测的待检测特征图
Figure FDA0003199200400000023
Calculate the Laplace variance of each grayscale image G ti
Figure FDA0003199200400000021
Calculate the fusion weight α ti of all the video frames to be fused by normalizing the Laplacian variance; fuse the frame candidate features in the feature fusion window in the way of weighted summation to obtain the time series feature of the current time point
Figure FDA0003199200400000022
Connect the timing features with the candidate features of the current frame in the channel dimension, complete the fusion of timing information, and obtain the first feature map to be detected for detection.
Figure FDA0003199200400000023
4.根据权利要求3所述的基于注意力机制的视频目标检测方法,其特征在于,4. the video target detection method based on attention mechanism according to claim 3, is characterized in that, 所述步骤S3中,在得到当前时间点融合了时序特征的待检测特征图
Figure FDA0003199200400000024
后,利用3×3卷积层和2×2池化层对待检测特征图进行进一步特征提取同时减小待检测特征图的尺寸,经过e-1次特征提取,最终得到e个待检测特征图:
Figure FDA0003199200400000025
In the step S3, the feature map to be detected that is fused with the timing features at the current time point is obtained.
Figure FDA0003199200400000024
Then, use 3×3 convolutional layer and 2×2 pooling layer to further extract the feature map to be detected and reduce the size of the feature map to be detected. After e-1 feature extraction, e feature maps to be detected are finally obtained. :
Figure FDA0003199200400000025
CN201910499786.9A 2019-06-11 2019-06-11 Video target detection method based on attention mechanism Active CN110287826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910499786.9A CN110287826B (en) 2019-06-11 2019-06-11 Video target detection method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910499786.9A CN110287826B (en) 2019-06-11 2019-06-11 Video target detection method based on attention mechanism

Publications (2)

Publication Number Publication Date
CN110287826A CN110287826A (en) 2019-09-27
CN110287826B true CN110287826B (en) 2021-09-17

Family

ID=68003699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910499786.9A Active CN110287826B (en) 2019-06-11 2019-06-11 Video target detection method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN110287826B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674886B (en) * 2019-10-08 2022-11-25 中兴飞流信息科技有限公司 Video target detection method fusing multi-level features
CN110751646A (en) * 2019-10-28 2020-02-04 支付宝(杭州)信息技术有限公司 Method and device for identifying damage by using multiple image frames in vehicle video
CN111310609B (en) * 2020-01-22 2023-04-07 西安电子科技大学 Video target detection method based on time sequence information and local feature similarity
CN113393491B (en) * 2020-03-12 2025-02-21 优酷文化科技(北京)有限公司 Method, device and electronic device for detecting target object from video
WO2022036567A1 (en) * 2020-08-18 2022-02-24 深圳市大疆创新科技有限公司 Target detection method and device, and vehicle-mounted radar
CN112016472B (en) * 2020-08-31 2023-08-22 山东大学 Driver attention area prediction method and system based on target dynamic information
CN112434607B (en) * 2020-11-24 2023-05-26 北京奇艺世纪科技有限公司 Feature processing method, device, electronic equipment and computer readable storage medium
CN112686913B (en) * 2021-01-11 2022-06-10 天津大学 Object Boundary Detection and Object Segmentation Models Based on Boundary Attention Consistency
CN112561001A (en) * 2021-02-22 2021-03-26 南京智莲森信息技术有限公司 Video target detection method based on space-time feature deformable convolution fusion
CN113688801B (en) * 2021-10-22 2022-02-15 南京智谱科技有限公司 Chemical gas leakage detection method and system based on spectrum video
CN114594770B (en) * 2022-03-04 2024-04-26 深圳市千乘机器人有限公司 Inspection method for inspection robot without stopping
CN115131710B (en) * 2022-07-05 2024-09-03 福州大学 Real-time action detection method based on multi-scale feature fusion attention

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102393958A (en) * 2011-07-16 2012-03-28 西安电子科技大学 Multi-focus image fusion method based on compressive sensing
CN105913404A (en) * 2016-07-01 2016-08-31 湖南源信光电科技有限公司 Low-illumination imaging method based on frame accumulation
CN107481238A (en) * 2017-09-20 2017-12-15 众安信息技术服务有限公司 Image quality measure method and device
CN108921803A (en) * 2018-06-29 2018-11-30 华中科技大学 A kind of defogging method based on millimeter wave and visual image fusion
CN109104568A (en) * 2018-07-24 2018-12-28 苏州佳世达光电有限公司 The intelligent cleaning driving method and drive system of monitoring camera
CN109684912A (en) * 2018-11-09 2019-04-26 中国科学院计算技术研究所 A kind of video presentation method and system based on information loss function

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152513B (en) * 2011-12-06 2016-05-25 瑞昱半导体股份有限公司 Image processing method and relevant image processing apparatus
CN103702032B (en) * 2013-12-31 2017-04-12 华为技术有限公司 Image processing method, device and terminal equipment
US10395118B2 (en) * 2015-10-29 2019-08-27 Baidu Usa Llc Systems and methods for video paragraph captioning using hierarchical recurrent neural networks
US10169656B2 (en) * 2016-08-29 2019-01-01 Nec Corporation Video system using dual stage attention based recurrent neural network for future event prediction
CN109829398B (en) * 2019-01-16 2020-03-31 北京航空航天大学 A method for object detection in video based on 3D convolutional network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102393958A (en) * 2011-07-16 2012-03-28 西安电子科技大学 Multi-focus image fusion method based on compressive sensing
CN105913404A (en) * 2016-07-01 2016-08-31 湖南源信光电科技有限公司 Low-illumination imaging method based on frame accumulation
CN107481238A (en) * 2017-09-20 2017-12-15 众安信息技术服务有限公司 Image quality measure method and device
CN108921803A (en) * 2018-06-29 2018-11-30 华中科技大学 A kind of defogging method based on millimeter wave and visual image fusion
CN109104568A (en) * 2018-07-24 2018-12-28 苏州佳世达光电有限公司 The intelligent cleaning driving method and drive system of monitoring camera
CN109684912A (en) * 2018-11-09 2019-04-26 中国科学院计算技术研究所 A kind of video presentation method and system based on information loss function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Infrared dim target detection based on visual attention;Xin Wang;《Infrared Physics & Technology》;20121130;513-521 *
基于提升小波变换的图像清晰度评价算法;王昕;《万方数据知识服务平台》;20100322;52-57 *

Also Published As

Publication number Publication date
CN110287826A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110287826B (en) Video target detection method based on attention mechanism
CN111126472B (en) An Improved Target Detection Method Based on SSD
CN110276316B (en) A human keypoint detection method based on deep learning
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN108399362B (en) Rapid pedestrian detection method and device
CN109284670B (en) A pedestrian detection method and device based on multi-scale attention mechanism
WO2019192397A1 (en) End-to-end recognition method for scene text in any shape
CN112884742B (en) A multi-target real-time detection, recognition and tracking method based on multi-algorithm fusion
CN111723798B (en) Multi-instance natural scene text detection method based on relevance hierarchy residual errors
US20180114071A1 (en) Method for analysing media content
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN108256562A (en) Well-marked target detection method and system based on Weakly supervised space-time cascade neural network
CN116645592B (en) A crack detection method and storage medium based on image processing
CN110705412A (en) Video target detection method based on motion history image
CN113139896A (en) Target detection system and method based on super-resolution reconstruction
CN108256462A (en) A kind of demographic method in market monitor video
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
Li et al. Learning to holistically detect bridges from large-size VHR remote sensing imagery
CN113610024B (en) A multi-strategy deep learning remote sensing image small target detection method
WO2022219402A1 (en) Semantically accurate super-resolution generative adversarial networks
Midwinter et al. Unsupervised defect segmentation with pose priors
US12190535B2 (en) Generating depth images for image data
CN118865178B (en) A flood extraction and location method based on deep learning and spatial information fusion
WO2023093086A1 (en) Target tracking method and apparatus, training method and apparatus for model related thereto, and device, medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20241211

Address after: 518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Patentee after: Shenzhen Wanzhida Technology Co.,Ltd.

Country or region after: China

Address before: 100124 No. 100 Chaoyang District Ping Tian Park, Beijing

Patentee before: Beijing University of Technology

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20250521

Address after: Room 1-103, 1st Floor, Building 3, No. 5 Guangmao Street, Daxing Economic Development Zone, Daxing District, Beijing, 102600

Patentee after: Kuaima (Beijing) Electronic Technology Co.,Ltd.

Country or region after: China

Address before: 518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Patentee before: Shenzhen Wanzhida Technology Co.,Ltd.

Country or region before: China