CN110287826B - Video target detection method based on attention mechanism - Google Patents

Video target detection method based on attention mechanism Download PDF

Info

Publication number
CN110287826B
CN110287826B CN201910499786.9A CN201910499786A CN110287826B CN 110287826 B CN110287826 B CN 110287826B CN 201910499786 A CN201910499786 A CN 201910499786A CN 110287826 B CN110287826 B CN 110287826B
Authority
CN
China
Prior art keywords
feature
detected
frame
candidate
fused
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910499786.9A
Other languages
Chinese (zh)
Other versions
CN110287826A (en
Inventor
李建强
白骏
刘雅琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910499786.9A priority Critical patent/CN110287826B/en
Publication of CN110287826A publication Critical patent/CN110287826A/en
Application granted granted Critical
Publication of CN110287826B publication Critical patent/CN110287826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a video target detection method based on an attention mechanism, and relates to computer vision. The invention comprises the following steps: step S1, extracting a candidate feature map of the current time frame; step S2, setting a fusion window in the past time period, calculating the Laplacian variance of each frame in the window, normalizing the variance as the weight of each frame in the window, carrying out weighted summation on the candidate feature maps of all frames in the window to obtain a time sequence feature, and connecting the candidate feature of the current time frame with the time sequence feature to obtain a feature map to be detected; step S3, extracting a feature map with an additional scale on the feature map to be detected by using the convolution layer; in step S4, the object type and position are predicted by using the convolutional layer on the feature maps of different scales. The feature fusion method of the invention distributes different weights to the frame features with different qualities in the past time period, so that the fusion of the time sequence information is more sufficient, and the performance of the detection model is improved.

Description

Video target detection method based on attention mechanism
Technical Field
The invention relates to computer vision, deep learning and video target detection technology.
Background
The image target detection method based on deep learning has made great progress in the last five years, such as the RCNN series network, the SSD network and the YOLO series network. However, in the fields of video surveillance, vehicle-assisted driving and the like, video-based target detection has a wider demand. Due to the problems of motion blur, shielding, shape change diversity, illumination change diversity and the like in the video, a good detection result cannot be obtained only by using an image target detection technology to detect the target in the video. The adjacent frames in the video have continuity in time and similarity in space, the positions of the targets between the frames are related, and how to utilize the target time sequence information in the video becomes the key for improving the video target detection performance.
The current video target detection framework mainly comprises three types: a method for detecting video frames as independent images by using an image target detection algorithm ignores time information and independently detects each frame, so that the effect is not ideal; the other method combines target detection and target tracking technology, the method carries out post-processing on the detection result so as to track the target, the tracking precision depends on the detection, and error propagation is easy to cause; there is also a method of detecting only a few key frames and then generating the features of the remaining frames using optical flow information and key frame features, and this method uses time series information but the optical flow is very expensive to calculate and difficult to detect quickly.
Disclosure of Invention
The invention aims to provide a rapid and accurate video target detection method which fully integrates time sequence characteristics.
In order to solve the technical problem, the invention provides a video target detection method based on an attention mechanism, which comprises the following steps:
step S1, inputting the video frame image of the current time point into a Mobilenet network to extract candidate characteristic graphs;
step S2, setting a time sequence feature fusion window in the past time period adjacent to the current time point, respectively calculating the Laplacian variance of the images of the video frames to be fused in the feature fusion window, normalizing the Laplacian variance, taking the normalized Laplacian variance as the fusion weight of each frame to be fused, carrying out weighted summation on the candidate feature graphs of all the frames to be fused according to the fusion weight to obtain the time sequence feature required by the current frame, and connecting the candidate feature of the video frame at the current time point with the channel dimension of the time sequence feature in the feature to obtain the feature graph to be detected fused with the time sequence information;
step S3, extracting the feature diagram to be detected with extra scales on the feature diagram to be detected by utilizing the convolution feature extraction layer and the maximum pooling layer;
and step S4, on the feature maps to be detected with different scales, predicting the object class and the boundary frame coordinates on the current frame by using the convolutional layer.
Further, in step S1, the video frame at the current time point t is detected, and first, the video frame image I at the current time point is detectedtInputting into a Mobilenet network for feature extraction, wherein
Figure GDA0003199200410000021
HIAnd WIExtracting candidate characteristic graphs respectively corresponding to the height and the width of the video frame
Figure GDA0003199200410000022
Represents a real number, C1,H1And W1The number of feature channels, the height and the width of the candidate feature map are respectively.
Further, in step S2, a feature fusion window with a width w of S is set in the past time period of the current time point t, and the video frame images to be fused in the feature fusion window are: { It-i}i∈[1,s]And the candidate feature map corresponding to the video frame to be fused in the feature fusion window is as follows: { Ft-i}i∈[1,s]. Each video frame image I to be fused is processedt-iConversion into a grey-scale map Gt-iAnd calculating the Laplace variance of the image on the basis of the gray level map, wherein the Laplace operator at the G coordinate (x, y) of the gray level map is
Figure GDA0003199200410000023
The Laplace operator of the image captures an area with a rapidly changed pixel value in the image by calculating a second derivative of each pixel point of the image in each direction, and can be used for detecting corners in the image, the Laplace variance of the image reflects the pixel value change condition of the whole image, if the Laplace variance is large, the image is clear, and otherwise, the image is fuzzy.
First, each gray scale map G is calculatedt-iIs a laplace mean of
Figure GDA0003199200410000024
HIAnd WIHeight and width of the grey scale map respectively:
Figure GDA0003199200410000025
next, each gray-scale image G is calculatedt-iVariance of laplacian
Figure GDA0003199200410000026
Figure GDA0003199200410000031
If a video frame is sharp, its candidate features contribute to the detection of objects, whereas some frames cause image blurring due to moving objects. The candidate characteristics of the frames are not beneficial to detecting the target, different fusion weights should be allocated to the video frames with different definition degrees, so that the detection model focuses more on the clear characteristics rather than the fuzzy characteristics, and the fusion weight alpha of all the video frames to be fused is calculated firstlyt-i
Figure GDA0003199200410000032
Fusing the frame candidate features in the feature fusion window in a weighted summation mode to obtain the time sequence features of the current time point
Figure GDA0003199200410000033
And connecting the time sequence characteristics with the candidate characteristics of the current frame in a channel dimension to complete the fusion of time sequence information and obtain a first characteristic diagram to be detected for detection.
Further, in step S3, a feature map to be detected is obtained at the current time point, in which the time series feature is fused
Figure GDA0003199200410000034
Then, in order to obtain more scales of characteristic diagrams to be detected, a 3 x 3 convolutional layer and a 2 x 2 pooling layer are utilized to perform further characteristic extraction on the characteristic diagrams to be detected and reduce the size of the characteristic diagrams to be detected, so that local information in the characteristic diagrams to be detected with large size is rich, the characteristic diagrams to be detected with small size are suitable for predicting small-size targets, the characteristic diagrams to be detected with small size contain stronger global semantic information and are suitable for detecting targets with large size, and e-1 times of characteristic extraction are performed to finally obtain e characteristic diagrams to be detectedCharacteristic diagram to be detected:
Figure GDA0003199200410000035
further, in step S4, a multi-scale feature map to be detected is obtained through additional feature extraction, anchor frames with prior positions are set on the feature maps to be detected of different scales, and the offset of the target boundary frame relative to the anchor frame and the type of the target are respectively performed on the feature maps to be detected by using two 3 × 3 convolutional layers by using the channel dimension. Let the number of classes be d (including background), for each feature map to be detected
Figure GDA0003199200410000036
Obtaining a classification prediction result after prediction of a 3 multiplied by 3 convolution type prediction layer and a 3 multiplied by 3 convolution bounding box prediction layer
Figure GDA0003199200410000037
And bounding box prediction results
Figure GDA0003199200410000038
Drawings
FIG. 1 is a schematic of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views, and merely illustrate the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
Example 1
As shown in fig. 1, the present example provides a video object detection method based on attention mechanism, comprising the following steps
Step S1, inputting the video frame image of the current time point into a Mobilenet network to extract candidate characteristic graphs;
step S2, setting a time sequence feature fusion window in the past time period adjacent to the current time point, respectively calculating the Laplacian variance of the images of the video frames to be fused in the feature fusion window, normalizing the Laplacian variance to serve as the fusion weight of each frame to be fused, carrying out weighted summation on the candidate feature graphs of all the frames to be fused according to the weight to obtain the time sequence feature required by the current frame, and connecting the candidate feature of the video frame at the current time point with the time sequence feature in a channel dimension to obtain the feature graph to be detected fused with time sequence information;
step S3, extracting the feature diagram to be detected with extra scales on the feature diagram to be detected by utilizing the convolution feature extraction layer and the maximum pooling layer;
and step S4, on the feature maps to be detected with different scales, predicting the object class and the boundary frame coordinates on the current frame by using the convolutional layer.
In step S1, detecting the current time point t video frame first, the current time point t video frame image I is detectedtInputting the Mobilene to perform feature extraction, wherein
Figure GDA0003199200410000041
HIAnd WIRespectively the height and width of the frame image, and extracting candidate feature maps
Figure GDA0003199200410000042
Wherein C is1,H1,W1The number of channels, height and width of the candidate feature map are respectively.
In step S2, a feature fusion window with width w of S is set in the past time slot of the current time point t, and the length of the past time slot is q, the setting rule of the feature fusion window width is as follows, that is, if the length of the past time slot is greater than S, the fusion window width is set to S, and if the length of the past time step is less than S and there is not enough features, the fusion window width is set to the length of the past time step.
Figure GDA0003199200410000043
And making the video frame image to be fused in the feature fusion window as: { It-i}i∈[1,s]Video frames to be fused within a feature fusion windowThe corresponding candidate feature maps are: { Ft-i}i∈[1,s]. Each video frame image I to be fused is processedt-iConversion into a grey-scale map Gt-iAnd calculating the Laplace variance of the image on the basis of the gray level image, wherein the Laplace operator at the G coordinate (x, y) of the gray level image is as follows:
Figure GDA0003199200410000051
where G (x, y) represents the pixel value of the grayscale map G at the coordinates (x, y). The Laplace operator of the image captures an area with a rapidly changed pixel value in the image by calculating a second derivative of each pixel point of the image in each direction, and can be used for detecting corners in the image, the Laplace variance of the image reflects the pixel value change condition of the whole image, if the Laplace variance is large, the image is clear, and otherwise, the image is fuzzy.
First, each gray scale map G is calculatedt-iIs a laplace mean of
Figure GDA0003199200410000052
HIAnd WIRespectively the height and width of the grey scale map.
Figure GDA0003199200410000053
Next, each gray-scale image G is calculatedt-iVariance of laplacian
Figure GDA0003199200410000054
Figure GDA0003199200410000055
If a video frame is sharp, its candidate features contribute to the detection of objects, whereas some frames cause image blurring due to moving objects. The candidate characteristics of the frames are not beneficial to detecting the target, and the video with different definition degrees is obtainedThe frames should be assigned different fusion weights, the clearer frame feature weight is larger, so that the detection model focuses more on the clear features rather than the fuzzy features, and the fusion weights alpha of all the video frames to be fused are calculated firstlyt-i
Figure GDA0003199200410000058
Fusing the frame candidate features in the feature fusion window in a weighted summation mode to obtain the time sequence features of the current time point
Figure GDA0003199200410000057
Figure GDA0003199200410000061
Connecting the time sequence characteristics with the candidate characteristics of the current frame in the channel dimension to complete the fusion of the time sequence information and obtain the first characteristic diagram to be detected for detection
Figure GDA0003199200410000062
Figure GDA0003199200410000063
In step S3, a feature map to be detected is obtained in which the time sequence features are fused at the current time point
Figure GDA0003199200410000064
Then, in order to obtain a characteristic diagram to be detected with more scales, the feature diagram to be detected is further extracted by utilizing the convolution layer and the pooling layer, and the size of the characteristic diagram to be detected is reduced, so that the local information in the characteristic diagram to be detected with large size is rich, the characteristic diagram to be detected with small size is suitable for predicting small-size targets, the characteristic diagram to be detected with small size contains stronger global semantic information, and is suitable for detecting targets with large size through e-1 times of feature extraction,finally obtaining e characteristic graphs to be detected:
Figure GDA0003199200410000065
in the step S4, a multi-scale feature map to be detected is obtained through additional feature extraction, anchor frames with prior positions are set on the feature maps to be detected of different scales, and the offset of the target boundary frame relative to the anchor frame and the type of the target are respectively performed on the feature maps to be detected by using two convolutional layers by using the channel dimensions. Let the number of classes be d (including background), for each feature map to be detected
Figure GDA0003199200410000066
Figure GDA0003199200410000067
Wherein C isFi,HFi,WFiThe number of channels, height and width of the feature map are respectively, and the number of anchor frames of each pixel position is niObtaining classification prediction results after prediction of convolution type prediction layer and convolution boundary frame prediction layer
Figure GDA0003199200410000068
And bounding box prediction results
Figure GDA0003199200410000069

Claims (4)

1. A video target detection method based on an attention mechanism is characterized by comprising the following steps:
step S1, inputting the video frame image of the current time point into a Mobilene to extract a candidate feature map;
step S2, setting a time sequence feature fusion window in the past time period adjacent to the current time point, respectively calculating the Laplacian variance of the images of the video frames to be fused in the feature fusion window, normalizing the Laplacian variance to serve as the fusion weight of each frame to be fused, carrying out weighted summation on the candidate feature graphs of all the frames to be fused according to the weight to obtain the time sequence feature required by the current frame, and connecting the candidate feature of the video frame at the current time point with the time sequence feature in a channel dimension to obtain the feature graph to be detected fused with time sequence information;
step S3, extracting the feature diagram to be detected with extra scales on the feature diagram to be detected by utilizing the convolution feature extraction layer and the maximum pooling layer;
and step S4, on the feature maps to be detected with different scales, predicting the object class and the boundary frame coordinates on the current frame by using the convolutional layer.
2. The attention mechanism-based video object detection method of claim 1,
in step S1, the current time point t video frame is detected by first detecting the current time point t video frame image ItInputting the feature into a Mobilenet network to extract features to obtain a candidate feature map Ft(ii) a Wherein
Figure FDA0003199200400000011
HIAnd WIExtracting candidate characteristic graphs respectively corresponding to the height and the width of the video frame
Figure FDA0003199200400000012
Figure FDA0003199200400000013
Represents a real number, C1,H1And W1The number of feature channels, the height and the width of the candidate feature map are respectively.
3. The attention mechanism-based video object detection method of claim 2,
in step S2, a feature fusion window with a width w of S is set in the past time period of the current time point t, and the video frame images to be fused in the feature fusion window are: { It-i}i∈[1,s]And the candidate feature map corresponding to the video frame to be fused in the feature fusion window is as follows: { Ft-i}i∈[1,s](ii) a Each video frame image I to be fused is processedt-iConversion into a grey-scale map Gt-i
Calculate each gray map Gt-iVariance of laplacian
Figure FDA0003199200400000021
Calculating fusion weight alpha of all video frames to be fused by normalizing Laplace variancet-i(ii) a Fusing the frame candidate features in the feature fusion window in a weighted summation mode to obtain the time sequence features of the current time point
Figure FDA0003199200400000022
Connecting the time sequence characteristics with the candidate characteristics of the current frame in the channel dimension to complete the fusion of the time sequence information and obtain the first characteristic diagram to be detected for detection
Figure FDA0003199200400000023
4. The attention mechanism-based video object detection method of claim 3,
in step S3, a feature map to be detected is obtained in which the time sequence features are fused at the current time point
Figure FDA0003199200400000024
And then, performing further feature extraction on the feature diagram to be detected by utilizing a 3 × 3 convolutional layer and a 2 × 2 pooling layer, simultaneously reducing the size of the feature diagram to be detected, and finally obtaining e feature diagrams to be detected through e-1 feature extraction:
Figure FDA0003199200400000025
CN201910499786.9A 2019-06-11 2019-06-11 Video target detection method based on attention mechanism Active CN110287826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910499786.9A CN110287826B (en) 2019-06-11 2019-06-11 Video target detection method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910499786.9A CN110287826B (en) 2019-06-11 2019-06-11 Video target detection method based on attention mechanism

Publications (2)

Publication Number Publication Date
CN110287826A CN110287826A (en) 2019-09-27
CN110287826B true CN110287826B (en) 2021-09-17

Family

ID=68003699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910499786.9A Active CN110287826B (en) 2019-06-11 2019-06-11 Video target detection method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN110287826B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674886B (en) * 2019-10-08 2022-11-25 中兴飞流信息科技有限公司 Video target detection method fusing multi-level features
CN110751646A (en) * 2019-10-28 2020-02-04 支付宝(杭州)信息技术有限公司 Method and device for identifying damage by using multiple image frames in vehicle video
CN111310609B (en) * 2020-01-22 2023-04-07 西安电子科技大学 Video target detection method based on time sequence information and local feature similarity
CN114450720A (en) * 2020-08-18 2022-05-06 深圳市大疆创新科技有限公司 Target detection method and device and vehicle-mounted radar
CN112016472B (en) * 2020-08-31 2023-08-22 山东大学 Driver attention area prediction method and system based on target dynamic information
CN112434607B (en) * 2020-11-24 2023-05-26 北京奇艺世纪科技有限公司 Feature processing method, device, electronic equipment and computer readable storage medium
CN112686913B (en) * 2021-01-11 2022-06-10 天津大学 Object boundary detection and object segmentation model based on boundary attention consistency
CN112561001A (en) * 2021-02-22 2021-03-26 南京智莲森信息技术有限公司 Video target detection method based on space-time feature deformable convolution fusion
CN113688801B (en) * 2021-10-22 2022-02-15 南京智谱科技有限公司 Chemical gas leakage detection method and system based on spectrum video
CN114594770B (en) * 2022-03-04 2024-04-26 深圳市千乘机器人有限公司 Inspection method for inspection robot without stopping
CN115131710B (en) * 2022-07-05 2024-09-03 福州大学 Real-time action detection method based on multiscale feature fusion attention

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102393958A (en) * 2011-07-16 2012-03-28 西安电子科技大学 Multi-focus image fusion method based on compressive sensing
CN105913404A (en) * 2016-07-01 2016-08-31 湖南源信光电科技有限公司 Low-illumination imaging method based on frame accumulation
CN107481238A (en) * 2017-09-20 2017-12-15 众安信息技术服务有限公司 Image quality measure method and device
CN108921803A (en) * 2018-06-29 2018-11-30 华中科技大学 A kind of defogging method based on millimeter wave and visual image fusion
CN109104568A (en) * 2018-07-24 2018-12-28 苏州佳世达光电有限公司 The intelligent cleaning driving method and drive system of monitoring camera
CN109684912A (en) * 2018-11-09 2019-04-26 中国科学院计算技术研究所 A kind of video presentation method and system based on information loss function

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103152513B (en) * 2011-12-06 2016-05-25 瑞昱半导体股份有限公司 Image processing method and relevant image processing apparatus
CN103702032B (en) * 2013-12-31 2017-04-12 华为技术有限公司 Image processing method, device and terminal equipment
US10395118B2 (en) * 2015-10-29 2019-08-27 Baidu Usa Llc Systems and methods for video paragraph captioning using hierarchical recurrent neural networks
US10169656B2 (en) * 2016-08-29 2019-01-01 Nec Corporation Video system using dual stage attention based recurrent neural network for future event prediction
CN109829398B (en) * 2019-01-16 2020-03-31 北京航空航天大学 Target detection method in video based on three-dimensional convolution network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102393958A (en) * 2011-07-16 2012-03-28 西安电子科技大学 Multi-focus image fusion method based on compressive sensing
CN105913404A (en) * 2016-07-01 2016-08-31 湖南源信光电科技有限公司 Low-illumination imaging method based on frame accumulation
CN107481238A (en) * 2017-09-20 2017-12-15 众安信息技术服务有限公司 Image quality measure method and device
CN108921803A (en) * 2018-06-29 2018-11-30 华中科技大学 A kind of defogging method based on millimeter wave and visual image fusion
CN109104568A (en) * 2018-07-24 2018-12-28 苏州佳世达光电有限公司 The intelligent cleaning driving method and drive system of monitoring camera
CN109684912A (en) * 2018-11-09 2019-04-26 中国科学院计算技术研究所 A kind of video presentation method and system based on information loss function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Infrared dim target detection based on visual attention;Xin Wang;《Infrared Physics & Technology》;20121130;513-521 *
基于提升小波变换的图像清晰度评价算法;王昕;《万方数据知识服务平台》;20100322;52-57 *

Also Published As

Publication number Publication date
CN110287826A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110287826B (en) Video target detection method based on attention mechanism
CN108154118B (en) A kind of target detection system and method based on adaptive combined filter and multistage detection
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
Zhou et al. Efficient road detection and tracking for unmanned aerial vehicle
US9042648B2 (en) Salient object segmentation
CN110738673A (en) Visual SLAM method based on example segmentation
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN108564598B (en) Improved online Boosting target tracking method
CN111723693A (en) Crowd counting method based on small sample learning
CN110942471A (en) Long-term target tracking method based on space-time constraint
CN112232371A (en) American license plate recognition method based on YOLOv3 and text recognition
CN112883850A (en) Multi-view aerospace remote sensing image matching method based on convolutional neural network
Lu et al. Superthermal: Matching thermal as visible through thermal feature exploration
CN114299383A (en) Remote sensing image target detection method based on integration of density map and attention mechanism
CN117949942B (en) Target tracking method and system based on fusion of radar data and video data
CN115147418B (en) Compression training method and device for defect detection model
CN111723660A (en) Detection method for long ground target detection network
CN111833353B (en) Hyperspectral target detection method based on image segmentation
CN109377511A (en) Motion target tracking method based on sample combination and depth detection network
CN113496480A (en) Method for detecting weld image defects
CN111414938B (en) Target detection method for bubbles in plate heat exchanger
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN116645592A (en) Crack detection method based on image processing and storage medium
CN116758340A (en) Small target detection method based on super-resolution feature pyramid and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant