CN112861733A - Night traffic video significance detection method based on space-time double coding - Google Patents

Night traffic video significance detection method based on space-time double coding Download PDF

Info

Publication number
CN112861733A
CN112861733A CN202110183195.8A CN202110183195A CN112861733A CN 112861733 A CN112861733 A CN 112861733A CN 202110183195 A CN202110183195 A CN 202110183195A CN 112861733 A CN112861733 A CN 112861733A
Authority
CN
China
Prior art keywords
convolution
space
time
blocks
coding structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110183195.8A
Other languages
Chinese (zh)
Other versions
CN112861733B (en
Inventor
颜红梅
蒋莲芳
田晗
高港耀
吴江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110183195.8A priority Critical patent/CN112861733B/en
Publication of CN112861733A publication Critical patent/CN112861733A/en
Application granted granted Critical
Publication of CN112861733B publication Critical patent/CN112861733B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a night traffic video saliency detection method based on space-time double coding, which is applied to the technical field of computer vision and aims at solving the problem of the prior art in saliency detection of night traffic scenes, wherein a network model related to the invention comprises three parts, namely space-time double coding, attention fusion and a decoder; the time coding module learns the time information before and after the continuous time sequence of the night traffic video by adopting convolution LSTM, and highlights the motion characteristic in the traffic video; in a spatial coding module, extracting spatial features under different receptive fields by utilizing pyramid cavity convolution (PDC); the Attention module is used for enhancing the characteristics which greatly contribute to the driving task while fusing the time characteristics and the space characteristics; finally, important salient regions in the night driving task are accurately predicted by the decoder.

Description

Night traffic video significance detection method based on space-time double coding
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a video saliency detection technology in a night traffic scene.
Background
The development of advanced driver assistance systems for decades has made them play an increasingly important role in automobile driving, achieving good results in facilitating driving and ensuring safe driving. However, the traffic driving environment is a complex and varied dynamic scene and is flooded with a large amount of information. In a road traffic environment, there are not only driving-related information, such as traffic lights, signs, pedestrians, etc., but also driving-unrelated disturbances, such as billboards, neon lights, etc. The brain's ability to process information is limited and therefore must be highly focused when driving. If there is distracted driving, the chance of an accident is greatly increased. Under the conditions of dim light, mixed light and low visibility, the driver is easy to cause visual fatigue, distracted, overlooked or overlooked important targets when driving for a long time in the night traffic scene. The traffic accident rate and the death rate caused by night driving are high, so that the real-time reminding of important information in the driving process is particularly important. The visual saliency detection of traffic scenes is the calculation of the areas and objects that the driver should focus on while driving, which are important for the driving task. The attention distribution in the night driving process under the waking state of an experienced driver is learned to help the significance detection of the night traffic scene, so that the driving safety is improved.
Night driving scenes are more complex than daytime. The night scene is complicated by: 1. insufficient illumination and low contrast; 2. the lamp light is disordered, and the visual interference caused by uneven brightness is increased; 3. the noise interference is large, and the detail blurring is serious; 4. color distortion, etc. This greatly increases the difficulty of processing the nighttime images. Therefore, the detection of significance in night traffic scenarios is one of the challenges to be addressed.
In order to solve the problems faced by the method, a dual-coding neural network model is designed to predict the significance region of the visual search of the driver in the night traffic video scene, and the purpose of reminding the driver in real time to pay attention to important information which is useful for driving is achieved. The significance region predicted by the model has high matching degree with the real attention distribution of the driver.
Disclosure of Invention
In order to solve the technical problem, the invention provides a night traffic video saliency detection method based on space-time double coding.
The technical scheme adopted by the invention is as follows: a night traffic video saliency detection method based on space-time dual coding comprises the following steps:
s1, acquiring a standard fixation point saliency map;
s2, establishing a network model, wherein the network model is used for carrying out significance detection on the input standard fixation point significant map;
the network model includes: the system comprises a space-time coding structure, an Attention fusion module and a decoding module, wherein the space-time coding structure is used for extracting the spatial characteristics and the time characteristics of an input standard fixation point saliency map, the Attention fusion module is used for fusing the extracted spatial characteristics and the time characteristics, and the decoding module calculates to obtain the saliency map according to a fusion result;
and S3, training the network model, and detecting the image significance by adopting the trained network model.
The space-time coding structure comprises a space coding structure and a time coding structure, the space coding structure is used for extracting the space characteristics of the input standard viewpoint saliency map, and the time coding structure is used for extracting the time characteristics of the input standard viewpoint saliency map.
In the process of training the network model: the current frame is used for extracting the spatial features, and the current frame and the previous 5 frames form a continuous sequence for extracting the temporal features.
The spatial coding structure comprises: 4 groups of rolling blocks and a pyramid cavity rolling block;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks comprises a2 multiplied by 2 maximum pooling layer with the step size of 2;
the word tower hole convolution block acquires spatial features by adopting hole convolution parallel architectures with different hole rates.
The temporal coding structure comprises: 4 groups of convolution blocks and a convolution long-term and short-term memory network;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
The time coding structure extracts the characteristics of a continuous sequence through 4 groups of convolution blocks, and then the extracted characteristics are input into the frame information before and after the learning of a convolution long-short term memory network.
The structure of the decoding module sequentially comprises: the system comprises 3 upsampling layers, 3 groups of convolution blocks, a layer of 1 multiplied by 1 convolution layer and a Sigmoid layer, wherein a multiplied by 2 upsampling layer is arranged in front of each group of convolution blocks;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
The invention has the beneficial effects that: the invention firstly proposes significance detection based on a top-down night traffic scene, the model of the invention extracts time information and space information, and selectively strengthens space-time information through integration of an attention mechanism, thereby enabling the effect of a significance detection map to be better and a predicted region to be more accurate.
Drawings
FIG. 1 is a flow chart of an eye movement experiment provided by the present invention;
FIG. 2 is a schematic diagram of a network architecture employed in the present invention;
fig. 3 is a diagram illustrating an example of a night traffic video image saliency prediction according to an embodiment of the present invention;
fig. 3(a) shows an input original image, fig. 3(b) shows a standard eye movement saliency map, and fig. 3(c) shows a model prediction map according to the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
The method comprises two main steps of calculation of a standard eye movement saliency map and training of a network model:
A. calculation of standard eye movement saliency map:
step A1: eye movement data (including fixation point information of each frame) of 30 drivers with driving ages of two years and more are recorded by an eye tracker, and the experimental process is shown in fig. 1. And eliminating abnormal data and integrating all tested eye movement data to each frame.
Step A2: generating a blank matrix with the same size as the input image, and assigning the corresponding position of the gazing point of each frame in the blank matrix as 1 to obtain a binary image, namely a standard gazing point binary image. Next, 2-dimensional gaussian smoothing (δ being 30) is performed on the binary image to obtain a standard fixation point saliency map, which is used as a label for network training. (one standard gaze point saliency map for each input picture).
B. Training a network model:
b1, the model designed by the invention mainly comprises three parts: a space-time coding structure (divided into a space coding structure and a time coding structure), an Attention fusion module and a decoding module.
The spatial coding structure is used for extracting spatial features of an image, and specifically comprises the following steps:
the spatial features are very important features of the traffic scene, and the convolution operation can effectively extract the spatial characteristics of the image. The spatial coding structure consists of 4 sets of rolling blocks and a pyramid void rolling block (PDC). Each set of volume blocks consists of two convolution operation layers, each convolution operation layer comprising a 3 × 3 convolution, a batch normalization unit (BN), and a correction linear unit Relu. There is a2 x 2 maximum pooling layer of step 2 between the volume blocks.
The PDC module aims at solving the detection problem of areas with different sizes, and adopts a cavity convolution parallel framework with different cavity rates to obtain spatial characteristics. In this embodiment, 4 void convolutions with a void ratio of 1, 2, 4, and 8 are respectively used to obtain local information, and global features are obtained by using Global Average Pooling (GAP). And finally, a convolution operation layer is connected.
The time coding structure is used for extracting time characteristics of the image, and specifically comprises the following steps:
the temporal coding structure consists of 4 sets of volume blocks and a convLSTM (Convolitional Long Short-Term Memory Network Convolutional Long Short-Term Memory Network). The volume block and the spatial coding structure are the same. There is a2 x 2 maximum pooling layer of step 2 between the volume blocks. Compared with FC-LSTM, convLSTM effectively preserves the spatial structure of the image when learning temporal features, and is therefore better suited to processing video sequences.
Unlike spatial coding, the input to the temporal coding structure is a T frame (1)<T<10) A video sequence. Extraction of characteristics Z of T frames of a video sequence by 4 sets of convolutional blockst~Zt-TThen Z ist~Zt-TInput to the convolutional LSTM learning front and rear frame information. To obtain maximum dynamic information of consecutive T frames, finally we retain the feature H of the last time seriest-T
The Attention fusion module is used for fusing space-time characteristics, and specifically comprises:
the fusion module fuses the time and space characteristics on the channel by applying an Attention mechanism. The method mainly calculates the weight of the channel through the correlation on the characteristic diagram channel, and then weights the weight to the image characteristic to update the characteristic, so that the channel which is more important to the detection result is more prominent. Splicing the output of the time code and the output of the space code to obtain a characteristic F, and deforming the characteristic F through a shape function to obtain F1Then transpose to obtain F2。F1And F2Multiplying the matrix and obtaining F through softmax3。F3And weighting to F as the channel weight to obtain the final fusion result.
The decoding module is configured to calculate a final saliency map, specifically:
the decoding structure consists of 3 sets of convolutional blocks, 3 upsampled layers, one layer of 1 x 1 convolutional layers, and one Sigmoid layer. Wherein the convolutional blocks are identical to those of the spatial coding structure. Each set of convolutional blocks is preceded by a x 2 upsampling. The last layer of Sigmoid function controls the output value to be in the range of [0,1 ]. The predicted driver saliency map is a grayscale map of the same size as the input image.
B2, mixing the data set with about 8: 2: 3 into training set, verification set and test set. In order to shorten the training time, the size of the input picture is changed to 320 × 192 × 3 (height H × width W × number of channels C).
B3, firstly, randomly initializing the parameters of the network model (see fig. 2 for the network model, the network characteristics are all represented by H × W × C). Will train set picture Ft(320 x 192) is input to spatial coding, while the current frame and its previous t-5 frames, i.e., F, are inputt~Ft-5The frames constitute a continuous time sequence that is input into the time code. The BCE function is employed to calculate the loss value between the predicted saliency map and the corresponding label (standard gaze point saliency map). Usage learning rate of 10-3The Adam optimizer with a momentum value of 0.9 and an attenuation rate of 10-4 was trained to update the parameters and save the model parameters for each epoch.
And B4, verifying the model by using a verification set after each epoch is trained. And continuously repeating the step B2 to carry out iterative training until the calculated loss value fluctuation amplitude is basically stable, namely the parameters in the network are basically stable, so as to obtain the optimal model parameters.
The network model of the present invention is verified with specific data as follows:
step 1: and B4, importing the optimal model parameters in the step B4 into the model, and randomly inputting test set data to obtain a prediction result.
Step 2: in order to verify the performance of the model, the results are qualitatively analyzed and quantitatively calculated. In qualitative analysis, for more intuitive comparison and evaluation, the predicted gray-scale image is colorized and then superimposed on an original image, namely a standard eye movement saliency map. The qualitative effect is shown in fig. 3, where fig. 3(a) shows the input original image, and the distribution of the model prediction graph shown in fig. 3(c) is similar to the distribution of the standard eye movement saliency graph shown in fig. 3(b), indicating that the prediction performance of the model is better. In quantitative analysis, the main evaluation indexes include: AUC _ Borji value, AUC _ Judd value, NSS value (normalized scan path significance), CC (linear correlation coefficient), KLD (relative entropy), EMD (land mobile distance), SIM (similarity). The results of the quantitative analysis are shown in Table 1. The better the model effect, namely the more accurate the predicted region, the more the model effect is evaluated by using the indexes. Wherein, the lower the KLD and EMD values are, the better the effect of the model is. The higher the AUC _ Borji, AUC _ Judd, NSS, CC, and SIM values are, the better the model effect is.
Table 1: the method of the invention predicts the evaluation index result of the night traffic video image
Figure BDA0002942006030000051
Those skilled in the art should note that ↓ represents the higher the value is better, and ↓ represents the lower the value is better in table 1.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (6)

1. A night traffic video saliency detection method based on space-time dual coding is characterized by comprising the following steps:
s1, acquiring a standard fixation point saliency map;
s2, establishing a network model, wherein the network model is used for carrying out significance detection on the input standard fixation point significant map;
the network model includes: the system comprises a space-time coding structure, an Attention fusion module and a decoding module, wherein the space-time coding structure is used for extracting the spatial characteristics and the time characteristics of an input standard fixation point saliency map, the Attention fusion module is used for fusing the extracted spatial characteristics and the time characteristics, and the decoding module calculates to obtain the saliency map according to a fusion result;
and S3, training the network model, and detecting the image significance by adopting the trained network model.
The space-time coding structure comprises a space coding structure and a time coding structure, the space coding structure is used for extracting the space characteristics of the input standard viewpoint saliency map, and the time coding structure is used for extracting the time characteristics of the input standard viewpoint saliency map.
2. The method for detecting the saliency of night traffic video based on space-time dual coding according to claim 1, wherein in the training process of the network model: the current frame is used for extracting the spatial features, and the current frame and the previous 5 frames form a continuous sequence for extracting the temporal features.
3. The method according to claim 2, wherein the spatial coding structure comprises: 4 groups of rolling blocks and a pyramid cavity rolling block;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks comprises a2 multiplied by 2 maximum pooling layer with the step size of 2;
the word tower hole convolution block acquires spatial features by adopting hole convolution parallel architectures with different hole rates.
4. The method according to claim 3, wherein the temporal coding structure comprises: 4 groups of convolution blocks and a convolution long-term and short-term memory network;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
5. The method as claimed in claim 4, wherein the temporal coding structure extracts features of continuous sequences through 4 groups of convolutional blocks, and then inputs the extracted features into convolutional long-short term memory network learning front and back frame information.
6. The method as claimed in claim 5, wherein the decoding module sequentially comprises: the system comprises 3 upsampling layers, 3 groups of convolution blocks, a layer of 1 multiplied by 1 convolution layer and a Sigmoid layer, wherein a multiplied by 2 upsampling layer is arranged in front of each group of convolution blocks;
each group of convolution blocks specifically includes: 2 convolution operation layers, wherein each convolution operation layer comprises a 3 multiplied by 3 convolution, a batch processing normalization unit and a correction linear unit; each group of volume blocks includes a2 x 2 maximum pooling layer of step size 2.
CN202110183195.8A 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding Active CN112861733B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110183195.8A CN112861733B (en) 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110183195.8A CN112861733B (en) 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding

Publications (2)

Publication Number Publication Date
CN112861733A true CN112861733A (en) 2021-05-28
CN112861733B CN112861733B (en) 2022-09-02

Family

ID=75988373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110183195.8A Active CN112861733B (en) 2021-02-08 2021-02-08 Night traffic video significance detection method based on space-time double coding

Country Status (1)

Country Link
CN (1) CN112861733B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101305735B1 (en) * 2012-06-15 2013-09-06 성균관대학교산학협력단 Method and apparatus for providing of tactile effect
US20160210528A1 (en) * 2014-02-24 2016-07-21 Beijing University Of Technology Method for detecting visual saliencies of video image based on spatial and temporal features
US20180285683A1 (en) * 2017-03-30 2018-10-04 Beihang University Methods and apparatus for image salient object detection
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks
CN110705566A (en) * 2019-09-11 2020-01-17 浙江科技学院 Multi-mode fusion significance detection method based on spatial pyramid pool
CN110909594A (en) * 2019-10-12 2020-03-24 杭州电子科技大学 Video significance detection method based on depth fusion
CN111461043A (en) * 2020-04-07 2020-07-28 河北工业大学 Video significance detection method based on deep network
CN111563418A (en) * 2020-04-14 2020-08-21 浙江科技学院 Asymmetric multi-mode fusion significance detection method based on attention mechanism
CN112016476A (en) * 2020-08-31 2020-12-01 山东大学 Method and system for predicting visual saliency of complex traffic guided by target detection
CN112040222A (en) * 2020-08-07 2020-12-04 深圳大学 Visual saliency prediction method and equipment
CN112308005A (en) * 2019-11-15 2021-02-02 电子科技大学 Traffic video significance prediction method based on GAN

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101305735B1 (en) * 2012-06-15 2013-09-06 성균관대학교산학협력단 Method and apparatus for providing of tactile effect
US20160210528A1 (en) * 2014-02-24 2016-07-21 Beijing University Of Technology Method for detecting visual saliencies of video image based on spatial and temporal features
US20180285683A1 (en) * 2017-03-30 2018-10-04 Beihang University Methods and apparatus for image salient object detection
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks
CN110705566A (en) * 2019-09-11 2020-01-17 浙江科技学院 Multi-mode fusion significance detection method based on spatial pyramid pool
CN110909594A (en) * 2019-10-12 2020-03-24 杭州电子科技大学 Video significance detection method based on depth fusion
CN112308005A (en) * 2019-11-15 2021-02-02 电子科技大学 Traffic video significance prediction method based on GAN
CN111461043A (en) * 2020-04-07 2020-07-28 河北工业大学 Video significance detection method based on deep network
CN111563418A (en) * 2020-04-14 2020-08-21 浙江科技学院 Asymmetric multi-mode fusion significance detection method based on attention mechanism
CN112040222A (en) * 2020-08-07 2020-12-04 深圳大学 Visual saliency prediction method and equipment
CN112016476A (en) * 2020-08-31 2020-12-01 山东大学 Method and system for predicting visual saliency of complex traffic guided by target detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KYUNG HWA CHAE等: "Visual tracking of objects for unmanned surface vehicle navigation", 《2016 16TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS)》 *
杨天: "全局场景感知的眼动分布规律及注视区域预测模型研究", 《中国优秀硕士学位论文全文数据库》 *

Also Published As

Publication number Publication date
CN112861733B (en) 2022-09-02

Similar Documents

Publication Publication Date Title
CN113688723B (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
CN111639524B (en) Automatic driving image semantic segmentation optimization method
CN110363770B (en) Training method and device for edge-guided infrared semantic segmentation model
CN113642390B (en) Street view image semantic segmentation method based on local attention network
Zhang et al. Lightweight and efficient asymmetric network design for real-time semantic segmentation
CN113762209A (en) Multi-scale parallel feature fusion road sign detection method based on YOLO
Fang et al. Traffic accident detection via self-supervised consistency learning in driving scenarios
CN111191608A (en) Improved traffic sign detection and identification method based on YOLOv3
CN112308005A (en) Traffic video significance prediction method based on GAN
CN116343144B (en) Real-time target detection method integrating visual perception and self-adaptive defogging
Cheng et al. A highway traffic image enhancement algorithm based on improved GAN in complex weather conditions
CN112861733B (en) Night traffic video significance detection method based on space-time double coding
Cho et al. Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation
CN116740362A (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN116993987A (en) Image semantic segmentation method and system based on lightweight neural network model
CN116704194A (en) Street view image segmentation algorithm based on BiSeNet network and attention mechanism
CN111612803A (en) Vehicle image semantic segmentation method based on image definition
CN113343903B (en) License plate recognition method and system in natural scene
CN114283288B (en) Method, system, equipment and storage medium for enhancing night vehicle image
CN113673527B (en) License plate recognition method and system
Yuan et al. RM-IQA: A new no-reference image quality assessment framework based on range mapping method
Liu et al. Deep memory and prediction neural network for video prediction
CN112487986A (en) Driving assistance recognition method based on high-precision map
CN112101382A (en) Space-time combined model and video significance prediction method based on space-time combined model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant