CN111738267A - Visual perception method and visual perception device based on linear multi-step residual error learning - Google Patents

Visual perception method and visual perception device based on linear multi-step residual error learning Download PDF

Info

Publication number
CN111738267A
CN111738267A CN202010473221.6A CN202010473221A CN111738267A CN 111738267 A CN111738267 A CN 111738267A CN 202010473221 A CN202010473221 A CN 202010473221A CN 111738267 A CN111738267 A CN 111738267A
Authority
CN
China
Prior art keywords
visual perception
information
linear multi
model
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010473221.6A
Other languages
Chinese (zh)
Other versions
CN111738267B (en
Inventor
张寒波
邵文泽
李海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010473221.6A priority Critical patent/CN111738267B/en
Publication of CN111738267A publication Critical patent/CN111738267A/en
Application granted granted Critical
Publication of CN111738267B publication Critical patent/CN111738267B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual perception method and a visual perception device based on linear residual error learning, wherein the method comprises the following steps: acquiring a real-time image; performing visual perception on the image by adopting a pre-established visual perception model to acquire semantic information or distance parameters in the image; the visual perception model is established by carrying out depth convolution on the acquired image and based on linear residual learning; the visual perception model takes data in a training set as a training sample, inputs original shared features and task features of an image, and outputs a semantic segmentation map or a depth map. The method and the device greatly reduce the parameter quantity of the model and improve the operation efficiency of the perception model while ensuring the calculation precision.

Description

Visual perception method and visual perception device based on linear multi-step residual error learning
Technical Field
The invention relates to a visual perception method and a visual perception device based on linear multi-step residual error learning, and belongs to the field of computer visual image processing.
Background
In the automatic driving process, the unmanned vehicle not only needs to judge what the surrounding objects are based on the visual perception system, but also needs to quickly judge the distance between the object and the unmanned vehicle, and the decision is made correctly. The quality of the visual perception system directly affects the safety and reliability of the unmanned vehicle, and the degree of the technical development of the visual perception system directly relates to the existence and subsequent development of automatic driving. Cardinality mainly adopted in the vision perception system of automatic driving at present includes technologies such as a camera, a laser radar, a GPS and the like. Currently, in the industry, there are two main ways to design a visual perception system. Firstly, the judgment and the distance measurement of surrounding objects are realized through active scanning of a laser radar; the other scheme is a pure visual perception scheme, and specifically, the camera is used for collecting surrounding environment information of the unmanned vehicle, and semantic information and distance parameters of surrounding objects are analyzed through a visual algorithm.
Compared with a laser radar perception scheme, the pure visual perception scheme is low in price and easy to apply, and is an ideal scheme. However, how to design an efficient visual perception algorithm becomes a difficult problem? Although the development of computer vision and deep learning currently produces many high-efficiency perception algorithms in the academic world, in an industrial floor scene, the accuracy of the perception algorithms and the efficiency and parameter problems of the algorithms need to be considered due to the computing capability of unmanned vehicle computing platforms. Therefore, how to maximize the accuracy of the algorithm on the unmanned vehicle platform and ensure high execution efficiency is a core problem of perceptual algorithm design.
In recent years, many efforts have been made to solve the problem of the automatic Driving visual perception algorithm, for example, MultiNet [ Real-time Joint mechanical reading for Autonomous Driving ] proposes a method for Real-time classification, detection and Semantic segmentation by a unified architecture. The Cross-batch network for multi-task learning researches the influence of network weight sharing of different levels on multi-task learning, and provides an optimal network sharing structure for realizing automatic learning by a Cross-batch unit. The UberNet [ Training autonomous' connected visual network for low-, mid-, and high-level using direct data sets and limited memory ] proposes a general network as a "Swiss Knife" solution to visual tasks, which can jointly process low, medium and high level visual tasks using different data sets and limited memory. Although the model has high accuracy, the quantity of parameters is too large, so that the calculation is complex and the efficiency is low. Particularly, when the method is applied to the unmanned vehicle, the computational complexity and efficiency of the model become factors which restrict the unmanned vehicle to make correct decisions quickly.
Disclosure of Invention
The invention provides a visual perception method based on linear multistep residual learning, which greatly reduces the parameter quantity of a model and improves the operation efficiency of a perception model while ensuring the calculation precision.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a visual perception method based on linear multi-step residual error learning comprises the following steps: acquiring a real-time image; performing visual perception on the image by adopting a pre-established visual perception model to acquire semantic information or distance parameters in the image; the visual perception model is established by performing depth convolution on the acquired image and based on linear multi-step residual learning; the visual perception model takes data in a training set as a training sample, inputs original shared features and task features of an image, and outputs a semantic segmentation map or a depth map.
Further, the training set is a data set of an original Cityscape training set after random rotation and horizontal inversion processing.
Further, after the deep convolution is performed on the acquired image, the establishing of the visual perception model based on the linear multi-step residual learning comprises the following steps: carrying out shared feature extraction on the acquired real-time image to obtain global scene information; extracting task features of the global scene information to obtain semantic segmentation information or depth estimation information; and performing depth convolution on the global scene information and the semantic segmentation information or the depth estimation information, and then performing linear multi-step residual learning to obtain a visual perception model.
Further, the depth convolution is a depth separable convolution including a depth convolution and a point-by-point convolution.
Further, the visual perception model is calculated by formula (1):
xn+1=kn*(bn(xn)+xn)+bn(bn(xn)+xn)+(1-kn)xn(1)
wherein k isnLearnable parameters for module n within the visual perception model, bnIs the nth processing unit, xnOriginal information, x, input for the nth processing unitn+1Is the processed information.
A visual perception device based on linear multi-step residual learning is characterized in that: the system comprises a data set establishing module, a data set generating module and a data set generating module, wherein the data set establishing module is used for establishing a training set and a verification set; the model establishing module is used for establishing a visual perception model; and the model training module is used for training the visual perception model by a training set.
Optionally, the model building module includes: the shared feature extraction module is used for extracting shared features to obtain global scene information, performing deep convolution on the global scene information and then performing linear multi-step residual error learning; the semantic segmentation module is used for extracting semantic segmentation information from global scene information, performing deep convolution on the semantic segmentation information and then performing linear multi-step residual error learning; and the depth estimation module is used for extracting depth estimation information from the global scene information, performing depth convolution on the depth estimation information and then performing linear multi-step residual error learning.
According to the method, the obtained image is subjected to deep convolution, and the visual perception model is established based on deep convolution linear multi-step residual learning, so that the parameter quantity of the model is greatly reduced while the calculation precision is ensured, and the operation efficiency of the perception model is improved.
Drawings
Fig. 1 is a block diagram of a visual perception method based on linear multi-step residual error learning according to an embodiment of the present invention;
fig. 2 is a block diagram of a visual perception apparatus based on linear multi-step residual learning according to an embodiment of the present invention.
Detailed Description
For a better understanding of the nature of the invention, its description is further set forth below in connection with the specific embodiments and the drawings.
The invention discloses a visual perception method based on linear multistep residual error learning, which is particularly suitable for visual perception of automatic driving, and specifically comprises the following steps as shown in figure 1:
step one, establishing a training set and a verification set.
Carrying out random rotation and horizontal turnover on an original Cityscape training set to be used as a training set; the original Ciytscape verification set is used as the verification set. The original Cityscape training set and the original Ciytscape verification set contain image categories: RGB images, true semantic segmentation maps and depth estimation maps. The training set and validation set were saved as npy data formats as input to the visual perception model.
2975 images are selected from training images in total, and 500 images are selected from verification assembly. Semantic segmentation and depth estimation in the cityscaps verification set are 7 types, specifically as follows:
[1] flat: corresponding to rod, sidewalk
[2] constraint: corresponding to building, wall, fence
[3] object: corresponding to pole, traffic light, traffic sign
[4] And (6) nature: corresponding to vegetation, terrain
[5] sky: corresponding to, sky
[6] Man: corresponding to person, rider
[7] vehicle: corresponding to the name of carm truck, bus caravan, trailer train, motorcycle
And the training set and the verification set are saved in npy format, so that the data storage space is saved.
And step two, performing depth convolution on the acquired image, and establishing a visual perception model based on linear multi-step residual error learning.
And S1, acquiring an image with the height H and the width W (marked as H x W) in real time, and extracting the global scene information of the image under different scales.
And S2, extracting task features of the global scene information to obtain semantic segmentation information or depth estimation information.
And S3, performing depth separable convolution on the global scene information and the semantic segmentation information or the depth estimation information. The depth separable convolution is divided into two parts: depth convolution and point-by-point convolution.
S4, linear multi-step residual error learning is carried out on the global scene information, the semantic segmentation information or the depth estimation information to obtain a visual perception model:
xn+1=kn*(bn(xn)+xn)+bn(bn(xn)+xn)+(1-kn)xn(1)
wherein k isnLearnable parameters for module n within the visual perception model, bnIs the nth processing unit, xnOriginal information, x, input for the nth processing unitn+1Is the processed information.
By utilizing the depth separable convolution and respectively considering the image area and the channel, the separation of the channel and the area is realized, and the model parameter quantity can be reduced. Meanwhile, deep-level associated information can be mined and the characteristics of a specific task can be extracted by utilizing a linear multi-step algorithm, so that the accuracy and the efficiency of the model can be improved.
And step three, training the visual perception model.
Configuring learnable parameter k of visual perception model using a pyrrch deep learning frameworknAnd simultaneously configuring training parameters of the visual perception model: the optimization function is set to Adam's algorithm, the base learning rate is set to 5e-3, batch _ size is set to 2, and the total number of iterations is set to 200. And then importing the data in the training set into a visual perception model for iterative training.
And step four, verifying the visual perception model.
Using the learnable parameters k saved in the verification set and step threenAnd comparing and verifying the visual perception model by using a basic learning rate.
Comparing and verifying the semantic segmentation graph and the depth graph output by the model obtained by the method, and the semantic segmentation graph and the depth graph output by other models and comparing and verifying the semantic segmentation graph and the depth graph output by other models by adopting a semantic segmentation evaluation index Pixel Precision (PA) and a mean intersection unit (mIoU), and an absolute Error (Absolute Error) and a Relative Error (Relative Error).
1. Pixel precision PA:
Figure BDA0002515014560000061
and k is the number of the target classes, Pii is the total number of the pixel points which belong to the i class and are predicted to be the i class, and Pij is the total number of the pixel points which belong to the i class and are predicted to be the j class.
2. Homozygosity ratio mIoU:
Figure BDA0002515014560000071
wherein k +1 is k target classes and a background class, and Pji is the total number of pixels which belong to j classes and are predicted to be i classes.
3. Absolute error Abs Err:
Figure BDA0002515014560000072
wherein Y (i, j) is the true depth value,
Figure BDA0002515014560000073
for the predicted depth values, m is the height of the image and n is the width of the image.
4. Relative error Rel Err:
Figure BDA0002515014560000075
the invention is compared and verified with MTAN model and Dense model. The MTAN model is mentioned in Shikun Liu, Edward Jons et al. The evaluation parameter calculation is respectively carried out on the three models by utilizing 7 types of semantic segmentation and depth estimation results in the CityScaps verification set, and the calculation results are shown in Table 1
TABLE 1
Figure BDA0002515014560000074
As can be seen from Table 1, the invention greatly reduces the model parameters without affecting the average cross-over ratio, the pixel precision, the relative error and the absolute error of the model, thereby improving the efficiency of the visual perception model and lightening the model.
The invention also provides a visual perception device based on linear multi-step residual error learning, which comprises a data set establishing module, a data set verifying module and a data set selecting module, wherein the data set establishing module is used for establishing a training set and a verifying set; the model establishing module is used for establishing a visual perception model; and the model training module is used for training the visual perception model by a training set.
The model building module comprises: and the shared feature extraction module is used for extracting shared features to obtain global scene information, and performing linear multi-step residual error learning after performing deep convolution on the global scene information. And the semantic segmentation module is used for extracting semantic segmentation information from the global scene information, performing deep convolution on the semantic segmentation information and then performing linear multi-step residual error learning. And the depth estimation module is used for extracting depth estimation information from the global scene information, performing depth convolution on the depth estimation information and then performing linear multi-step residual error learning.
1. And sharing the characteristic module.
The shared characteristic module is used for extracting global scene information and transmitting the global scene information to the depth estimation module or the semantic segmentation module.
The shared characteristic module mainly adopts an encoding-decoding (encoder-decoder) architecture. The coding layer is provided with five modules which are connected in sequence, namely a first feature coding block, a second feature coding block, a third feature coding block, a fourth feature coding block and a fifth feature coding block. Each module is followed by a maximum pooling layer for image down-sampling by a factor of 2. The decoding layer has five modules for decoding information. The five decoding modules are connected in sequence, and a first characteristic decoding block, a second characteristic decoding block, a third characteristic decoding block, a fourth characteristic decoding block and a fifth characteristic decoding block are connected in sequence. And a maximum despooling layer is arranged in front of each decoding block and is used for image upsampling, and the upsampling multiple is 2.
The first feature encoding module to share the feature module uses two basic convolution blocks that use mainly the standard 3x3 convolution, batch normalization and activation function Relu. Other feature coding modules are used for linear multi-step residual learning of global scene information.
2. Depth estimation module
And the depth estimation module is used for extracting depth estimation information from the global scene information, performing depth convolution on the depth estimation information and then performing linear multi-step residual error learning.
The depth estimation module also employs an encoding-decoding (encoder-decoder) architecture. The coding layer is mainly used for coding the global scene information of the shared characteristic module and extracting depth estimation information from the global scene information. The coding layer comprises five modules, namely a first task characteristic coding block, a second task characteristic coding block, a third task characteristic coding block, a fourth task characteristic coding block and a fifth task characteristic coding block. Each module is followed by a maximum pooling layer for image down-sampling by a factor of 2. The input of the first task feature coding block is from an original image, the input of the other four modules is from two parts, the output of the last task module after down sampling and the output of the feature coding block corresponding to the shared feature module after down sampling. The decoding layer also contains five modules. The decoding layer is mainly used to decode shared features, as well as features that tend to estimate the depth of the image. The decoding layer comprises five modules, namely a first task characteristic coding block, a second task characteristic coding block, a third task characteristic coding block, a fourth task characteristic coding block and a fifth task characteristic coding block. Immediately before each module is a maximum deslustering layer for image upsampling by a factor of 2. The input of each task characteristic decoding block is from two parts, namely the up-sampled output of the last task characteristic decoding block and the up-sampled output corresponding to the decoding layer of the sharing module.
The first task feature encoding block of the depth estimation module depth convolves the depth estimation information with two 3x3 standard convolutions. Other task feature coding modules are used for linear multi-step residual learning of depth estimation information.
The input of the depth estimation module mainly comes from two parts, namely information in the corresponding shared feature module and information of a previous layer of the module. And (3) performing cat function on the two parts of information, performing convolution lightweight after the two parts of information are connected on channels of the two parts of information, and extracting the geometric information of the object surface which is beneficial to depth estimation. The final output of the depth estimation module is fed into a task prediction module that estimates the depth map using two layers of standard 3x3 convolution, allowing accurate predictions to be made for each pixel in the image.
3. Semantic segmentation module
And the semantic segmentation module is used for extracting semantic segmentation information from the global scene information, performing deep convolution on the semantic segmentation information and then performing linear multi-step residual error learning.
The semantic segmentation module adopts the same coding-decoding architecture as the depth estimation module. The first task feature encoding block of the semantic segmentation module performs a deep convolution using two 3x3 standard convolution semantic segmentation information. Other task feature coding modules are used for linear multi-step residual learning of semantic segmentation information.
The input of the semantic segmentation module mainly comes from two parts, namely information in the corresponding shared characteristic module and information of a previous layer of the module. And (3) performing cat function on the two parts of information, performing convolution lightweight after the two parts of information are connected on channels of the two parts of information, and extracting object semantic information beneficial to semantic segmentation. The final output of the semantic segmentation module is fed into a task prediction module that estimates the segmentation map using two layers of standard 3x3 convolution, allowing accurate predictions to be made for each pixel in the image.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the claims of the present invention which are filed as the application.

Claims (7)

1. A visual perception method based on linear multi-step residual learning is characterized by comprising the following steps:
acquiring a real-time image;
performing visual perception on the image by adopting a pre-established visual perception model to acquire semantic information or distance parameters in the image;
the visual perception model is established by performing depth convolution on the acquired image and based on linear multi-step residual learning;
the visual perception model takes data in a training set as a training sample, inputs original shared features and task features of an image, and outputs a semantic segmentation map or a depth map.
2. The visual perception method based on linear multi-step residual learning according to claim 1, characterized in that: the training set is a data set of an original Cityscape training set after random rotation and horizontal turnover processing.
3. The visual perception method based on linear multi-step residual learning according to claim 1, characterized in that: the deep convolution of the acquired image and the establishment of the visual perception model based on the linear multi-step residual error learning comprise the following steps:
carrying out shared feature extraction on the acquired real-time image to obtain global scene information;
extracting task features of the global scene information to obtain semantic segmentation information or depth estimation information;
and performing depth convolution on the global scene information and the semantic segmentation information or the depth estimation information, and then performing linear multi-step residual learning to obtain a visual perception model.
4. The visual perception method based on linear multi-step residual learning according to claim 3, characterized in that: the depth convolution is a depth separable convolution including a depth convolution and a point-by-point convolution.
5. The visual perception method based on linear multi-step residual learning according to claim 1, characterized in that: the visual perception model is calculated by formula (1):
xn+1=kn*(bn(xn)+xn)+bn(bn(xn)+xn)+(1-kn)xn(1)
wherein k isnLearnable parameters for module n within the visual perception model, bnIs the nth processing unit, xnOriginal information, x, input for the nth processing unitn+1Is the processed information.
6. A visual perception device based on linear multi-step residual learning is characterized in that: comprises that
The data set establishing module is used for establishing a training set and a verification set;
the model establishing module is used for establishing a visual perception model;
and the model training module is used for training the visual perception model by a training set.
7. The visual perception device based on linear multi-step residual learning according to claim 6, wherein: the model building module comprises:
the shared feature extraction module is used for extracting shared features to obtain global scene information, performing deep convolution on the global scene information and then performing linear multi-step residual error learning;
the semantic segmentation module is used for extracting semantic segmentation information from global scene information, performing deep convolution on the semantic segmentation information and then performing linear multi-step residual error learning;
and the depth estimation module is used for extracting depth estimation information from the global scene information, performing depth convolution on the depth estimation information and then performing linear multi-step residual error learning.
CN202010473221.6A 2020-05-29 2020-05-29 Visual perception method and visual perception device based on linear multi-step residual learning Active CN111738267B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010473221.6A CN111738267B (en) 2020-05-29 2020-05-29 Visual perception method and visual perception device based on linear multi-step residual learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010473221.6A CN111738267B (en) 2020-05-29 2020-05-29 Visual perception method and visual perception device based on linear multi-step residual learning

Publications (2)

Publication Number Publication Date
CN111738267A true CN111738267A (en) 2020-10-02
CN111738267B CN111738267B (en) 2023-04-18

Family

ID=72647974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010473221.6A Active CN111738267B (en) 2020-05-29 2020-05-29 Visual perception method and visual perception device based on linear multi-step residual learning

Country Status (1)

Country Link
CN (1) CN111738267B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN108280814A (en) * 2018-02-08 2018-07-13 重庆邮电大学 Light field image angle super-resolution rate method for reconstructing based on perception loss
CN108764112A (en) * 2018-05-23 2018-11-06 上海理工大学 A kind of Remote Sensing Target object detecting method and equipment
CN108876737A (en) * 2018-06-06 2018-11-23 武汉大学 A kind of image de-noising method of joint residual error study and structural similarity
US20190147335A1 (en) * 2017-11-15 2019-05-16 Uber Technologies, Inc. Continuous Convolution and Fusion in Neural Networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US20190147335A1 (en) * 2017-11-15 2019-05-16 Uber Technologies, Inc. Continuous Convolution and Fusion in Neural Networks
CN108280814A (en) * 2018-02-08 2018-07-13 重庆邮电大学 Light field image angle super-resolution rate method for reconstructing based on perception loss
CN108764112A (en) * 2018-05-23 2018-11-06 上海理工大学 A kind of Remote Sensing Target object detecting method and equipment
CN108876737A (en) * 2018-06-06 2018-11-23 武汉大学 A kind of image de-noising method of joint residual error study and structural similarity

Also Published As

Publication number Publication date
CN111738267B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN113780296B (en) Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN111860227B (en) Method, apparatus and computer storage medium for training trajectory planning model
CN110879994A (en) Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
SE541962C2 (en) Method and apparatus for detecting vehicle contour based on point cloud data
CN111209780A (en) Lane line attribute detection method and device, electronic device and readable storage medium
CN112183482A (en) Dangerous driving behavior recognition method, device and system and readable storage medium
CN108288047A (en) A kind of pedestrian/vehicle checking method
CN111860072A (en) Parking control method and device, computer equipment and computer readable storage medium
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN111091023A (en) Vehicle detection method and device and electronic equipment
CN112613434A (en) Road target detection method, device and storage medium
US20230326055A1 (en) System and method for self-supervised monocular ground-plane extraction
CN114037640A (en) Image generation method and device
CN115147598A (en) Target detection segmentation method and device, intelligent terminal and storage medium
CN115249321A (en) Method for training neural network, system for training neural network and neural network
Ouyang et al. PV-EncoNet: Fast object detection based on colored point cloud
CN115861601A (en) Multi-sensor fusion sensing method and device
CN114048536A (en) Road structure prediction and target detection method based on multitask neural network
CN114067142A (en) Method for realizing scene structure prediction, target detection and lane level positioning
CN112654998B (en) Lane line detection method and device
CN111738267B (en) Visual perception method and visual perception device based on linear multi-step residual learning
CN117115690A (en) Unmanned aerial vehicle traffic target detection method and system based on deep learning and shallow feature enhancement
CN109657556B (en) Method and system for classifying road and surrounding ground objects thereof
CN116824317A (en) Water infrared target detection method based on multi-scale feature self-adaptive fusion
CN111144361A (en) Road lane detection method based on binaryzation CGAN network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant