CN113591770A - Multimode fusion obstacle detection method and device based on artificial intelligence blind guiding - Google Patents

Multimode fusion obstacle detection method and device based on artificial intelligence blind guiding Download PDF

Info

Publication number
CN113591770A
CN113591770A CN202110913691.4A CN202110913691A CN113591770A CN 113591770 A CN113591770 A CN 113591770A CN 202110913691 A CN202110913691 A CN 202110913691A CN 113591770 A CN113591770 A CN 113591770A
Authority
CN
China
Prior art keywords
feature map
neural network
images
convolutional neural
infrared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110913691.4A
Other languages
Chinese (zh)
Other versions
CN113591770B (en
Inventor
秦文健
张旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202110913691.4A priority Critical patent/CN113591770B/en
Publication of CN113591770A publication Critical patent/CN113591770A/en
Priority to PCT/CN2021/138104 priority patent/WO2023015799A1/en
Application granted granted Critical
Publication of CN113591770B publication Critical patent/CN113591770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multimode fusion obstacle detection method based on artificial intelligence blind guiding, which comprises the following steps: the method comprises the steps that an infrared camera and a color camera are respectively responsible for acquiring an infrared image and a color image of a scene; the acquired infrared and color bimodal images are respectively transmitted to a convolutional neural network Q1 and a convolutional neural network Q2, and the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel feature map and a second multichannel feature map to be flattened into vectors at the back; vectorizing and representing the first multi-channel feature map and the second multi-channel feature map, and performing feature vector coding on the first multi-channel feature map and the second multi-channel feature map sequence to generate a plurality of prediction vectors; the invention discloses a method for detecting obstacles, which is characterized in that a Transformer structure is introduced in the process of detecting the obstacles, so that multi-mode fusion is more effectively realized, a Transformer-block is introduced, the characteristics of infrared and color images are fully fused, and the obstacle detection precision under the low-illumination situation is improved.

Description

Multimode fusion obstacle detection method and device based on artificial intelligence blind guiding
Technical Field
The invention relates to the technical field of natural image processing, in particular to a multimode fusion obstacle detection method and device based on artificial intelligence blind guiding.
Background
According to the statistics of Chinese Union, at least 500 million blind people are in China at present, and the number of blind people is increased year by year with the aging of population. The 'guiding blind for blind people' is always a hot research problem. Before the rise of artificial intelligence, intelligent blind guiding is always the blind guiding solution pursued by researchers. This pursuit is increasingly becoming a reality as artificial intelligence begins to break out in this century. The appearance of deep learning and convolutional neural networks enables computer vision to apply traditional blind guiding technologies which rely on ultrasonic waves and the like to avoid obstacles gradually, and the problem that obstacle detection is complex and difficult to process is solved.
At present, most of the latest blind guiding technologies applying deep target detection upload collected images to a server, then the collected images are processed by a network trained by a supervised or unsupervised method, and blind guiding is performed by combining other sensing information. The method fully utilizes the advantages of processing complex images by deep learning, and has good performance under the general blind guiding situation. Experiments show that the blind guiding device can accurately identify common objects such as a garbage can, a chair, people and the like in the life scene of the blind through deep learning. Although this type of method works well, the detection results are not satisfactory for dark scenes. Most of vision-based blind guiding technologies are realized by using a color image training network under bright illumination, and a bright image of a dark scene is difficult to obtain. One solution is multi-modal image fusion, i.e., acquiring an infrared image and a common color image of a dark scene, and obtaining a more reliable detection result by respectively extracting and fusing the characteristics of the infrared image and the color image. In dark scenes, the validity of the features of the color image is greatly reduced, the object contour is not easy to identify, but the infrared image can obtain the object contour information more easily. The characteristics of the two images extracted by the neural network are fused by a certain method, so that the target detection performance of the neural network can be greatly improved. Most of the existing multi-modal image fusion is based on CNN, and the CNN can not be fully fused when fusing multi-modal characteristics, so a Transformer structure is introduced, so that the image characteristics of different modalities can be fully fused, and the detection precision is improved.
At present, obstacle detection methods of blind guiding equipment can be divided into traditional vision-free methods, traditional machine vision methods and machine vision methods based on deep learning.
(1) Most of the traditional vision-free sensors only use ultrasonic and infrared sensors, the judgment of the barrier is only limited to the azimuth distance, and the precision is low;
(2) the traditional machine vision mainly utilizes a pre-written algorithm to identify the characteristics of a target in an image, and the method has weak migration capability and no intelligence;
(3) the machine vision method based on deep learning trains the characteristics of the learning images through the data set, images of various scenes can be recognized, target detection is carried out, the detection effect is good, but in a dark scene, the color images can obtain little object information, and obstacles are difficult to effectively detect.
(4) The CNN-based multi-modal obstacle detection method can extract infrared and color bimodal image features for fusion so as to better detect obstacles, but the features cannot be fully fused.
Disclosure of Invention
The invention aims to introduce a Transformer structure in the process of detecting an obstacle, more effectively realize multi-mode fusion, introduce a Transformer block, fully fuse the characteristics of infrared and color images and improve the obstacle detection precision under the low-illumination situation.
In a first aspect, the invention provides a multimode fusion obstacle detection method based on artificial intelligence blind guiding, which comprises the following steps:
the method comprises the steps that an infrared camera and a color camera are respectively responsible for acquiring an infrared image and a color image of a scene;
the acquired infrared and color bimodal images are respectively transmitted to a convolutional neural network Q1 and a convolutional neural network Q2, and the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel feature map and a second multichannel feature map to be flattened into vectors at the back;
vectorizing and representing the first multi-channel feature map and the second multi-channel feature map, and performing feature vector coding on the first multi-channel feature map and the second multi-channel feature map sequence to generate a plurality of prediction vectors;
and performing classification and position prediction on the generated multiple prediction vectors.
Preferably, the acquired infrared and color bimodal images are respectively transmitted to a convolutional neural network Q1 and a convolutional neural network Q2, the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel feature map and a second multichannel feature map, and the subsequent flattening as a vector specifically includes respectively inputting color images of different specifications or infrared images into a VGG-16 backbone network through scaling, padding and deformation to 227 × 227 size, cutting off full connection layers, that is, obtaining 512 7 × 7 feature maps through convolution pooling.
Preferably, the vectorization of the first multi-channel feature map and the second multi-channel feature map is performed, feature vector coding is performed on the first multi-channel feature map and the second multi-channel feature map sequence, and the generation of the plurality of prediction vectors includes that the first multi-channel feature map and the second multi-channel feature map of the infrared and color images are firstly flattened to obtain 512 × 49 feature maps, then the feature maps are regarded as 49 512-dimensional feature vectors, so that the slices can fully notice each other between pixels, and then the two modal vectors are spliced into 98 feature vectors with the length of 512 dimensions.
Preferably, the classifying and the predicting the position of the generated multiple prediction vectors specifically include performing a loss calculation on the multiple prediction vectors through a set loss function and a label.
Preferably, before performing the loss calculation on the plurality of prediction vectors by the set loss function and the labels, the bipartite graph matching method further includes finding the best match between one prediction vector and a label, then calculating the category loss by using cross entropy, and calculating the position loss in a regression manner to be the global loss.
In a second aspect, the invention further provides a multimode fusion obstacle detection device based on artificial intelligence blind guiding, which comprises
The image acquisition module consists of an infrared camera and a color camera and is used for respectively acquiring an infrared image and a color image of a scene;
the characteristic extraction module is used for acquiring infrared and color bimodal images and respectively transmitting the infrared and color bimodal images to a convolutional neural network Q1 and a convolutional neural network Q2, wherein the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel characteristic diagram and a second multichannel characteristic diagram, and the images are flattened into vectors in preparation for later use;
the feature fusion module is used for vectorizing and representing the first multi-channel feature map and the second multi-channel feature map, and performing feature vector coding on the first multi-channel feature map and the second multi-channel feature map sequence to generate a plurality of prediction vectors;
a classification module to classify and position predict the generated plurality of prediction vectors.
Preferably, the feature fusion module comprises an encoder and a decoder.
Preferably, the encoder comprises an embedded tokens, a regularization layer, a multi-head self-attention layer and a feedforward neural network layer; the decoder includes a regularization layer, a multi-headed self-attention layer, and a feed-forward neural network layer.
The method of the invention has the following advantages:
in the invention, a Transformer structure is introduced in the process of detecting the obstacle, multi-mode fusion is more effectively realized, a Transformer-block is introduced, the characteristics of infrared and color images are fully fused, and the obstacle detection precision under the low-illumination situation is improved.
Drawings
Fig. 1 is a flow chart of a multimode fusion obstacle detection method based on artificial intelligence blind guiding provided by the invention.
Fig. 2 is a schematic diagram of a sensor space structure of the multi-modal fusion obstacle detection method based on artificial intelligence blind guiding provided by the invention.
FIG. 3 is a schematic diagram of a feature fusion module provided in the present invention.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention. In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
As shown in fig. 1, the invention provides a multi-modal fusion obstacle detection method based on artificial intelligence blind guiding,
the method comprises the following steps:
s1, acquiring an infrared image and a color image of a scene through the infrared camera and the color camera respectively;
s2, respectively transmitting the acquired infrared and color bimodal images to a convolutional neural network Q1 and a convolutional neural network Q2, respectively converting the images into a first multichannel feature map and a second multichannel feature map by the convolutional neural network Q1 and the convolutional neural network Q2, and flattening the images into vectors in preparation for later use;
s3, vectorizing and representing the first multichannel feature map and the second multichannel feature map, and performing feature vector coding on the first multichannel feature map and the second multichannel feature map sequence to generate a plurality of prediction vectors;
and S4, classifying and predicting the position of the generated prediction vectors.
The acquired infrared and color bimodal images are respectively transmitted to a convolutional neural network Q1 and a convolutional neural network Q2, the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel feature map and a second multichannel feature map, the subsequent flattening is used as a vector, specifically, the method comprises the steps of respectively inputting color images with different specifications or infrared images into a VGG-16 backbone network through scaling, padding and deformation to obtain 227 × 227 images, cutting off full connection layers, and obtaining 512 7 × 7 feature maps after convolution pooling.
Vectorizing and representing the first multichannel feature map and the second multichannel feature map, performing feature vector coding on the first multichannel feature map and the second multichannel feature map sequence, and generating a plurality of prediction vectors, wherein the step of flattening the first multichannel feature map and the second multichannel feature map of the infrared and color images to obtain 512 × 49 feature maps, and then regarding the feature maps as 49 512-dimensional feature vectors, so that the slicing can give full attention to each other between pixels, and then splicing the two modal vectors into 98 feature vectors with the length of 512 dimensions.
The classifying and position predicting the generated multiple prediction vectors specifically comprises performing loss calculation on the multiple prediction vectors through a set loss function and a label.
Before loss calculation is carried out on a plurality of prediction vectors through a set loss function and labels, a bipartite graph matching method is used, the best matching of one prediction vector and one label is found, then the cross entropy is used for calculating the category loss, and the position loss calculated in a regression mode is added up to be the global loss.
As shown in fig. 2, the invention further provides a multi-modal fusion obstacle detection device based on artificial intelligence blind guiding, which comprises an image acquisition module, a feature extraction module, a feature fusion module and a classification module.
An image acquisition module: the system consists of an infrared camera and a color camera which are respectively responsible for acquiring an infrared image and a color image of a scene.
And the feature extraction module is used for acquiring infrared and color bimodal images and respectively transmitting the infrared and color bimodal images to the convolutional neural network Q1 and the convolutional neural network Q2, and the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel feature map and a second multichannel feature map so as to be flattened into vectors in the future. In the module, the convolutional neural network can select a classic CNN network framework, such as VGG-16 and the like. The color images with different specifications are transformed into images with the size of 227 × 227 through scaling and padding, and the images are respectively input into a VGG-16 backbone network, and full connection layers are cut off, namely 512 characteristic maps of 7 × 7 are obtained after convolution pooling. Similarly, 512 characteristic maps of 7 × 7 are obtained for the infrared image.
A feature fusion module (a transform block, a feature fusion module) configured to perform vectorization representation on the first multi-channel feature map and the second multi-channel feature map, perform feature vector coding on the first multi-channel feature map and the second multi-channel feature map, and generate a plurality of prediction vector feature fusion modules, as shown in fig. 3, where the feature fusion module mainly includes an encoder and a decoder. The encoder comprises an embedded tokens (embedded token), a regularization layer, a Multi-head self-attention (Multi-head attention) and a feed-forward neural network layer; the decoder includes a regularization layer, a multi-headed self-attention layer, and a feed-forward neural network layer.
The function of the embedded tokens is to vectorize the image to make the image into an input form conforming to the transform-Encoder. The Transformer is the model based on Natural Language Processing (NLP) from the beginning, and the Encoder inputs word vectors, so when the Transformer is used for processing images, image information needs to be converted into vector form and input into the encode. The embedded tokens are vector forms obtained by the two mode images through a convolutional neural network, and thus the embedded tokens conform to the input form of an encoder.
The module firstly flattens multi-channel feature graphs of the infrared and color images respectively to obtain 512-49 feature graphs, then considers the feature graphs as 49 512-dimensional feature vectors, so that the slicing can make the pixels sufficiently notice each other, then splices the two modal vectors into 98 feature vectors with the length of 512, inputs the 98 feature vectors into a transform block, encodes the feature vectors by an Encoder, and sends results to a classification module, wherein the number of the encoders is selectable, and the performance of the encoders is improved to a certain extent.
In each Encoder, the feature vector is linearly added to its corresponding position-coding vector and input to a Multi-head integration composed of a plurality of single-headed Self-integrations. A single-headed Self-attention case is introduced: assuming that the input is X, the single attention is to subject X to three different linear changes Wq, Wk, and Wv, and three results are represented as query, key, and value, respectively, which are denoted as Q-WqX, K-WkX, and V-WvX. Each Q is then multiplied by each K, denoted QKT, through a layer of softmax and then multiplied by the corresponding V to obtain the final result O ═ softmax (qkt) V.
This is the attention of a single head, and a Multi-head (Multi-head) is to cut an input X into n segments, perform linear transformation respectively, and then splice back after the transformation.
And then, performing residual error connection on the previous input X and output O, then making a layer norm (regularization layer), inputting the norm finished result into a feed forward neural network layer in a full connection layer, and finishing an encoder after residual error connection and regularization are performed again.
Decoder, basically the same as encoder structure, except that the input is not just a quantity, multiplied by Wq is object queries (object queries), this vector acts as a different local in the attention map, similar to what is somewhere in the "query" graph, the vector is initialized randomly, and can be obtained by training, the output of encoder operates with Wv, Wk at the same time, wherein the output that operates with Wk is also added with position coding in advance.
A classification module to classify and position predict the generated plurality of prediction vectors. The classification module is mainly a set loss function. The parallel vectors output by the decoder are prediction vectors, and the number of object queries vectors determines the number of prediction classes. Each vector predicts the category and position information of a target, and loss calculation is carried out through a set loss function and a label (grountrith). Before calculation, the best match between a prediction vector and a label is found by using a bipartite graph matching method (Hungarian algorithm), then the category loss is calculated by using cross entropy, and the position loss calculated by a regression mode is added up to be the global loss.
In the invention, a Transformer structure is introduced in the process of detecting the obstacle, multi-mode fusion is more effectively realized, a Transformer-block is introduced, the characteristics of infrared and color images are fully fused, and the obstacle detection precision under the low-illumination situation is improved.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (8)

1. A multimode fusion obstacle detection method based on artificial intelligence blind guiding is characterized by comprising the following steps: comprises that
The method comprises the steps that an infrared camera and a color camera are respectively responsible for acquiring an infrared image and a color image of a scene;
the acquired infrared and color bimodal images are respectively transmitted to a convolutional neural network Q1 and a convolutional neural network Q2, and the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel feature map and a second multichannel feature map to be flattened into vectors at the back;
vectorizing and representing the first multi-channel feature map and the second multi-channel feature map, and performing feature vector coding on the first multi-channel feature map and the second multi-channel feature map sequence to generate a plurality of prediction vectors;
and performing classification and position prediction on the generated multiple prediction vectors.
2. The method for detecting the multi-modal fusion obstacles based on artificial intelligence blind guiding as claimed in claim 1, characterized in that:
the acquired infrared and color bimodal images are respectively transmitted to a convolutional neural network Q1 and a convolutional neural network Q2, the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel feature map and a second multichannel feature map, the subsequent flattening is used as a vector, specifically, the method comprises the steps of respectively inputting color images with different specifications or infrared images into images with the size of 227 x 227 through scaling, padding and deformation into a VGG-16 backbone network, cutting off full connection layers, and obtaining 512 feature maps with 7 x 7 after convolution pooling.
3. The method for detecting the multi-modal fusion obstacles based on artificial intelligence blind guiding as claimed in claim 1, characterized in that: vectorizing and representing the first multichannel feature map and the second multichannel feature map, performing feature vector coding on the first multichannel feature map and the second multichannel feature map sequence, and generating a plurality of prediction vectors, wherein the step of flattening the first multichannel feature map and the second multichannel feature map of the infrared and color images to obtain 512 × 49 feature maps, and then regarding the feature maps as 49 512-dimensional feature vectors, so that the slicing can give full attention to each other between pixels, and then splicing the two modal vectors into 98 feature vectors with the length of 512 dimensions.
4. The method for detecting the multi-modal fusion obstacles based on artificial intelligence blind guiding as claimed in claim 1, characterized in that: the classifying and position predicting the generated multiple prediction vectors specifically comprises performing loss calculation on the multiple prediction vectors through a set loss function and a label.
5. The method for detecting the multi-modal fusion obstacles based on artificial intelligence blind guiding as claimed in claim 4, characterized in that: before loss calculation is carried out on a plurality of prediction vectors through a set loss function and labels, a bipartite graph matching method is used, the best matching of one prediction vector and one label is found, then the cross entropy is used for calculating the category loss, and the position loss calculated in a regression mode is added up to be the global loss.
6. The utility model provides a multimode fuses barrier detection device based on artificial intelligence leads blind, its characterized in that: comprises that
The image acquisition module consists of an infrared camera and a color camera and is used for respectively acquiring an infrared image and a color image of a scene;
the characteristic extraction module is used for acquiring infrared and color bimodal images and respectively transmitting the infrared and color bimodal images to a convolutional neural network Q1 and a convolutional neural network Q2, wherein the convolutional neural network Q1 and the convolutional neural network Q2 respectively convert the images into a first multichannel characteristic diagram and a second multichannel characteristic diagram, and the images are flattened into vectors in preparation for later use;
the feature fusion module is used for vectorizing and representing the first multi-channel feature map and the second multi-channel feature map, and performing feature vector coding on the first multi-channel feature map and the second multi-channel feature map sequence to generate a plurality of prediction vectors;
a classification module to classify and position predict the generated plurality of prediction vectors.
7. The multi-modal fusion obstacle detection device based on artificial intelligence blind guiding of claim 6, wherein: the feature fusion module includes an encoder and a decoder.
8. The multi-modal fusion obstacle detection device based on artificial intelligence blind guiding of claim 7, wherein: the encoder comprises an embedded tokens, a regularization layer, a multi-head self-attention layer and a feedforward neural network layer; the decoder includes a regularization layer, a multi-headed self-attention layer, and a feed-forward neural network layer.
CN202110913691.4A 2021-08-10 2021-08-10 Multi-mode fusion obstacle detection method and device based on artificial intelligence blind guiding Active CN113591770B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110913691.4A CN113591770B (en) 2021-08-10 2021-08-10 Multi-mode fusion obstacle detection method and device based on artificial intelligence blind guiding
PCT/CN2021/138104 WO2023015799A1 (en) 2021-08-10 2021-12-14 Multimodal fusion obstacle detection method and apparatus based on artificial intelligence blindness guiding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110913691.4A CN113591770B (en) 2021-08-10 2021-08-10 Multi-mode fusion obstacle detection method and device based on artificial intelligence blind guiding

Publications (2)

Publication Number Publication Date
CN113591770A true CN113591770A (en) 2021-11-02
CN113591770B CN113591770B (en) 2023-07-18

Family

ID=78256776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110913691.4A Active CN113591770B (en) 2021-08-10 2021-08-10 Multi-mode fusion obstacle detection method and device based on artificial intelligence blind guiding

Country Status (2)

Country Link
CN (1) CN113591770B (en)
WO (1) WO2023015799A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114999637A (en) * 2022-07-18 2022-09-02 华东交通大学 Pathological image diagnosis method and system based on multi-angle coding and embedded mutual learning
WO2023015799A1 (en) * 2021-08-10 2023-02-16 中国科学院深圳先进技术研究院 Multimodal fusion obstacle detection method and apparatus based on artificial intelligence blindness guiding
CN116485729A (en) * 2023-04-03 2023-07-25 兰州大学 Multistage bridge defect detection method based on transformer

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071226B (en) * 2023-03-06 2023-07-18 中国科学技术大学 Electronic microscope image registration system and method based on attention network
CN116403163B (en) * 2023-04-20 2023-10-27 慧铁科技有限公司 Method and device for identifying opening and closing states of handles of cut-off plug doors
CN117274899B (en) * 2023-09-20 2024-05-28 中国人民解放军海军航空大学 Storage hidden danger detection method based on visible light and infrared light image feature fusion
CN117091588B (en) * 2023-10-16 2024-01-26 珠海太川云社区技术股份有限公司 Hospital diagnosis guiding method and system based on multi-mode fusion
CN117173639B (en) * 2023-11-01 2024-02-06 伊特拉姆成都能源科技有限公司 Behavior analysis and safety early warning method and system based on multi-source equipment
CN117726991B (en) * 2024-02-07 2024-05-24 金钱猫科技股份有限公司 High-altitude hanging basket safety belt detection method and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368118A (en) * 2020-02-13 2020-07-03 中山大学 Image description generation method, system, device and storage medium
CN111523534A (en) * 2020-03-31 2020-08-11 华东师范大学 Image description method
CN112418163A (en) * 2020-12-09 2021-02-26 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN112926700A (en) * 2021-04-27 2021-06-08 支付宝(杭州)信息技术有限公司 Class identification method and device for target image

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542626B2 (en) * 2013-09-06 2017-01-10 Toyota Jidosha Kabushiki Kaisha Augmenting layer-based object detection with deep convolutional neural networks
CN113591770B (en) * 2021-08-10 2023-07-18 中国科学院深圳先进技术研究院 Multi-mode fusion obstacle detection method and device based on artificial intelligence blind guiding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368118A (en) * 2020-02-13 2020-07-03 中山大学 Image description generation method, system, device and storage medium
CN111523534A (en) * 2020-03-31 2020-08-11 华东师范大学 Image description method
CN112418163A (en) * 2020-12-09 2021-02-26 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN112926700A (en) * 2021-04-27 2021-06-08 支付宝(杭州)信息技术有限公司 Class identification method and device for target image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邹伟 等: "基于多模态特征融合的自主驾驶车辆低辨识目标检测方法" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023015799A1 (en) * 2021-08-10 2023-02-16 中国科学院深圳先进技术研究院 Multimodal fusion obstacle detection method and apparatus based on artificial intelligence blindness guiding
CN114999637A (en) * 2022-07-18 2022-09-02 华东交通大学 Pathological image diagnosis method and system based on multi-angle coding and embedded mutual learning
CN116485729A (en) * 2023-04-03 2023-07-25 兰州大学 Multistage bridge defect detection method based on transformer
CN116485729B (en) * 2023-04-03 2024-01-12 兰州大学 Multistage bridge defect detection method based on transformer

Also Published As

Publication number Publication date
WO2023015799A1 (en) 2023-02-16
CN113591770B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN113591770A (en) Multimode fusion obstacle detection method and device based on artificial intelligence blind guiding
CN109800648B (en) Face detection and recognition method and device based on face key point correction
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN112132197B (en) Model training, image processing method, device, computer equipment and storage medium
CN115713679A (en) Target detection method based on multi-source information fusion, thermal infrared and three-dimensional depth map
IL281302B1 (en) Method and device for classifying objects
CN112651262A (en) Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN114663514B (en) Object 6D attitude estimation method based on multi-mode dense fusion network
CN112560579A (en) Obstacle detection method based on artificial intelligence
CN112348033B (en) Collaborative saliency target detection method
CN113989718A (en) Human body target detection method facing radar signal heat map
CN116434143A (en) Cross-modal pedestrian re-identification method and system based on feature reconstruction
CN116704554A (en) Method, equipment and medium for estimating and identifying hand gesture based on deep learning
CN116543338A (en) Student classroom behavior detection method based on gaze target estimation
CN112200840B (en) Moving object detection system in visible light and infrared image combination
CN114898429A (en) Thermal infrared-visible light cross-modal face recognition method
CN115049894A (en) Target re-identification method of global structure information embedded network based on graph learning
CN114694174A (en) Human body interaction behavior identification method based on space-time diagram convolution
CN117953590B (en) Ternary interaction detection method, system, equipment and medium
CN113191943B (en) Multi-path parallel image content characteristic separation style migration method and system
CN116185182B (en) Controllable image description generation system and method for fusing eye movement attention
CN117274690A (en) Weak supervision target positioning method based on multiple modes
CN117975499A (en) Complex scene pedestrian detection method and system and electronic equipment
CN116503949A (en) Video motion recognition method based on improved long-term cyclic convolution network
CN114898397A (en) ViT-fused cross-modal pedestrian re-identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant