CN112710310A - Visual language indoor navigation method, system, terminal and application - Google Patents
Visual language indoor navigation method, system, terminal and application Download PDFInfo
- Publication number
- CN112710310A CN112710310A CN202011428332.1A CN202011428332A CN112710310A CN 112710310 A CN112710310 A CN 112710310A CN 202011428332 A CN202011428332 A CN 202011428332A CN 112710310 A CN112710310 A CN 112710310A
- Authority
- CN
- China
- Prior art keywords
- visual
- information
- language
- robot
- indoor navigation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 102
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000007246 mechanism Effects 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 34
- 238000000605 extraction Methods 0.000 claims description 22
- 238000012216 screening Methods 0.000 claims description 16
- 230000009471 action Effects 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 4
- 238000011156 evaluation Methods 0.000 description 5
- 230000007547 defect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention belongs to the technical field of visual language navigation, and discloses a visual language indoor navigation method, a system, a terminal and application. The invention combines the visual information of the robot and the information of the natural language to carry out the indoor navigation of the robot, and adopts the attention mechanism to enable the robot to more effectively understand the language instruction of the human and combine the visual information, so that the robot can reach the destination according to the instruction of the human to complete the task. The invention mainly designs an attention mechanism which can effectively combine natural language and visual information to realize that the robot finds an optimal path in an unknown room.
Description
Technical Field
The invention belongs to the technical field of visual language navigation, and particularly relates to a visual language indoor navigation method, a system, a terminal and application.
Background
At present: the visual language navigation technology is a recently developed intelligent navigation method, and the navigation task requires that the robot reaches a specified target technology from an initial random position by using self-acquired visual image information under a given language instruction. For example, giving the robot a command "go straight down the hallway, enter the right bedroom, stop at the bedside of the bedroom", the robot follows the command, in combination with his own observations, to adjust the direction of progress continuously until the destination is reached. The method can be widely applied to a plurality of scenes such as unmanned vehicles, intelligent robots, unmanned delivery dining cars and the like. Unlike the task based on visual navigation, the visual language based navigation requires the use of comprehensive natural language information and computer visual information, and the robot continuously interacts with the acquired environment to acquire necessary information of the environment, thereby completing the designated task given by human. After integrating the elements of natural language information and computer vision information, the agent needs to plan its own actions.
Through the above analysis, the problems and defects of the prior art are as follows: in the prior art, on one hand, the computing power requirement is improved due to complex data, on the other hand, the key information is difficult to extract due to input information of multiple dimensions, and meanwhile, the problem of complexity and high degree of a network is also needed to be faced, so that the accuracy and efficiency of extracting the information are reduced.
The difficulty in solving the above problems and defects is: the main difficulties in solving the problems are: the system is complex, the information input dimensionality is high, the method particularly relates to the field of two artificial intelligence branches of natural language processing and computer vision, the improvement difficulty is high, and certain challenges are achieved.
The significance of solving the problems and the defects is as follows: the problems that the information is complicated and the key information cannot be extracted are solved, the complexity of calculation can be effectively reduced, the navigation effect is improved, the interference of noise and useless features on the model is reduced, the efficiency of the model is improved, and the accuracy of the model is increased.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a visual language indoor navigation method, a system, a terminal and application.
The invention is realized in such a way that the visual language indoor navigation method combines natural language commands and visual information by using a sequence-to-sequence method, respectively extracts the characteristics of the natural language command information and the visual image information, and respectively screens the attention characteristics of the extracted characteristics after completing the characteristic extraction to screen out the key information related to a task.
Furthermore, the visual language indoor navigation method carries out fusion coding on natural language command information and visual image information, and enables a depth model to pay attention to certain local information; the method comprises the steps of selectively screening out local information from a large amount of information, focusing on the local information, encoding a characteristic vector, decoding the vector, and decoding to obtain a command of the robot action.
Further, the visual language indoor navigation method specifically comprises the following steps:
firstly, initializing, namely inputting a language description instruction into a robot, wherein the robot is positioned at an initial position;
secondly, extracting natural language features of the language description instruction by using the LSTM;
thirdly, extracting key information of the language description instruction by using a natural language attention mechanism, and screening out interference of irrelevant information;
fourthly, extracting the visual features of the computer by using a CNN convolutional neural network for the acquired image;
fifthly, extracting visual key information from the acquired visual features in the fourth step by using a visual attention mechanism;
sixthly, mutually fusing the extracted visual key information in the fifth step and the key information of the language description instruction in the third step;
seventhly, extracting key information of the features fused in the sixth step by using an attention mechanism again;
eighthly, decoding and evaluating the key information obtained from the seventh step to obtain the advancing direction of the robot;
the ninth step, repeat the second step-the eighth step;
and step ten, reaching the destination and stopping advancing.
Further, the visual language indoor navigation method adopts a classical convolution neural network ResNet-50 network to extract features, before the ResNet-50 network extracts the features, the data of an international known image data set ImageNet is pre-trained, the trained ResNet-50 network is used for extracting feature vectors, and the feature vectors V of the panoramic image observed by the robot at the time tt:
Extracting attention feature vector v by using attention mechanismt:
vt=attention(Ht-1,Vt);
After evolution:
vt=∑jsoftmax(Ht-1Wh(WvVt)T)Vt;
Ht=LSTM([VtAt-1],Ht-1):
wherein v istRepresenting the feature vector, V, extracted by the attention mechanismtFeature vector, H, representing trained ResNet-50 network extractiont-1Represents the historical feature vector at time t-1 and HtThen represents the historical feature vector at time t, AtAnd At-1Respectively representing the actions taken by the machine at time t and at time t-1, WhAnd WυA weight matrix is represented.
Further, the visual language indoor navigation method is used for inputting a string of natural language instructions W (W)1,w2,w3...), the natural language instruction is composed of a string of words, features are extracted by using a long-short term memory neural network LSTM, and natural feature extraction is performed by using a C ═ LSTM (W), wherein C is the extracted features of the natural language, and the natural language features are re-extracted by using an attention mechanism, and the formal expression is that:
Yt=attention((Ht,C)。
further, the visual language indoor navigation method comprises the steps of respectively coding and extracting the robot visual information and the natural language information, then carrying out fusion attention extraction on the feature vectors of the robot visual information and the natural language information, fusing all extracted information and robot historical information, evaluating the next step of the robot, determining the probability P of the advancing direction, and determining the most walking direction of the robot according to the maximum probability:
Dt=attention(Yt,vt);
P=softmax([Ht,vt,Yt,Dt]WcWb);
wherein DtRepresenting the fused feature vector, P representing the probability of the heading, WcAnd WbRespectively, represent a weight matrix.
The invention also aims to provide a robot visual language navigation information data processing terminal which is used for realizing the visual language indoor navigation method.
Another object of the present invention is to provide a visual language indoor navigation system implementing the visual language indoor navigation method, the visual language indoor navigation system comprising:
a command and information combining module for combining natural language commands with visual information using a sequence-to-sequence approach;
the characteristic extraction module is used for respectively extracting the characteristics of the natural language command information and the visual image information;
and the key information screening module is used for screening the attention characteristics of the extracted characteristics respectively after completing the characteristic extraction, and screening the key information related to the task.
By combining all the technical schemes, the invention has the advantages and positive effects that: the attention mechanism is a method for taking human attention as a reference, when the human brain processes visual information, the human brain can quickly scan a global image to acquire a key area needing attention, and the efficiency and the accuracy of visual processing are greatly improved. The attention mechanism aims to select key information with important meanings from a plurality of information, is used for reference by natural language processing for the first time, and aims to screen out phrases with important semantics. Since then, attention mechanisms have been widely used in many scenarios, such as speech recognition, image processing, and the like. The invention combines the visual information of the robot and the information of the natural language to carry out the indoor navigation of the robot, and adopts the attention mechanism to enable the robot to more effectively understand the language instruction of the human and combine the visual information, so that the robot can reach the destination according to the instruction of the human to complete the task. The invention mainly designs an attention mechanism which can effectively combine natural language and visual information to realize that the robot finds an optimal path in an unknown room.
The visual language indoor navigation task provided by the invention needs to combine natural language command information and visual image information, and has large data volume and more related key information, so that if an attention mechanism is not used, the computational power requirement caused by complicated data is improved, and the problem of high complexity of the network needs to be faced. In order to improve the accuracy and efficiency of information extraction, the invention provides a visual language indoor navigation method based on an attention mechanism.
The invention carries out fusion coding on natural language command information and visual image information, and leads a depth model to pay attention to certain local information. The method comprises the steps of selectively screening out local information from a large amount of information, focusing on the local information, encoding a characteristic vector, decoding the vector, and decoding to obtain a command of the robot action. The attention mechanism is adopted in the process of encoding the feature vector, the extraction of the attention mechanism of the features of natural language and computer vision is different, and the fused features also need attention to extract key information, so that the encoding efficiency is higher, and the extracted information is more valuable.
The invention provides an advanced indoor navigation method of a robot in visual language, which effectively combines the combination of natural language commands and visual information, and can lead the robot to reach a destination in unknown indoor space according to human commands, thus leading the navigation to be more close to the application of a real scene. The invention designs an attention mechanism, and the attention mechanism can refine the language features and the visual features and refine the obtained information because a large amount of natural language information and visual information need to be obtained in the visual language indoor navigation, so that the obtained features are more precise. The interference of noise and useless characteristics to the model is reduced, the efficiency of the model is improved, and the accuracy of the model is increased.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a flowchart of a visual language indoor navigation method according to an embodiment of the present invention.
FIG. 2 is a schematic structural diagram of a visual language indoor navigation system provided by an embodiment of the present invention;
in the figure: 1. a command and information combining module; 2. a feature extraction module; 3. and a key information screening module.
Fig. 3 is a flowchart of an implementation of a visual language indoor navigation method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a visual language indoor navigation method, a system, a terminal and application thereof, and the invention is described in detail with reference to the accompanying drawings.
As shown in fig. 1, the visual language indoor navigation method provided by the present invention comprises the following steps:
s101: combining natural language commands with visual information by a sequence-to-sequence method;
s102: respectively extracting the characteristics of the natural language command information and the visual image information;
s103: after the feature extraction is completed, the attention features are respectively screened for the extracted features, and key information related to the task is screened out.
Those skilled in the art can also implement the visual language indoor navigation method provided by the present invention by adopting other steps, and the visual language indoor navigation method provided by the present invention in fig. 1 is only one specific embodiment.
As shown in fig. 2, the visual language indoor navigation system provided by the present invention comprises:
a command and information combining module 1 for combining natural language commands and visual information by a sequence-to-sequence method;
the characteristic extraction module 2 is used for respectively extracting the characteristics of the natural language command information and the visual image information;
and the key information screening module 3 is used for screening attention features of the extracted features respectively after feature extraction is completed, and screening key information related to the task.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
As shown in fig. 3, the method provided by the present invention is mainly applied to a robot language vision navigation module, and does not relate to the design of the whole robot, and the current implementation method mainly depends on a computer to simulate the module, and specifically includes the following steps:
firstly, initializing, namely inputting a language description instruction into a robot, wherein the robot is positioned at an initial position;
secondly, extracting natural language features of the language description instruction by using the LSTM;
thirdly, extracting key information of the language description instruction by using a natural language attention mechanism, and screening out interference of irrelevant information;
fourthly, extracting the visual features of the computer by using a CNN convolutional neural network for the acquired image;
fifthly, extracting visual key information from the acquired visual features in the fourth step by using a visual attention mechanism;
sixthly, mutually fusing the extracted visual key information in the fifth step and the key information of the language description instruction in the third step;
seventhly, extracting key information of the features fused in the sixth step by using an attention mechanism again;
eighthly, decoding and evaluating the key information obtained from the seventh step to obtain the advancing direction of the robot;
the ninth step, repeat the second step-the eighth step;
and step ten, reaching the destination and stopping advancing.
In the invention, for computer vision images, a ResNet-50 network is adopted to extract features, before the ResNet-50 extracts the features, the data subjected to ImageNet is pre-trained, the trained ResNet-50 is used for extracting feature vectors, and for the feature vector V of the panoramic image observed by the robot at the time ttIn a word:
extracting attention feature vector v by using attention mechanismt:
vt=attention(Ht-1,Vt) (1)
After evolution:
vt=∑jsoftmax(Ht-1Wh(WvVt)T)Vt (2)
Ht=LSTM([VtAt-1],Ht-1) (3)
wherein v istRepresenting the feature vector, V, extracted by the attention mechanismtFeature vector, H, representing trained ResNet-50 network extractiont-1Represents the historical feature vector at time t-1 and HtThen represents the historical feature vector at time t, AtAnd At-1Respectively representing the actions taken by the machine at time t and at time t-1, WhAnd WvA weight matrix is represented.
In the present invention, for an input string of natural language instructions W (W)1,w2,w3...), the natural language instruction is composed of a string of words, features are extracted by using LSTM, and natural feature extraction is performed by using C ═ LSTM (w), where C is the feature extracted from the natural language, and the natural language features need to be re-extracted by using attention mechanism, and the formal expression is:
Yt=attention((Ht,C) (4)
in the invention, after the robot visual information and the natural language information are respectively coded and extracted, because the natural language information is highly described to the visual information, the correlation between the visual information and the natural language information is good, then the feature vectors of the visual information and the natural language information are fused and extracted, then all the extracted information are fused and added with the history information of the robot, the step carried out by the next robot is evaluated, the probability P of the advancing direction is evaluated, and the most walking direction of the robot is determined according to the maximum probability:
Dt=attention(Yt,vt) (5)
P=softmax([Ht,vt,Yt,Dt]WcWb) (6)
wherein DtRepresenting the fused feature vector, P representing the probability of the heading, WcAnd WbRespectively, represent a weight matrix.
The effect of the invention is tested on the disclosed simulation data set R2R, the data set collects 99 data of different scenes, and the test result shows that the method provided by the invention obviously improves the navigation performance, and the test effect of the invention is as shown in Table 1.
TABLE 1 test results
Method | TL↓ | NE↓ | OSR↑ | SR↑ | SPL↑ |
Seq2Seq | 8.40 | 3.67 | 0.43 | 0.25 | 0.35 |
RCM | 10.65 | 3.53 | 0.75 | 0.46 | 0.43 |
Our | 7.86 | 3.54 | 0.78 | 0.53 | 0.58 |
In table 1, Our represents the method proposed by the present invention, seq2seq represents the existing basic navigation method, and RCM represents other well-known navigation methods. TL is path length evaluation, NE is navigation error evaluation, OSR is database success rate, SR is success rate, SPL is weighting success rate of reverse path length, and the five indexes are accepted indexes for internationally evaluating navigation accuracy. The arrow points downwards to indicate that the smaller the value is better under the evaluation criteria, and the opposite is true when the arrow points upwards to indicate that the larger the value is better under the evaluation criteria, and the bold font indicates that the best results are obtained. It can be seen from the table that, among the five evaluation indexes, four indexes are the best one obtained by the method provided by the invention, and one index is the second best one obtained by the method provided by the invention.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A visual language indoor navigation method is characterized in that a sequence-to-sequence method is utilized, natural language commands and visual information are combined, feature extraction is respectively carried out on natural language command information and visual image information, after the feature extraction is completed, attention features are respectively screened on the extracted features, and key information related to tasks is screened out.
2. The visual language indoor navigation method of claim 1, wherein the visual language indoor navigation method performs fusion coding of natural language command information and visual image information to let a depth model pay attention to a certain local information; the method comprises the steps of selectively screening out local information from a large amount of information, focusing on the local information, encoding a characteristic vector, decoding the vector, and decoding to obtain a command of the robot action.
3. The visual language indoor navigation method of claim 1, wherein the visual language indoor navigation method specifically comprises:
firstly, initializing, namely inputting a language description instruction into a robot, wherein the robot is positioned at an initial position;
secondly, extracting natural language features of the language description instruction by using the LSTM;
thirdly, extracting key information of the language description instruction by using a natural language attention mechanism, and screening out interference of irrelevant information;
fourthly, extracting the visual features of the computer by using a CNN convolutional neural network for the acquired image;
fifthly, extracting visual key information from the acquired visual features in the fourth step by using a visual attention mechanism;
sixthly, mutually fusing the extracted visual key information in the fifth step and the key information of the language description instruction in the third step;
seventhly, extracting key information of the features fused in the sixth step by using an attention mechanism again;
eighthly, decoding and evaluating the key information obtained from the seventh step to obtain the advancing direction of the robot;
the ninth step, repeat the second step-the eighth step;
and step ten, reaching the destination and stopping advancing.
4. The visual language indoor navigation method of claim 3, wherein the visual language indoor navigation method adopts a ResNet-50 network for feature extraction, the data of ImageNet is pre-trained before the ResNet-50 network extracts features, the trained ResNet-50 network is used for extracting feature vectors, and the feature vectors V of the panoramic image observed by the robot at the time t aret:
Extracting attention feature vector v by using attention mechanismt:
vt=attention(Ht-1,Vt);
After evolution:
vt=∑jsoftmax(Ht-1Wh(WvVt)T)Vt;
Ht=LSTM([VtAt-1],Ht-1);
wherein v istRepresenting the feature vector, V, extracted by the attention mechanismtFeature vector, H, representing trained ResNet-50 network extractiont-1Represents the historical feature vector at time t-1 and HtThen represents the historical feature vector at time t, AtAnd At-1Respectively representing the actions taken by the machine at time t and at time t-1, WhAnd WvA weight matrix is represented.
5. The visual language indoor navigation method of claim 3, wherein the visual language indoor navigation method is applied to an input string of natural language instructions W (W)1,w2,w3...), the natural language instruction is composed of a string of words, features are extracted by using LSTM, and natural feature extraction is performed by using C ═ LSTM (W), wherein C is the extracted features of the natural language, and the natural language features are re-extracted by using an attention mechanism, and the formal expression is that:
Yt=attention((Ht,C)。
6. the visual language indoor navigation method of claim 3, wherein the visual language indoor navigation method comprises the steps of respectively encoding and extracting the visual information and the natural language information of the robot, then performing fusion attention extraction on the feature vectors of the visual information and the natural language information of the robot, fusing all the extracted information and the historical information of the robot, evaluating the next step performed by the robot, estimating the probability P of the advancing direction, and determining the direction which the robot should most move according to the maximum probability:
Dt=attention(Yt,vt);
P=softmax([Ht,vt,Yt,Dt]WcWb);
wherein DtRepresenting the fused feature vector, P representing the probability of the heading, WcAnd WbRespectively, represent a weight matrix.
7. A robot visual language navigation information data processing terminal, characterized in that the robot visual language navigation information data processing terminal is used for realizing the visual language indoor navigation method of any one of claims 1 to 6.
8. A visual language indoor navigation system for implementing the visual language indoor navigation method according to any one of claims 1 to 6, wherein the visual language indoor navigation system comprises:
a command and information combining module for combining natural language commands with visual information using a sequence-to-sequence approach;
the characteristic extraction module is used for respectively extracting the characteristics of the natural language command information and the visual image information;
and the key information screening module is used for screening the attention characteristics of the extracted characteristics respectively after completing the characteristic extraction, and screening the key information related to the task.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011428332.1A CN112710310B (en) | 2020-12-07 | 2020-12-07 | Visual language indoor navigation method, system, terminal and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011428332.1A CN112710310B (en) | 2020-12-07 | 2020-12-07 | Visual language indoor navigation method, system, terminal and application |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112710310A true CN112710310A (en) | 2021-04-27 |
CN112710310B CN112710310B (en) | 2024-04-19 |
Family
ID=75542756
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011428332.1A Active CN112710310B (en) | 2020-12-07 | 2020-12-07 | Visual language indoor navigation method, system, terminal and application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112710310B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420606A (en) * | 2021-05-31 | 2021-09-21 | 华南理工大学 | Method for realizing autonomous navigation of robot based on natural language and machine vision |
CN113670310A (en) * | 2021-07-27 | 2021-11-19 | 际络科技(上海)有限公司 | Visual voice navigation method, device, equipment and storage medium |
CN113984052A (en) * | 2021-06-16 | 2022-01-28 | 北京小米移动软件有限公司 | Indoor navigation method, indoor navigation device, equipment and storage medium |
CN115082915A (en) * | 2022-05-27 | 2022-09-20 | 华南理工大学 | Mobile robot vision-language navigation method based on multi-modal characteristics |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130237179A1 (en) * | 2012-03-08 | 2013-09-12 | Rajesh Chandra Potineni | System and method for guided emergency exit |
CN108245384A (en) * | 2017-12-12 | 2018-07-06 | 清华大学苏州汽车研究院(吴江) | Binocular vision apparatus for guiding blind based on enhancing study |
CN108981712A (en) * | 2018-08-15 | 2018-12-11 | 深圳市烽焌信息科技有限公司 | Robot goes on patrol method and robot |
CN109491613A (en) * | 2018-11-13 | 2019-03-19 | 深圳龙岗智能视听研究院 | A kind of continuous data protection storage system and its storage method using the system |
CN110348462A (en) * | 2019-07-09 | 2019-10-18 | 北京金山数字娱乐科技有限公司 | A kind of characteristics of image determination, vision answering method, device, equipment and medium |
-
2020
- 2020-12-07 CN CN202011428332.1A patent/CN112710310B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130237179A1 (en) * | 2012-03-08 | 2013-09-12 | Rajesh Chandra Potineni | System and method for guided emergency exit |
CN108245384A (en) * | 2017-12-12 | 2018-07-06 | 清华大学苏州汽车研究院(吴江) | Binocular vision apparatus for guiding blind based on enhancing study |
CN108981712A (en) * | 2018-08-15 | 2018-12-11 | 深圳市烽焌信息科技有限公司 | Robot goes on patrol method and robot |
CN109491613A (en) * | 2018-11-13 | 2019-03-19 | 深圳龙岗智能视听研究院 | A kind of continuous data protection storage system and its storage method using the system |
CN110348462A (en) * | 2019-07-09 | 2019-10-18 | 北京金山数字娱乐科技有限公司 | A kind of characteristics of image determination, vision answering method, device, equipment and medium |
Non-Patent Citations (1)
Title |
---|
刘文清: "基于AI开放平台的机器视觉应用开发技术", 湖南电力, vol. 39, no. 6, pages 13 - 15 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420606A (en) * | 2021-05-31 | 2021-09-21 | 华南理工大学 | Method for realizing autonomous navigation of robot based on natural language and machine vision |
CN113984052A (en) * | 2021-06-16 | 2022-01-28 | 北京小米移动软件有限公司 | Indoor navigation method, indoor navigation device, equipment and storage medium |
CN113984052B (en) * | 2021-06-16 | 2024-03-19 | 北京小米移动软件有限公司 | Indoor navigation method, indoor navigation device, equipment and storage medium |
CN113670310A (en) * | 2021-07-27 | 2021-11-19 | 际络科技(上海)有限公司 | Visual voice navigation method, device, equipment and storage medium |
CN113670310B (en) * | 2021-07-27 | 2024-05-31 | 际络科技(上海)有限公司 | Visual voice navigation method, device, equipment and storage medium |
CN115082915A (en) * | 2022-05-27 | 2022-09-20 | 华南理工大学 | Mobile robot vision-language navigation method based on multi-modal characteristics |
CN115082915B (en) * | 2022-05-27 | 2024-03-29 | 华南理工大学 | Multi-modal feature-based mobile robot vision-language navigation method |
Also Published As
Publication number | Publication date |
---|---|
CN112710310B (en) | 2024-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112710310A (en) | Visual language indoor navigation method, system, terminal and application | |
CN113705769B (en) | Neural network training method and device | |
CN110766044B (en) | Neural network training method based on Gaussian process prior guidance | |
US20220004744A1 (en) | Human posture detection method and apparatus, device and storage medium | |
CN108960407B (en) | Recurrent neural network language model training method, device, equipment and medium | |
US11915128B2 (en) | Neural network circuit device, neural network processing method, and neural network execution program | |
CN112580369B (en) | Sentence repeating method, method and device for training sentence repeating model | |
Mishra et al. | The understanding of deep learning: A comprehensive review | |
CN109766557B (en) | Emotion analysis method and device, storage medium and terminal equipment | |
CN109817276A (en) | A kind of secondary protein structure prediction method based on deep neural network | |
CN114676234A (en) | Model training method and related equipment | |
CN111626134B (en) | Dense crowd counting method, system and terminal based on hidden density distribution | |
CN112733768A (en) | Natural scene text recognition method and device based on bidirectional characteristic language model | |
CN111259735B (en) | Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network | |
CN110807335A (en) | Translation method, device, equipment and storage medium based on machine learning | |
CN113159236A (en) | Multi-focus image fusion method and device based on multi-scale transformation | |
US11948078B2 (en) | Joint representation learning from images and text | |
CN113747168A (en) | Training method of multimedia data description model and generation method of description information | |
CN116739071A (en) | Model training method and related device | |
CN111783688B (en) | Remote sensing image scene classification method based on convolutional neural network | |
CN112905754B (en) | Visual dialogue method and device based on artificial intelligence and electronic equipment | |
CN113407820A (en) | Model training method, related system and storage medium | |
CN116975347A (en) | Image generation model training method and related device | |
CN116434058A (en) | Image description generation method and system based on visual text alignment | |
CN115824213A (en) | Visual language navigation method based on follower model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |