CN112710310A - Visual language indoor navigation method, system, terminal and application - Google Patents

Visual language indoor navigation method, system, terminal and application Download PDF

Info

Publication number
CN112710310A
CN112710310A CN202011428332.1A CN202011428332A CN112710310A CN 112710310 A CN112710310 A CN 112710310A CN 202011428332 A CN202011428332 A CN 202011428332A CN 112710310 A CN112710310 A CN 112710310A
Authority
CN
China
Prior art keywords
visual
information
language
robot
indoor navigation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011428332.1A
Other languages
Chinese (zh)
Other versions
CN112710310B (en
Inventor
张世雄
李楠楠
龙仕强
朱鑫懿
魏文应
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Original Assignee
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Instritute Of Intelligent Video Audio Technology Longgang Shenzhen filed Critical Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority to CN202011428332.1A priority Critical patent/CN112710310B/en
Publication of CN112710310A publication Critical patent/CN112710310A/en
Application granted granted Critical
Publication of CN112710310B publication Critical patent/CN112710310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention belongs to the technical field of visual language navigation, and discloses a visual language indoor navigation method, a system, a terminal and application. The invention combines the visual information of the robot and the information of the natural language to carry out the indoor navigation of the robot, and adopts the attention mechanism to enable the robot to more effectively understand the language instruction of the human and combine the visual information, so that the robot can reach the destination according to the instruction of the human to complete the task. The invention mainly designs an attention mechanism which can effectively combine natural language and visual information to realize that the robot finds an optimal path in an unknown room.

Description

Visual language indoor navigation method, system, terminal and application
Technical Field
The invention belongs to the technical field of visual language navigation, and particularly relates to a visual language indoor navigation method, a system, a terminal and application.
Background
At present: the visual language navigation technology is a recently developed intelligent navigation method, and the navigation task requires that the robot reaches a specified target technology from an initial random position by using self-acquired visual image information under a given language instruction. For example, giving the robot a command "go straight down the hallway, enter the right bedroom, stop at the bedside of the bedroom", the robot follows the command, in combination with his own observations, to adjust the direction of progress continuously until the destination is reached. The method can be widely applied to a plurality of scenes such as unmanned vehicles, intelligent robots, unmanned delivery dining cars and the like. Unlike the task based on visual navigation, the visual language based navigation requires the use of comprehensive natural language information and computer visual information, and the robot continuously interacts with the acquired environment to acquire necessary information of the environment, thereby completing the designated task given by human. After integrating the elements of natural language information and computer vision information, the agent needs to plan its own actions.
Through the above analysis, the problems and defects of the prior art are as follows: in the prior art, on one hand, the computing power requirement is improved due to complex data, on the other hand, the key information is difficult to extract due to input information of multiple dimensions, and meanwhile, the problem of complexity and high degree of a network is also needed to be faced, so that the accuracy and efficiency of extracting the information are reduced.
The difficulty in solving the above problems and defects is: the main difficulties in solving the problems are: the system is complex, the information input dimensionality is high, the method particularly relates to the field of two artificial intelligence branches of natural language processing and computer vision, the improvement difficulty is high, and certain challenges are achieved.
The significance of solving the problems and the defects is as follows: the problems that the information is complicated and the key information cannot be extracted are solved, the complexity of calculation can be effectively reduced, the navigation effect is improved, the interference of noise and useless features on the model is reduced, the efficiency of the model is improved, and the accuracy of the model is increased.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a visual language indoor navigation method, a system, a terminal and application.
The invention is realized in such a way that the visual language indoor navigation method combines natural language commands and visual information by using a sequence-to-sequence method, respectively extracts the characteristics of the natural language command information and the visual image information, and respectively screens the attention characteristics of the extracted characteristics after completing the characteristic extraction to screen out the key information related to a task.
Furthermore, the visual language indoor navigation method carries out fusion coding on natural language command information and visual image information, and enables a depth model to pay attention to certain local information; the method comprises the steps of selectively screening out local information from a large amount of information, focusing on the local information, encoding a characteristic vector, decoding the vector, and decoding to obtain a command of the robot action.
Further, the visual language indoor navigation method specifically comprises the following steps:
firstly, initializing, namely inputting a language description instruction into a robot, wherein the robot is positioned at an initial position;
secondly, extracting natural language features of the language description instruction by using the LSTM;
thirdly, extracting key information of the language description instruction by using a natural language attention mechanism, and screening out interference of irrelevant information;
fourthly, extracting the visual features of the computer by using a CNN convolutional neural network for the acquired image;
fifthly, extracting visual key information from the acquired visual features in the fourth step by using a visual attention mechanism;
sixthly, mutually fusing the extracted visual key information in the fifth step and the key information of the language description instruction in the third step;
seventhly, extracting key information of the features fused in the sixth step by using an attention mechanism again;
eighthly, decoding and evaluating the key information obtained from the seventh step to obtain the advancing direction of the robot;
the ninth step, repeat the second step-the eighth step;
and step ten, reaching the destination and stopping advancing.
Further, the visual language indoor navigation method adopts a classical convolution neural network ResNet-50 network to extract features, before the ResNet-50 network extracts the features, the data of an international known image data set ImageNet is pre-trained, the trained ResNet-50 network is used for extracting feature vectors, and the feature vectors V of the panoramic image observed by the robot at the time tt
Extracting attention feature vector v by using attention mechanismt
vt=attention(Ht-1,Vt);
After evolution:
vt=∑jsoftmax(Ht-1Wh(WvVt)T)Vt
Ht=LSTM([VtAt-1],Ht-1):
wherein v istRepresenting the feature vector, V, extracted by the attention mechanismtFeature vector, H, representing trained ResNet-50 network extractiont-1Represents the historical feature vector at time t-1 and HtThen represents the historical feature vector at time t, AtAnd At-1Respectively representing the actions taken by the machine at time t and at time t-1, WhAnd WυA weight matrix is represented.
Further, the visual language indoor navigation method is used for inputting a string of natural language instructions W (W)1,w2,w3...), the natural language instruction is composed of a string of words, features are extracted by using a long-short term memory neural network LSTM, and natural feature extraction is performed by using a C ═ LSTM (W), wherein C is the extracted features of the natural language, and the natural language features are re-extracted by using an attention mechanism, and the formal expression is that:
Yt=attention((Ht,C)。
further, the visual language indoor navigation method comprises the steps of respectively coding and extracting the robot visual information and the natural language information, then carrying out fusion attention extraction on the feature vectors of the robot visual information and the natural language information, fusing all extracted information and robot historical information, evaluating the next step of the robot, determining the probability P of the advancing direction, and determining the most walking direction of the robot according to the maximum probability:
Dt=attention(Yt,vt);
P=softmax([Ht,vt,Yt,Dt]WcWb);
wherein DtRepresenting the fused feature vector, P representing the probability of the heading, WcAnd WbRespectively, represent a weight matrix.
The invention also aims to provide a robot visual language navigation information data processing terminal which is used for realizing the visual language indoor navigation method.
Another object of the present invention is to provide a visual language indoor navigation system implementing the visual language indoor navigation method, the visual language indoor navigation system comprising:
a command and information combining module for combining natural language commands with visual information using a sequence-to-sequence approach;
the characteristic extraction module is used for respectively extracting the characteristics of the natural language command information and the visual image information;
and the key information screening module is used for screening the attention characteristics of the extracted characteristics respectively after completing the characteristic extraction, and screening the key information related to the task.
By combining all the technical schemes, the invention has the advantages and positive effects that: the attention mechanism is a method for taking human attention as a reference, when the human brain processes visual information, the human brain can quickly scan a global image to acquire a key area needing attention, and the efficiency and the accuracy of visual processing are greatly improved. The attention mechanism aims to select key information with important meanings from a plurality of information, is used for reference by natural language processing for the first time, and aims to screen out phrases with important semantics. Since then, attention mechanisms have been widely used in many scenarios, such as speech recognition, image processing, and the like. The invention combines the visual information of the robot and the information of the natural language to carry out the indoor navigation of the robot, and adopts the attention mechanism to enable the robot to more effectively understand the language instruction of the human and combine the visual information, so that the robot can reach the destination according to the instruction of the human to complete the task. The invention mainly designs an attention mechanism which can effectively combine natural language and visual information to realize that the robot finds an optimal path in an unknown room.
The visual language indoor navigation task provided by the invention needs to combine natural language command information and visual image information, and has large data volume and more related key information, so that if an attention mechanism is not used, the computational power requirement caused by complicated data is improved, and the problem of high complexity of the network needs to be faced. In order to improve the accuracy and efficiency of information extraction, the invention provides a visual language indoor navigation method based on an attention mechanism.
The invention carries out fusion coding on natural language command information and visual image information, and leads a depth model to pay attention to certain local information. The method comprises the steps of selectively screening out local information from a large amount of information, focusing on the local information, encoding a characteristic vector, decoding the vector, and decoding to obtain a command of the robot action. The attention mechanism is adopted in the process of encoding the feature vector, the extraction of the attention mechanism of the features of natural language and computer vision is different, and the fused features also need attention to extract key information, so that the encoding efficiency is higher, and the extracted information is more valuable.
The invention provides an advanced indoor navigation method of a robot in visual language, which effectively combines the combination of natural language commands and visual information, and can lead the robot to reach a destination in unknown indoor space according to human commands, thus leading the navigation to be more close to the application of a real scene. The invention designs an attention mechanism, and the attention mechanism can refine the language features and the visual features and refine the obtained information because a large amount of natural language information and visual information need to be obtained in the visual language indoor navigation, so that the obtained features are more precise. The interference of noise and useless characteristics to the model is reduced, the efficiency of the model is improved, and the accuracy of the model is increased.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a flowchart of a visual language indoor navigation method according to an embodiment of the present invention.
FIG. 2 is a schematic structural diagram of a visual language indoor navigation system provided by an embodiment of the present invention;
in the figure: 1. a command and information combining module; 2. a feature extraction module; 3. and a key information screening module.
Fig. 3 is a flowchart of an implementation of a visual language indoor navigation method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a visual language indoor navigation method, a system, a terminal and application thereof, and the invention is described in detail with reference to the accompanying drawings.
As shown in fig. 1, the visual language indoor navigation method provided by the present invention comprises the following steps:
s101: combining natural language commands with visual information by a sequence-to-sequence method;
s102: respectively extracting the characteristics of the natural language command information and the visual image information;
s103: after the feature extraction is completed, the attention features are respectively screened for the extracted features, and key information related to the task is screened out.
Those skilled in the art can also implement the visual language indoor navigation method provided by the present invention by adopting other steps, and the visual language indoor navigation method provided by the present invention in fig. 1 is only one specific embodiment.
As shown in fig. 2, the visual language indoor navigation system provided by the present invention comprises:
a command and information combining module 1 for combining natural language commands and visual information by a sequence-to-sequence method;
the characteristic extraction module 2 is used for respectively extracting the characteristics of the natural language command information and the visual image information;
and the key information screening module 3 is used for screening attention features of the extracted features respectively after feature extraction is completed, and screening key information related to the task.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
As shown in fig. 3, the method provided by the present invention is mainly applied to a robot language vision navigation module, and does not relate to the design of the whole robot, and the current implementation method mainly depends on a computer to simulate the module, and specifically includes the following steps:
firstly, initializing, namely inputting a language description instruction into a robot, wherein the robot is positioned at an initial position;
secondly, extracting natural language features of the language description instruction by using the LSTM;
thirdly, extracting key information of the language description instruction by using a natural language attention mechanism, and screening out interference of irrelevant information;
fourthly, extracting the visual features of the computer by using a CNN convolutional neural network for the acquired image;
fifthly, extracting visual key information from the acquired visual features in the fourth step by using a visual attention mechanism;
sixthly, mutually fusing the extracted visual key information in the fifth step and the key information of the language description instruction in the third step;
seventhly, extracting key information of the features fused in the sixth step by using an attention mechanism again;
eighthly, decoding and evaluating the key information obtained from the seventh step to obtain the advancing direction of the robot;
the ninth step, repeat the second step-the eighth step;
and step ten, reaching the destination and stopping advancing.
In the invention, for computer vision images, a ResNet-50 network is adopted to extract features, before the ResNet-50 extracts the features, the data subjected to ImageNet is pre-trained, the trained ResNet-50 is used for extracting feature vectors, and for the feature vector V of the panoramic image observed by the robot at the time ttIn a word:
extracting attention feature vector v by using attention mechanismt
vt=attention(Ht-1,Vt) (1)
After evolution:
vt=∑jsoftmax(Ht-1Wh(WvVt)T)Vt (2)
Ht=LSTM([VtAt-1],Ht-1) (3)
wherein v istRepresenting the feature vector, V, extracted by the attention mechanismtFeature vector, H, representing trained ResNet-50 network extractiont-1Represents the historical feature vector at time t-1 and HtThen represents the historical feature vector at time t, AtAnd At-1Respectively representing the actions taken by the machine at time t and at time t-1, WhAnd WvA weight matrix is represented.
In the present invention, for an input string of natural language instructions W (W)1,w2,w3...), the natural language instruction is composed of a string of words, features are extracted by using LSTM, and natural feature extraction is performed by using C ═ LSTM (w), where C is the feature extracted from the natural language, and the natural language features need to be re-extracted by using attention mechanism, and the formal expression is:
Yt=attention((Ht,C) (4)
in the invention, after the robot visual information and the natural language information are respectively coded and extracted, because the natural language information is highly described to the visual information, the correlation between the visual information and the natural language information is good, then the feature vectors of the visual information and the natural language information are fused and extracted, then all the extracted information are fused and added with the history information of the robot, the step carried out by the next robot is evaluated, the probability P of the advancing direction is evaluated, and the most walking direction of the robot is determined according to the maximum probability:
Dt=attention(Yt,vt) (5)
P=softmax([Ht,vt,Yt,Dt]WcWb) (6)
wherein DtRepresenting the fused feature vector, P representing the probability of the heading, WcAnd WbRespectively, represent a weight matrix.
The effect of the invention is tested on the disclosed simulation data set R2R, the data set collects 99 data of different scenes, and the test result shows that the method provided by the invention obviously improves the navigation performance, and the test effect of the invention is as shown in Table 1.
TABLE 1 test results
Method TL↓ NE↓ OSR↑ SR↑ SPL↑
Seq2Seq 8.40 3.67 0.43 0.25 0.35
RCM 10.65 3.53 0.75 0.46 0.43
Our 7.86 3.54 0.78 0.53 0.58
In table 1, Our represents the method proposed by the present invention, seq2seq represents the existing basic navigation method, and RCM represents other well-known navigation methods. TL is path length evaluation, NE is navigation error evaluation, OSR is database success rate, SR is success rate, SPL is weighting success rate of reverse path length, and the five indexes are accepted indexes for internationally evaluating navigation accuracy. The arrow points downwards to indicate that the smaller the value is better under the evaluation criteria, and the opposite is true when the arrow points upwards to indicate that the larger the value is better under the evaluation criteria, and the bold font indicates that the best results are obtained. It can be seen from the table that, among the five evaluation indexes, four indexes are the best one obtained by the method provided by the invention, and one index is the second best one obtained by the method provided by the invention.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A visual language indoor navigation method is characterized in that a sequence-to-sequence method is utilized, natural language commands and visual information are combined, feature extraction is respectively carried out on natural language command information and visual image information, after the feature extraction is completed, attention features are respectively screened on the extracted features, and key information related to tasks is screened out.
2. The visual language indoor navigation method of claim 1, wherein the visual language indoor navigation method performs fusion coding of natural language command information and visual image information to let a depth model pay attention to a certain local information; the method comprises the steps of selectively screening out local information from a large amount of information, focusing on the local information, encoding a characteristic vector, decoding the vector, and decoding to obtain a command of the robot action.
3. The visual language indoor navigation method of claim 1, wherein the visual language indoor navigation method specifically comprises:
firstly, initializing, namely inputting a language description instruction into a robot, wherein the robot is positioned at an initial position;
secondly, extracting natural language features of the language description instruction by using the LSTM;
thirdly, extracting key information of the language description instruction by using a natural language attention mechanism, and screening out interference of irrelevant information;
fourthly, extracting the visual features of the computer by using a CNN convolutional neural network for the acquired image;
fifthly, extracting visual key information from the acquired visual features in the fourth step by using a visual attention mechanism;
sixthly, mutually fusing the extracted visual key information in the fifth step and the key information of the language description instruction in the third step;
seventhly, extracting key information of the features fused in the sixth step by using an attention mechanism again;
eighthly, decoding and evaluating the key information obtained from the seventh step to obtain the advancing direction of the robot;
the ninth step, repeat the second step-the eighth step;
and step ten, reaching the destination and stopping advancing.
4. The visual language indoor navigation method of claim 3, wherein the visual language indoor navigation method adopts a ResNet-50 network for feature extraction, the data of ImageNet is pre-trained before the ResNet-50 network extracts features, the trained ResNet-50 network is used for extracting feature vectors, and the feature vectors V of the panoramic image observed by the robot at the time t aret
Extracting attention feature vector v by using attention mechanismt
vt=attention(Ht-1,Vt);
After evolution:
vt=∑jsoftmax(Ht-1Wh(WvVt)T)Vt
Ht=LSTM([VtAt-1],Ht-1);
wherein v istRepresenting the feature vector, V, extracted by the attention mechanismtFeature vector, H, representing trained ResNet-50 network extractiont-1Represents the historical feature vector at time t-1 and HtThen represents the historical feature vector at time t, AtAnd At-1Respectively representing the actions taken by the machine at time t and at time t-1, WhAnd WvA weight matrix is represented.
5. The visual language indoor navigation method of claim 3, wherein the visual language indoor navigation method is applied to an input string of natural language instructions W (W)1,w2,w3...), the natural language instruction is composed of a string of words, features are extracted by using LSTM, and natural feature extraction is performed by using C ═ LSTM (W), wherein C is the extracted features of the natural language, and the natural language features are re-extracted by using an attention mechanism, and the formal expression is that:
Yt=attention((Ht,C)。
6. the visual language indoor navigation method of claim 3, wherein the visual language indoor navigation method comprises the steps of respectively encoding and extracting the visual information and the natural language information of the robot, then performing fusion attention extraction on the feature vectors of the visual information and the natural language information of the robot, fusing all the extracted information and the historical information of the robot, evaluating the next step performed by the robot, estimating the probability P of the advancing direction, and determining the direction which the robot should most move according to the maximum probability:
Dt=attention(Yt,vt);
P=softmax([Ht,vt,Yt,Dt]WcWb);
wherein DtRepresenting the fused feature vector, P representing the probability of the heading, WcAnd WbRespectively, represent a weight matrix.
7. A robot visual language navigation information data processing terminal, characterized in that the robot visual language navigation information data processing terminal is used for realizing the visual language indoor navigation method of any one of claims 1 to 6.
8. A visual language indoor navigation system for implementing the visual language indoor navigation method according to any one of claims 1 to 6, wherein the visual language indoor navigation system comprises:
a command and information combining module for combining natural language commands with visual information using a sequence-to-sequence approach;
the characteristic extraction module is used for respectively extracting the characteristics of the natural language command information and the visual image information;
and the key information screening module is used for screening the attention characteristics of the extracted characteristics respectively after completing the characteristic extraction, and screening the key information related to the task.
CN202011428332.1A 2020-12-07 2020-12-07 Visual language indoor navigation method, system, terminal and application Active CN112710310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011428332.1A CN112710310B (en) 2020-12-07 2020-12-07 Visual language indoor navigation method, system, terminal and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011428332.1A CN112710310B (en) 2020-12-07 2020-12-07 Visual language indoor navigation method, system, terminal and application

Publications (2)

Publication Number Publication Date
CN112710310A true CN112710310A (en) 2021-04-27
CN112710310B CN112710310B (en) 2024-04-19

Family

ID=75542756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011428332.1A Active CN112710310B (en) 2020-12-07 2020-12-07 Visual language indoor navigation method, system, terminal and application

Country Status (1)

Country Link
CN (1) CN112710310B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420606A (en) * 2021-05-31 2021-09-21 华南理工大学 Method for realizing autonomous navigation of robot based on natural language and machine vision
CN113670310A (en) * 2021-07-27 2021-11-19 际络科技(上海)有限公司 Visual voice navigation method, device, equipment and storage medium
CN113984052A (en) * 2021-06-16 2022-01-28 北京小米移动软件有限公司 Indoor navigation method, indoor navigation device, equipment and storage medium
CN115082915A (en) * 2022-05-27 2022-09-20 华南理工大学 Mobile robot vision-language navigation method based on multi-modal characteristics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130237179A1 (en) * 2012-03-08 2013-09-12 Rajesh Chandra Potineni System and method for guided emergency exit
CN108245384A (en) * 2017-12-12 2018-07-06 清华大学苏州汽车研究院(吴江) Binocular vision apparatus for guiding blind based on enhancing study
CN108981712A (en) * 2018-08-15 2018-12-11 深圳市烽焌信息科技有限公司 Robot goes on patrol method and robot
CN109491613A (en) * 2018-11-13 2019-03-19 深圳龙岗智能视听研究院 A kind of continuous data protection storage system and its storage method using the system
CN110348462A (en) * 2019-07-09 2019-10-18 北京金山数字娱乐科技有限公司 A kind of characteristics of image determination, vision answering method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130237179A1 (en) * 2012-03-08 2013-09-12 Rajesh Chandra Potineni System and method for guided emergency exit
CN108245384A (en) * 2017-12-12 2018-07-06 清华大学苏州汽车研究院(吴江) Binocular vision apparatus for guiding blind based on enhancing study
CN108981712A (en) * 2018-08-15 2018-12-11 深圳市烽焌信息科技有限公司 Robot goes on patrol method and robot
CN109491613A (en) * 2018-11-13 2019-03-19 深圳龙岗智能视听研究院 A kind of continuous data protection storage system and its storage method using the system
CN110348462A (en) * 2019-07-09 2019-10-18 北京金山数字娱乐科技有限公司 A kind of characteristics of image determination, vision answering method, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘文清: "基于AI开放平台的机器视觉应用开发技术", 湖南电力, vol. 39, no. 6, pages 13 - 15 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420606A (en) * 2021-05-31 2021-09-21 华南理工大学 Method for realizing autonomous navigation of robot based on natural language and machine vision
CN113984052A (en) * 2021-06-16 2022-01-28 北京小米移动软件有限公司 Indoor navigation method, indoor navigation device, equipment and storage medium
CN113984052B (en) * 2021-06-16 2024-03-19 北京小米移动软件有限公司 Indoor navigation method, indoor navigation device, equipment and storage medium
CN113670310A (en) * 2021-07-27 2021-11-19 际络科技(上海)有限公司 Visual voice navigation method, device, equipment and storage medium
CN113670310B (en) * 2021-07-27 2024-05-31 际络科技(上海)有限公司 Visual voice navigation method, device, equipment and storage medium
CN115082915A (en) * 2022-05-27 2022-09-20 华南理工大学 Mobile robot vision-language navigation method based on multi-modal characteristics
CN115082915B (en) * 2022-05-27 2024-03-29 华南理工大学 Multi-modal feature-based mobile robot vision-language navigation method

Also Published As

Publication number Publication date
CN112710310B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN112710310A (en) Visual language indoor navigation method, system, terminal and application
CN113705769B (en) Neural network training method and device
CN110766044B (en) Neural network training method based on Gaussian process prior guidance
US20220004744A1 (en) Human posture detection method and apparatus, device and storage medium
CN108960407B (en) Recurrent neural network language model training method, device, equipment and medium
US11915128B2 (en) Neural network circuit device, neural network processing method, and neural network execution program
CN112580369B (en) Sentence repeating method, method and device for training sentence repeating model
Mishra et al. The understanding of deep learning: A comprehensive review
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
CN109817276A (en) A kind of secondary protein structure prediction method based on deep neural network
CN114676234A (en) Model training method and related equipment
CN111626134B (en) Dense crowd counting method, system and terminal based on hidden density distribution
CN112733768A (en) Natural scene text recognition method and device based on bidirectional characteristic language model
CN111259735B (en) Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network
CN110807335A (en) Translation method, device, equipment and storage medium based on machine learning
CN113159236A (en) Multi-focus image fusion method and device based on multi-scale transformation
US11948078B2 (en) Joint representation learning from images and text
CN113747168A (en) Training method of multimedia data description model and generation method of description information
CN116739071A (en) Model training method and related device
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
CN112905754B (en) Visual dialogue method and device based on artificial intelligence and electronic equipment
CN113407820A (en) Model training method, related system and storage medium
CN116975347A (en) Image generation model training method and related device
CN116434058A (en) Image description generation method and system based on visual text alignment
CN115824213A (en) Visual language navigation method based on follower model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant