CN113627249B

CN113627249B - Navigation system training method and device based on contrast learning and navigation system

Info

Publication number: CN113627249B
Application number: CN202110759056.5A
Authority: CN
Inventors: 梁小丹; 龙衍鑫; 林冰倩
Original assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Current assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2023-04-28
Anticipated expiration: 2041-07-05
Also published as: CN113627249A

Abstract

The invention discloses a navigation system training method and device based on contrast learning and a navigation system, wherein the method comprises the following steps: collecting modal information of an intelligent agent in different modes when the intelligent agent moves, and encoding the modal information into feature vectors; when determining that the intelligent agent stops moving, acquiring a hidden state vector according to the characteristic vector; performing track coding on the hidden state vector to obtain track coding data; and calling a training model for performing contrast resistance learning on the track coding data by using preset obstacle scene track data and preset barrier-free scene track data to obtain a navigation training system. The invention can optimize the long-term planning capability of the intelligent body through contrast learning and training, can greatly improve the robustness of the navigator under the obstacle condition and improve the navigation accuracy.

Description

Navigation system training method and device based on contrast learning and navigation system

Technical Field

The invention relates to the technical field of visual language navigation, in particular to a navigation system training method and device based on contrast learning and a navigation system.

Background

The visual language navigation refers to that an intelligent body is guided along with natural language instructions, meanwhile, the instructions and image information visible in the visual angle are understood, then the state of the intelligent body is adjusted and repaired in the environment, corresponding actions are made, and finally the intelligent body reaches the target position.

For successful navigation, visual language navigation allows the agent to understand the intent of the instruction and gradually base the instruction on surrounding observations, in turn making the correct action decisions to move in dynamically changing scenes. The current common training method is to design a cross modal information alignment module for training, or train through a data enhancement strategy or train through a learning example, so that the navigation accuracy of the visual language navigation system is improved.

However, the training methods commonly used at present have the following technical problems: because the training modes are all to presume obstacle points to train, the intelligent agent uses the predefined candidate operation space to navigate without interference, but in the actual use process, unexpected obstacles (road shoulders, road blocks or various furniture buildings and the like) can appear at different positions, so that the intelligent agent can not find a driving path under the unexpected obstacles, and the intelligent agent can not reach the final navigation point.

Disclosure of Invention

The invention provides a navigation system training method, a device and a navigation system based on contrast learning, wherein the method can lead visual language navigation to carry out contrast learning training in an environment without barriers and under barrier conditions so as to improve the accuracy of navigation of an intelligent body under different conditions.

A first aspect of an embodiment of the present invention provides a navigation system training method based on contrast learning, the method including:

collecting modal information of an intelligent agent in different modes when the intelligent agent moves, and encoding the modal information into feature vectors;

when determining that the intelligent agent stops moving, acquiring a hidden state vector according to the characteristic vector;

performing track coding on the hidden state vector to obtain track coding data;

and calling preset obstacle scene track data and preset barrier-free scene track data to perform a training model of contrast resistance learning on the track coding data to obtain a navigation training system, wherein the preset obstacle scene track data is navigation track data of an agent under the barrier condition, and the preset barrier-free scene track data is navigation track data of the agent under the barrier-free condition.

In a possible implementation manner of the first aspect, the invoking a training model for performing contrast learning on the track encoded data by using preset obstacle scene track data and preset no-obstacle scene track data includes:

performing contrast training on the preset obstacle scene track data and the preset barrier-free scene track data to respectively obtain training obstacle scene track data and training barrier-free scene track data;

the training obstacle scene track data and the training obstacle-free scene track data are zoomed in through a comparison loss function and a gradient descent algorithm to obtain zoomed-in track data;

and training the track coding data by utilizing the zoomed-in track data.

In a possible implementation manner of the first aspect, the performing contrast training on the preset obstacle-free scene track data and the preset obstacle track data includes:

and performing contrast training by taking the preset obstacle scene track data as a negative sample of the preset barrier-free scene track data.

In a possible implementation manner of the first aspect, the modality information includes instruction information composed of natural language and image information acquired by a plurality of viewpoints, and the feature vectors include language feature vectors and visual feature vectors;

The encoding the modality information into feature vectors includes:

calculating the instruction information by adopting a preset two-way long-short-term memory network to obtain an instruction feature vector;

and calculating the image information by using a preset convolutional neural network to obtain a visual feature vector, wherein the preset convolutional neural network is obtained by training an ImageNet data set.

In a possible implementation manner of the first aspect, the obtaining a hidden state vector according to the feature vector includes:

and inputting the characteristic vector into a preset hidden long-short-term memory neural network to be converted into a hidden state vector.

In a possible implementation manner of the first aspect, the performing track encoding on the hidden state vector to obtain track encoded data includes:

acquiring a plurality of hidden state vectors when an intelligent agent starts to move to stop;

forming the plurality of hidden state vectors into a serialized hidden state set;

and carrying out track coding on the hidden state set by adopting a preset coding long-period and short-period memory network to obtain track coding data.

In a possible implementation manner of the first aspect, the method further includes:

gradient calculation is carried out on the navigation training system by adopting a learning loss function, so as to obtain gradient parameters;

And carrying out parameter updating of gradient calculation by adopting the gradient parameters.

In a possible implementation manner of the first aspect, before the step of determining that the agent stops moving, the method further includes:

performing coding calculation on the feature vector to obtain a navigation probability value set of the moving of the agent to N directions, wherein the navigation probability value set comprises N corresponding movement probability values and a stopping probability value, each movement probability value corresponds to a movement direction, and N is a positive integer greater than or equal to 0;

selecting the probability value with the largest numerical value from the navigation probability value set as a target probability value;

if the target probability value is any one of the N movement probability values, controlling the intelligent body to move according to the movement direction corresponding to the target probability value;

and if the target probability value is the stopping probability value, controlling the intelligent agent to stop moving.

A second aspect of an embodiment of the present invention provides a navigation system training device based on contrast learning, the device including:

the acquisition module is used for acquiring the modal information of the intelligent agent in different modes when the intelligent agent moves, and encoding the modal information into feature vectors;

The acquisition module is used for acquiring a hidden state vector according to the characteristic vector when the intelligent agent is determined to stop moving;

the encoding module is used for carrying out track encoding on the hidden state vector to obtain track encoded data;

the training module is used for calling preset obstacle scene track data and preset barrier-free scene track data to perform contrast learning on the track coding data to obtain a navigation training system, wherein the preset obstacle scene track data is navigation track data of an agent under barrier conditions, and the preset barrier-free scene track data is navigation track data of the agent under barrier-free conditions.

A third aspect of the embodiments of the present invention provides a navigation system based on contrast learning, including:

the system comprises a visual language coding module, an action decoding module, a navigation module, a track coding module and an antagonism contrast learning module which are connected in sequence;

the visual language coding module is used for collecting instruction information and image information, converting the instruction information into language feature vectors and converting the image information into visual feature vectors;

The action decoding module is used for generating a hidden state vector and a navigation probability value set by adopting the language feature vector and the visual feature vector;

the navigation module is used for controlling the intelligent body to move or stop by adopting the navigation probability value set;

the track coding module is used for converting the hidden state vector into track coding data;

the contrast learning module is used for performing contrast training on the track coding data to generate a navigation model.

Compared with the prior art, the navigation system training method, the device and the navigation system based on contrast learning have the beneficial effects that: the intelligent navigation system can enable the intelligent body to acquire instruction information and image information under different modes in the moving process, convert the instruction information and the image information in the moving process into the data of the specified track to be moved after stopping moving, and conduct contrast-resistant learning and training on the moving track data, so that the long-term planning capability of the intelligent body is optimized, the robustness of the navigator under the obstacle condition can be greatly improved, and the navigation accuracy is improved.

Drawings

FIG. 1 is a flow chart of a method for training a navigation system based on contrast learning according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating operations for contrast training according to one embodiment of the present invention;

FIG. 3 is a comparative schematic diagram of a navigation result sample provided by an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a navigation system training device based on contrast learning according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a navigation system based on contrast learning according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The currently common training mode has the following technical problems: because the training modes are all to presume obstacle points to train, the intelligent agent uses the predefined candidate operation space to navigate without interference, but in the actual use process, unexpected obstacles (road shoulders, road blocks or various furniture buildings and the like) can appear at different positions, so that the intelligent agent can not find a driving path under the unexpected obstacles, and the intelligent agent can not reach the final navigation point.

In order to solve the above-mentioned problems, a navigation system training method based on contrast learning provided in the embodiments of the present application will be described and illustrated in detail by the following specific examples.

Referring to fig. 1, a flowchart of a navigation system training method based on contrast learning is shown in an embodiment of the present invention.

The method for training the navigation system based on contrast learning can be applied to a navigation system which can be arranged in an agent to control the movement of the agent in a navigation way.

Wherein, as an example, the navigation system training method based on contrast learning may include:

s11, collecting the modal information of the intelligent agent in different modes when the intelligent agent moves, and encoding the modal information into feature vectors.

In order to improve training efficiency, in this embodiment, the agent may be set in a preset scene, and the scene may include a plurality of modifiable or movable obstacle points, so that the scene may exhibit different modes.

It should be noted that different modes refer to movement states of the agent in different modes of the scene.

After information of the intelligent agent in different modes is collected, the information can be encoded to obtain a feature vector, so that subsequent training can be carried out through the feature vector.

In this embodiment, the modality information includes instruction information composed of natural language and image information acquired by a plurality of viewpoints, and the feature vector includes a language feature vector and a visual feature vector.

The instruction information may be voice data that instructs the agent to move. The image information may be image data of the agent in different modalities.

As an example, step S11 may include the following sub-steps:

and S111, calculating the instruction information by adopting a preset two-way long-short-term memory network to obtain an instruction feature vector.

In a specific implementation, the instruction information composed of natural language is i= { w _o ,…, _l -w is _i Is a word or symbol and l is the length of the sentence. The instruction information may include stepwise instruction contents from the start point to the target. The instruction information I can be calculated by utilizing a two-way long-short-term memory network to obtain the feature vector of the language, which is specifically calculated as

And step S112, calculating the image information by using a preset convolutional neural network to obtain a visual feature vector, wherein the preset convolutional neural network is obtained by training an ImageNet data set.

In this embodiment, a certain number of discrete view points may be set in the scene, and images of the agent under different modes may be acquired according to the view points, and it may also be determined whether the agent has reached the endpoint.

In a specific implementation, panoramic image information of a current point is acquired at different view angles according to a first person view angle to obtain image information; the shortest path between them is then taken, navigation is started from the start point in the manner of a first person perspective and then the path is manually described, which generates corresponding instruction information, for example, "Exit the bedroom, cross the room to the left and stop near the room divider on the left.

When the panoramic visual image information is obtained, the image information can be divided into a plurality of pictures at equal intervals, and then the images are encoded by using a convolutional neural network which is pre-trained on an image data set to obtain visual feature vectors.

In particular, the image information may be divided into 36 pictures

Each picture may contain an RGB map b _t,i And its direction->

Wherein->

And->

Heading angle and elevation angle, respectively. Then, the convolutional neural network pre-trained on the ImageNet data set is used for respectively calculating 36 pictures to obtain visual eigenvectors +.>

If the agent performs training once every moving direction, the processing time of the data is prolonged, the training efficiency is reduced, and the movement in a single direction is difficult to represent the state of the agent in the whole moving path, in order to perfect the training operation, in one embodiment, before step S12, the method may further include the following steps:

And S21, performing coding calculation on the feature vectors to obtain a navigation probability value set of the intelligent body moving towards N directions, wherein the navigation probability value set comprises N corresponding movement probability values and a stopping probability value, each movement probability value corresponds to a movement direction, and N is a positive integer greater than or equal to 0.

Specifically, the language feature vector and the visual feature vector may be input to the first attention mechanism module, the direction in which N agents can navigate and move and a stopping direction (the stopped feature vector is denoted by all 1) are output, then the direction in which N agents can navigate and move and a stopping direction are input to the second attention mechanism module, i.e. n+1 probability values including N movement probability values corresponding to the N directions and a stopping probability value corresponding to the stopping direction are output, and finally n+1 probability values are integrated into a navigation probability value set.

And S22, selecting the probability value with the largest numerical value from the navigation probability value set as a target probability value.

And S23, if the target probability value is any one of the N movement probability values, controlling the intelligent body to move according to the movement direction corresponding to the target probability value.

And step S24, if the target probability value is the stopping probability value, controlling the intelligent body to stop moving.

Then, the probability value with the largest value is selected from the N+1 probability values to be the target probability value of the last operation; if the target probability value is any one of the N movement probability values, the intelligent agent is controlled to move continuously according to the movement direction corresponding to the target probability value; and if the target probability value is the stopping probability value, controlling the intelligent agent to stop moving.

In actual operation, the agent may receive or collect instruction information and image information at each time t to generate a hidden state vector h _t And motion decision vector a _t . Then by hiding the state vector h _t The agent can predict the next action a ^′ _t (i.e., the next viewpoint it will move to). Next, the next a is identified in the obstacle scene ^′ _t Whether it is an obstacle point. If a is ^′ _t Is an obstacle point, the agent will again make action decisions or stop for the remaining navigable points. By this, the forced agent can search other paths of the target, and if stopped, all hidden state vectors can be integrated into a hidden state set

The hidden state set is input to a track encoder to generate track encoded data.

And S12, when the fact that the intelligent agent stops moving is determined, acquiring a hidden state vector according to the feature vector.

If the agent stops moving, it is determined that the agent may encounter an obstacle such that the agent's current location has no direction of movement, or the agent has reached the destination. When the intelligent agent stops moving, the hidden state vector of the intelligent agent in the moving process can be obtained according to the language feature vector and the visual feature vector, so that the hidden state vector can be obtained according to the hidden state vector.

In an alternative embodiment, since the agent may perform movement in multiple directions during the movement process, the hidden state vector corresponding to the present feature vector may be collected after each feature vector is obtained.

In order to improve the processing efficiency of the data, in an alternative embodiment, step S12 may include the following sub-steps:

and S121, inputting the characteristic vector into a preset hidden long-short-term memory neural network to convert the characteristic vector into a hidden state vector.

In actual operation, at each moving moment, the language feature vector and the visual feature vector can be respectively input into the first attention mechanism module, and then the output result of the first attention mechanism module is input into a preset hidden long-short-period memory neural network to obtain a hidden state vector.

Each hidden state vector may implicitly represent a fusion of the visual feature vector and the linguistic feature vector at that time.

S13, carrying out track coding on the hidden state vector to obtain track coding data.

Because the instruction information describes the navigation track of the intelligent agent, the path of the intelligent agent after the intelligent agent navigates according to the instruction information is track coding data, and the track coding data are serialized, and specifically comprise all visual information and language information on the track. The hidden state vector is a fusion condition comprising a visual feature vector and a language feature vector at each moment, and can be specifically encoded, so that corresponding serialization information, namely track encoding data, can be obtained to represent a path of an intelligent agent.

Since the agent may make multiple movements or movements in multiple directions, the hidden state vectors included in the movement track are multiple, in order to accurately calculate the track coding data of the agent in movement, in one embodiment, step S13 may include the following substeps:

in the substep S131, a plurality of hidden state vectors are acquired when the agent moves from the start to the stop.

Specifically, each time after the language feature vector and the visual feature vector are obtained, a hidden state vector is generated and recorded once, until the agent stops moving, a plurality of previously recorded hidden state vectors are obtained.

Substep S132, constructing the plurality of hidden state vectors into a serialized hidden state set.

Because each hidden state vector is serialized information, a plurality of hidden state vectors can be constructed into a serialized set to obtain a hidden state set.

And S133, performing track coding on the hidden state set by adopting a preset coding long-period and short-period memory network to obtain track coding data.

The serialized hidden state set can be encoded by a preset encoding long-term memory network to obtain track encoded data.

The track coding data comprises the determination of the movement action of the intelligent agent in each navigation and the navigation directions among different times, so that the coding of the whole track of the intelligent agent is obtained, the long-term planning capacity of the intelligent agent is enhanced, and the long-term planning capacity of the intelligent agent is improved.

It should be noted that the preset encoding long-term and short-term memory network may be set in the encoder, and the hidden state set may be input to the encoder to obtain the corresponding track encoded data.

The encoder may receive a set of hidden states

Wherein h is _t Is the hidden state of the recurrent neural network in the decoder, generates the track code e, e=lstm (h ₁ ,…,h _T ) Wherein T represents each time step, T represents the navigation plot length, and the hidden state set is encoded to obtain track encoded data.

S14, invoking a training model for performing contrast resistance learning on the track coding data by using preset obstacle scene track data and preset barrier-free scene track data to obtain a navigation training system, wherein the preset obstacle scene track data is navigation track data of an agent under barrier conditions, and the preset barrier-free scene track data is navigation track data of the agent under barrier-free conditions.

Referring to fig. 2, an operation flowchart of contrast training provided by an embodiment of the present invention is shown, in which two different scenes (an obstacle scene and an unobstructed scene) may be provided, and a model is trained by performing contrast learning in the obstacle scene and the unobstructed scene.

In an alternative embodiment, in order to improve training efficiency, the obstacle scene and the non-obstacle scene may be set in a training sample, where the training sample may include preset obstacle scene track data e _o And preset barrier-free scene track data e _f Preset obstacle scene trajectory data e _o And preset barrier-free scene track data e _f The intelligent agent is obtained by navigation calculation under the obstacle condition and the non-obstacle condition respectively. The acquisition mode can also be obtained through calculation through the step flow.

Then reuse the obstacle scene track data e _o And barrier-free scene track data e _f And performing model training on the track coding data to obtain a navigation training system, so that the navigation training system can be provided with the functions of dealing with obstacles and no obstacles, and the navigation stability and the robustness of the navigation training system can be improved.

In order for the navigation training system to learn slowly and to converge to navigate out a trajectory similar to that taken under unobstructed conditions during the training process, to avoid the effects of obstructions, step S14 may include, as an example, the sub-steps of:

and S141, performing contrast training on the preset obstacle scene track data and the preset barrier-free scene track data to obtain training obstacle scene track data and training barrier-free scene track data respectively.

The preset obstacle scene track data and the preset barrier-free scene track data are subjected to contrast training, so that the learning convergence efficiency of the model can be improved, and the data processing efficiency is improved.

To further optimize the obstacle scene trajectory data in an obstacle conditioned environment, wherein, as an example, the sub-step S141 may comprise the following sub-steps:

in the substep S1411, the preset obstacle scene track data is taken as a negative sample of the preset barrier-free scene track data to perform contrast training.

Specifically, referring to fig. 2, in the setting of the obstacle-free scene, the preset obstacle scene trajectory data e _o Can also be used as preset barrier-free scene track data e _f Is a negative sample of (a). Under the condition of obstacle, the preset obstacle scene track data e _o Is forced to approach the preset barrier-free scene track data e _f In the unobstructed scene, the preset unobstructed scene track data e _f Optimized to be matched with preset obstacle scene track data e _o The differences are as follows:

sim (a,) represents the similarity between a and b. f, o represent an unobstructed setting and a setting with an obstructed condition, respectively. Track coding under two different settings can be jointly optimized by contrast training to further improve the robustness of the navigator.

And S142, pulling the training obstacle scene track data and the training obstacle-free scene track data by comparing a loss function and a gradient descent algorithm to obtain pulled track data.

In particular, contrast loss l can be exploited _o And gradient descent algorithm, training to pull-in e _o And e _f The intelligent agent will learn to imitate the barrier-free track, and then gradually learn to converge to navigate out the track which is similar to the track which is walked out under the barrier-free condition in the training process under the barrier scene, so as to avoid the influence caused by the barrier.

In an unobstructed scene, by collecting unobstructedTrack code e _f And navigating the obtained trajectory codes along the supervised path to construct positive sample pairs for the specific instance. While the agent will learn to make decisions in the long term by mimicking the supervised trajectory.

In this embodiment, different training sample examples may include different negative samples, which may be represented as S ^- . Alternatively, the negative samples are track samples formed by fixing the same starting point and end point and then randomly sampling the connection of nodes, and are expressed by re-track coding.

Wherein, under the obstacle scene, the contrast loss l _o Can be represented by the following formula:

where τ is a temperature parameter.

In an unobstructed scene, contrast loss l _f Can be represented by the following formula:

where λ is the weight between two penalty terms.

In practice, in both unobstructed and obstructed scenarios, different instances of micro-batch processing may also be used as negative examples to encourage more discriminatory trajectory representations.

Substep S143, using the zoomed-in trajectory data to simulate training the trajectory encoding data.

Then, the track coding data can be simulated and trained by using the zoomed track data, so as to obtain the lead training system.

In order to perform parameter updating, learning ability of the agent is improved. In an alternative embodiment, the method may further comprise the steps of:

and S15, carrying out gradient calculation on the navigation training system by adopting a learning loss function to obtain gradient parameters. In the present embodiment, the learning loss function is as follows

S16, carrying out parameter updating of gradient calculation by adopting the gradient parameters.

After the gradient parameters are calculated, the parameters can be updated according to the returned gradient.

Referring to fig. 3, a comparative schematic diagram of a navigation result sample is shown according to an embodiment of the present invention. The MACL is a result of navigation in an obstacle scene after the intelligent agent is trained based on a proposed model-independent contrast learning mode, and the marker represents an obstacle point and the star represents a terminal point.

The results presented are based on the assumption that an agent may encounter only one obstacle point during navigation. In this comparison, the baseline model used in this example was EnvDrop. As can be seen from fig. 3, in an obstacle scenario, an agent may face an obstacle at different stages of the navigation process.

Although the navigation performance of the agent program is good in an unobstructed environment, it is less robust and therefore unusable in our obstructed conditions. After the method is used, the robustness of the intelligent body can be effectively improved, and the intelligent body can be moved to the target position after facing the obstacle.

In this embodiment, the present invention provides a navigation system training method based on contrast learning, which has the following beneficial effects: the intelligent navigation system can enable the intelligent body to acquire instruction information and image information under different modes in the moving process, convert the instruction information and the image information in the moving process into the data of the specified track to be moved after stopping moving, and conduct contrast-resistant learning and training on the moving track data, so that the long-term planning capability of the intelligent body is optimized, the robustness of the navigator under the obstacle condition can be greatly improved, and the navigation accuracy is improved.

The embodiment of the invention also provides a navigation system training device based on contrast learning, and referring to fig. 4, a schematic structural diagram of the navigation system training device based on contrast learning is shown.

Wherein, as an example, the navigation system training device based on contrast learning may include:

the acquisition module 401 is configured to acquire modality information of an agent in different modalities when the agent moves, and encode the modality information into feature vectors;

an obtaining module 402, configured to obtain a hidden state vector according to the feature vector when it is determined that the agent stops moving;

the encoding module 403 is configured to perform track encoding on the hidden state vector to obtain track encoded data;

the training module 404 is configured to invoke a training model for performing contrast learning on the track encoded data by using preset obstacle scene track data and preset barrier-free scene track data, so as to obtain a navigation training system, where the preset obstacle scene track data is navigation track data of an agent under a barrier condition, and the preset barrier-free scene track data is navigation track data of the agent under the barrier-free condition.

Optionally, the training module is further configured to:

and training the track coding data by utilizing the zoomed-in track data.

Optionally, the training module is further configured to:

Optionally, the modal information includes instruction information composed of natural language and image information collected by multiple viewpoints, and the feature vectors include language feature vectors and visual feature vectors;

the acquisition module is also used for:

Optionally, the acquiring module is further configured to:

Optionally, the encoding module is further configured to:

Optionally, the apparatus further comprises:

the calculation module is used for carrying out gradient calculation on the navigation training system by adopting a learning loss function to obtain gradient parameters;

and the updating module is used for carrying out parameter updating of gradient calculation by adopting the gradient parameters.

Optionally, the apparatus further comprises:

the probability module is used for carrying out coding calculation on the feature vectors to obtain a navigation probability value set of the intelligent body moving towards N directions, wherein the navigation probability value set comprises N corresponding movement probability values and a stopping probability value, each movement probability value corresponds to one movement direction, and N is a positive integer greater than or equal to 0;

the screening module is used for screening the probability value with the largest numerical value from the navigation probability value set as a target probability value;

the mobile module is used for controlling the intelligent body to move according to the moving direction corresponding to the target probability value if the target probability value is any one of the N moving probability values;

And the stopping module is used for controlling the intelligent body to stop moving if the target probability value is the stopping probability value.

The embodiment of the invention also provides a navigation system based on contrast learning, and referring to fig. 5, a schematic structural diagram of the navigation system based on contrast learning is shown.

Wherein, as an example, the navigation system based on contrast learning comprises: the system comprises a visual language coding module, an action decoding module, a navigation module, a track coding module and an antagonism contrast learning module which are connected in sequence;

Further, an embodiment of the present application further provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the method of training a navigation system based on contrast learning as described in the above embodiments.

Further, the embodiment of the application further provides a computer readable storage medium, which stores computer executable instructions for causing a computer to execute the navigation system training method based on contrast learning as described in the above embodiment.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A method of training a navigation system based on contrast learning, the method comprising:

performing track coding on the hidden state vector to obtain track coding data;

invoking preset obstacle scene track data and preset barrier-free scene track data to perform a training model of contrast resistance learning on the track coding data to obtain a navigation training system, wherein the preset obstacle scene track data is navigation track data of an agent under barrier conditions, and the preset barrier-free scene track data is navigation track data of the agent under barrier-free conditions;

the training model for invoking preset obstacle scene track data and preset barrier-free scene track data to perform contrast resistance learning on the track coding data comprises the following steps:

for the preset obstacle scene track data e _o And the preset barrier-free scene track data e _f Performing contrast training to obtain training obstacle scene track data and training obstacle-free scene track data respectively; wherein, under the condition of having obstacle, the preset obstacle scene track data e _o Is forced to approach the preset barrier-free scene track data e _f In the unobstructed scene, the preset unobstructed scene track data e _f Optimized to be matched with preset obstacle scene track data e _o With zonesThe following steps:

sim (a, b) represents the similarity between a and b, f, o represents the barrier-free setting and the setting with barrier conditions, respectively;

simulating training the trajectory encoding data using the zoomed-in trajectory data;

and drawing the training obstacle scene track data and the training non-obstacle scene track data by comparing a loss function and a gradient descent algorithm to obtain drawing track data, wherein the drawing track data comprises the following steps of:

training to approach e using contrast loss and gradient descent algorithms _o And e _f The intelligent agent will learn to imitate the barrier-free track, and then gradually learn to converge to navigate out the track which is similar to the track which is walked out under the barrier-free condition in the training process under the barrier scene; wherein, under the obstacle scene, the contrast loss l _o Can be represented by the following formula:

τ is a temperature parameter; in an unobstructed scene, contrast loss l _f Can be represented by the following formula: />

λ is the weight between two loss terms;

encoding e in an unobstructed scene by collecting unobstructed trajectories _f And navigating the obtained trajectory codes along the supervised path to construct positive sample pairs for the specific instance.

2. The method for training a navigation system based on contrast learning of claim 1, wherein the performing contrast training on the preset obstacle-free scene trajectory data and the preset obstacle trajectory data includes:

3. The method for training a navigation system based on contrast learning according to claim 1, wherein the modal information includes instruction information composed of natural language and image information collected by a plurality of viewpoints, and the feature vectors include language feature vectors and visual feature vectors;

the encoding the modality information into feature vectors includes:

4. The method for training a navigation system based on contrast learning of claim 1, wherein the obtaining a hidden state vector from the feature vector comprises:

5. The method for training a navigation system based on contrast learning of claim 4, wherein the performing track encoding on the hidden state vector to obtain track encoded data comprises:

6. The method of training a navigation system based on contrast learning of claim 1, further comprising:

7. The method of training a navigation system based on contrast learning of any of claims 1-6, wherein prior to the step of determining when the agent stops moving, the method further comprises:

8. A navigation system training device based on contrast learning, the device comprising:

the training module is used for calling preset obstacle scene track data and preset barrier-free scene track data to perform contrast learning on the track coding data to obtain a navigation training system, wherein the preset obstacle scene track data is navigation track data of an agent under the barrier condition, and the preset barrier-free scene track data is navigation track data of the agent under the barrier-free condition;

for the preset obstacle scene track data e _o And the preset barrier-free scene track data e _f Performing contrast training to obtain training obstacle scene track data and training obstacle-free scene track data respectively; wherein, under the condition of having obstacle, the preset obstacle scene track data e _o Is forced to approach the preset barrier-free scene track data e _f In the unobstructed scene, the preset unobstructed scene track data e _f Optimized to be matched with preset obstacle scene track data e _o The differences are as follows:

λ is the weight between two loss terms;

9. A navigation system based on contrast learning, the navigation system comprising: the system comprises a visual language coding module, an action decoding module, a navigation module, a track coding module and an antagonism contrast learning module which are connected in sequence;

the contrast learning module is used for performing contrast training on the track coded data to generate a navigation model;

Invoking a training model for performing contrast learning on the track coding data by using preset obstacle scene track data and preset barrier-free scene track data, wherein the training model comprises the following components:

Training to approach e using contrast loss and gradient descent algorithms _o And e _f Is expressed so that the agent will learn to imitate barrier-freeThe track is blocked, and then the track which is similar to the track which is walked under the barrier-free condition is gradually learned and converged in the training process under the barrier scene; wherein, under the obstacle scene, the contrast loss l _o Can be represented by the following formula:

λ is the weight between two loss terms;