CN113589810B

CN113589810B - Dynamic autonomous obstacle avoidance movement method and device for intelligent body, server and storage medium

Info

Publication number: CN113589810B
Application number: CN202110844941.3A
Authority: CN
Inventors: 杨琪; 杨鹏; 唐珂
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2024-04-30
Anticipated expiration: 2041-07-26
Also published as: CN113589810A

Abstract

The embodiment of the invention discloses an intelligent dynamic autonomous obstacle avoidance movement method, an intelligent dynamic autonomous obstacle avoidance movement device, a server and a storage medium. The method comprises the following steps: determining a corresponding motion model according to the actual motion condition of the intelligent body, wherein the motion model comprises a simulation environment and a basic neural network; extracting high-dimensional parameters of the basic neural network, and reducing the high-dimensional parameters to obtain low-dimensional variables; optimizing and restoring the low-dimensional variable based on the simulation environment by selecting an evolution algorithm to obtain an obstacle avoidance strategy parameter, and generating an obstacle avoidance strategy network based on the basic neural network and the obstacle avoidance strategy parameter; and inputting the current scene perception of the intelligent agent into the obstacle avoidance strategy network to generate an obstacle avoidance action so as to move according to the obstacle avoidance action. According to the embodiment of the invention, artificial global planning is not required to be carried out on the path of the intelligent agent, a specific obstacle avoidance rule is not required to be designed, hardware acceleration is easy to use, parallelism is easy to achieve, and when an obstacle avoidance strategy network is obtained, low-dimensional parameters are optimized, a complex algorithm is not required, and the implementation is easy.

Description

Dynamic autonomous obstacle avoidance movement method and device for intelligent body, server and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to an intelligent dynamic autonomous obstacle avoidance movement method, device, server and storage medium.

Background

Currently, autonomous dynamic obstacle avoidance refers to a process that an intelligent body (such as an automatic driving trolley, an underwater robot, an unmanned aerial vehicle and the like) senses sensing information of a current environment to the environment by using a sensor (such as a laser radar, a camera, a sonar and the like) in an environment with a dynamically moving obstacle, automatically avoids the obstacle and completes a given target (such as reaching a destination). The existing autonomous obstacle avoidance control method mainly starts from the several views, and one category is to make path planning based on a continuously corrected map, find an unobstructed (obstacle-free) path and execute the path planning. The existing method mainly realizes obstacle avoidance based on heuristic optimization methods such as genetic algorithm (Evolutionary Algorithms), fuzzy control (Fuzzy control) and the like, artificial potential field method (ARTIFICIAL POTENTIAL FIELD) and the like, and local path planning method of reinforcement learning (Reinforcement Learning), and a decision method combined with deep learning is widely used in recent years. The second is a computer vision-based method, which detects an object (obstacle) in real time and avoids the object by a control rule.

In our problem, hope not to avoid the obstacle through the way of local path planning or visual recognition, also not human intervention, but the intelligent body can select a series of movements independently according to the current received real-time observation information, realize avoiding the obstacle, accomplish the task. The method uses an end-to-end depth network (DNN) to implicitly represent the agent control strategy (policy), and then automatically learns strategies for completing tasks and avoiding barriers from a large number of simulation interactions, and when the method is put into use, autonomous decision can be realized by using the perception information. Through the transformation of the problems, the core problem of the method is how to effectively optimize and obtain the optimal strategy parameters x and maximize the implicit objective function y. The main difficulty of this problem is that the image in the actual scene tends to have a large information amount, a large picture size, and a large number of pixel values. To handle larger-scale inputs and to compete with more complex tasks, the parameter dimension of neural networks is rapidly growing. However, the high-dimensional parameters also lead to rapid performance degradation during optimization, so that the optimization is easier to lead to local optimization, the optimization time is increased, and no better acceleration means exists.

Disclosure of Invention

In view of the above, the invention provides a dynamic autonomous obstacle avoidance movement method, a device, a server and a storage medium for an intelligent agent, which use a neural network to abstract and represent the obstacle avoidance strategy of the intelligent agent, so that hardware acceleration is easy to use, and parallelism is easy to realize.

In a first aspect, the present invention provides a dynamic autonomous obstacle avoidance movement method for an agent, including:

Determining a corresponding motion model according to the actual motion condition of the intelligent body, wherein the motion model comprises a simulation environment and a basic neural network;

Extracting high-dimensional parameters of the basic neural network, and reducing the dimensions of the high-dimensional parameters to obtain low-dimensional variables;

Selecting a preset evolution algorithm based on the simulation environment, optimizing and restoring the low-dimensional variable to obtain an obstacle avoidance strategy parameter, and generating an obstacle avoidance strategy network based on the basic neural network and the obstacle avoidance strategy parameter;

And inputting the current scene perception of the intelligent agent into the obstacle avoidance strategy network to generate an obstacle avoidance action so as to move according to the obstacle avoidance action.

In a second aspect, the present invention provides an intelligent dynamic autonomous obstacle avoidance exercise device, comprising:

the motion modeling module is used for determining a corresponding motion model according to the actual motion condition of the intelligent body, and the motion model comprises a feedback function and a basic neural network;

the parameter dimension reduction module is used for extracting the high-dimension parameters of the basic neural network and reducing the dimension of the high-dimension parameters to obtain low-dimension variables;

the parameter optimization module is used for selecting a preset evolution algorithm to optimize the low-dimensional variable to obtain an obstacle avoidance strategy parameter, and generating an obstacle avoidance strategy network based on the basic neural network and the obstacle avoidance strategy parameter;

the obstacle avoidance action module is used for inputting the current scene perception of the intelligent agent into the obstacle avoidance strategy network to generate an obstacle avoidance action so as to move according to the obstacle avoidance action.

In a third aspect, the present invention provides a server comprising:

one or more processors;

a storage means for storing one or more programs;

The one or more programs, when executed by the one or more processors, enable the one or more processors to implement an agent dynamic autonomous obstacle avoidance exercise method as provided by any of the embodiments of the present invention.

In a third aspect, the present invention provides a computer readable storage medium storing a computer program comprising program instructions that when executed implement the method of dynamic autonomous obstacle avoidance movement of an agent provided by any of the embodiments of the present invention.

In the embodiment of the invention, the obstacle avoidance movement of an intelligent body is converted into a sequence decision problem, the neural network is adopted to finish the output of the sequence decision, in order to obtain a proper neural network, a movement model comprising a simulation environment and a basic neural network is firstly determined according to the actual movement situation of the intelligent body, then the high-dimensional parameters of the basic neural network are extracted, the high-dimensional parameters are reduced in dimension to obtain low-dimensional variables, a preset evolution algorithm is selected to optimize the low-dimensional variables based on the simulation environment to obtain obstacle avoidance strategy parameters, the obstacle avoidance strategy network is generated based on the basic neural network and the obstacle avoidance strategy parameters, the obstacle avoidance strategy network is the required neural network, and finally the current scene perception of the intelligent body is input into the obstacle avoidance strategy network to generate the obstacle avoidance action so as to move according to the obstacle avoidance action.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly explain the drawings required to be used in the embodiments or the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an intelligent dynamic autonomous obstacle avoidance exercise method according to an embodiment of the present invention;

Fig. 2 is a sub-flowchart of an intelligent agent dynamic autonomous obstacle avoidance movement method according to a second embodiment of the present invention;

FIG. 3 is a sub-flowchart of an intelligent agent dynamic autonomous obstacle avoidance movement method according to a second embodiment of the present invention;

Fig. 4 is a sub-flowchart of an intelligent agent dynamic autonomous obstacle avoidance movement method according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of an intelligent dynamic autonomous obstacle avoidance movement device according to a third embodiment of the present invention;

Fig. 6 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.

Detailed Description

The technical scheme in the implementation of the present application is clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of, and not restrictive on, some, but not all embodiments of the application. It should be further noted that, based on the embodiments of the present application, all other embodiments obtained by a person having ordinary skill in the art without making any inventive effort are within the scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Furthermore, the terms "first," "second," and the like, may be used herein to describe various directions, acts, steps, or elements, etc., but these directions, acts, steps, or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element. For example, a first region may be referred to as a second region, and similarly, a second region may be referred to as a first region, without departing from the scope of the invention. Both the first region and the second region are regions, but they are not the same region. The terms "first," "second," and the like, are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise. It should be noted that when one portion is referred to as being "fixed to" another portion, it may be directly on the other portion or there may be a portion in the middle. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only and do not represent the only embodiment.

Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example 1

The embodiment provides a dynamic autonomous obstacle avoidance movement method of an intelligent agent, which is used for automatically controlling obstacle avoidance movement of the intelligent agent, does not need to make artificial global planning on a path of the intelligent agent or design specific obstacle avoidance rules, can be realized by a terminal or a server, and can also be realized by interaction of the terminal and the server, and the terminal in the embodiment is illustrated by taking the terminal as an example, and can specifically refer to the intelligent agent, and referring to fig. 1, the method comprises the following steps:

s110, determining a corresponding motion model according to the actual motion condition of the intelligent body, wherein the motion model comprises a simulation environment and a basic neural network.

The intelligent body refers to intelligent equipment capable of sensing environment information and completing preset movement purposes, such as an automatic driving trolley, an underwater robot, an unmanned aerial vehicle and the like. The actual movement condition represents the movement environment and movement purpose of the intelligent body, such as passing through a road on a certain road on the ground to reach a designated place, and completing exploration of a designated area under water while avoiding obstacles (animals, sediments, etc.), etc. The motion model is used for establishing a simulation environment of the motion of the intelligent body and a basic neural network for deciding the motion action of the intelligent body, the simulation environment is used for describing the current motion environment and the motion purpose of the intelligent body and the motion parameters of the intelligent body, and the basic neural network is a neural network for making the motion decision and is used for deciding the motion action of the intelligent body according to the current environment of the intelligent body.

S120, extracting high-dimensional parameters of the basic neural network, and reducing the dimensions of the high-dimensional parameters to obtain low-dimensional variables.

The basic neural network is used for abstractly representing obstacle avoidance strategies of the intelligent agent, and the input of the obstacle avoidance strategies is scene perception of the intelligent agent, such as images shot around the intelligent agent through a camera. The higher-dimensional parameters are parameter sets used to determine the specific architecture of the underlying neural network, with the more complex the underlying neural network, the greater the number of parameters in the higher-dimensional parameters, for example, image input, which generally means a greater number of pixels, more complex image input, and greater motion space for the agent. The low-dimensional variable is a parameter set obtained by splitting a high-dimensional parameter, and is usually obtained by mapping the high-dimensional parameter to a low-dimensional space.

S130, selecting a preset evolution algorithm based on the simulation environment, optimizing and restoring the low-dimensional variable to obtain an obstacle avoidance strategy parameter, and generating an obstacle avoidance strategy network based on the basic neural network and the obstacle avoidance strategy parameter.

The evolution algorithm is used for training and optimizing the low-dimensional variables so that the corresponding high-dimensional parameters can be used for obtaining the neural network with better decision results, the high-dimensional parameters obtained at the moment are called obstacle avoidance strategy parameters for convenience in distinguishing, and the corresponding neural network is called an obstacle avoidance strategy network (the obstacle avoidance strategy parameters can be obtained by replacing the high-dimensional parameters in the basic neural network). The evolution algorithm is used for training and optimizing low-dimensional variables through data driving, specifically, simulating interaction between an intelligent agent and the environment through a simulation environment, screening and optimizing according to different interaction results, and restoring the optimization results to obtain required high-dimensional obstacle avoidance strategy parameters, and the specific evolution algorithm can be specially set according to different actual motion conditions without limitation.

S140, inputting the current scene perception of the intelligent agent into the obstacle avoidance strategy network to generate an obstacle avoidance action so as to move according to the obstacle avoidance action.

After the obstacle avoidance strategy network is obtained by the trained obstacle avoidance strategy parameters, the obstacle avoidance strategy network can be directly applied to the intelligent body, namely the intelligent body continuously acquires current scene perception, inputs the current scene perception into the obstacle avoidance strategy network, outputs corresponding obstacle avoidance actions by the obstacle avoidance strategy network, and moves according to the obstacle avoidance actions, so that the current environment perception continuously changes along with the movement of the intelligent body, the corresponding obstacle avoidance actions also change, and the intelligent body gradually completes the movement purpose determined by the actual movement condition.

The embodiment provides an intelligent dynamic autonomous obstacle avoidance movement method, which is characterized in that obstacle avoidance movement of an intelligent body is changed into a sequence decision problem, output of the sequence decision is completed by adopting a neural network, a movement model comprising a simulation environment and a basic neural network is determined according to actual movement conditions of the intelligent body in order to obtain a proper neural network, then high-dimensional parameters of the basic neural network are extracted, the high-dimensional parameters are reduced to obtain low-dimensional variables, a preset evolution algorithm is selected to optimize the low-dimensional variables based on the simulation environment to obtain obstacle avoidance strategy parameters, an obstacle avoidance strategy network is generated based on the basic neural network and the obstacle avoidance strategy parameters, the obstacle avoidance strategy network is the needed neural network, and finally current scene perception of the intelligent body is input into the obstacle avoidance strategy network to generate the obstacle avoidance action so as to perform movement according to the obstacle avoidance action.

Example two

The embodiment further explains and exemplifies part of the process of the dynamic autonomous obstacle avoidance movement method of the intelligent agent on the basis of the first embodiment, for example, how to optimize the low-dimensional variable to obtain the obstacle avoidance strategy parameters. The method specifically comprises the following steps:

As shown in fig. 2, step S110 includes steps S111-112:

S111, determining the simulation environment according to the actual motion condition of the intelligent agent, wherein the simulation environment comprises a state space, an action space and a feedback function, the state space is formed by an environment image which is possibly observed by the intelligent agent, the action space is formed by motion actions which can be taken by the intelligent agent, and the feedback function is used for counting the reward points which are obtained by the intelligent agent when the intelligent agent completes the motion.

S112, selecting a basic neural network with input and output meeting requirements according to the state space and the action space, wherein the input size of the basic neural network is determined by the image size of the environment image, and the output size of the basic neural network is determined by the size of the action space.

In the steps S111-112, for determining a specific process of the motion model, the simulation environment needs to determine < S, a, R, P >, a state space S and an action space a, where all possible images (such as a water surface picture and a traffic flow picture) observed by the agent form a state space, and actions (such as up, down, left, right and static) that the agent can take form an action space. The feedback function is used to determine feedback corresponding to different actions, and the feedback can be defined according to specific problems, for example, the feedback is expressed in terms of a score, in one embodiment, the score obtained by reaching the opposite street once is 1 score, the number of times the opposite street is successfully reached within a certain time (2 minutes) is used as the final score of a strategy, in another embodiment, when the underwater robot is in contact with a fish shoal or falls into water for-10 scores, the floating ice of a water area reaching a specified depth is +10 scores, and the positive score obtained by returning to the shore is +100. Successful return to shore is successful in ending the task, whereas 5 contacts or drops into the water are terminated for unsuccessful ending of the task.

For the basic neural network, the input size is the image size of the observed environment state after uniform preprocessing, and the output size is the size of the intelligent agent action space. In one embodiment, the observed images are simply resized, all the images are cut into images with the length and width of 84 pixels, gray scale processing and jitter elimination are carried out, and each 4 frames of images are respectively input into 4 channels of the strategy network. That is, the policy network input size is 84×84×4. In the problem of obstacle avoidance of the traffic flow (a), the size of the action space is 3, and the action space comprises three actions of static action, ascending action and descending action. In the problem of obstacle avoidance on the water surface, the degree of freedom is higher, the action space is 18, and the robot is static, jumps in place, walks to the east (north-south), walks to the southeast (north-south, northwest, southwest), jumps to the east (north-south, northwest), jumps to the southeast (north-south, northwest, southwest), and has 18 actions in total.

Optionally, in some embodiments, step S120 includes steps S121-123 as shown in fig. 3:

S121, initializing parameters, namely initializing high-dimensional parameters of the basic neural network and dimension reduction parameters corresponding to the high-dimensional parameters.

The dimension reduction parameters are set according to the high-dimension parameters, and are used for reducing the high-dimension parameters to the dimension meeting certain requirements, and the dimension reduction parameters are generally set in a manner based on actual operation requirements and are generally used for determining dimension reduction dimensions and the number of low-dimension subspaces (used for dimension reduction of high-dimension parameter projection).

S122, sampling according to the high-dimensional parameters and the dimension reduction parameters to generate a random matrix.

Step S122 is actually a plurality of parallel threads, and in order to quickly reduce the dimension of the high-dimension parameter, each parallel thread in step S122 generates a random matrix, so as to increase the dimension reduction speed of the high-dimension parameter. That is, step S122 generates a plurality of random matrices by a plurality of parallel threads according to the high-dimensional parameters and the dimension-reduction parameters, wherein the random matrices are used for determining the low-dimensional subspace in which the high-dimensional parameters need to be projected (or embedded).

S123, reducing the dimension of the high-dimension parameter to a low-dimension subspace through a random embedding matrix based on the random matrix to obtain a low-dimension variable.

Step S123 is also actually a plurality of parallel threads, similar to step S122, and in order to rapidly reduce the high-dimensional parameter, each parallel thread in step S123 uses a random matrix to reduce the high-dimensional parameter to a low-dimensional subspace to obtain a low-dimensional variable.

Steps S121-123 are the process of dimension reduction for the high-dimension parameters: the size D of the low-dimensional subspace (representing the low dimension and determined according to the dimension reduction parameters) is determined, the individual obstacle avoidance strategies of lambda D dimensions (representing the high dimension) are initialized (namely, single parameters in the high-dimension parameters), lambda D-dimensional vectors (representing the low dimension) are initialized, a [ D, D ] random matrix A is sampled, each variable in the random matrix A is independently sampled from a Gaussian distribution N (0,I), seeds of a random number generator are reserved, and the parameters are reduced to the D-dimensional subspace by utilizing a random embedding technology, namely, x=Ay and y=A ^-1 x. Because D x D dimensions are large, matrix inverse operation is relatively time-consuming and is not easy to accelerate, the initial solution of the sub-problem of lambda D dimensions is generated during direct initialization, which is equivalent to projecting an original D dimension space on lambda different D dimension subspaces, and an optimization algorithm is always operated on the D dimension space. In the step of solution reduction, the initial solution is reduced in the form of x ⁱ⁺¹＝xⁱ+Ayⁱ correspondingly, so that matrix inverse operation is avoided.

Optionally, in some embodiments, as shown in fig. 4, step S130 includes steps S131-134:

S131, taking the low-dimensional variable as a parent to generate a plurality of offspring through variation or intersection.

In one example, the manifestation may generate multiple optimization vectors for using gaussian noise and the dimensionality reduction vector on each of the low-dimensional subspaces, e.g., generate μ d-dimensional vectors on each subspace. The generation method is to add Gaussian noise N (0, sigma) to the original d-dimensional vector, wherein sigma is adjusted along with the evolution process. Steps S121-123 are actually the basis for the optimization algorithm in step S130, in which λ represents the number of parent individuals, μ represents the number of child individuals, X represents the parent individual vector, and Y represents the child individual vector.

S132, evaluating fitness based on the plurality of offspring and the simulation environment.

In one example, the manifestation is to restore the plurality of optimization vectors to a plurality of optimization strategy networks, and determine fitness of the optimization strategy networks based on simulated scene perception and the simulated environment. For example, the solution is restored by x ⁱ⁺¹＝xⁱ+Ayⁱ to be μ policy network units, in fact, due to the excessively high dimension, the D-dimensional vector is divided into vector segments with s length as interval, and the solution is performed on each vector segment, so that the step can be conveniently performed on the GPU.

More specifically, in one embodiment, step S132 includes steps S1321-1324 (not shown).

S1321, inputting the simulated scene perception into the optimization strategy network to obtain a proposal decision.

S1322 determining a feedback value of the proposed decision according to the proposed decision and the simulation environment.

S1323, accumulating the feedback value as the fitness of the optimization strategy network.

S1324, iterating the steps until reaching a motion termination condition to obtain the final fitness, wherein the motion termination condition comprises success and failure. This step is used to determine if the iteration is terminated, and a motion termination condition such as the number of times the agent in the vehicle stream has been knocked down by an obstacle reaching an upper limit, or the agent falling into the water.

S133, updating the parent according to the fitness and the plurality of offspring.

S134, repeating the steps based on the father until the iteration requirement is met, outputting the father finally obtained, and determining obstacle avoidance strategy parameters according to the father.

Steps S131-134 are processes of optimizing the low-dimensional variables to determine the obstacle avoidance strategy parameters, and during training, an iterative training mode is adopted: the low-dimensional variable is obtained as a parent to generate new offspring through variation or intersection, the offspring is used as a basis to simulate the decision results of different offspring in a simulation environment, the fitness of the offspring is evaluated according to the decision results, the offspring with the fitness more meeting the requirements (more meeting the movement purpose) are screened out according to the fitness, the parent is updated according to the screening results, the optimization process is continuously carried out, the obtained parent is the needed low-dimensional variable, and the needed high-dimensional parameters, namely obstacle avoidance strategy parameters, are obtained through inverse pushing according to the dimension reduction process.

More specifically, in some embodiments step S133 includes steps S1331-1332 (not shown):

S1331, calculating the correlation degree of the child probability distribution of each child and the parent probability distribution of the parent in the whole population.

In this embodiment, the probability distribution is a gaussian distribution with the current child vector value as the mean and the dynamically updated variance matrix as the variance, and the whole population represents the probability distribution of other search processes. Specifically, the correlation in this embodiment is measured by the distribution Bhattacharyya distance (of course, other distribution distance calculations may be actually used).

Step S1331 actually calculates the negative correlation distance (i.e., the distribution distance, which is the negative correlation and therefore the negative correlation distance according to the final expression) by using the low-dimensional variables (i.e., the optimization variables) corresponding to the child and parent, and specifically, the calculation method is as follows:

Wherein D _B refers to the Bhattacharyya distance, which is calculated as follows:

Where p _i represents the parent probability distribution of the desired computation (corresponding to p _i 'appearing later, p _i' represents the child probability distribution, the correlation computation of which is similar), p _j represents the probability distribution of other search processes, Σ _i represents the covariance matrix of the p _i distribution, Σ _j represents the covariance matrix of the p _j distribution, x _i represents the mean of the p _i distribution, and x _j represents the mean of the p _j distribution.

S1332, replacing the parent with the child which meets the replacement condition according to the correlation degree to generate a new parent, and updating the variation or crossing mode.

The substitution condition being a judgment condition related to the negatively-correlated distance, e.g. when The searched offspring replace the parent when, otherwise, the original parent is still used. The update variance or crossover approach is specific to the variance of gaussian noise when the next generation generates children is modified in this embodiment.

More specifically, in some embodiments, step S1332 is followed by step S1333 (not shown):

S1333, adjusting the variance matrix to increase or decrease every preset algebra according to the number of times of successfully replacing the parent.

Algebraic representation steps S1331-1332 are performed for a number of times, i.e. for an updated generation, without performing once, and the variance matrix is adjusted after the preset algebra in order to avoid the adverse effect of the optimization of the same variance matrix, and when the parent replacement always keeps frequent or too few occurrences, the variance matrix needs to be adjusted accordingly. Specifically, in one example, when the number of times of successfully replacing the parent in the preset algebra is greater than the first preset number of times, the adjustment variance matrix is increased, and when the number of times of successfully replacing the parent in the preset algebra is less than the second preset number of times, the adjustment variance matrix is decreased. Of course, the above specific examples are only for illustration and not limitation, and the relationship between the number of successful parent replacement in the preset algebra and the adjustment of the size of the variance matrix needs to be set according to practical situations.

The method for dynamically and autonomously avoiding obstacle movement of the intelligent agent provided by the embodiment further provides a process for optimizing low-dimensional parameters on the basis of the first embodiment, an evolution algorithm is adopted as a population-based optimization method, the method is naturally suitable for parallelization, the purpose of reducing the elapsed time of algorithm training by doubling the increase of computing resources is achieved, and the system overhead is lower than that of a strategy gradient method.

Example III

Fig. 5 is a schematic structural diagram of an intelligent dynamic autonomous obstacle avoidance movement device according to a third embodiment of the present invention, where, as shown in fig. 5, the device includes:

The motion modeling module 310 is configured to determine a corresponding motion model according to an actual motion situation of the agent, where the motion model includes a feedback function and a basic neural network;

The parameter dimension reduction module 320 is configured to extract a high-dimension parameter of the basic neural network, and reduce the dimension of the high-dimension parameter to obtain a low-dimension variable;

The parameter optimization module 330 is configured to select a preset evolution algorithm to optimize the low-dimensional variable to obtain an obstacle avoidance policy parameter, and generate an obstacle avoidance policy network based on the basic neural network and the obstacle avoidance policy parameter;

the obstacle avoidance action module 340 is configured to input the current scene of the agent to the obstacle avoidance policy network to generate an obstacle avoidance action, so as to perform movement according to the obstacle avoidance action.

More specifically, in some embodiments, the motion modeling module 310 includes:

The simulation environment determining unit is used for determining the simulation environment according to the actual motion situation of the intelligent body, the simulation environment comprises a state space, an action space and a feedback function, the state space is formed by an environment image which the intelligent body can observe, the action space is formed by motion actions which the intelligent body can take, and the feedback function is used for counting the rewarding points which are obtained by the intelligent body when the intelligent body finishes the motion;

And the basic neural network determining unit is used for selecting a basic neural network with input and output meeting requirements according to the state space and the action space, wherein the input size of the basic neural network is determined by the image size of the environment image, and the output size of the basic neural network is determined by the size of the action space.

Optionally, in some embodiments, the parameter dimension reduction module 320 includes:

The initialization module is used for initializing parameters, and initializing high-dimensional parameters of the basic neural network and dimension reduction parameters corresponding to the high-dimensional parameters;

The random matrix generation module is used for sampling and generating a random matrix according to the high-dimensional parameters and the dimension reduction parameters;

And the dimension reduction module is used for reducing the dimension of the high-dimension parameter to a low-dimension subspace through a random embedding matrix based on the random matrix to obtain a low-dimension variable.

Optionally, in some embodiments, the parameter optimization module 330 includes:

a child generation unit for generating a plurality of children by variation or intersection with the low-dimensional variable as a parent;

An fitness evaluation unit configured to evaluate fitness of the plurality of children based on the simulation environment;

a parent updating unit for updating a parent according to the fitness and the plurality of children;

Repeating the steps until the iteration requirement is met based on the parent, outputting the finally obtained parent, and determining obstacle avoidance strategy parameters according to the parent.

More specifically, in some embodiments, generating the plurality of children from the variation or intersection with the low-dimensional variable as a parent comprises:

generating a plurality of optimization vectors using gaussian noise and the dimension-reduction vectors on each of the low-dimensional subspaces;

The evaluating fitness based on the plurality of children and the simulated environment includes:

restoring the plurality of optimization vectors into a plurality of optimization strategy networks;

And determining the fitness of the optimized strategy network based on the simulated scene perception and the simulated environment.

More specifically, in some embodiments, the determining the fitness of the optimization strategy network based on the simulated scene perception and the simulated environment includes:

inputting the simulated scene perception into the optimization strategy network to obtain a proposal decision;

determining a feedback value of the proposed decision according to the proposed decision and the simulation environment;

Accumulating the feedback value as the fitness of the optimization strategy network;

The above steps are iterated until a final fitness is achieved for the motion termination conditions including success and failure.

More specifically, in some embodiments, said updating the parent according to the fitness and the plurality of children comprises:

Calculating the correlation degree of the child probability distribution of each child and the parent probability distribution of the parent in the whole population;

And replacing the parent with the child which meets the substitution condition according to the correlation degree to generate a new parent, and updating the variation or crossing mode.

The embodiment provides the intelligent dynamic autonomous obstacle avoidance movement device, which does not need to make artificial global planning on the path of the intelligent, does not need to design specific obstacle avoidance rules, is easy to accelerate by using hardware and is easy to parallelize, and when an obstacle avoidance strategy network is obtained, the low-dimensional parameters are optimized, a complex algorithm is not needed, and the implementation is easy.

Example IV

Fig. 6 is a schematic structural diagram of a server 400 according to a fourth embodiment of the present invention, where, as shown in fig. 6, the server includes a memory 410 and a processor 420, and the number of the processors 420 in the server may be one or more, and in fig. 6, one processor 420 is taken as an example. The memory 410, processor 420 in the server may be connected by a bus or other means, for example in fig. 6.

The memory 410 is used as a computer readable storage medium, and can be used to store software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the method of dynamic autonomous obstacle avoidance movement of an agent in an embodiment of the present invention (for example, the movement modeling module 310, the parameter dimension reduction module 320, the parameter optimization module 330, and the obstacle avoidance action module 340 in the dynamic autonomous obstacle avoidance movement device of the agent). The processor 420 executes various functional applications and data processing of the server by running software programs, instructions and modules stored in the memory 410, i.e. implementing the above-described method for dynamic autonomous obstacle avoidance movement of the agent.

Wherein the processor 420 is configured to execute a computer executable program stored in the memory 410 to implement the following steps: step S110, determining a corresponding motion model according to the actual motion condition of the intelligent body, wherein the motion model comprises a simulation environment and a basic neural network; step S120, extracting high-dimensional parameters of the basic neural network, and reducing the dimensions of the high-dimensional parameters to obtain low-dimensional variables; step S130, selecting a preset evolution algorithm based on the simulation environment, optimizing and restoring the low-dimensional variable to obtain an obstacle avoidance strategy parameter, and generating an obstacle avoidance strategy network based on the basic neural network and the obstacle avoidance strategy parameter; and step 140, inputting the current scene perception of the intelligent agent into the obstacle avoidance strategy network to generate an obstacle avoidance action so as to move according to the obstacle avoidance action.

Of course, the server provided by the embodiment of the invention is not limited to the method operation described above, and may also perform the related operation in the method for dynamic autonomous obstacle avoidance movement of an agent provided by any embodiment of the invention.

The memory 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating device, at least one application program required for a function; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 410 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 410 may further include memory remotely located relative to processor 420, which may be connected to a server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiment provides a server for performing dynamic autonomous obstacle avoidance movement of an intelligent agent, which does not need to make artificial global planning on a path of the intelligent agent, does not need to design specific obstacle avoidance rules, is easy to accelerate by using hardware, is easy to parallelize, and optimizes low-dimensional parameters when an obstacle avoidance strategy network is obtained, does not need complex algorithms, and is easy to implement.

Example five

The fifth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform an agent dynamic autonomous obstacle avoidance movement method, the agent dynamic autonomous obstacle avoidance movement method comprising:

Of course, the storage medium containing the computer executable instructions provided by the embodiments of the present invention is not limited to the above-mentioned method operations, and may also perform the related operations in the method for dynamic autonomous obstacle avoidance movement of an agent provided by any embodiment of the present invention.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, etc., including several instructions for causing a computer server (which may be a personal computer, a server, or a network server, etc.) to execute the method according to the embodiments of the present invention.

It should be noted that, in the embodiment of the dynamic autonomous obstacle avoidance movement device of the intelligent agent, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be realized; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. The dynamic autonomous obstacle avoidance movement method of the intelligent body is characterized by comprising the following steps of:

Inputting the current scene perception of the intelligent agent into the obstacle avoidance strategy network to generate an obstacle avoidance action so as to move according to the obstacle avoidance action;

The selecting a preset evolution algorithm to optimize the low-dimensional variable to obtain obstacle avoidance strategy parameters comprises the following steps:

generating a plurality of optimization vectors using gaussian noise and a dimension reduction vector on each low-dimensional subspace;

Determining a fitness of the optimized policy network based on a simulated scene perception and the simulated environment;

updating the parent according to the fitness and the plurality of children;

2. The method for dynamic autonomous obstacle avoidance movement of an agent of claim 1, wherein determining a corresponding movement model based on actual movement conditions of the agent comprises:

determining the simulation environment according to the actual motion condition of the intelligent body, wherein the simulation environment comprises a state space, an action space and a feedback function, the state space is formed by an environment image observed by the intelligent body, the action space is formed by motion actions which can be taken by the intelligent body, and the feedback function is used for counting the reward points obtained by the intelligent body when the intelligent body completes the motion;

and selecting a basic neural network with input and output meeting requirements according to the state space and the action space, wherein the input size of the basic neural network is determined by the image size of the environment image, and the output size of the basic neural network is determined by the size of the action space.

3. The method for dynamic autonomous obstacle avoidance movement of an agent of claim 1, wherein extracting the high-dimensional parameters of the underlying neural network and reducing the high-dimensional parameters to obtain low-dimensional variables comprises:

initializing parameters, namely initializing high-dimensional parameters of the basic neural network and dimension reduction parameters corresponding to the high-dimensional parameters;

sampling according to the high-dimensional parameters and the dimension reduction parameters to generate a random matrix;

And reducing the dimension of the high-dimension parameter to a low-dimension subspace through a random embedding matrix based on the random matrix to obtain a low-dimension variable.

4. The method of claim 1, wherein the determining the fitness of the optimization strategy network based on simulated scene perception and the simulated environment comprises:

5. The method of claim 1, wherein updating the parent according to the fitness and the plurality of children comprises:

6. An intelligent dynamic autonomous obstacle avoidance exercise device, comprising:

the obstacle avoidance action module is used for inputting the current scene perception of the intelligent agent into the obstacle avoidance strategy network to generate an obstacle avoidance action so as to move according to the obstacle avoidance action;

The parameter optimization module comprises:

A child generation unit for generating a plurality of optimization vectors using gaussian noise and a dimension reduction vector on each low-dimension subspace;

The fitness evaluation unit is used for restoring the plurality of optimization vectors into a plurality of optimization strategy networks; determining a fitness of the optimized policy network based on a simulated scene perception and the simulated environment;

a parent updating unit for updating the parent according to the fitness and the plurality of children;

7. A server, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, enable the one or more processors to implement the smart dynamic autonomous obstacle avoidance exercise method of any of claims 1-5.

8. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of dynamic autonomous obstacle avoidance exercise of an agent as claimed in any one of claims 1 to 5.