CN117539266B

CN117539266B - Route planning method and device in logistics system based on vision and electronic equipment

Info

Publication number: CN117539266B
Application number: CN202410009844.6A
Authority: CN
Inventors: 王鑫; 舒子珊; 彭举彬; 李大琳; 苏洋
Original assignee: Zhuhai Genu Technology Co ltd; University of Electronic Science and Technology of China
Current assignee: Zhuhai Genu Technology Co ltd; University of Electronic Science and Technology of China
Priority date: 2024-01-04
Filing date: 2024-01-04
Publication date: 2024-04-19
Anticipated expiration: 2044-01-04
Also published as: CN117539266A

Abstract

The embodiment of the invention provides a method, a device and electronic equipment for path planning in a logistics system based on vision, wherein the method comprises the following steps: acquiring a target position and first state data of the intelligent logistics robot, and inputting the first state data into an action positioning model to obtain an alternative action set; the action positioning model comprises a first long-time and short-time memory network layer; inputting each alternative action parameter and the first state data into an evaluation model for each alternative action parameter in the alternative action set, and outputting initial evaluation parameters; based on the initial evaluation parameters and the historical evaluation parameters, obtaining evaluation parameters corresponding to the alternative action parameters; and sending the evaluation parameters to an action positioning model, and determining target action parameters from the alternative action set based on the evaluation parameters by the action positioning model, and controlling the intelligent logistics robot to move according to target movement actions corresponding to the target action parameters. The mode guarantees the correlation between the front and back actions of the intelligent logistics robot, and improves the passing efficiency of the robot.

Description

Route planning method and device in logistics system based on vision and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for path planning in a logistics system based on vision and electronic equipment.

Background

In recent years, with the rise of artificial intelligence, a path planning method based on deep learning is generated, and the intelligent logistics robot provides convenience for a factory to realize an intelligent transportation function. In the prior art, when the intelligent logistics robot is subjected to path planning, although the intelligent logistics robot can be moved to a target position on the premise of not colliding with an obstacle, the current environment state of the intelligent logistics robot can only be relied on to determine the next movement action, so that the forward and backward movement actions of the intelligent logistics robot lack correlation, and the passing efficiency of the robot can be influenced if serious.

Disclosure of Invention

Therefore, the invention aims to provide a route planning method, a route planning device and electronic equipment in a logistics system based on vision, so as to ensure the correlation between the front and back actions of an intelligent logistics robot and improve the passing efficiency of the robot.

In a first aspect, an embodiment of the present invention provides a method for path planning in a vision-based logistics system, where the logistics system includes an intelligent logistics robot; the intelligent logistics robot is positioned in a target site, and the target site is provided with an obstacle; the intelligent logistics robot interacts with the surrounding environment through the vision detection module; the method comprises the following steps: acquiring a target position and first state data of the intelligent logistics robot, wherein the first state data indicates a first environment state of the intelligent logistics robot at the current moment; inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer; inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer; and sending the evaluation parameters to an action positioning model, and determining target action parameters from the alternative action set based on the evaluation parameters by the action positioning model, and controlling the intelligent logistics robot to move to the target position according to target movement actions corresponding to the target action parameters.

In a second aspect, an embodiment of the present invention provides a pose estimation device for an object in a disordered grabbing process, where a logistics system includes an intelligent logistics robot; the intelligent logistics robot is positioned in a target site, and the target site is provided with an obstacle; the intelligent logistics robot interacts with the surrounding environment through the vision detection module; the device comprises: the intelligent logistics robot comprises a first acquisition module, a second acquisition module and a first state acquisition module, wherein the first acquisition module is used for acquiring a target position and first state data of the intelligent logistics robot, and the first state data indicates a first environment state of the intelligent logistics robot at the current moment; the first input module is used for inputting the first state data into the action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer; the second input module is used for inputting each alternative action parameter and the first state data into the evaluation model for each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer; the first movement module is used for sending the evaluation parameters to the action positioning model, determining target action parameters from the alternative action set based on the evaluation parameters, and controlling the intelligent logistics robot to move towards the target position according to the target movement action corresponding to the target action parameters.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory stores machine executable instructions executable by the processor, and the processor executes the machine executable instructions to implement the method for path planning in a vision-based logistics system.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory stores machine executable instructions executable by the processor, and the processor executes the machine executable instructions to implement the method for path planning in a vision-based logistics system.

The embodiment of the invention has the following beneficial effects:

The embodiment of the invention provides a method, a device and electronic equipment for path planning in a logistics system based on vision, wherein the method comprises the following steps: acquiring a target position and first state data of the intelligent logistics robot, wherein the first state data indicates a first environment state of the intelligent logistics robot at the current moment; inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer; inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer; and sending the evaluation parameters to an action positioning model, and determining target action parameters from the alternative action set based on the evaluation parameters by the action positioning model, and controlling the intelligent logistics robot to move to the target position according to target movement actions corresponding to the target action parameters.

According to the method, the target position of the intelligent logistics robot and first state data indicating a first environment state of the intelligent logistics robot are obtained, the first state data are input into an action positioning model comprising a first long-short-time memory network layer to obtain alternative action sets, then, for each alternative action parameter in the alternative action sets, each alternative action parameter and the first state data are input into an evaluation model comprising a second long-short-time memory network layer, initial evaluation parameters corresponding to the alternative action parameters are output, evaluation parameters corresponding to the alternative action parameters are obtained according to the initial evaluation parameters corresponding to the alternative action parameters and the historical evaluation parameters, finally, the target action parameters are determined according to the evaluation parameters, the intelligent logistics robot is controlled to move to the target position according to target moving actions corresponding to the target action parameters.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a route planning method in a vision-based logistics system according to an embodiment of the present invention;

Fig. 2 is a schematic diagram of a detection range of a visual detection module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a network architecture of an action positioning model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a network architecture of an evaluation model according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of a route planning device in a vision-based logistics system according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For the sake of understanding the present embodiment, as shown in fig. 1, a detailed description is first given of a path planning method in a vision-based logistics system according to an embodiment of the present invention. Here, the logistics system may be an intelligent logistics robot that is applied to operations such as transferring and transporting goods in a warehouse, a sorting center, a transportation path, or the like, including an AGV (Automated Guided Vehicle, automatic guided vehicle) robot and a sorting robot for transporting objects to a target location. The intelligent logistics robot is located in a target site, the target site is provided with an obstacle, the intelligent logistics robot detects the surrounding environment through a visual detection module installed on the intelligent logistics robot, the visual detection module can be a visual sensor comprising a camera and an optical projector, and the safety motion of the intelligent logistics robot can be realized by acquiring the distance and azimuth information of the obstacle in the surrounding environment through vision or laser sound waves.

Step S102, acquiring a target position and first state data of the intelligent logistics robot, wherein the first state data indicates a first environment state of the intelligent logistics robot at the current moment;

the target position is the destination to be finally reached by the intelligent logistics robot in the target site. The state data can be used for describing the environment state of the intelligent logistics robot, wherein the environment state comprises the distance between the intelligent logistics robot and surrounding obstacles, the position relationship between the intelligent logistics robot and the target position in the target site, the current moving state of the intelligent logistics robot and the like. The first environmental state is the environmental state of the intelligent logistics robot at the current moment, and the first environmental state can be indicated through the first state data.

In one mode, the first state data is determined based on a plurality of historical motion parameters, a detection result parameter of the visual detection module, a first distance parameter, a first angle parameter, a first yaw angle, and a second angle parameter. Here, the historical motion parameters are motion parameters that the intelligent logistics robot has last performed, and include a historical linear velocity parameter and a historical angular velocity parameter. The detection result parameters of the visual detection module indicate the distance and azimuth information of the intelligent logistics robot and surrounding obstacles. For example, based on the angle of 0 degree right in front of fig. 2, the angle detection range of the vision detection module is from-90 degrees right in front to 90 degrees, the detection distance is 0.3m at minimum, and the detection distance is 2m at maximum, if the distance between the intelligent logistics robot and the obstacle is smaller than 0.3m, the intelligent logistics robot is considered to collide with the obstacle, the information data detected by the vision detection module comprises 10 angle dimensions, the included angle between each angle dimension is 20 degrees, and the detection range of the vision detection module is shown in fig. 2. The detection result parameters of the visual detection module include the distance from the obstacle detected by the visual detection module to the visual detection module in each angular dimension, and if no obstacle is detected, the distance is 0. The first distance parameter and the first angle parameter respectively indicate the distance and the included angle between the current intelligent logistics robot and the target position, and the first yaw angle is used for indicating the yaw angle of the intelligent logistics robot. The second angle parameter is determined by the first angle parameter and the first yaw angle.

In the step, a target position and first state data of the intelligent logistics robot are obtained, wherein the first state data is the basis for the intelligent logistics robot to select alternative movement actions.

Step S104, inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer;

The action positioning model is connected with the evaluation model in the same neural network, the action positioning model acquires a corresponding alternative action set by inputting first state data, the evaluation model receives each alternative action parameter in the alternative action set, outputs an initial evaluation parameter corresponding to the alternative action parameter according to the alternative action parameter and the first state data, the initial evaluation parameter is used as a preliminary evaluation result for evaluating the feasibility of the alternative action parameter of the intelligent logistics robot under the environment state corresponding to the first state data, and then the evaluation model continues to determine the evaluation parameter according to the initial evaluation parameter and the historical evaluation parameter corresponding to the alternative action parameter, and the evaluation parameter indicates the dominance degree of the alternative movement action corresponding to the alternative action parameter.

The action positioning model and the evaluation model can be connected in the same neural network structure, the action positioning model can be obtained by improving the Actor network structure in the DDPG algorithm, and the evaluation model can be obtained by improving the Critic network structure in the neural network structure of the DDPG algorithm. The Actor network only comprises a full-connection layer, and the action positioning model is obtained by adopting a first long-short-time memory network layer and a plurality of batch normalization layers to replace part of the full-connection layer.

The Long Short-Term Memory network (LSTM) is a cyclic neural network with a specific structure, the output of the Long-Term Memory network layer needs to consider not only the input of the previous moment but also the previous information, and by introducing three gating mechanisms, the information of each moment is judged by using the gating mechanisms, and the information is regulated and updated in good time, so that the previous information can be fully utilized, and the method is suitable for processing and predicting the application with longer time sequence. The first long-short-time memory network is used as a first network layer of the action positioning model, the characteristics that the long-short-time memory network can memorize past state data of the intelligent logistics robot are utilized to process the first state data, the past state data of the intelligent logistics robot and the first state data obtained at the current moment can be synthesized, and the correlation between the front and back alternative movement parameters of the intelligent logistics robot is ensured. And the batch normalization layer can scale and translate the input data of each layer, so that the input value of the activation function is more suitable, the stability of a path planning algorithm is facilitated, the path planning efficiency is improved, and the passing efficiency of the robot is improved.

The intelligent logistics robot is trained through a simulation environment which is identical to the target site configuration, and the training process adopts a neural network comprising an action positioning model and an evaluation model to complete path planning of the intelligent logistics robot from a departure point to a target position. And forming one or more of each piece of status data generated by interaction between the intelligent logistics robot and the environment, all action parameters executed by the intelligent logistics robot when collision-free running is realized under the status data, the status data corresponding to the environment status of the intelligent logistics robot after the corresponding action parameters are executed, initial evaluation parameters corresponding to each action parameter and historical evaluation parameters corresponding to each action parameter into an experience sample, storing the experience sample in a designated parameter storage space, and performing a great amount of training on each model parameter by the action positioning model and the evaluation model through the experience sample in the designated parameter storage space until the preset maximum training times are reached.

The motion parameters include a linear velocity parameter and an angular velocity parameter, and the linear velocity parameter is understood to be the velocity of linear motion along a dimension axis from the front wheel of the intelligent logistics robot to the rear wheel of the intelligent logistics robot on the plane where the chassis of the intelligent logistics robot is located. The angular velocity parameter refers to the angular velocity of the intelligent logistics robot rotating around a dimension axis perpendicular to the plane of the intelligent logistics robot chassis. Here, the first state data is input into the trained action positioning model, at least one alternative action parameter can be output, the output alternative data is formed into an alternative action set, and one alternative action parameter can be used for representing one alternative movement action of the intelligent logistics robot.

Step S106, inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer;

In one mode, the evaluation model can be obtained by improving a Critic network in a DDPG algorithm neural network structure, the evaluation model also comprises a long-short-time memory network layer, the traditional Critic network only comprises a full-connection layer, and the evaluation model adopts a second long-short-time memory network layer and an advantage evaluation module to replace part of the full-connection layer, so that the action positioning model is obtained. In one mode, the advantage evaluation module may be a duel network module, where the duel network module is configured to comprehensively evaluate the advantage degree of the first environmental state and the advantage degree of the environmental state of the intelligent logistics robot after executing the alternative movement corresponding to the alternative movement parameter by using the state evaluation function and the movement evaluation function. The initial evaluation parameter is a result output after the candidate motion parameter and the candidate motion parameter are input into the evaluation model, and is a preliminary evaluation result of feasibility of executing the candidate movement corresponding to the candidate motion parameter in the first environmental state.

Considering whether an alternative movement action corresponding to an alternative action parameter has an execution advantage, not only the advantage degree of the current environmental state of the intelligent logistics robot and the advantage degree of the environmental state of the intelligent logistics robot after the alternative movement action is executed, but also the historical execution result after the intelligent logistics robot executes the alternative movement action when the intelligent logistics robot is in the same environmental state before. And determining the history evaluation parameters according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment. Setting a punishment and punishment table, wherein the punishment and punishment table stores the punishment values of different action parameters executed by the intelligent logistics robot under each state data, and the punishment values are determined according to the execution results of the action parameters under the state data and are feedback of the actual execution results of the intelligent logistics robot; the reward value is positive or negative, for example, if the intelligent logistics robot reaches the target position or is located in the safety area after executing the action parameter under certain state data, a corresponding positive number of reward values can be given; if the intelligent logistics robot encounters an obstacle or is in a dangerous area after executing the action parameters under the same state data, punishment is needed, and a corresponding negative rewarding value can be given. When the intelligent logistics robot starts to move in the target site, acquiring state data of an intelligent logistics robot input action positioning model and action parameters executed by the intelligent logistics robot, determining the reward value of the action parameters executed in the state data, and adding the accumulated reward value of the action parameters executed in the state data and the reward value of the action parameters to obtain an updated accumulated reward value, so that the accumulated reward value of the action parameters is continuously updated in the corresponding state data of the intelligent logistics robot in the moving process of the target site. For each alternative action parameter, the accumulated reward value of the alternative action parameter executed by the intelligent logistics robot in the first environment state before the current moment is taken as a historical evaluation parameter.

In the step, aiming at each alternative action parameter in an alternative action set output by the action positioning model, forming a group of alternative action parameters and first state data, inputting an evaluation model, and outputting initial evaluation parameters corresponding to the alternative action parameters; and then, performing function operation on the initial evaluation parameters corresponding to the alternative action parameters and the historical evaluation parameters corresponding to the alternative action parameters to obtain the evaluation parameters corresponding to the alternative action parameters.

Step S108, the evaluation parameters are sent to the action positioning model, and based on the evaluation parameters, the action positioning model determines target action parameters from the alternative action set and controls the intelligent logistics robot to move to the target position according to target movement actions corresponding to the target action parameters.

The method comprises the steps of sending evaluation parameters to an action positioning model, obtaining the evaluation parameters corresponding to each candidate action parameter by the action positioning model, determining the candidate action parameter corresponding to the maximum evaluation parameter from a candidate action set, determining the candidate action parameter corresponding to the maximum evaluation parameter as a target action parameter, and controlling the intelligent logistics robot to move to a target position according to a target movement action corresponding to the target action parameter.

If the intelligent logistics robot does not reach the target place after moving, the intelligent logistics robot needs to continuously acquire the state data and output the action parameters under the state data according to the action positioning model and the evaluation model, and the intelligent logistics robot continuously moves in the target place until reaching the target position.

According to the route planning method in the vision-based logistics system, the target position and the first state data of the intelligent logistics robot are obtained, wherein the first state data indicate the first environment state of the intelligent logistics robot at the current moment; inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer; inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer; and sending the evaluation parameters to an action positioning model, and determining target action parameters from the alternative action set based on the evaluation parameters by the action positioning model, and controlling the intelligent logistics robot to move to the target position according to target movement actions corresponding to the target action parameters.

According to the method, the target position of the intelligent logistics robot and first state data indicating a first environment state of the intelligent logistics robot are obtained, the first state data are input into an action positioning model comprising a first long-short-term memory network layer to obtain alternative action sets, then, for each alternative action parameter in the alternative action sets, each alternative action parameter and the first state data are input into an evaluation model comprising a second long-short-term memory network layer, initial evaluation parameters corresponding to the alternative action parameters are output, evaluation parameters corresponding to the alternative action parameters are obtained according to the initial evaluation parameters and the historical evaluation parameters corresponding to the alternative action parameters, finally, the target action parameters are determined according to the evaluation parameters, and the intelligent logistics robot is controlled to move to the target position according to target movement actions corresponding to the target action parameters.

In one mode, the first state data is determined based on a plurality of historical motion parameters, a detection result parameter of the visual detection module, a first distance parameter, a first angle parameter, a first yaw angle, and a second angle parameter; the historical action parameters comprise historical linear speed parameters and historical angular speed parameters of the intelligent logistics robot; the detection result parameters are determined according to the detection results of the visual detection module in each angle dimension; the first distance parameter indicates the distance between the current position of the intelligent logistics robot and the target position; the first angle parameter is used for indicating an included angle between a designated position on the intelligent logistics robot and a target position; the first yaw angle indicates a yaw angle of the intelligent logistics robot; the second angle parameter is determined by the first angle parameter and the first yaw angle.

The historical action parameters comprise historical linear velocity parameters and historical angular velocity parameters, wherein the historical linear velocity parameters are linear velocity parameters corresponding to the last executed movement action of the intelligent logistics robot before the current moment; the historical angular velocity parameter is the linear velocity parameter corresponding to the last execution of the moving action by the intelligent logistics robot before the current moment.

The detection result parameter of the visual detection module may be a distance between the obstacle and the visual detection module in each angular dimension of the visual detection module, for example, taking the visual detection module including 10 angular dimensions as an example, the detection result parameter T may be denoted as [t₁,t₂,t₃,t₄,t₅,t₆,t₇,t₈,t₉,t₁₀],, where ,t₁、t₂、t₃、t₄、t₅、t₆、t₇、t₈、t₉、t₁₀ is a distance between the obstacle detected by the visual detection module and the visual detection module in 10 angular dimensions within a range of-90 degrees to 90 degrees, respectively, in front of the visual detection module.

The first distance parameter indicates a distance between a current position and a target position of the intelligent logistics robot, in one manner, the first distance parameter may be determined according to a diagonal line of the current position, the target position and the target site map of the intelligent logistics robot, and an example is provided that a calculation formula of the first distance parameter D is provided: d=l/h, where L is the distance between the current position of the intelligent logistics robot in the target site and the target position, and h is the diagonal distance of the target site map.

And establishing a three-dimensional coordinate axis in the target site, wherein the X axis and the Y axis are on the plane of the target site and are mutually perpendicular, and the Z axis is perpendicular to the plane of the target site. The first angle parameter is used for indicating an included angle between a designated position on the intelligent logistics robot and a target position, and in one mode, the first angle parameter can be an included angle formed by a connecting line of a projection center of the vision detection module on the target field on the intelligent logistics robot and a chassis center of the intelligent logistics robot and a connecting line of the target position and the chassis center of the intelligent logistics robot. The first yaw angle is an included angle formed by a connecting line of the projection center of the visual detection module on the target field and the center of the chassis of the intelligent logistics robot and a Y axis. The second angle parameter is an absolute value of the first yaw angle and the first angle parameter after the angle difference.

In actual implementation, the first state data may be determined based on the historical motion parameters, the detection result parameter of the visual detection module, the first distance parameter, the first angle parameter, the first yaw angle and the second angle parameter, or may be determined based on some of the above parameters. In one mode, the first state data is 16 parameter sets composed of historical action parameters, detection result parameters of the visual detection module, first distance parameters, first angle parameters, first yaw angles and second angle parameters, and the first state data is _t= [V_t-1,W_t-1,T,D,φ₁,A,φ₂, wherein S _t is the first state data, and V _t-1、W_t-1 is a historical linear velocity parameter and a historical angular velocity parameter respectively; t, D, phi ₁、A、φ₂ are respectively the detection result parameter, the first distance parameter, the first angle parameter, the first yaw angle and the second angle parameter of the vision detection module.

The following embodiments provide implementations for obtaining a set of alternative actions.

In one mode, the action positioning model consists of a first long-time and short-time memory network layer, a first full-connection layer, a second full-connection layer and a plurality of batch normalization layers; the first full-connection layer and the second full-connection layer are respectively arranged between the two batch normalization layers; the activation functions of the first full connection layer and the second full connection layer are piecewise linear activation functions; the first full-connection layer and the second full-connection layer have different node numbers; the plurality of batch normalization layers comprises a first batch normalization layer, a second batch normalization layer and a third batch normalization layer; inputting the first state data into a first long-short-time memory network layer and outputting first output data; inputting the first output data into a first batch of normalization layers for normalization processing to obtain first normalization data; inputting the first standardized data into a first full-connection layer, performing correlation analysis by using channel characteristics of different channels in the first full-connection layer, and automatically performing weighted summation on the first standardized data to obtain first summation data; inputting the first summation data into a second batch of normalization layers for normalization processing to obtain second normalization data; inputting the second standardized data into a second full-connection layer, performing correlation analysis by using channel characteristics of different channels in the second full-connection layer, and automatically performing weighted summation on the second standardized data to obtain second summation data; and inputting the second summation data into a third batch of normalization layers for normalization processing to obtain at least one alternative action parameter, and determining an alternative action set corresponding to the first state data based on the alternative action parameter.

In one mode, as shown in fig. 3, the action positioning model is composed of a first long-short time memory network layer, a first full-connection layer, a second full-connection layer and 3 batch normalization layers, in the action positioning model, the first network layer is the first long-short time memory network layer, the third network layer and the fifth network layer are respectively the first full-connection layer and the second full-connection layer, the node numbers are respectively 400 and 300, and the activation function is a piecewise linear activation function; the second network layer, the fourth network layer and the sixth network layer are respectively a first batch normalization layer, a second batch normalization layer and a third batch normalization layer and are used for carrying out standardization processing on input data so as to ensure the stability of a network algorithm.

The method comprises the steps that training of an experience sample in advance is completed, and after an action positioning model network receives input first state data, a first long-short-time memory network layer firstly processes the first state data and outputs first output data; inputting the first output data into a first batch of normalization layers for normalization processing to obtain first normalization data, inputting the first normalization data into a first full-connection layer, performing correlation analysis by using channel characteristics of different channels in the first full-connection layer, automatically performing weighted summation on the first normalization data to obtain first summation data, and inputting the first summation data into a second batch of normalization layers for normalization processing to obtain second normalization data; and inputting second standardized data into a second full-connection layer, performing correlation analysis by utilizing channel characteristics of different network layer channels in the second full-connection layer, automatically weighting and summing the second standardized data to obtain second summed data, then inputting the second summed data into a third batch of normalization layers for normalization processing to obtain at least one alternative action parameter, wherein the one alternative action parameter comprises an alternative linear velocity parameter and an alternative angular velocity parameter, and collecting and outputting the at least one alternative action parameter to form an alternative action set corresponding to the first state data.

In the traditional network based on DDPG algorithm, the Actor network only comprises a fully-connected network layer, the action positioning model provided by the invention adopts a long-short-time memory network layer and a batch normalization layer to replace part of the fully-connected network layer, the time sequence data is processed by utilizing the stronger memory capacity of the long-short-time memory network layer, the influence of the history movement action on the current environmental state at the moment can be considered when the decision problem is solved, the neural network has long-term prediction capacity, more reasonable alternative linear velocity parameters and alternative angular velocity parameters can be output, and the batch normalization layer can scale and translate the input data of each layer, so that the input value of an activation function is more suitable, the stability of the network layer algorithm is more facilitated, and the path planning efficiency is improved.

In one mode, the assessment model may consist of a second long and short term memory network layer, a third full connection layer, and an advantage assessment module; inputting the first state data into a second long-short-time memory network layer, and outputting second output data; inputting the alternative action parameters into a third full-connection layer for each alternative action parameter in the alternative action set, and outputting third output data; the advantage evaluation module inputs the second output data and the third output data, outputs a state evaluation value and a motion evaluation value, and the state evaluation value is used for indicating the advantage degree of the first environment state of the intelligent logistics robot; the action evaluation value is used for indicating the dominance degree of the second environment state of the intelligent logistics robot after executing the alternative movement action corresponding to the alternative action parameter; based on the state evaluation value and the action evaluation value, an initial evaluation parameter corresponding to each candidate action parameter in the candidate action set is determined.

In one manner, as shown in fig. 4, the evaluation model includes a second long-short time memory network layer, a third fully-connected layer, and an advantage evaluation module, where the activation function of the third fully-connected layer is a piecewise linear activation function.

Specifically, first, after the candidate action parameters and the first state data are input into the evaluation model, the second long-short-time memory network layer processes the input first state data, outputs second output data, and the third full-connection layer processes the input candidate action parameters, and outputs third output data.

Further, the second output data and the third output data are input together to the advantage evaluation module, and the output state evaluation value and the action evaluation value are output. The advantage evaluation module may be a duel network, where the duel network includes a state evaluation function and an action evaluation function, where the state evaluation function may be denoted by V (S) to evaluate the advantage degree of the current environmental state, where the input value S is state data corresponding to the environmental state where the robot is located, and when S is the first state data, the state evaluation function may output a state evaluation value, where the state evaluation value is a numerical value, to indicate the advantage degree of the first environmental state where the intelligent logistics robot is located. The above action evaluation function is used for evaluating the dominance degree of the second environmental state where the intelligent logistics robot is located after executing the alternative movement action corresponding to the alternative action parameter, and may be represented by C (S, a), where the input value S is state data corresponding to the environmental state where the intelligent logistics robot is located, and C is an optional action parameter of the intelligent logistics robot; when S is the first state data and C is an alternative motion parameter of the intelligent logistics robot, the motion estimation function may output a motion estimation value indicating the dominance degree of the second environmental state to which the intelligent logistics robot moves if the alternative movement motion is performed by the intelligent logistics robot. The second environmental state refers to an environmental state in which the intelligent logistics robot is located after the alternative movement action is executed.

Finally, after the state evaluation value and the action evaluation value are output, initial evaluation parameters corresponding to each candidate action parameter in the candidate action set are determined according to the state evaluation value and the action evaluation value, different weights can be given to the state evaluation value and the action evaluation value, the weight sum of the state evaluation value and the action evaluation value is calculated, the initial evaluation parameters corresponding to each candidate action parameter are obtained, the state evaluation value and the action evaluation value are used as one factor for evaluating the dominance degree of the candidate action under the first environment state, the environment states of the intelligent logistics robot before and after the candidate action are executed are comprehensively considered, the dominance evaluation values of the environment states are used as one factor for selecting the candidate action, the planned route of the intelligent logistics robot is safer and more reliable, and the traffic efficiency of the robot is improved.

In a specific implementation manner, the evaluation parameters corresponding to each alternative action parameter are sent to an action positioning model, and the action positioning model determines the maximum evaluation parameter from the evaluation parameters; determining an alternative motion parameter corresponding to the maximum evaluation parameter from the alternative motion set, and determining the alternative motion parameter corresponding to the maximum evaluation parameter as a target motion parameter; and controlling the intelligent logistics robot to move to the target position according to the target movement action corresponding to the target action parameter.

That is, the evaluation model may further perform addition and other functional operations on the initial evaluation parameters and the historical evaluation parameters corresponding to each candidate action parameter in the candidate action set to obtain the evaluation parameters corresponding to each candidate action parameter, where the evaluation parameters indicate the dominance degree of the intelligent logistics robot to execute the candidate movement corresponding to the candidate action parameter in the first environmental state. And sending the evaluation parameters corresponding to each candidate action parameter to an action positioning model, determining the maximum evaluation parameter from at least one evaluation parameter by the action positioning model, determining the candidate action parameter corresponding to the maximum evaluation parameter as a target action parameter, and controlling the intelligent logistics robot to move to a target position according to a target movement action corresponding to the target action parameter.

In one mode, determining a reward value based on an execution result of the intelligent logistics robot executing the target movement action; the magnitude of the rewarding value indicates the advantages and disadvantages of the execution result of the intelligent logistics robot; and updating the historical evaluation parameters corresponding to the target action parameters based on the reward values.

After the intelligent logistics robot executes the target moving action, a reward value is determined according to the execution result, wherein the reward value is feedback to the actual execution result of the intelligent logistics robot, and the magnitude of the reward value indicates the advantages and disadvantages of the execution result of the intelligent logistics robot for executing the target moving action. And then, after determining the rewarding value, updating the historical evaluation parameter corresponding to the target action parameter according to the rewarding value.

That is, after the intelligent logistics robot performs the movement action a in the first environmental state corresponding to the first state data, determining the reward value according to the execution result and updating the history evaluation parameter corresponding to the movement action a to a in the first state data through addition calculation. If the intelligent logistics robot does not reach the target place yet, the intelligent logistics robot needs to continuously acquire the state data and output the motion parameters according to the motion positioning model and the evaluation model so as to continuously move in the target place, if the state data acquired next time is the same as the first state data, the motion parameters corresponding to the moving motion A can be used as one of the alternative motion parameters, and at the moment, the evaluation parameters need to be calculated according to the evaluation model by superposing the initial evaluation parameters and the historical evaluation parameters a.

The following embodiments provide specific implementations of determining prize values.

The execution result comprises: the intelligent logistics robot reaches a target position, the intelligent logistics robot touches an obstacle in the process of executing a target moving action, the intelligent logistics robot does not touch the obstacle and is positioned in a safe area, the intelligent logistics robot does not touch the obstacle but is positioned in a dangerous area, and the intelligent logistics robot does not touch the obstacle and is positioned in a designated area; the safety area and the target position are smaller than a first preset distance threshold value, and the obstacle position is larger than a second preset distance threshold value; the dangerous area and the obstacle position are not more than a third preset distance threshold value; designating an area as an area other than a safe area and a dangerous area; if the intelligent logistics robot reaches the target position, determining that the rewarding value is the maximum increment value; if the intelligent logistics robot touches an obstacle in the process of executing the target moving action, determining that the rewarding value is the maximum reduction value; if the intelligent logistics robot does not touch the obstacle and is located in the safety area, acquiring a first distance between the current position and the target position of the intelligent logistics robot, and acquiring a reward value based on the first distance and a first preset distance threshold; if the intelligent logistics robot does not touch the obstacle but is located in the dangerous area, acquiring a second distance between the current position of the intelligent logistics robot and the position of the obstacle, and determining a reward value based on the second distance and a third preset distance threshold; if the intelligent logistics robot does not touch the obstacle and is located in the designated area, a first distance between the current position of the intelligent logistics robot and the target position and a third distance between the current position of the intelligent logistics robot and the starting position of the intelligent logistics robot in the target site are obtained, and a reward value is determined based on the third distance and the first distance.

The reward value is designed by considering not only that the intelligent logistics robot can reach the target position through the optimal path, but also the safety of the intelligent logistics robot.

Here, a first preset distance threshold value, a second preset distance threshold value, and a third preset distance threshold value are set in advance as required, and the target site is divided into a dangerous area, a safe area, and a designated area according to the first preset distance threshold value, the second preset distance threshold value, and the third preset distance threshold value. The safety area and the target position are smaller than a first preset distance threshold value, and the obstacle position is larger than a second preset distance threshold value; the dangerous area and the obstacle position are not more than a third preset distance threshold value; the designated area is an area other than the safe area and the dangerous area in the target site.

Thus, all possible results after the intelligent logistics robot performs the target action are classified into 5 categories, including: the intelligent logistics robot reaches a target position, the intelligent logistics robot touches an obstacle in the process of executing a target moving action, the intelligent logistics robot does not touch the obstacle and is positioned in a safe area, the intelligent logistics robot does not touch the obstacle but is positioned in a dangerous area, and the intelligent logistics robot does not touch the obstacle and is positioned in a designated area;

Further, a prize value is determined based on the outcome of the executing.

Specifically, if the intelligent logistics robot reaches the target position, determining that the reward value is the maximum increment value. The specified increment is a preset positive number, that is, if the intelligent logistics robot directly reaches the target position after executing the target action, the maximum positive reward of the target action is given to the first state data.

If the intelligent logistics robot touches an obstacle in the process of executing the target moving action, determining that the rewarding value is the maximum reduction value; the maximum reduction value is a negative number, and the absolute value thereof is the maximum value of the rewards, that is, if the intelligent logistics robot performs the object moving action in the process of executing the object moving action, the object moving action is endowed with the maximum negative rewards under the first state data.

If the intelligent logistics robot does not touch the obstacle and is located in the safety area, a first distance between the current position and the target position of the intelligent logistics robot is obtained, and a reward value is obtained based on the first distance and a first preset distance threshold. The first distance is a distance between a current position of the intelligent logistics robot and a target position, and in one mode, if the intelligent logistics robot does not touch an obstacle and is located in a safety area, under the first state data, a reward value of the target action parameter can be a numerical result obtained by dividing the first distance by a first preset distance threshold.

If the intelligent logistics robot does not touch the obstacle but is located in the dangerous area, a second distance between the current position of the intelligent logistics robot and the position of the obstacle is obtained, and a reward value is determined based on the second distance and a third preset distance threshold. The second distance is a distance between a current position of the intelligent logistics robot and a position of the obstacle, and in one mode, if the intelligent logistics robot does not touch the obstacle but is located in a dangerous area, in the first state data, the reward value of the target action parameter may be an inverse number of a numerical result obtained by dividing the second distance by a third preset distance threshold.

If the intelligent logistics robot does not touch the obstacle and is located in the designated area, a first distance between the current position of the intelligent logistics robot and the target position and a third distance between the current position of the intelligent logistics robot and the starting position of the intelligent logistics robot in the target site are obtained, and a reward value is determined based on the third distance and the first distance. The first distance is a distance between a current position of the intelligent logistics robot and a target position, and the third distance is a distance between the current position of the intelligent logistics robot and a starting position of the intelligent logistics robot in the target site.

The design of the rewarding value not only considers that the intelligent logistics robot can reach the target position through the optimal path, but also considers the safety of the intelligent logistics robot, and improves the safety passing efficiency of the logistics robot.

Corresponding to the above method embodiment, referring to fig. 5, a path planning device in a vision-based logistics system is shown, where the device is disposed in a wearable device, and the device includes:

a first obtaining module 502, configured to obtain a target position and first state data of the intelligent logistics robot, where the first state data indicates a first environmental state where the intelligent logistics robot is located at a current moment;

The first input module 504 is configured to input the first state data to the action positioning model, so as to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer;

A second input module 506, configured to input, for each candidate action parameter in the candidate action set, each candidate action parameter and the first state data into the evaluation model, and output an initial evaluation parameter corresponding to the candidate action parameter; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer;

The first movement module 508 is configured to send the evaluation parameter to the action positioning model, determine a target action parameter from the candidate action set based on the evaluation parameter, and control the intelligent logistics robot to move towards the target position according to a target movement corresponding to the target action parameter.

According to the method, the target position of the intelligent logistics robot and first state data indicating a first environment state of the intelligent logistics robot are obtained, the first state data are input into an action positioning model comprising a first long-short-time memory network layer to obtain alternative action sets, then, for each alternative action parameter in the alternative action sets, each alternative action parameter and the first state data are input into an evaluation model comprising a second long-short-time memory network layer, initial evaluation parameters corresponding to the alternative action parameters are output, evaluation parameters corresponding to the alternative action parameters are obtained according to the initial evaluation parameters and the historical evaluation parameters corresponding to the alternative action parameters, finally, the target action parameters are determined according to the evaluation parameters, the intelligent logistics robot is controlled to move to the target position according to target moving actions corresponding to the target action parameters.

The first state data is determined based on a plurality of historical action parameters, a detection result parameter of the visual detection module, a first distance parameter, a first angle parameter, a first yaw angle and a second angle parameter; the historical action parameters comprise historical linear speed parameters and historical angular speed parameters of the intelligent logistics robot; the detection result parameters are determined according to the detection results of the visual detection module in each angle dimension; the first distance parameter indicates the distance between the current position of the intelligent logistics robot and the target position; the first angle parameter is used for indicating an included angle between a designated position on the intelligent logistics robot and a target position; the first yaw angle indicates a yaw angle of the intelligent logistics robot; the second angle parameter is determined by the first angle parameter and the first yaw angle.

The action positioning model consists of a first long-time and short-time memory network layer, a first full-connection layer, a second full-connection layer and a plurality of batch normalization layers; the first full-connection layer and the second full-connection layer are respectively arranged between the two batch normalization layers; the activation functions of the first full connection layer and the second full connection layer are piecewise linear activation functions; the first full-connection layer and the second full-connection layer have different node numbers; the plurality of batch normalization layers comprise a first batch normalization layer, a second batch normalization layer and a third batch normalization layer; the first input module is configured to input first state data to a first long-short-time memory network layer, and output first output data; inputting the first output data into a first batch of normalization layers for normalization processing to obtain first normalization data; inputting the first standardized data into a first full-connection layer, performing correlation analysis by using channel characteristics of different channels in the first full-connection layer, and automatically performing weighted summation on the first standardized data to obtain first summation data; inputting the first summation data into a second batch of normalization layers for normalization processing to obtain second normalization data; inputting the second standardized data into a second full-connection layer, performing correlation analysis by using channel characteristics of different channels in the second full-connection layer, and automatically performing weighted summation on the second standardized data to obtain second summation data; and inputting the second summation data into a third batch of normalization layers for normalization processing to obtain at least one alternative action parameter, and determining an alternative action set corresponding to the first state data based on the alternative action parameter.

The evaluation model consists of a second long-short-time memory network layer, a third full-connection layer and an advantage evaluation module; the second input module is configured to input the first state data into a second long-short-time memory network layer, and output second output data; inputting the alternative action parameters into a third full-connection layer for each alternative action parameter in the alternative action set, and outputting third output data; the advantage evaluation module inputs the second output data and the third output data, outputs a state evaluation value and a motion evaluation value, and the state evaluation value is used for indicating the advantage degree of the first environment state of the intelligent logistics robot; the action evaluation value is used for indicating the dominance degree of the second environment state of the intelligent logistics robot after executing the alternative movement action corresponding to the alternative action parameter; based on the state evaluation value and the action evaluation value, an initial evaluation parameter corresponding to each candidate action parameter in the candidate action set is determined.

The first mobile module is further configured to send an evaluation parameter corresponding to each candidate action parameter to an action positioning model, where the action positioning model determines a maximum evaluation parameter from the evaluation parameters; determining an alternative motion parameter corresponding to the maximum evaluation parameter from the alternative motion set, and determining the alternative motion parameter corresponding to the maximum evaluation parameter as a target motion parameter; and controlling the intelligent logistics robot to move to the target position according to the target movement action corresponding to the target action parameter.

The device further comprises a first updating module, a second updating module and a third updating module, wherein the first updating module is used for determining a reward value based on an execution result of the intelligent logistics robot for executing the target movement action; the magnitude of the rewarding value indicates the advantages and disadvantages of the execution result of the intelligent logistics robot; and updating the historical evaluation parameters corresponding to the target action parameters based on the reward values.

The device also comprises a first updating module, a second updating module and a third updating module, wherein the first updating module is used for updating the first updating module; the execution result comprises: the intelligent logistics robot reaches a target position, the intelligent logistics robot touches an obstacle in the process of executing a target moving action, the intelligent logistics robot does not touch the obstacle and is positioned in a safe area, the intelligent logistics robot does not touch the obstacle but is positioned in a dangerous area, and the intelligent logistics robot does not touch the obstacle and is positioned in a designated area; the safety area and the target position are smaller than a first preset distance threshold value, and the obstacle position is larger than a second preset distance threshold value; the dangerous area and the obstacle position are not more than a third preset distance threshold value; designating the area as an area except for a safe area and a dangerous area in the target site; if the intelligent logistics robot reaches the target position, determining that the rewarding value is a designated increment value; if the intelligent logistics robot touches an obstacle in the process of executing the target moving action, determining that the rewarding value is a designated decrement value; if the intelligent logistics robot does not touch the obstacle and is located in the safety area, acquiring a first distance between the current position and the target position of the intelligent logistics robot, and acquiring a reward value based on the first distance and a first preset distance threshold; if the intelligent logistics robot does not touch the obstacle but is located in the dangerous area, acquiring a second distance between the current position of the intelligent logistics robot and the position of the obstacle, and determining a reward value based on the second distance and a third preset distance threshold; if the intelligent logistics robot does not touch the obstacle and is located in the designated area, a first distance between the current position of the intelligent logistics robot and the target position and a third distance between the current position of the intelligent logistics robot and the starting position of the intelligent logistics robot in the target site are obtained, and a reward value is determined based on the third distance and the first distance.

The embodiment also provides an electronic device, which comprises a processor and a memory, wherein the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to realize the path planning method in the vision-based logistics system. The electronic device may be a server or a terminal device.

Referring to fig. 6, the electronic device includes a processor 100 and a memory 101, the memory 101 storing machine executable instructions executable by the processor 100, the processor 100 executing the machine executable instructions to implement the above-described method of path planning in a vision-based logistics system.

Further, the electronic device shown in fig. 6 further includes a bus 102 and a communication interface 103, and the processor 100, the communication interface 103, and the memory 101 are connected through the bus 102. The memory 101 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 103 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 102 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 6, but not only one bus or type of bus. The processor 100 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 100 or by instructions in the form of software. The processor 100 may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks of the disclosure in the embodiments of the disclosure may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present disclosure may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 101, and the processor 100 reads the information in the memory 101 and, in combination with its hardware, performs the steps of the method of the previous embodiment.

The processor in the electronic device may implement the following operations of the path planning method in the vision-based logistics system by executing machine executable instructions: acquiring a target position and first state data of the intelligent logistics robot, wherein the first state data indicates a first environment state of the intelligent logistics robot at the current moment; inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer; inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer; and sending the evaluation parameters to an action positioning model, and determining target action parameters from the alternative action set based on the evaluation parameters by the action positioning model, and controlling the intelligent logistics robot to move to the target position according to target movement actions corresponding to the target action parameters.

The action positioning model consists of a first long-time and short-time memory network layer, a first full-connection layer, a second full-connection layer and a plurality of batch normalization layers; the first full-connection layer and the second full-connection layer are respectively arranged between the two batch normalization layers; the activation functions of the first full connection layer and the second full connection layer are piecewise linear activation functions; the first full-connection layer and the second full-connection layer have different node numbers; the plurality of batch normalization layers comprise a first batch normalization layer, a second batch normalization layer and a third batch normalization layer; the processor in the electronic device may implement the following operations of the path planning method in the vision-based logistics system by executing machine executable instructions: inputting the first state data into a first long-short-time memory network layer and outputting first output data; inputting the first output data into a first batch of normalization layers for normalization processing to obtain first normalization data; inputting the first standardized data into a first full-connection layer, performing correlation analysis by using channel characteristics of different channels in the first full-connection layer, and automatically performing weighted summation on the first standardized data to obtain first summation data; inputting the first summation data into a second batch of normalization layers for normalization processing to obtain second normalization data; inputting the second standardized data into a second full-connection layer, performing correlation analysis by using channel characteristics of different channels in the second full-connection layer, and automatically performing weighted summation on the second standardized data to obtain second summation data; and inputting the second summation data into a third batch of normalization layers for normalization processing to obtain at least one alternative action parameter, and determining an alternative action set corresponding to the first state data based on the alternative action parameter.

The evaluation model consists of a second long-short-time memory network layer, a third full-connection layer and an advantage evaluation module; the processor in the electronic device may implement the following operations of the path planning method in the vision-based logistics system by executing machine executable instructions: inputting the first state data into a second long-short-time memory network layer, and outputting second output data; inputting the alternative action parameters into a third full-connection layer for each alternative action parameter in the alternative action set, and outputting third output data; the advantage evaluation module inputs the second output data and the third output data, outputs a state evaluation value and a motion evaluation value, and the state evaluation value is used for indicating the advantage degree of the first environment state of the intelligent logistics robot; the action evaluation value is used for indicating the dominance degree of the second environment state of the intelligent logistics robot after executing the alternative movement action corresponding to the alternative action parameter; based on the state evaluation value and the action evaluation value, an initial evaluation parameter corresponding to each candidate action parameter in the candidate action set is determined.

The processor in the electronic device may implement the following operations of the path planning method in the vision-based logistics system by executing machine executable instructions: transmitting the evaluation parameters corresponding to each candidate action parameter to an action positioning model, and determining the maximum evaluation parameter from the evaluation parameters by the action positioning model; determining an alternative motion parameter corresponding to the maximum evaluation parameter from the alternative motion set, and determining the alternative motion parameter corresponding to the maximum evaluation parameter as a target motion parameter; and controlling the intelligent logistics robot to move to the target position according to the target movement action corresponding to the target action parameter.

The processor in the electronic device may implement the following operations of the path planning method in the vision-based logistics system by executing machine executable instructions: determining a reward value based on an execution result of the intelligent logistics robot executing the target movement action; the magnitude of the rewarding value indicates the advantages and disadvantages of the execution result of the intelligent logistics robot; and updating the historical evaluation parameters corresponding to the target action parameters based on the reward values.

The processor in the electronic device may implement the following operations of the path planning method in the vision-based logistics system by executing machine executable instructions: the execution result comprises: the intelligent logistics robot reaches a target position, the intelligent logistics robot touches an obstacle in the process of executing a target moving action, the intelligent logistics robot does not touch the obstacle and is positioned in a safe area, the intelligent logistics robot does not touch the obstacle but is positioned in a dangerous area, and the intelligent logistics robot does not touch the obstacle and is positioned in a designated area; the safety area and the target position are smaller than a first preset distance threshold value, and the obstacle position is larger than a second preset distance threshold value; the dangerous area and the obstacle position are not more than a third preset distance threshold value; designating the area as an area except for a safe area and a dangerous area in the target site; if the intelligent logistics robot reaches the target position, determining that the rewarding value is a designated increment value; if the intelligent logistics robot touches an obstacle in the process of executing the target moving action, determining that the rewarding value is a designated decrement value; if the intelligent logistics robot does not touch the obstacle and is located in the safety area, acquiring a first distance between the current position and the target position of the intelligent logistics robot, and acquiring a reward value based on the first distance and a first preset distance threshold; if the intelligent logistics robot does not touch the obstacle but is located in the dangerous area, acquiring a second distance between the current position of the intelligent logistics robot and the position of the obstacle, and determining a reward value based on the second distance and a third preset distance threshold; if the intelligent logistics robot does not touch the obstacle and is located in the designated area, a first distance between the current position of the intelligent logistics robot and the target position and a third distance between the current position of the intelligent logistics robot and the starting position of the intelligent logistics robot in the target site are obtained, and a reward value is determined based on the third distance and the first distance.

The present embodiment also provides a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the above-described method of path planning in a vision-based logistics system.

The machine-executable instructions stored on the machine-readable storage medium may implement, by executing the machine-executable instructions, the following operations in the path planning method in the vision-based logistics system: acquiring a target position and first state data of the intelligent logistics robot, wherein the first state data indicates a first environment state of the intelligent logistics robot at the current moment; inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative motion parameters are used for representing alternative movement motions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer; inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement action corresponding to the alternative action parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution results of the alternative movement actions corresponding to the execution alternative action parameters of the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short time memory network layer; and sending the evaluation parameters to an action positioning model, and determining target action parameters from the alternative action set based on the evaluation parameters by the action positioning model, and controlling the intelligent logistics robot to move to the target position according to target movement actions corresponding to the target action parameters.

The action positioning model consists of a first long-time and short-time memory network layer, a first full-connection layer, a second full-connection layer and a plurality of batch normalization layers; the first full-connection layer and the second full-connection layer are respectively arranged between the two batch normalization layers; the activation functions of the first full connection layer and the second full connection layer are piecewise linear activation functions; the first full-connection layer and the second full-connection layer have different node numbers; the plurality of batch normalization layers comprise a first batch normalization layer, a second batch normalization layer and a third batch normalization layer; the machine-executable instructions stored on the machine-readable storage medium may implement, by executing the machine-executable instructions, the following operations in the path planning method in the vision-based logistics system: inputting the first state data into a first long-short-time memory network layer and outputting first output data; inputting the first output data into a batch normalization layer for normalization processing to obtain first normalization data; inputting the first standardized data into a first full-connection layer, performing correlation analysis by using channel characteristics of different channels in the first full-connection layer, and automatically performing weighted summation on the first standardized data to obtain first summation data; inputting the first summation data into a batch normalization layer for normalization processing to obtain second normalization data; inputting the second standardized data into a second full-connection layer, performing correlation analysis by using channel characteristics of different channels in the second full-connection layer, and automatically performing weighted summation on the second standardized data to obtain second summation data; and inputting the second summation data into a batch normalization layer for normalization processing to obtain at least one alternative action parameter, and determining an alternative action set corresponding to the first state data based on the alternative action parameter.

The evaluation model consists of a second long-short-time memory network layer, a third full-connection layer and an advantage evaluation module; the machine-executable instructions stored on the machine-readable storage medium may implement, by executing the machine-executable instructions, the following operations in the path planning method in the vision-based logistics system: inputting the first state data into a second long-short-time memory network layer, and outputting second output data; inputting the alternative action parameters into a third full-connection layer for each alternative action parameter in the alternative action set, and outputting third output data; the advantage evaluation module inputs the second output data and the third output data, outputs a state evaluation value and a motion evaluation value, and the state evaluation value is used for indicating the advantage degree of the first environment state of the intelligent logistics robot; the action evaluation value is used for indicating the dominance degree of the second environment state of the intelligent logistics robot after executing the alternative movement action corresponding to the alternative action parameter; based on the state evaluation value and the action evaluation value, an initial evaluation parameter corresponding to each candidate action parameter in the candidate action set is determined.

The machine-executable instructions stored on the machine-readable storage medium may implement, by executing the machine-executable instructions, the following operations in the path planning method in the vision-based logistics system: transmitting the evaluation parameters corresponding to each candidate action parameter to an action positioning model, and determining the maximum evaluation parameter from the evaluation parameters by the action positioning model; determining an alternative motion parameter corresponding to the maximum evaluation parameter from the alternative motion set, and determining the alternative motion parameter corresponding to the maximum evaluation parameter as a target motion parameter; and controlling the intelligent logistics robot to move to the target position according to the target movement action corresponding to the target action parameter.

The machine-executable instructions stored on the machine-readable storage medium may implement, by executing the machine-executable instructions, the following operations in the path planning method in the vision-based logistics system: determining a reward value based on an execution result of the intelligent logistics robot executing the target movement action; the magnitude of the rewarding value indicates the advantages and disadvantages of the execution result of the intelligent logistics robot; and updating the historical evaluation parameters corresponding to the target action parameters based on the reward values.

The machine-executable instructions stored on the machine-readable storage medium may implement, by executing the machine-executable instructions, the following operations in the path planning method in the vision-based logistics system: the execution result comprises: the intelligent logistics robot reaches a target position, the intelligent logistics robot touches an obstacle in the process of executing a target moving action, the intelligent logistics robot does not touch the obstacle and is positioned in a safe area, the intelligent logistics robot does not touch the obstacle but is positioned in a dangerous area, and the intelligent logistics robot does not touch the obstacle and is positioned in a designated area; the safety area and the target position are smaller than a first preset distance threshold value, and the obstacle position is larger than a second preset distance threshold value; the dangerous area and the obstacle position are not more than a third preset distance threshold value; designating the area as an area except for a safe area and a dangerous area in the target site; if the intelligent logistics robot reaches the target position, determining that the rewarding value is a designated increment value; if the intelligent logistics robot touches an obstacle in the process of executing the target moving action, determining that the rewarding value is a designated decrement value; if the intelligent logistics robot does not touch the obstacle and is located in the safety area, acquiring a first distance between the current position and the target position of the intelligent logistics robot, and acquiring a reward value based on the first distance and a first preset distance threshold; if the intelligent logistics robot does not touch the obstacle but is located in the dangerous area, acquiring a second distance between the current position of the intelligent logistics robot and the position of the obstacle, and determining a reward value based on the second distance and a third preset distance threshold; if the intelligent logistics robot does not touch the obstacle and is located in the designated area, a first distance between the current position of the intelligent logistics robot and the target position and a third distance between the current position of the intelligent logistics robot and the starting position of the intelligent logistics robot in the target site are obtained, and a reward value is determined based on the third distance and the first distance.

The method, the device and the computer program product of the electronic device for path planning in the vision-based logistics system provided by the embodiment of the invention comprise a computer readable storage medium storing program codes, wherein the instructions included in the program codes can be used for executing the method described in the method embodiment, and specific implementation can be seen in the method embodiment and will not be repeated here.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In addition, in the description of embodiments of the present invention, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood by those skilled in the art in specific cases.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention for illustrating the technical solution of the present invention, but not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that the present invention is not limited thereto: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A route planning method in a logistics system based on vision is characterized in that the logistics system comprises an intelligent logistics robot; the intelligent logistics robot is located in a target site, and the target site is provided with an obstacle; the intelligent logistics robot interacts with the surrounding environment through the vision detection module; the method comprises the following steps:

Acquiring a target position and first state data of the intelligent logistics robot, wherein the first state data indicates a first environment state of the intelligent logistics robot at the current moment;

Inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative action parameters are used for representing alternative movement actions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer; the action positioning model consists of a first long-time and short-time memory network layer, a first full-connection layer, a second full-connection layer and a plurality of batch normalization layers; the first full-connection layer and the second full-connection layer are respectively arranged between the two batch normalization layers; the activation functions of the first full-connection layer and the second full-connection layer are piecewise linear activation functions; the first full-connection layer and the second full-connection layer have different node numbers; the plurality of batch normalization layers comprise a first batch normalization layer, a second batch normalization layer and a third batch normalization layer;

Inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting initial evaluation parameters corresponding to the alternative action parameters; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement corresponding to the alternative movement parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution result of executing the alternative movement action corresponding to the alternative action parameters by the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short-time memory network layer; the evaluation model consists of a second long-short-time memory network layer, a third full-connection layer and an advantage evaluation module;

The evaluation parameters are sent to the action positioning model, and based on the evaluation parameters, the action positioning model determines target action parameters from the alternative action set and controls the intelligent logistics robot to move to the target position according to target movement actions corresponding to the target action parameters;

the step of inputting the first state data to an action positioning model to obtain an alternative action set corresponding to the first state data includes:

Inputting the first state data to the first long-time and short-time memory network layer, and outputting first output data;

inputting the first output data into the first batch normalization layer for normalization processing to obtain first normalization data;

Inputting the first standardized data into the first full-connection layer, performing correlation analysis by using channel characteristics of different channels in the first full-connection layer, and automatically performing weighted summation on the first standardized data to obtain first summation data;

inputting the first summation data into the second batch normalization layer for normalization processing to obtain second normalization data;

Inputting the second standardized data into the second full-connection layer, performing correlation analysis by using channel characteristics of different channels in the second full-connection layer, and automatically performing weighted summation on the second standardized data to obtain second summation data;

Inputting the second summation data into the third batch normalization layer for normalization processing to obtain at least one alternative action parameter, and determining an alternative action set corresponding to the first state data based on the alternative action parameter;

Inputting each candidate action parameter and the first state data into an evaluation model for each candidate action parameter in the candidate action set, and outputting an initial evaluation parameter corresponding to the candidate action parameter, wherein the step comprises the steps of;

inputting the first state data into the second long-short time memory network layer and outputting second output data;

Inputting the alternative action parameters into the third full connection layer for each alternative action parameter in the alternative action set, and outputting third output data;

the advantage evaluation module inputs the second output data and the third output data, outputs a state evaluation value and a motion evaluation value, wherein the state evaluation value is used for indicating the advantage degree of a first environment state of the intelligent logistics robot; the action evaluation value is used for indicating the dominance degree of the second environment state of the intelligent logistics robot after executing the alternative movement action corresponding to the alternative action parameter;

And determining initial evaluation parameters corresponding to each alternative action parameter in the alternative action set based on the state evaluation value and the action evaluation value.

2. The method of claim 1, wherein the first status data is determined based on a plurality of historical motion parameters, a detection result parameter of the vision detection module, a first distance parameter, a first angle parameter, a first yaw angle, and a second angle parameter;

The historical action parameters comprise historical linear speed parameters and historical angular speed parameters of the intelligent logistics robot; the detection result parameters are determined according to detection results in each angle dimension of the visual detection module; the first distance parameter indicates the distance between the current position and the target position of the intelligent logistics robot; the first angle parameter is used for indicating an included angle between a designated position and a target position on the intelligent logistics robot; the first yaw angle indicates a yaw angle of the intelligent logistics robot; the second angle parameter is determined by the first angle parameter and the first yaw angle.

3. The method for path planning in a vision-based logistics system of claim 1, wherein the step of sending the evaluation parameters to the action positioning model, determining target action parameters from the candidate action set by the action positioning model based on the evaluation parameters, and controlling the intelligent logistics robot to move to a target position according to a target movement corresponding to the target action parameters comprises the steps of:

Transmitting the evaluation parameters corresponding to each candidate action parameter to an action positioning model, wherein the action positioning model determines the maximum evaluation parameter from the evaluation parameters; determining an alternative action parameter corresponding to the maximum evaluation parameter from the alternative action set, and determining the alternative action parameter corresponding to the maximum evaluation parameter as a target action parameter;

And controlling the intelligent logistics robot to move towards the target position according to the target movement action corresponding to the target action parameter.

4. The method for planning a path in a vision-based logistics system of claim 1, wherein after the step of controlling the intelligent logistics robot to move to a target position according to a target movement corresponding to the target movement parameter, the method comprises:

determining a reward value based on an execution result of the intelligent logistics robot executing the target movement action; the magnitude of the rewarding value indicates the advantages and disadvantages of the execution result of the intelligent logistics robot;

And updating the historical evaluation parameters corresponding to the target action parameters based on the reward values.

5. The method for path planning in a vision-based logistics system of claim 4, wherein the determining a prize value based on the result of the execution of the target movement by the intelligent logistics robot comprises:

The execution result comprises: the intelligent logistics robot reaches the target position, the intelligent logistics robot touches an obstacle in the process of executing the target moving action, the intelligent logistics robot does not touch the obstacle and is positioned in a safe area, the intelligent logistics robot does not touch the obstacle but is positioned in a dangerous area, and the intelligent logistics robot does not touch the obstacle and is positioned in a designated area;

The safety area and the target position are smaller than a first preset distance threshold, and the obstacle position is larger than a second preset distance threshold; the dangerous area and the obstacle position are not greater than a third preset distance threshold; the designated area is an area of the target site except the safe area and the dangerous area;

if the intelligent logistics robot reaches the target position, determining that the rewarding value is a designated increment value;

If the intelligent logistics robot touches an obstacle in the process of executing the target moving action, determining that the rewarding value is a designated decrement value;

if the intelligent logistics robot does not touch an obstacle and is located in the safety area, acquiring a first distance between the current position of the intelligent logistics robot and the target position, and acquiring a reward value based on the first distance and the first preset distance threshold;

if the intelligent logistics robot does not touch the obstacle but is positioned in the dangerous area, acquiring a second distance between the current position of the intelligent logistics robot and the position of the obstacle, and determining a reward value based on the second distance and the third preset distance threshold;

if the intelligent logistics robot does not touch an obstacle and is located in the designated area, a first distance between the current position of the intelligent logistics robot and the target position and a third distance between the current position of the intelligent logistics robot and the starting position of the intelligent logistics robot in the target site are obtained, and a reward value is determined based on the third distance and the first distance.

6. A vision-based logistics system path planning apparatus for implementing the vision-based logistics system path planning method of any one of claims 1-5, wherein the logistics system comprises an intelligent logistics robot; the intelligent logistics robot is located in a target site, and the target site is provided with an obstacle; the intelligent logistics robot interacts with the surrounding environment through the vision detection module; the device comprises:

The intelligent logistics robot comprises a first acquisition module, a second acquisition module and a first state acquisition module, wherein the first acquisition module is used for acquiring a target position and first state data of the intelligent logistics robot, the first state data indicates a first environment state of the intelligent logistics robot at the current moment;

The first input module is used for inputting the first state data into an action positioning model to obtain an alternative action set corresponding to the first state data; the alternative action set comprises at least one alternative action parameter; the alternative action parameters are used for representing alternative movement actions of the intelligent logistics robot; the alternative action parameters comprise alternative linear velocity parameters and alternative angular velocity parameters; the action positioning model comprises a first long-time and short-time memory network layer;

The second input module is used for inputting each alternative action parameter and the first state data into an evaluation model aiming at each alternative action parameter in the alternative action set, and outputting an initial evaluation parameter corresponding to the alternative action parameter; obtaining an evaluation parameter corresponding to the alternative action parameter based on the initial evaluation parameter corresponding to the alternative action parameter and the historical evaluation parameter corresponding to the alternative action parameter; the evaluation parameters are used for determining the dominance degree of the alternative movement corresponding to the alternative movement parameters executed by the intelligent logistics robot in the first environment state; the history evaluation parameters are determined according to the execution result of executing the alternative movement action corresponding to the alternative action parameters by the intelligent logistics robot in the first environment state before the current moment; the evaluation model comprises a second long-short-time memory network layer;

the first movement module is used for sending the evaluation parameters to the action positioning model, determining target action parameters from the alternative action set by the action positioning model based on the evaluation parameters, and controlling the intelligent logistics robot to move towards the target position according to target movement actions corresponding to the target action parameters.

7. An electronic device comprising a processor and a memory, the memory storing machine-executable instructions executable by the processor, the processor executing the machine-executable instructions to implement the method of path planning in a vision-based logistics system of any one of claims 1-5.

8. A machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of path planning in a vision-based logistics system of any one of claims 1-5.