CN108520327B

CN108520327B - Loading method and device of vehicle-mounted cargo and computer readable medium

Info

Publication number: CN108520327B
Application number: CN201810357903.3A
Authority: CN
Inventors: 金忠孝; 戴昌志
Original assignee: SAIC Motor Corp Ltd; Anji Automotive Logistics Co Ltd
Current assignee: SAIC Motor Corp Ltd; Anji Automotive Logistics Co Ltd
Priority date: 2018-04-19
Filing date: 2018-04-19
Publication date: 2021-03-23
Anticipated expiration: 2038-04-19
Also published as: CN108520327A

Abstract

A loading method and device of vehicle-mounted cargos and a computer readable medium are provided, wherein the loading method of the vehicle-mounted cargos comprises the following steps: generating a random number; determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network-based selection strategy; and selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action. By applying the scheme, different loading actions can be selected based on the neural network, so that the different loading actions can be evaluated in a probability statistics manner, the calculation speed is high, and the large-scale logistics loading problem can be solved.

Description

Loading method and device of vehicle-mounted cargo and computer readable medium

Technical Field

The embodiment of the invention relates to the field of solving combinatorial optimization problems, in particular to a loading method and device of vehicle-mounted cargos and a computer readable medium.

Background

For the logistics system, the loading method of the vehicle-mounted cargo is an important technical problem. The loading method of the vehicle-mounted goods is a boxing problem, and the boxing problem widely exists in various fields of industrial production, computer science and the like, such as fabric cutting in the clothing industry, container loading in the transportation industry, plate type blanking in the processing industry, layout in the printing industry, object packaging and arrangement in real life, and bottom layer operations of multiprocessor task scheduling, resource allocation, file allocation, memory management and the like in the computer field.

From the aspect of computational complexity, the binning problem is a Non-deterministic (NP) problem of Polynomial complexity, and it is difficult to solve an accurate global optimal solution, and a heuristic algorithm and a search algorithm are generally adopted for solving the problem. The idea of the heuristic algorithm is to find a heuristic rule that produces a feasible solution, in order to find an optimal or near optimal solution to the problem. The method has high solving efficiency, but a specific heuristic rule needs to be found for different problems, and the heuristic rule generally has no universality and is not suitable for other problems. For the packing problem, heuristic algorithms include a First adaptation (FF) algorithm, a Best adaptation (Best Fit, BF) algorithm, a genetic algorithm, a simulated annealing algorithm, a particle swarm algorithm, and the like. The search algorithm is to search in a solution space to find an optimal solution or an approximately optimal solution of the problem. This method does not guarantee the optimal solution to the problem, but if some heuristic knowledge is properly utilized, a better balance can be achieved in the quality and efficiency of the approximate solution.

The conventional loading method for vehicle-mounted cargos mainly adopts a heuristic algorithm to exhaust the boxing scheme in a limited space, is low in boxing rate and long in calculation time, and is not suitable for large-scale operation. The heuristic algorithm which is fast in solving, for example, the particle swarm algorithm can obtain the solution fast, but the solving quality is not high.

Therefore, the existing scheme cannot solve the problem of solving large-scale logistics loading.

Disclosure of Invention

The technical problem solved by the embodiment of the invention is how to solve the problem of large-scale logistics loading.

In order to solve the technical problem, an embodiment of the present invention provides a method for loading a vehicle-mounted cargo, where the method includes: generating a random number; determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network-based selection strategy; and selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action.

Optionally, the predetermined probability parameter decreases linearly with increasing number of execution steps.

Optionally, when the preset probability parameter is not greater than a preset parameter threshold, the preset probability parameter is set to be a preset fixed value.

Optionally, the determining a selection policy of the loading action based on the random number and a preset probability parameter includes: when the random number is smaller than the preset probability parameter, determining that the selection strategy of the loading action is a random selection strategy; and when the random number is not less than the preset probability parameter, determining that the selection strategy of the loading action is a selection strategy based on a neural network.

Optionally, the loading act comprises: identification information of goods to be put in, position information of the goods to be put in, and orientation information of the goods to be put in.

Optionally, the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.

Optionally, when the selection policy of the loading action is a selection policy based on a neural network, selecting a corresponding loading action based on the selection policy of the loading action includes: calculating BP neural network output values corresponding to different loading actions based on the BP neural network; and selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action.

Optionally, the output value of the BP neural network is: q (s, a; theta)_i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θ_iAnd the network parameters are the network parameters of the BP neural network.

Optionally, after selecting the corresponding loading action and placing the corresponding cargo, the method further includes: executing the loading action, and storing the loading action into a training set; updating the BP neural network parameters based on the training set.

Optionally, the updating the BP neural network parameters based on the training set comprises: extracting samples in the training set; calculating a goal based on the samplesThe output value of the target neural network is:

wherein r is an instantaneous feedback value for performing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, the optional placing position and the identification information of goods to be placed in, s' is a state set after the loading action is updated,

gamma is a preset second proportionality coefficient which is a network parameter of the target neural network,

used for evaluating the loading rate indexes of different a in different s; calculating the mean square error between the output value of the target neural network and the output value of the BP neural network as:

wherein the content of the first and second substances,

is the mean square error between the output value of the target neural network and the output value of the BP neural network;

and updating the parameters of the BP neural network based on the principle of minimum mean square error.

Optionally, the updating the parameters of the BP neural network based on the principle of minimum mean square error further includes: the partial derivative function of each neural network parameter is calculated by adopting a stochastic gradient descent algorithm as follows:

wherein

Is to theta_iPartial derivatives of (d);

a partial derivative function of the neural network parameter; and calculating and updating parameters of the BP neural network based on the partial derivative function.

Optionally, the loading method of the vehicle-mounted cargo further includes: updating the parameters of the target neural network based on the parameters of the BP neural network.

The embodiment of the invention provides a loading device for vehicle-mounted cargos, which comprises: a generating unit adapted to generate a random number; a determining unit adapted to determine a selection policy of the loading action based on the random number and a preset probability parameter, the selection policy of the loading action including any one of: a random selection strategy and a neural network-based selection strategy; and the placing unit is suitable for selecting the corresponding loading action and placing the corresponding goods based on the selection strategy of the loading action.

Optionally, the determining unit includes: a first determining subunit, adapted to determine, when the random number is smaller than the probability parameter, that the selection policy of the loading action is a random selection policy; a second determining subunit, adapted to determine that the selection policy of the loading action is a neural network-based selection policy when the random number is not less than the probability parameter.

Optionally, when the selection policy of the loading action is a neural network-based selection policy, the placing unit includes: the first calculating subunit is suitable for calculating BP neural network output values corresponding to different loading actions based on the BP neural network; and the selecting subunit is suitable for selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action. A placing subunit adapted to place the respective goods based on the corresponding loading actions.

Optionally, the output value of the BP neural network is: q (s, a; theta)_i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: the size of the enclosed space of the carriage, the optional placement position and the identification information of the box to be placed, theta_iAnd the network parameters are the network parameters of the BP neural network.

Optionally, after the placing unit selects the corresponding loading action and places the corresponding goods, the method further includes: the execution unit is suitable for executing the loading action and storing the loading action into a training set; a training unit adapted to update the BP neural network parameters based on the training set.

Optionally, the training unit comprises: an extraction subunit adapted to extract samples in the training set; a second calculation subunit adapted to calculate, based on the samples, an output value of the target neural network as:

wherein r is an instantaneous feedback value for performing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, optional placement positions and identification information of the boxes to be placed, s' is a state set after the loading action is updated,

gamma is a preset second proportion for the network parameter of the target neural networkThe coefficients of which are such that,

used for evaluating the loading rate indexes of different a in different s; a third computing subunit adapted to compute a mean square error between the output value of the target neural network and the output value of the BP neural network as:

wherein the content of the first and second substances,

is the mean square error between the output value of the target neural network and the output value of the BP neural network; and the updating subunit is suitable for updating the parameters of the BP neural network based on the principle of minimum mean square error.

Optionally, the update subunit includes: the calculation module is suitable for calculating the partial derivative function of each neural network parameter by adopting a stochastic gradient descent algorithm as follows:

wherein

Is to theta_iPartial derivatives of (d);

a partial derivative function of the neural network parameter; and the updating module is suitable for calculating and updating the parameters of the BP neural network based on the partial derivative function.

Optionally, the device for loading vehicle-mounted cargo is characterized by further comprising: and the updating unit is suitable for updating the parameters of the target neural network based on the parameters of the BP neural network.

An embodiment of the present invention provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and has stored thereon computer instructions, where the computer instructions, when executed, perform any of the steps of the method described above.

The embodiment of the invention provides a loading device for vehicle-mounted cargos, which comprises a memory and a processor, wherein computer instructions capable of being operated on the processor are stored on the memory, and the processor executes any one of the steps of the method when executing the computer instructions.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

according to the embodiment of the invention, the selection strategy of the loading action is determined to be a random selection strategy or a selection strategy based on a neural network based on the random number and the preset probability parameter, and then the corresponding loading action is selected and the corresponding goods are placed based on the selection strategy of the loading action. Because different loading actions can be selected based on the neural network, different loading actions can be evaluated in a probability statistic manner, the calculation speed is high, and the large-scale logistics loading problem can be solved.

Furthermore, the neural network can be designed into a function-like form, so that multi-core and multi-thread parallel operation can be realized, and the calculation speed is further improved.

Furthermore, the loading action comprises identification information of the goods to be placed, position information of the goods to be placed and orientation information of the goods to be placed, so that the conditions of changeability of carriages, goods and the like can be met, the method is suitable for various scenes, and the universality of the loading method of the vehicle-mounted goods is improved.

Drawings

Fig. 1 is a flowchart of a loading method for vehicle-mounted cargo according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for selecting a loading action based on a neural network according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a loading device for vehicle-mounted cargo according to an embodiment of the present invention.

Detailed Description

The conventional loading method for vehicle-mounted cargos mainly adopts a heuristic algorithm to exhaust the boxing scheme in a limited space, is low in boxing rate and long in calculation time, and is not suitable for large-scale operation. The heuristic algorithm which is fast in solving, for example, the particle swarm algorithm can obtain the solution fast, but the solving quality is not high. Therefore, the existing scheme cannot solve the problem of solving large-scale logistics loading.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Referring to fig. 1, an embodiment of the present invention provides a loading method of vehicle-mounted cargo, which may include the following steps:

step S101 generates a random number.

In particular implementations, the selection strategy for different loading actions may be determined based on random numbers.

Step S102, determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network based selection strategy.

In a specific implementation, the selection strategy of the desired loading action may be selected by a roulette algorithm, or a greedy algorithm, based on the random number and a preset probability parameter. For example, when training begins, random selection strategies can be selected comparatively more due to imperfections in the neural network model; with the increase of the execution steps, the neural network model is continuously optimized, the accuracy is improved, and the neural network model can be selected more frequently to improve the loading rate of the loading action.

In a specific implementation, the predetermined probability parameter may be a value in the interval [0,1] and decreases linearly with the increase of the execution steps. For example, in step 0, the preset probability parameter is 1; in the step 1, the preset probability parameter is 0.9; … …, the predetermined probability parameter is 0.1.

In a specific implementation, as the number of execution steps increases, when the preset probability parameter is not greater than, i.e., less than or equal to the preset parameter threshold, the preset probability parameter may be set to a preset fixed value to ensure the learning efficiency of the neural network. For example, when the preset probability parameter is decreased to 0.1, it may be set to 0.1 to ensure the learning efficiency of the neural network.

It can be understood that the preset parameter threshold and the preset fixed value may be set to the same value or may be set to different values, which is not limited in the embodiment of the present invention.

In an embodiment of the present invention, the determining a selection policy of the loading action based on the random number and a preset probability parameter includes: when the random number is smaller than the preset probability parameter, determining that the selection strategy of the loading action is a random selection strategy; and when the random number is not less than the preset probability parameter, determining that the selection strategy of the loading action is a selection strategy based on a neural network.

In a specific implementation, the loading action may include: identification information of goods to be put in, position information of the goods to be put in, and orientation information of the goods to be put in. Based on the loading action, a particular cargo may be selected to be placed in its corresponding location.

It should be understood that the cargo may also be referred to by other names such as boxes, and the loading action may also be referred to by other names such as executing action and boxing action, all of which are within the scope of the present invention as long as the meanings are the same.

In a specific implementation, to improve the loading rate, the loading action may include at least one of the following constraint relationships: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.

In particular embodiments, the left cargo may be constrained to be higher than the right cargo, the right cargo may be constrained to be higher than the left cargo, or the right cargo may be as high as the left cargo, since there is always one case in which this constraint can be implemented from a probabilistic perspective.

Since the loading action includes: the identification information of the goods to be placed, the position information of the goods to be placed and the orientation information of the goods to be placed can meet the changeable conditions of carriages, goods and the like, are suitable for various scenes, and improve the universality of the loading method of the vehicle-mounted goods.

In a specific implementation, when the selection policy of the loading action is a selection policy based on a neural network, selecting a corresponding loading action based on the selection policy of the loading action may include: calculating BP neural network output values corresponding to different loading actions based on a Back Propagation (BP) neural network; and selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action.

In an embodiment of the present invention, the output value of the BP neural network is used to evaluate a loading rate indicator of a loading action set in a state set, and is defined as: feedback value Q (s, a; theta) of loading action_i) Wherein a is a loading action set, s is a current state set, and the state set includes: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θ_iAnd the network parameters are the network parameters of the BP neural network.

Because the traditional reinforcement learning algorithm uses a table to store an action estimation function, and the table storage is only suitable for the discrete (irrelevant) condition of the action estimation, and is not suitable for large-scale operation, the execution action can be evaluated in a probability statistic mode by adopting a BP neural network, namely, the feedback value of the execution action is estimated, and the traditional reinforcement learning algorithm is suitable for large-scale operation.

The neural network can be designed into a function-like form, so that multi-core and multi-thread parallel operation can be realized, and the calculation speed is further improved.

In a specific implementation, the loading Rate (Packing Rate) may be: the higher the loading rate, the larger the output value of the BP neural network.

In an embodiment of the present invention, the output value of the BP neural network is a product of an instantaneous loading rate and a preset first scaling factor.

And S103, selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action.

In a specific implementation, after selecting a corresponding loading action and placing a corresponding cargo, the loading action may be stored in a training set for training the neural network.

In an embodiment of the present invention, after selecting a corresponding loading action and placing a corresponding cargo, the method further includes executing the loading action, and storing the loading action into a training set; updating the BP neural network parameters based on the training set.

In a specific implementation, the stored data may be randomly selected from the training set based on a normal distribution rule to train the BP neural network, or the stored data may be randomly selected from the training set based on an evenly distributed rule to train the BP neural network, or the stored data may be randomly selected from the training set based on another distribution rule to train the BP neural network, which is not described herein again.

In specific implementation, in order to ensure the stability and convergence of the BP neural training, a target neural network may be further defined, and the BP neural network may be trained by updating the target neural network in stages.

In an embodiment of the present invention, the updating the BP neural network parameters based on the training set includes: extracting samples in the training set; based on the samples, calculating an output value of the target neural network as:

wherein r is an instantaneous feedback value for executing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, the optional placing position and the identification information of goods to be placed in, s' is a state set after the loading action is updated,

wherein the content of the first and second substances,

is the mean square error between the output value of the target neural network and the output value of the BP neural network. And updating the parameters of the BP neural network based on the principle of minimum mean square error.

In a specific implementation, Top N recently executed samples in the training set, that is, N recently executed load actions may be extracted, where N is a positive integer, and other samples may also be extracted, which is not limited in the embodiment of the present invention.

In a specific implementation, the updating the parameters of the BP neural network based on the principle of minimum mean square error further includes:

the partial derivative function of each neural network parameter is calculated by using a Stochastic Gradient Descent (SGD) algorithm as follows:

wherein the content of the first and second substances,

is to theta_iPartial derivatives of (a).

Is a partial derivative function of the neural network parameter.

And then calculating and updating parameters of the BP neural network based on the partial derivative function.

In an embodiment of the present invention, in order to improve the effectiveness of the target neural network, the method for loading the vehicle-mounted cargo may further include: updating the parameters of the target neural network based on the parameters of the BP neural network.

In the implementation, updating the neural network using the Q-learning method is essentially an optimization for a method that approximates dynamic programming.

In an embodiment of the present invention, after selecting a loading action for evaluation each time, it is determined whether the iteration process is finished, and when the iteration process is finished, the loading action selected this time is a final value; and when the iteration process is not finished, inputting the next processed state set into a target neural network, obtaining a maximum feedback value Q by using the target neural network parameters, and then calculating the feedback value of the current action according to a Bellman formula.

By applying the scheme, the selection strategy of the loading action is determined to be a random selection strategy or a selection strategy based on a neural network based on the random number and the preset probability parameter, and then the corresponding loading action is selected and the corresponding goods are placed based on the selection strategy of the loading action. Because different loading actions can be selected based on the neural network, different loading actions can be evaluated in a probability statistic manner, the calculation speed is high, and the large-scale logistics loading problem can be solved.

To enable those skilled in the art to better understand and implement the present invention, the embodiment of the present invention provides a method for selecting a loading action based on a neural network, which may include the following steps:

in step S201, variables are initialized.

In a specific implementation, the variable initialization includes: definition of feedback values of loading actions, state set and setting of loading action set.

And step S202, calculating a feedback value of the BP neural network, and selecting the loading action with the maximum feedback value to place the goods.

In a specific implementation, the output value of the BP neural network is defined as: feedback value Q (s, a; theta) of loading action_i) Wherein a is a loading action set, s is a current state set, and the state set includes: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θ_iAnd the network parameters are the network parameters of the BP neural network.

Step S203, storing the loading action into an experience set.

And step S204, selecting N loading actions from the experience set, and calculating a feedback value of the target neural network.

Step S205, training and updating the parameters of the BP neural network based on the feedback value of the target neural network.

And step S206, updating the parameters of the target neural network based on the parameters of the BP neural network.

In order to make the technical personnel in the field understand and implement the invention better, the embodiment of the invention also provides a loading device capable of realizing the vehicle-mounted cargo, as shown in fig. 3.

Referring to fig. 3, the loading device 30 for vehicle cargo may include: a generating unit 31, a determining unit 32 and a placing unit 33, wherein:

the generating unit 31 is adapted to generate a random number.

The determining unit 32 is adapted to determine a selection policy of the loading action based on the random number and a preset probability parameter, where the selection policy of the loading action includes any one of: a random selection strategy and a neural network based selection strategy.

The placing unit 33 is adapted to select a corresponding loading action and place the corresponding goods based on the selection policy of the loading action.

In an embodiment of the invention, the predetermined probability parameter decreases linearly with the increase of the execution steps.

In an embodiment of the present invention, when the preset probability parameter is not greater than the preset parameter threshold, the preset probability parameter is set to be a preset fixed value.

In a specific implementation, the determining unit 32 includes: a first determining subunit 321 and a second determining subunit 322, wherein:

the first determining subunit 321 is adapted to determine that the selection policy of the loading action is a random selection policy when the random number is smaller than the probability parameter.

The second determining subunit 322 is adapted to determine that the selection policy of the loading action is a neural network-based selection policy when the random number is not less than the probability parameter.

In a specific implementation, the loading act includes: identification information of goods to be put in, position information of the goods to be put in, and orientation information of the goods to be put in.

In a specific implementation, the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.

In a specific implementation, when the selection policy of the loading action is a neural network-based selection policy, the placing unit may include: a first calculation subunit (not shown in fig. 3), a selection subunit (not shown in fig. 3), and a placement subunit (not shown in fig. 3), wherein:

the first calculating subunit is suitable for calculating BP neural network output values corresponding to different loading actions based on the BP neural network.

And the selecting subunit is suitable for selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action.

The placing subunit is adapted to place the corresponding cargo based on the corresponding loading action.

In a specific implementation, the output value of the BP neural network is: q (s, a; theta)_i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: the size of the enclosed space of the carriage, the optional placement position and the identification information of the box to be placed, theta_iAnd the network parameters are the network parameters of the BP neural network.

In a specific implementation, after the placing unit selects a corresponding loading action and places a corresponding cargo, the loading device 30 for vehicle-mounted cargo may further include: an execution unit (not shown in fig. 3) and a training unit (not shown in fig. 3), wherein:

the execution unit is suitable for executing the loading action and storing the loading action into a training set.

The training unit is suitable for updating the BP neural network parameters based on the training set.

In a specific implementation, the training unit may include: an extraction subunit (not shown in fig. 3), a second calculation subunit (not shown in fig. 3), a third calculation subunit (not shown in fig. 3), and an update subunit (not shown in fig. 3), wherein:

the extraction subunit is adapted to extract the samples in the training set.

The second calculating subunit is adapted to calculate, based on the samples, an output value of the target neural network as:

to evaluate the loading rate index of different a in different s.

The third computing subunit is adapted to compute a mean square error between the output value of the target neural network and the output value of the BP neural network as:

wherein the content of the first and second substances,

is the mean square error between the output value of the target neural network and the output value of the BP neural network.

And the updating subunit is suitable for updating the parameters of the BP neural network based on the principle of minimum mean square error.

In a specific implementation, the update subunit includes: a calculation module and an update module, wherein:

the calculation module is suitable for calculating the partial derivative function of each neural network parameter by adopting a stochastic gradient descent algorithm as follows:

wherein

Is to theta_iPartial derivatives of (d);

a partial derivative function of the neural network parameter;

and the updating module is suitable for calculating and updating the parameters of the BP neural network based on the partial derivative function.

In a specific implementation, the loading device 30 for vehicle-mounted cargo further includes: an updating unit (not shown in fig. 3) adapted to update the parameters of the target neural network based on the parameters of the BP neural network.

In a specific implementation, the working process and the principle of the loading device 30 for vehicle-mounted cargo may refer to the description of the method provided in the above embodiment, and are not described herein again.

An embodiment of the present invention provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and on which a computer instruction is stored, where the computer instruction executes, when running, any of the steps corresponding to the foregoing methods, and details are not described here again.

The embodiment of the invention provides a loading device for vehicle-mounted cargos, which comprises a memory and a processor, wherein a computer instruction capable of running on the processor is stored in the memory, and when the processor runs the computer instruction, the corresponding steps of any one of the methods are executed, and the description is omitted here.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for loading vehicle-mounted cargo, comprising:

generating a random number;

determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network-based selection strategy; the determining a selection strategy of the loading action based on the random number and a preset probability parameter comprises: when the random number is smaller than the preset probability parameter, determining that the selection strategy of the loading action is a random selection strategy; when the random number is not smaller than the preset probability parameter, determining that a selection strategy of the loading action is a selection strategy based on a neural network;

selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action, wherein the loading action comprises the following steps: identification information of goods to be put in, position information of the goods to be put in and orientation information of the goods to be put in;

when the selection policy of the loading action is a selection policy based on a neural network, selecting a corresponding loading action based on the selection policy of the loading action comprises: calculating BP neural network output values corresponding to different loading actions based on the BP neural network; selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action; the output value of the BP neural network is as follows: q (s, a; theta)_i) For evaluating a loading rate indicator of a loading action set in a state set, wherein a is the loading action set and s is a current state set, the state set comprising: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θ_iAnd the network parameters are the network parameters of the BP neural network.

2. The method for loading cargoes on board according to claim 1, wherein the preset probability parameter decreases linearly with the increase of the execution steps.

3. The loading method of vehicle-mounted cargo according to claim 2, wherein when the preset probability parameter is not greater than a preset parameter threshold, a preset fixed value is set.

4. The loading method of the vehicle-mounted cargo according to claim 1, wherein the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.

5. The method for loading vehicle-mounted cargos according to claim 1, further comprising, after selecting a corresponding loading action and placing the corresponding cargo:

executing the loading action, and storing the loading action into a training set;

updating network parameters of the BP neural network based on the training set.

6. The loading method of the vehicle-mounted cargo according to claim 5, wherein the updating the network parameters of the BP neural network based on the training set comprises:

extracting samples in the training set;

based on the samples, calculating an output value of the target neural network as:

used for evaluating the loading rate indexes of different a in different s;

calculating the mean square error between the output value of the target neural network and the output value of the BP neural network as:

wherein the content of the first and second substances,

and updating the network parameters of the BP neural network based on the principle of minimum mean square error.

7. The loading method of the vehicle-mounted cargo according to claim 6, wherein the updating the network parameters of the BP neural network based on the principle of minimum mean square error further comprises:

the partial derivative function of each neural network parameter is calculated by adopting a stochastic gradient descent algorithm as follows:

wherein

Is to theta_iPartial derivatives of (d);

a partial derivative function of the neural network parameter;

and calculating and updating the network parameters of the BP neural network based on the partial derivative function.

8. The loading method of vehicle-mounted cargo according to claim 6, further comprising:

updating the network parameters of the target neural network based on the network parameters of the BP neural network.

9. A loading device for vehicle-mounted cargos, characterized by comprising:

a generating unit adapted to generate a random number;

a determining unit adapted to determine a selection policy of the loading action based on the random number and a preset probability parameter, the selection policy of the loading action including any one of: a random selection strategy and a neural network-based selection strategy; the determination unit includes: a first determining subunit, adapted to determine, when the random number is smaller than the probability parameter, that the selection policy of the loading action is a random selection policy; a second determining subunit, adapted to determine, when the random number is not less than the probability parameter, that the selection policy of the loading action is a neural network-based selection policy;

a placing unit adapted to select a corresponding loading action and place a corresponding cargo based on a selection policy of the loading action, the loading action including: identification information of goods to be put in, position information of the goods to be put in and orientation information of the goods to be put in;

the placing unit includes: the first calculating subunit is suitable for calculating BP neural network output values corresponding to different loading actions based on the BP neural network; the selecting subunit is suitable for selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action; a placing subunit adapted to place the respective goods based on the corresponding loading actions; the output value of the BP neural network is as follows: q (s, a; theta)_i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: the size of the enclosed space of the carriage, the optional placement position and the identification information of the box to be placed, theta_iAnd the network parameters are the network parameters of the BP neural network.

10. The loading device of vehicular cargo according to claim 9, wherein the predetermined probability parameter decreases linearly with the increase of the performing step.

11. The loading device of the vehicle cargo according to claim 10, wherein when the predetermined probability parameter is not greater than a predetermined parameter threshold, a predetermined fixed value is set.

12. The loading device for vehicle-mounted cargos according to claim 9, wherein the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.

13. The loading device for vehicle-mounted cargos according to claim 9, wherein after the placing unit selects the corresponding loading action and places the corresponding cargo, the loading device further comprises:

the execution unit is suitable for executing the loading action and storing the loading action into a training set;

and the training unit is suitable for updating the network parameters of the BP neural network based on the training set.

14. The loading device for vehicle cargo according to claim 13, wherein the training unit comprises:

an extraction subunit adapted to extract samples in the training set;

a second calculation subunit adapted to calculate, based on the samples, an output value of the target neural network as:

used for evaluating the loading rate indexes of different a in different s;

a third computing subunit adapted to compute a mean square error between the output value of the target neural network and the output value of the BP neural network as:

wherein the content of the first and second substances,

and the updating subunit is suitable for updating the network parameters of the BP neural network based on the principle of minimum mean square error.

15. The loading device for vehicle cargo according to claim 14, wherein the updating subunit comprises:

wherein

Is to theta_iPartial derivatives of (d);

a partial derivative function of the neural network parameter;

and the updating module is suitable for calculating and updating the network parameters of the BP neural network based on the partial derivative function.

16. The loading device for vehicle cargo according to claim 14, further comprising:

and the updating unit is suitable for updating the network parameters of the target neural network based on the network parameters of the BP neural network.

17. A computer-readable storage medium, being a non-volatile storage medium or a non-transitory storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method according to any of the claims 1 to 8.

18. A loading device for vehicle cargo, comprising a memory and a processor, said memory having stored thereon a computer program being executable on said processor, characterized in that said processor, when executing said computer program, executes the steps of the method according to any of the claims 1 to 8.