CN108520327B - Loading method and device of vehicle-mounted cargo and computer readable medium - Google Patents

Loading method and device of vehicle-mounted cargo and computer readable medium Download PDF

Info

Publication number
CN108520327B
CN108520327B CN201810357903.3A CN201810357903A CN108520327B CN 108520327 B CN108520327 B CN 108520327B CN 201810357903 A CN201810357903 A CN 201810357903A CN 108520327 B CN108520327 B CN 108520327B
Authority
CN
China
Prior art keywords
neural network
loading
loading action
goods
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810357903.3A
Other languages
Chinese (zh)
Other versions
CN108520327A (en
Inventor
金忠孝
戴昌志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAIC Motor Corp Ltd
Anji Automotive Logistics Co Ltd
Original Assignee
SAIC Motor Corp Ltd
Anji Automotive Logistics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAIC Motor Corp Ltd, Anji Automotive Logistics Co Ltd filed Critical SAIC Motor Corp Ltd
Priority to CN201810357903.3A priority Critical patent/CN108520327B/en
Publication of CN108520327A publication Critical patent/CN108520327A/en
Application granted granted Critical
Publication of CN108520327B publication Critical patent/CN108520327B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A loading method and device of vehicle-mounted cargos and a computer readable medium are provided, wherein the loading method of the vehicle-mounted cargos comprises the following steps: generating a random number; determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network-based selection strategy; and selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action. By applying the scheme, different loading actions can be selected based on the neural network, so that the different loading actions can be evaluated in a probability statistics manner, the calculation speed is high, and the large-scale logistics loading problem can be solved.

Description

Loading method and device of vehicle-mounted cargo and computer readable medium
Technical Field
The embodiment of the invention relates to the field of solving combinatorial optimization problems, in particular to a loading method and device of vehicle-mounted cargos and a computer readable medium.
Background
For the logistics system, the loading method of the vehicle-mounted cargo is an important technical problem. The loading method of the vehicle-mounted goods is a boxing problem, and the boxing problem widely exists in various fields of industrial production, computer science and the like, such as fabric cutting in the clothing industry, container loading in the transportation industry, plate type blanking in the processing industry, layout in the printing industry, object packaging and arrangement in real life, and bottom layer operations of multiprocessor task scheduling, resource allocation, file allocation, memory management and the like in the computer field.
From the aspect of computational complexity, the binning problem is a Non-deterministic (NP) problem of Polynomial complexity, and it is difficult to solve an accurate global optimal solution, and a heuristic algorithm and a search algorithm are generally adopted for solving the problem. The idea of the heuristic algorithm is to find a heuristic rule that produces a feasible solution, in order to find an optimal or near optimal solution to the problem. The method has high solving efficiency, but a specific heuristic rule needs to be found for different problems, and the heuristic rule generally has no universality and is not suitable for other problems. For the packing problem, heuristic algorithms include a First adaptation (FF) algorithm, a Best adaptation (Best Fit, BF) algorithm, a genetic algorithm, a simulated annealing algorithm, a particle swarm algorithm, and the like. The search algorithm is to search in a solution space to find an optimal solution or an approximately optimal solution of the problem. This method does not guarantee the optimal solution to the problem, but if some heuristic knowledge is properly utilized, a better balance can be achieved in the quality and efficiency of the approximate solution.
The conventional loading method for vehicle-mounted cargos mainly adopts a heuristic algorithm to exhaust the boxing scheme in a limited space, is low in boxing rate and long in calculation time, and is not suitable for large-scale operation. The heuristic algorithm which is fast in solving, for example, the particle swarm algorithm can obtain the solution fast, but the solving quality is not high.
Therefore, the existing scheme cannot solve the problem of solving large-scale logistics loading.
Disclosure of Invention
The technical problem solved by the embodiment of the invention is how to solve the problem of large-scale logistics loading.
In order to solve the technical problem, an embodiment of the present invention provides a method for loading a vehicle-mounted cargo, where the method includes: generating a random number; determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network-based selection strategy; and selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action.
Optionally, the predetermined probability parameter decreases linearly with increasing number of execution steps.
Optionally, when the preset probability parameter is not greater than a preset parameter threshold, the preset probability parameter is set to be a preset fixed value.
Optionally, the determining a selection policy of the loading action based on the random number and a preset probability parameter includes: when the random number is smaller than the preset probability parameter, determining that the selection strategy of the loading action is a random selection strategy; and when the random number is not less than the preset probability parameter, determining that the selection strategy of the loading action is a selection strategy based on a neural network.
Optionally, the loading act comprises: identification information of goods to be put in, position information of the goods to be put in, and orientation information of the goods to be put in.
Optionally, the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.
Optionally, when the selection policy of the loading action is a selection policy based on a neural network, selecting a corresponding loading action based on the selection policy of the loading action includes: calculating BP neural network output values corresponding to different loading actions based on the BP neural network; and selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action.
Optionally, the output value of the BP neural network is: q (s, a; theta)i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θiAnd the network parameters are the network parameters of the BP neural network.
Optionally, after selecting the corresponding loading action and placing the corresponding cargo, the method further includes: executing the loading action, and storing the loading action into a training set; updating the BP neural network parameters based on the training set.
Optionally, the updating the BP neural network parameters based on the training set comprises: extracting samples in the training set; calculating a goal based on the samplesThe output value of the target neural network is:
Figure BDA0001634397680000031
wherein r is an instantaneous feedback value for performing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, the optional placing position and the identification information of goods to be placed in, s' is a state set after the loading action is updated,
Figure BDA0001634397680000032
gamma is a preset second proportionality coefficient which is a network parameter of the target neural network,
Figure BDA0001634397680000033
used for evaluating the loading rate indexes of different a in different s; calculating the mean square error between the output value of the target neural network and the output value of the BP neural network as:
Figure BDA0001634397680000034
wherein the content of the first and second substances,
Figure BDA0001634397680000035
is the mean square error between the output value of the target neural network and the output value of the BP neural network;
and updating the parameters of the BP neural network based on the principle of minimum mean square error.
Optionally, the updating the parameters of the BP neural network based on the principle of minimum mean square error further includes: the partial derivative function of each neural network parameter is calculated by adopting a stochastic gradient descent algorithm as follows:
Figure BDA0001634397680000036
wherein
Figure BDA0001634397680000037
Is to thetaiPartial derivatives of (d);
Figure BDA0001634397680000038
a partial derivative function of the neural network parameter; and calculating and updating parameters of the BP neural network based on the partial derivative function.
Optionally, the loading method of the vehicle-mounted cargo further includes: updating the parameters of the target neural network based on the parameters of the BP neural network.
The embodiment of the invention provides a loading device for vehicle-mounted cargos, which comprises: a generating unit adapted to generate a random number; a determining unit adapted to determine a selection policy of the loading action based on the random number and a preset probability parameter, the selection policy of the loading action including any one of: a random selection strategy and a neural network-based selection strategy; and the placing unit is suitable for selecting the corresponding loading action and placing the corresponding goods based on the selection strategy of the loading action.
Optionally, the predetermined probability parameter decreases linearly with increasing number of execution steps.
Optionally, when the preset probability parameter is not greater than a preset parameter threshold, the preset probability parameter is set to be a preset fixed value.
Optionally, the determining unit includes: a first determining subunit, adapted to determine, when the random number is smaller than the probability parameter, that the selection policy of the loading action is a random selection policy; a second determining subunit, adapted to determine that the selection policy of the loading action is a neural network-based selection policy when the random number is not less than the probability parameter.
Optionally, the loading act comprises: identification information of goods to be put in, position information of the goods to be put in, and orientation information of the goods to be put in.
Optionally, the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.
Optionally, when the selection policy of the loading action is a neural network-based selection policy, the placing unit includes: the first calculating subunit is suitable for calculating BP neural network output values corresponding to different loading actions based on the BP neural network; and the selecting subunit is suitable for selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action. A placing subunit adapted to place the respective goods based on the corresponding loading actions.
Optionally, the output value of the BP neural network is: q (s, a; theta)i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: the size of the enclosed space of the carriage, the optional placement position and the identification information of the box to be placed, thetaiAnd the network parameters are the network parameters of the BP neural network.
Optionally, after the placing unit selects the corresponding loading action and places the corresponding goods, the method further includes: the execution unit is suitable for executing the loading action and storing the loading action into a training set; a training unit adapted to update the BP neural network parameters based on the training set.
Optionally, the training unit comprises: an extraction subunit adapted to extract samples in the training set; a second calculation subunit adapted to calculate, based on the samples, an output value of the target neural network as:
Figure BDA0001634397680000051
wherein r is an instantaneous feedback value for performing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, optional placement positions and identification information of the boxes to be placed, s' is a state set after the loading action is updated,
Figure BDA0001634397680000052
gamma is a preset second proportion for the network parameter of the target neural networkThe coefficients of which are such that,
Figure BDA0001634397680000053
used for evaluating the loading rate indexes of different a in different s; a third computing subunit adapted to compute a mean square error between the output value of the target neural network and the output value of the BP neural network as:
Figure BDA0001634397680000054
wherein the content of the first and second substances,
Figure BDA0001634397680000055
is the mean square error between the output value of the target neural network and the output value of the BP neural network; and the updating subunit is suitable for updating the parameters of the BP neural network based on the principle of minimum mean square error.
Optionally, the update subunit includes: the calculation module is suitable for calculating the partial derivative function of each neural network parameter by adopting a stochastic gradient descent algorithm as follows:
Figure BDA0001634397680000056
wherein
Figure BDA0001634397680000057
Is to thetaiPartial derivatives of (d);
Figure BDA0001634397680000058
a partial derivative function of the neural network parameter; and the updating module is suitable for calculating and updating the parameters of the BP neural network based on the partial derivative function.
Optionally, the device for loading vehicle-mounted cargo is characterized by further comprising: and the updating unit is suitable for updating the parameters of the target neural network based on the parameters of the BP neural network.
An embodiment of the present invention provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and has stored thereon computer instructions, where the computer instructions, when executed, perform any of the steps of the method described above.
The embodiment of the invention provides a loading device for vehicle-mounted cargos, which comprises a memory and a processor, wherein computer instructions capable of being operated on the processor are stored on the memory, and the processor executes any one of the steps of the method when executing the computer instructions.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
according to the embodiment of the invention, the selection strategy of the loading action is determined to be a random selection strategy or a selection strategy based on a neural network based on the random number and the preset probability parameter, and then the corresponding loading action is selected and the corresponding goods are placed based on the selection strategy of the loading action. Because different loading actions can be selected based on the neural network, different loading actions can be evaluated in a probability statistic manner, the calculation speed is high, and the large-scale logistics loading problem can be solved.
Furthermore, the neural network can be designed into a function-like form, so that multi-core and multi-thread parallel operation can be realized, and the calculation speed is further improved.
Furthermore, the loading action comprises identification information of the goods to be placed, position information of the goods to be placed and orientation information of the goods to be placed, so that the conditions of changeability of carriages, goods and the like can be met, the method is suitable for various scenes, and the universality of the loading method of the vehicle-mounted goods is improved.
Drawings
Fig. 1 is a flowchart of a loading method for vehicle-mounted cargo according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for selecting a loading action based on a neural network according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a loading device for vehicle-mounted cargo according to an embodiment of the present invention.
Detailed Description
The conventional loading method for vehicle-mounted cargos mainly adopts a heuristic algorithm to exhaust the boxing scheme in a limited space, is low in boxing rate and long in calculation time, and is not suitable for large-scale operation. The heuristic algorithm which is fast in solving, for example, the particle swarm algorithm can obtain the solution fast, but the solving quality is not high. Therefore, the existing scheme cannot solve the problem of solving large-scale logistics loading.
According to the embodiment of the invention, the selection strategy of the loading action is determined to be a random selection strategy or a selection strategy based on a neural network based on the random number and the preset probability parameter, and then the corresponding loading action is selected and the corresponding goods are placed based on the selection strategy of the loading action. Because different loading actions can be selected based on the neural network, different loading actions can be evaluated in a probability statistic manner, the calculation speed is high, and the large-scale logistics loading problem can be solved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, an embodiment of the present invention provides a loading method of vehicle-mounted cargo, which may include the following steps:
step S101 generates a random number.
In particular implementations, the selection strategy for different loading actions may be determined based on random numbers.
Step S102, determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network based selection strategy.
In a specific implementation, the selection strategy of the desired loading action may be selected by a roulette algorithm, or a greedy algorithm, based on the random number and a preset probability parameter. For example, when training begins, random selection strategies can be selected comparatively more due to imperfections in the neural network model; with the increase of the execution steps, the neural network model is continuously optimized, the accuracy is improved, and the neural network model can be selected more frequently to improve the loading rate of the loading action.
In a specific implementation, the predetermined probability parameter may be a value in the interval [0,1] and decreases linearly with the increase of the execution steps. For example, in step 0, the preset probability parameter is 1; in the step 1, the preset probability parameter is 0.9; … …, the predetermined probability parameter is 0.1.
In a specific implementation, as the number of execution steps increases, when the preset probability parameter is not greater than, i.e., less than or equal to the preset parameter threshold, the preset probability parameter may be set to a preset fixed value to ensure the learning efficiency of the neural network. For example, when the preset probability parameter is decreased to 0.1, it may be set to 0.1 to ensure the learning efficiency of the neural network.
It can be understood that the preset parameter threshold and the preset fixed value may be set to the same value or may be set to different values, which is not limited in the embodiment of the present invention.
In an embodiment of the present invention, the determining a selection policy of the loading action based on the random number and a preset probability parameter includes: when the random number is smaller than the preset probability parameter, determining that the selection strategy of the loading action is a random selection strategy; and when the random number is not less than the preset probability parameter, determining that the selection strategy of the loading action is a selection strategy based on a neural network.
In a specific implementation, the loading action may include: identification information of goods to be put in, position information of the goods to be put in, and orientation information of the goods to be put in. Based on the loading action, a particular cargo may be selected to be placed in its corresponding location.
It should be understood that the cargo may also be referred to by other names such as boxes, and the loading action may also be referred to by other names such as executing action and boxing action, all of which are within the scope of the present invention as long as the meanings are the same.
In a specific implementation, to improve the loading rate, the loading action may include at least one of the following constraint relationships: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.
In particular embodiments, the left cargo may be constrained to be higher than the right cargo, the right cargo may be constrained to be higher than the left cargo, or the right cargo may be as high as the left cargo, since there is always one case in which this constraint can be implemented from a probabilistic perspective.
Since the loading action includes: the identification information of the goods to be placed, the position information of the goods to be placed and the orientation information of the goods to be placed can meet the changeable conditions of carriages, goods and the like, are suitable for various scenes, and improve the universality of the loading method of the vehicle-mounted goods.
In a specific implementation, when the selection policy of the loading action is a selection policy based on a neural network, selecting a corresponding loading action based on the selection policy of the loading action may include: calculating BP neural network output values corresponding to different loading actions based on a Back Propagation (BP) neural network; and selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action.
In an embodiment of the present invention, the output value of the BP neural network is used to evaluate a loading rate indicator of a loading action set in a state set, and is defined as: feedback value Q (s, a; theta) of loading actioni) Wherein a is a loading action set, s is a current state set, and the state set includes: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θiAnd the network parameters are the network parameters of the BP neural network.
Because the traditional reinforcement learning algorithm uses a table to store an action estimation function, and the table storage is only suitable for the discrete (irrelevant) condition of the action estimation, and is not suitable for large-scale operation, the execution action can be evaluated in a probability statistic mode by adopting a BP neural network, namely, the feedback value of the execution action is estimated, and the traditional reinforcement learning algorithm is suitable for large-scale operation.
The neural network can be designed into a function-like form, so that multi-core and multi-thread parallel operation can be realized, and the calculation speed is further improved.
In a specific implementation, the loading Rate (Packing Rate) may be: the higher the loading rate, the larger the output value of the BP neural network.
In an embodiment of the present invention, the output value of the BP neural network is a product of an instantaneous loading rate and a preset first scaling factor.
And S103, selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action.
In a specific implementation, after selecting a corresponding loading action and placing a corresponding cargo, the loading action may be stored in a training set for training the neural network.
In an embodiment of the present invention, after selecting a corresponding loading action and placing a corresponding cargo, the method further includes executing the loading action, and storing the loading action into a training set; updating the BP neural network parameters based on the training set.
In a specific implementation, the stored data may be randomly selected from the training set based on a normal distribution rule to train the BP neural network, or the stored data may be randomly selected from the training set based on an evenly distributed rule to train the BP neural network, or the stored data may be randomly selected from the training set based on another distribution rule to train the BP neural network, which is not described herein again.
In specific implementation, in order to ensure the stability and convergence of the BP neural training, a target neural network may be further defined, and the BP neural network may be trained by updating the target neural network in stages.
In an embodiment of the present invention, the updating the BP neural network parameters based on the training set includes: extracting samples in the training set; based on the samples, calculating an output value of the target neural network as:
Figure BDA0001634397680000091
wherein r is an instantaneous feedback value for executing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, the optional placing position and the identification information of goods to be placed in, s' is a state set after the loading action is updated,
Figure BDA0001634397680000101
gamma is a preset second proportionality coefficient which is a network parameter of the target neural network,
Figure BDA0001634397680000102
used for evaluating the loading rate indexes of different a in different s; calculating the mean square error between the output value of the target neural network and the output value of the BP neural network as:
Figure BDA0001634397680000103
wherein the content of the first and second substances,
Figure BDA0001634397680000104
is the mean square error between the output value of the target neural network and the output value of the BP neural network. And updating the parameters of the BP neural network based on the principle of minimum mean square error.
In a specific implementation, Top N recently executed samples in the training set, that is, N recently executed load actions may be extracted, where N is a positive integer, and other samples may also be extracted, which is not limited in the embodiment of the present invention.
In a specific implementation, the updating the parameters of the BP neural network based on the principle of minimum mean square error further includes:
the partial derivative function of each neural network parameter is calculated by using a Stochastic Gradient Descent (SGD) algorithm as follows:
Figure BDA0001634397680000105
wherein the content of the first and second substances,
Figure BDA0001634397680000106
is to thetaiPartial derivatives of (a).
Figure BDA0001634397680000107
Is a partial derivative function of the neural network parameter.
And then calculating and updating parameters of the BP neural network based on the partial derivative function.
In an embodiment of the present invention, in order to improve the effectiveness of the target neural network, the method for loading the vehicle-mounted cargo may further include: updating the parameters of the target neural network based on the parameters of the BP neural network.
In the implementation, updating the neural network using the Q-learning method is essentially an optimization for a method that approximates dynamic programming.
In an embodiment of the present invention, after selecting a loading action for evaluation each time, it is determined whether the iteration process is finished, and when the iteration process is finished, the loading action selected this time is a final value; and when the iteration process is not finished, inputting the next processed state set into a target neural network, obtaining a maximum feedback value Q by using the target neural network parameters, and then calculating the feedback value of the current action according to a Bellman formula.
By applying the scheme, the selection strategy of the loading action is determined to be a random selection strategy or a selection strategy based on a neural network based on the random number and the preset probability parameter, and then the corresponding loading action is selected and the corresponding goods are placed based on the selection strategy of the loading action. Because different loading actions can be selected based on the neural network, different loading actions can be evaluated in a probability statistic manner, the calculation speed is high, and the large-scale logistics loading problem can be solved.
To enable those skilled in the art to better understand and implement the present invention, the embodiment of the present invention provides a method for selecting a loading action based on a neural network, which may include the following steps:
in step S201, variables are initialized.
In a specific implementation, the variable initialization includes: definition of feedback values of loading actions, state set and setting of loading action set.
And step S202, calculating a feedback value of the BP neural network, and selecting the loading action with the maximum feedback value to place the goods.
In a specific implementation, the output value of the BP neural network is defined as: feedback value Q (s, a; theta) of loading actioni) Wherein a is a loading action set, s is a current state set, and the state set includes: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θiAnd the network parameters are the network parameters of the BP neural network.
Step S203, storing the loading action into an experience set.
And step S204, selecting N loading actions from the experience set, and calculating a feedback value of the target neural network.
Step S205, training and updating the parameters of the BP neural network based on the feedback value of the target neural network.
And step S206, updating the parameters of the target neural network based on the parameters of the BP neural network.
In order to make the technical personnel in the field understand and implement the invention better, the embodiment of the invention also provides a loading device capable of realizing the vehicle-mounted cargo, as shown in fig. 3.
Referring to fig. 3, the loading device 30 for vehicle cargo may include: a generating unit 31, a determining unit 32 and a placing unit 33, wherein:
the generating unit 31 is adapted to generate a random number.
The determining unit 32 is adapted to determine a selection policy of the loading action based on the random number and a preset probability parameter, where the selection policy of the loading action includes any one of: a random selection strategy and a neural network based selection strategy.
The placing unit 33 is adapted to select a corresponding loading action and place the corresponding goods based on the selection policy of the loading action.
In an embodiment of the invention, the predetermined probability parameter decreases linearly with the increase of the execution steps.
In an embodiment of the present invention, when the preset probability parameter is not greater than the preset parameter threshold, the preset probability parameter is set to be a preset fixed value.
In a specific implementation, the determining unit 32 includes: a first determining subunit 321 and a second determining subunit 322, wherein:
the first determining subunit 321 is adapted to determine that the selection policy of the loading action is a random selection policy when the random number is smaller than the probability parameter.
The second determining subunit 322 is adapted to determine that the selection policy of the loading action is a neural network-based selection policy when the random number is not less than the probability parameter.
In a specific implementation, the loading act includes: identification information of goods to be put in, position information of the goods to be put in, and orientation information of the goods to be put in.
In a specific implementation, the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.
In a specific implementation, when the selection policy of the loading action is a neural network-based selection policy, the placing unit may include: a first calculation subunit (not shown in fig. 3), a selection subunit (not shown in fig. 3), and a placement subunit (not shown in fig. 3), wherein:
the first calculating subunit is suitable for calculating BP neural network output values corresponding to different loading actions based on the BP neural network.
And the selecting subunit is suitable for selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action.
The placing subunit is adapted to place the corresponding cargo based on the corresponding loading action.
In a specific implementation, the output value of the BP neural network is: q (s, a; theta)i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: the size of the enclosed space of the carriage, the optional placement position and the identification information of the box to be placed, thetaiAnd the network parameters are the network parameters of the BP neural network.
In a specific implementation, after the placing unit selects a corresponding loading action and places a corresponding cargo, the loading device 30 for vehicle-mounted cargo may further include: an execution unit (not shown in fig. 3) and a training unit (not shown in fig. 3), wherein:
the execution unit is suitable for executing the loading action and storing the loading action into a training set.
The training unit is suitable for updating the BP neural network parameters based on the training set.
In a specific implementation, the training unit may include: an extraction subunit (not shown in fig. 3), a second calculation subunit (not shown in fig. 3), a third calculation subunit (not shown in fig. 3), and an update subunit (not shown in fig. 3), wherein:
the extraction subunit is adapted to extract the samples in the training set.
The second calculating subunit is adapted to calculate, based on the samples, an output value of the target neural network as:
Figure BDA0001634397680000131
wherein r is an instantaneous feedback value for performing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, optional placement positions and identification information of the boxes to be placed, s' is a state set after the loading action is updated,
Figure BDA0001634397680000132
gamma is a preset second proportionality coefficient which is a network parameter of the target neural network,
Figure BDA0001634397680000133
to evaluate the loading rate index of different a in different s.
The third computing subunit is adapted to compute a mean square error between the output value of the target neural network and the output value of the BP neural network as:
Figure BDA0001634397680000134
wherein the content of the first and second substances,
Figure BDA0001634397680000141
is the mean square error between the output value of the target neural network and the output value of the BP neural network.
And the updating subunit is suitable for updating the parameters of the BP neural network based on the principle of minimum mean square error.
In a specific implementation, the update subunit includes: a calculation module and an update module, wherein:
the calculation module is suitable for calculating the partial derivative function of each neural network parameter by adopting a stochastic gradient descent algorithm as follows:
Figure BDA0001634397680000142
wherein
Figure BDA0001634397680000143
Is to thetaiPartial derivatives of (d);
Figure BDA0001634397680000144
a partial derivative function of the neural network parameter;
and the updating module is suitable for calculating and updating the parameters of the BP neural network based on the partial derivative function.
In a specific implementation, the loading device 30 for vehicle-mounted cargo further includes: an updating unit (not shown in fig. 3) adapted to update the parameters of the target neural network based on the parameters of the BP neural network.
In a specific implementation, the working process and the principle of the loading device 30 for vehicle-mounted cargo may refer to the description of the method provided in the above embodiment, and are not described herein again.
An embodiment of the present invention provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and on which a computer instruction is stored, where the computer instruction executes, when running, any of the steps corresponding to the foregoing methods, and details are not described here again.
The embodiment of the invention provides a loading device for vehicle-mounted cargos, which comprises a memory and a processor, wherein a computer instruction capable of running on the processor is stored in the memory, and when the processor runs the computer instruction, the corresponding steps of any one of the methods are executed, and the description is omitted here.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (18)

1. A method for loading vehicle-mounted cargo, comprising:
generating a random number;
determining a selection strategy of the loading action based on the random number and a preset probability parameter, wherein the selection strategy of the loading action comprises any one of the following strategies: a random selection strategy and a neural network-based selection strategy; the determining a selection strategy of the loading action based on the random number and a preset probability parameter comprises: when the random number is smaller than the preset probability parameter, determining that the selection strategy of the loading action is a random selection strategy; when the random number is not smaller than the preset probability parameter, determining that a selection strategy of the loading action is a selection strategy based on a neural network;
selecting a corresponding loading action and placing corresponding goods based on the selection strategy of the loading action, wherein the loading action comprises the following steps: identification information of goods to be put in, position information of the goods to be put in and orientation information of the goods to be put in;
when the selection policy of the loading action is a selection policy based on a neural network, selecting a corresponding loading action based on the selection policy of the loading action comprises: calculating BP neural network output values corresponding to different loading actions based on the BP neural network; selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action; the output value of the BP neural network is as follows: q (s, a; theta)i) For evaluating a loading rate indicator of a loading action set in a state set, wherein a is the loading action set and s is a current state set, the state set comprising: size of the enclosed space of the vehicle compartment, optional placement position and identification information of the goods to be placed, θiAnd the network parameters are the network parameters of the BP neural network.
2. The method for loading cargoes on board according to claim 1, wherein the preset probability parameter decreases linearly with the increase of the execution steps.
3. The loading method of vehicle-mounted cargo according to claim 2, wherein when the preset probability parameter is not greater than a preset parameter threshold, a preset fixed value is set.
4. The loading method of the vehicle-mounted cargo according to claim 1, wherein the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.
5. The method for loading vehicle-mounted cargos according to claim 1, further comprising, after selecting a corresponding loading action and placing the corresponding cargo:
executing the loading action, and storing the loading action into a training set;
updating network parameters of the BP neural network based on the training set.
6. The loading method of the vehicle-mounted cargo according to claim 5, wherein the updating the network parameters of the BP neural network based on the training set comprises:
extracting samples in the training set;
based on the samples, calculating an output value of the target neural network as:
Figure FDA0002936976790000021
wherein r is an instantaneous feedback value for performing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, the optional placing position and the identification information of goods to be placed in, s' is a state set after the loading action is updated,
Figure FDA0002936976790000022
gamma is a preset second proportionality coefficient which is a network parameter of the target neural network,
Figure FDA0002936976790000023
used for evaluating the loading rate indexes of different a in different s;
calculating the mean square error between the output value of the target neural network and the output value of the BP neural network as:
Figure FDA0002936976790000024
wherein the content of the first and second substances,
Figure FDA0002936976790000025
is the mean square error between the output value of the target neural network and the output value of the BP neural network;
and updating the network parameters of the BP neural network based on the principle of minimum mean square error.
7. The loading method of the vehicle-mounted cargo according to claim 6, wherein the updating the network parameters of the BP neural network based on the principle of minimum mean square error further comprises:
the partial derivative function of each neural network parameter is calculated by adopting a stochastic gradient descent algorithm as follows:
Figure FDA0002936976790000026
wherein
Figure FDA0002936976790000027
Is to thetaiPartial derivatives of (d);
Figure FDA0002936976790000028
a partial derivative function of the neural network parameter;
and calculating and updating the network parameters of the BP neural network based on the partial derivative function.
8. The loading method of vehicle-mounted cargo according to claim 6, further comprising:
updating the network parameters of the target neural network based on the network parameters of the BP neural network.
9. A loading device for vehicle-mounted cargos, characterized by comprising:
a generating unit adapted to generate a random number;
a determining unit adapted to determine a selection policy of the loading action based on the random number and a preset probability parameter, the selection policy of the loading action including any one of: a random selection strategy and a neural network-based selection strategy; the determination unit includes: a first determining subunit, adapted to determine, when the random number is smaller than the probability parameter, that the selection policy of the loading action is a random selection policy; a second determining subunit, adapted to determine, when the random number is not less than the probability parameter, that the selection policy of the loading action is a neural network-based selection policy;
a placing unit adapted to select a corresponding loading action and place a corresponding cargo based on a selection policy of the loading action, the loading action including: identification information of goods to be put in, position information of the goods to be put in and orientation information of the goods to be put in;
the placing unit includes: the first calculating subunit is suitable for calculating BP neural network output values corresponding to different loading actions based on the BP neural network; the selecting subunit is suitable for selecting the loading action with the maximum corresponding BP neural network output value as the corresponding loading action; a placing subunit adapted to place the respective goods based on the corresponding loading actions; the output value of the BP neural network is as follows: q (s, a; theta)i) A load rate indicator for evaluating a set of loading actions in a state set; wherein a is a loading action set, s is a current state set, and the state set comprises: the size of the enclosed space of the carriage, the optional placement position and the identification information of the box to be placed, thetaiAnd the network parameters are the network parameters of the BP neural network.
10. The loading device of vehicular cargo according to claim 9, wherein the predetermined probability parameter decreases linearly with the increase of the performing step.
11. The loading device of the vehicle cargo according to claim 10, wherein when the predetermined probability parameter is not greater than a predetermined parameter threshold, a predetermined fixed value is set.
12. The loading device for vehicle-mounted cargos according to claim 9, wherein the loading action satisfies at least one of the following constraints: the goods to be put and at least one previously placed goods are close together, and the goods to be put and the previously placed goods are not overlapped in the horizontal direction.
13. The loading device for vehicle-mounted cargos according to claim 9, wherein after the placing unit selects the corresponding loading action and places the corresponding cargo, the loading device further comprises:
the execution unit is suitable for executing the loading action and storing the loading action into a training set;
and the training unit is suitable for updating the network parameters of the BP neural network based on the training set.
14. The loading device for vehicle cargo according to claim 13, wherein the training unit comprises:
an extraction subunit adapted to extract samples in the training set;
a second calculation subunit adapted to calculate, based on the samples, an output value of the target neural network as:
Figure FDA0002936976790000041
wherein r is an instantaneous feedback value for performing the loading action, a is a loading action set, a' is a future loading action set, and s is a current state set, the state set comprising: the size of the closed space of the carriage, optional placement positions and identification information of the boxes to be placed, s' is a state set after the loading action is updated,
Figure FDA0002936976790000042
gamma is a preset second proportionality coefficient which is a network parameter of the target neural network,
Figure FDA0002936976790000043
used for evaluating the loading rate indexes of different a in different s;
a third computing subunit adapted to compute a mean square error between the output value of the target neural network and the output value of the BP neural network as:
Figure FDA0002936976790000044
wherein the content of the first and second substances,
Figure FDA0002936976790000045
is the mean square error between the output value of the target neural network and the output value of the BP neural network;
and the updating subunit is suitable for updating the network parameters of the BP neural network based on the principle of minimum mean square error.
15. The loading device for vehicle cargo according to claim 14, wherein the updating subunit comprises:
the calculation module is suitable for calculating the partial derivative function of each neural network parameter by adopting a stochastic gradient descent algorithm as follows:
Figure FDA0002936976790000046
wherein
Figure FDA0002936976790000051
Is to thetaiPartial derivatives of (d);
Figure FDA0002936976790000052
a partial derivative function of the neural network parameter;
and the updating module is suitable for calculating and updating the network parameters of the BP neural network based on the partial derivative function.
16. The loading device for vehicle cargo according to claim 14, further comprising:
and the updating unit is suitable for updating the network parameters of the target neural network based on the network parameters of the BP neural network.
17. A computer-readable storage medium, being a non-volatile storage medium or a non-transitory storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method according to any of the claims 1 to 8.
18. A loading device for vehicle cargo, comprising a memory and a processor, said memory having stored thereon a computer program being executable on said processor, characterized in that said processor, when executing said computer program, executes the steps of the method according to any of the claims 1 to 8.
CN201810357903.3A 2018-04-19 2018-04-19 Loading method and device of vehicle-mounted cargo and computer readable medium Active CN108520327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810357903.3A CN108520327B (en) 2018-04-19 2018-04-19 Loading method and device of vehicle-mounted cargo and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810357903.3A CN108520327B (en) 2018-04-19 2018-04-19 Loading method and device of vehicle-mounted cargo and computer readable medium

Publications (2)

Publication Number Publication Date
CN108520327A CN108520327A (en) 2018-09-11
CN108520327B true CN108520327B (en) 2021-03-23

Family

ID=63429827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810357903.3A Active CN108520327B (en) 2018-04-19 2018-04-19 Loading method and device of vehicle-mounted cargo and computer readable medium

Country Status (1)

Country Link
CN (1) CN108520327B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112334914A (en) * 2018-09-27 2021-02-05 渊慧科技有限公司 Mock learning using generative lead neural networks
CN109615136B (en) * 2018-12-13 2021-08-13 浙江理工大学 Container loading process optimization method based on particle filling principle
CN112395903A (en) * 2019-08-12 2021-02-23 顺丰科技有限公司 Method and device for determining spatial characteristics, network equipment and storage medium
CN111860837A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 Method and device for processing boxing problem and computer readable storage medium
US20220129840A1 (en) * 2020-10-26 2022-04-28 Genpact Luxembourg S.À R.L System And Method For Reinforcement-Learning Based On-Loading Optimization
CN112365207A (en) * 2020-11-10 2021-02-12 上海汽车集团股份有限公司 Boxing method and device and computer readable storage medium
CN115081119B (en) * 2022-07-20 2022-11-08 中铁第四勘察设计院集团有限公司 Method, device and equipment for optimizing train loading and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008058138A2 (en) * 2006-11-06 2008-05-15 Russell Mark C Method for cooperative transport of parcels
CN101957945A (en) * 2010-08-20 2011-01-26 上海电机学院 Method and device for optimizing goods loading of container
CN107622321A (en) * 2017-07-27 2018-01-23 山东储备物资管理局八三二处 A kind of algorithm that casing loading pattern is intelligently generated based on multi-constraint condition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106936883B (en) * 2015-12-31 2020-03-20 伊姆西Ip控股有限责任公司 Method and apparatus for cloud system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008058138A2 (en) * 2006-11-06 2008-05-15 Russell Mark C Method for cooperative transport of parcels
CN101957945A (en) * 2010-08-20 2011-01-26 上海电机学院 Method and device for optimizing goods loading of container
CN107622321A (en) * 2017-07-27 2018-01-23 山东储备物资管理局八三二处 A kind of algorithm that casing loading pattern is intelligently generated based on multi-constraint condition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于神经网络的多车多件货物装载问题研究";许菁;《物流科技》;20101210(第12期);第101-104页 *
"基于装箱问题的Hopfield网络优化设计";戎晓剑; 赵晓青; 郝飞龙;《科学技术与工程》;20070101;第7卷(第1期);第119-121页 *

Also Published As

Publication number Publication date
CN108520327A (en) 2018-09-11

Similar Documents

Publication Publication Date Title
CN108520327B (en) Loading method and device of vehicle-mounted cargo and computer readable medium
AU2019202916B2 (en) Quantum computing improvements to transportation
Expósito-Izquierdo et al. An exact approach for the blocks relocation problem
CN109891438B (en) Numerical quantum experiment method and system
CN114662780A (en) Carbon emission prediction method, carbon emission prediction device, electronic apparatus, and storage medium
CN106651280B (en) Logistics transportation scheduling method and system for container ship
US10635078B2 (en) Simulation system, simulation method, and simulation program
CN108647810A (en) The distribution method and device of order shipment, computer-readable medium
WO2016118122A1 (en) Optimization of truck assignments in a mine using simulation
CN108229536A (en) Optimization method, device and the terminal device of classification prediction model
Zhong et al. Adaptive autotuning mathematical approaches for integrated optimization of automated container terminal
Yu et al. Learning whale optimization algorithm for open vehicle routing problem with loading constraints
US11625451B2 (en) Local search with global view for large scale combinatorial optimization
US20160189026A1 (en) Running Time Prediction Algorithm for WAND Queries
CN113255980A (en) Three-dimensional boxing processing method and device, electronic equipment and storage medium
CN111859625A (en) Energy-saving control method and device based on big data and storage medium
CN109784687B (en) Smart cloud manufacturing task scheduling method, readable storage medium and terminal
CN103955443A (en) Ant colony algorithm optimization method based on GPU (Graphic Processing Unit) acceleration
Hirashima et al. A new reinforcement learning for group-based marshaling plan considering desired layout of containers in port terminals
Wu et al. Optimal scheduling for retrieval jobs in double-deep AS/RS by evolutionary algorithms
CN114239931A (en) Method and device for realizing logistics storage loading scheduling based on improved ant colony algorithm
CN108897818B (en) Method and device for determining aging state of data processing process and readable storage medium
CN113743841A (en) Order processing method and device, electronic equipment and readable storage medium
Gödeke et al. A Simulative Approach to AMR Fleet Sizing in Decentralized Multi-Robot Task Allocation
Kiknadze et al. INNOVATIVE MODEL DESIGN FOR THE MANAGEMENT OF REGIONAL SUSTAINABLE DEVELOPMENT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant