CN115329683A

CN115329683A - Aviation luggage online loading planning method, device, equipment and medium

Info

Publication number: CN115329683A
Application number: CN202211264060.5A
Authority: CN
Inventors: 张攀; 程九廪; 田金涛; 张威
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2022-11-11
Anticipated expiration: 2042-10-17
Also published as: CN115329683B

Abstract

The invention discloses an aviation luggage online loading planning method, device, equipment and medium. The method comprises the steps of obtaining luggage size information of current luggage to be loaded and stacking type information of a stacking area; inputting the luggage size information and the stack type information into a hierarchical tree search model matched with the stacking area, and acquiring target node characteristics corresponding to each alternative luggage stacking position; inputting the luggage size information, the stack type information and the characteristics of each target node into a deep reinforcement learning model, and acquiring a target luggage stacking position matched with the luggage to be loaded currently; and controlling the mechanical arm to stack the current luggage to be loaded to a target luggage stacking position in the stacking area. The technical scheme of the embodiment of the invention realizes accurate, quick and automatic online loading planning of the aviation luggage, thereby ensuring that the loading of each aviation luggage is compact and stable, reducing the waste of space and effectively improving the economy and the operating efficiency of an airport.

Description

Aviation luggage online loading planning method, device, equipment and medium

Technical Field

The invention relates to the technical field of intelligent aviation logistics, in particular to an aviation luggage online loading planning method, device, equipment and medium.

Background

With the development of civil aviation field in China, the number of civil airports and the annual passenger throughput of the airports in China are rapidly increased, and the airports urgently need to improve the overall operation efficiency through intelligent construction and ensure the travel experience of passengers. At present, most of domestic airports still adopt a manual operation mode to load aviation luggage, the loading efficiency is low, the cost is high, and the waste of resources is caused, so that the design of an intelligent loading algorithm is a key point.

The loading problem of aviation luggage belongs to the problem of online three-dimensional packing, after a plurality of cuboid articles with known length, width and height are given, all the cuboid articles are loaded into a plurality of containers, and the loading utilization rate of the containers is the highest on the premise that specific packing constraint conditions (stability constraint, volume constraint, weight constraint and the like) are met. The most common baggage packing algorithms at present mainly include mathematical programming algorithms and heuristic algorithms.

The packing problem is regarded as an optimization problem with constraints by a mathematical programming algorithm, and planning and calculating are carried out on the articles by utilizing a branch definition method and a 0-1 integer programming model. Although the optimal solution can be accurately solved by using a mathematical programming algorithm, along with the increase of the number of articles and boxes, the calculation complexity of the algorithm is exponentially increased, so that the 'combined explosion' is caused, and the problem of large-scale article boxing is difficult to solve; a heuristic algorithm is formed based on manual loading experience, and related constraint rules of article stacking are fused to obtain an approximate optimal solution of the problem. Although the heuristic algorithm can obtain a feasible solution, the operation time is obviously improved compared with the mathematical programming algorithm, the solution quality is not theoretically guaranteed, and a large amount of operation time cost is still needed.

Disclosure of Invention

The embodiment of the invention provides an aviation luggage online loading planning method, device, equipment and medium, which are used for realizing accurate, rapid and automatic aviation luggage online loading planning.

According to an aspect of the embodiment of the invention, an online loading planning method for aviation luggage is provided, which comprises the following steps:

acquiring the size information of the luggage to be loaded at present and the stacking type information of a stacking area;

inputting the luggage size information and the stack type information into a hierarchical tree search model matched with the stacking area, and acquiring target node characteristics respectively corresponding to each alternative luggage stacking position in the stacking area;

inputting the luggage size information, the stacking type information and the target node characteristics corresponding to each alternative luggage stacking position in the stacking area into a deep reinforcement learning model, and acquiring a target luggage stacking position matched with the current luggage to be loaded;

and after the control mechanical arm stacks the current luggage to be loaded to the target luggage stacking position in the stacking area, updating the hierarchical tree search model according to the target luggage stacking position.

According to another aspect of the invention, an on-line loading planning device for aviation luggage is provided, which comprises:

the real-time information acquisition module is used for acquiring the size information of the luggage to be loaded at present and the stack type information of the stacking area;

the target node characteristic acquisition module is used for inputting the luggage size information and the stacking type information into a hierarchical tree search model matched with the stacking area and acquiring target node characteristics corresponding to each alternative luggage stacking position in the stacking area;

the target luggage stacking position acquisition module is used for inputting luggage size information, stack type information and target node characteristics corresponding to each alternative luggage stacking position in the stacking area into the deep reinforcement learning model, and acquiring a target luggage stacking position matched with the current luggage to be loaded;

and the stacking control module is used for controlling the mechanical arm to stack the current luggage to be loaded to the target luggage stacking position in the stacking area and then updating the hierarchical tree search model according to the target luggage stacking position.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of online loading planning of airline baggage according to any of the embodiments of the present invention.

According to another aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for causing a processor to implement the method for planning the online loading of airline baggage according to any one of the embodiments of the present invention when executed.

According to the technical scheme of the embodiment of the invention, target node characteristics respectively corresponding to the stacking positions of all the optional luggage in the stacking area are obtained in a hierarchical tree search model matched with the stacking area according to the luggage size information of the current luggage to be loaded and the stacking type information of the stacking area, and then the target luggage stacking position matched with the current luggage to be loaded is obtained by using a depth reinforcement learning model; and finally, controlling a mechanical arm to stack the current luggage to be loaded to the target luggage stacking position in the stacking area, and combining hierarchical tree search and a deep reinforcement learning model to realize accurate and rapid automatic online loading planning of the aviation luggage, so that the loading of each aviation luggage is compact and stable, the waste of space is reduced, and the economy and the operating efficiency of an airport are effectively improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1a is a flowchart of an aviation luggage online loading planning method according to an embodiment of the present invention;

FIG. 1b is a flowchart illustrating an on-line loading method of aviation luggage according to an embodiment of the present invention;

fig. 2a is a flowchart of an aviation luggage online loading planning method according to a second embodiment of the present invention;

FIG. 2b is a network architecture diagram of a hierarchical tree search model to which embodiments of the present invention are applicable;

fig. 3a is a flowchart of an aviation luggage online loading planning method according to a third embodiment of the present invention;

FIG. 3b is a network structure diagram of a deep reinforcement learning model according to an embodiment of the present invention;

fig. 3c is a general diagram structure diagram of an aviation luggage online loading planning method to which the embodiment of the invention is applied;

FIG. 3d is a filling rate variation curve of the aviation luggage online loading planning method applied in the embodiment of the present invention during the model training process;

fig. 4 is a structural diagram of an aviation luggage online loading planning device according to a fourth embodiment of the invention;

fig. 5 is a schematic structural diagram of an electronic device implementing the method for planning the online loading of airline baggage according to the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Fig. 1a is a flowchart of an online baggage loading planning method according to an embodiment of the present invention, where the embodiment is applicable to a situation where a robotic arm is controlled to load an airline baggage onto a suitable baggage stacking position in a palletizing region online, and the method may be performed by an online baggage loading planning device, which may be implemented in hardware and/or software and may be generally configured in a terminal or a server having a data processing function therein. As shown in fig. 1a, the method comprises:

and S110, acquiring the size information of the luggage to be loaded at present and the stacking type information of the stacking area.

In this embodiment, a baggage image of the current baggage to be loaded may be acquired by a baggage camera, and baggage size information of the current baggage to be loaded may be acquired by an image recognition technology. Specifically, the baggage size information may include length information, width information, and height information of baggage to be currently loaded.

Similarly, the area image of the stacking area before the current luggage to be loaded is loaded in real time can be obtained through the stacking camera, and the stacking information of the stacking area is obtained through an image recognition technology. Specifically, the stack type information may include a stacking position of each stacked baggage in the stacking area, and baggage size information of each stacked baggage.

It will be appreciated that the palletizing zone may include 0, 1 or more palletized baggage items prior to the loading of the currently loaded baggage item in real time.

And S120, inputting the luggage size information and the stack type information into a hierarchical tree search model matched with the stacking area, and acquiring target node characteristics corresponding to each alternative luggage stacking position in the stacking area.

In this embodiment, the hierarchical tree search model is matched with the stacking area and is configured to store the stacked position of each stacked baggage in the stacking area, and meanwhile, the hierarchical tree search model further stores the stacked position of each alternative baggage calculated based on the stacked position of each stacked baggage in the stacking area.

The alternative luggage stacking position can be understood as a stacking position in the stacking area which is currently in an empty state.

In the embodiment, the stack type information acquired in real time is used, so that the hierarchical tree search model can be further corrected, and the accuracy of the information stored in the hierarchical tree search model is ensured; by using the luggage size information of the luggage to be loaded before, the target node characteristics corresponding to each luggage stacking position can be further quantified and determined in the hierarchical tree search model, and then in the subsequent decision-making process, the most suitable target luggage stacking position with the luggage to be loaded at present is decided.

Specifically, the target node feature may be understood as a node feature formed by each alternative baggage stacking position with respect to the baggage size information of the baggage to be loaded.

S130, inputting the luggage size information, the stack type information and the target node characteristics corresponding to the stacking positions of the optional luggage in the stacking area into a deep reinforcement learning model, and acquiring the stacking position of the target luggage matched with the current luggage to be loaded.

In this embodiment, after the target node features corresponding to the respective candidate baggage stacking positions in the stacking area are obtained, an optimal candidate baggage stacking position may be determined from all the candidate baggage stacking positions as a target baggage stacking position matched with the current baggage to be loaded based on a depth-enhanced learning model obtained through pre-training and in combination with a current reward function.

The reward function can be obtained by combining the luggage size information and the stack type information of the luggage to be loaded in real time in an updating and determining mode.

And S140, after the mechanical arm is controlled to stack the current luggage to be loaded to the target luggage stacking position in the stacking area, updating the hierarchical tree search model according to the target luggage stacking position.

The target luggage stacking position is determined from all the alternative luggage stacking positions, and the mechanical arm can be controlled to stack the current luggage to be loaded to the target luggage stacking position in the stacking area, so that the real-time online loading of the luggage is realized.

Meanwhile, after the current luggage to be loaded is stacked, the current luggage to be loaded becomes loaded luggage stacked to the target luggage stacking position. Furthermore, the hierarchical tree search model corresponding to the stacking area needs to be updated again in combination with the target baggage stacking position, so that the new current baggage to be loaded can continue to use the hierarchical tree search model.

According to the technical scheme of the embodiment of the invention, target node characteristics respectively corresponding to the stacking positions of all the optional luggage in the stacking area are obtained in a hierarchical tree search model matched with the stacking area according to the luggage size information of the current luggage to be loaded and the stacking type information of the stacking area, and then a target luggage stacking position matched with the current luggage to be loaded is obtained by using a depth reinforcement learning model; and finally, controlling a mechanical arm to stack the current luggage to be loaded to a target luggage stacking position in a stacking area, and combining hierarchical tree search and a deep reinforcement learning model to realize accurate and quick automatic online loading planning of the aviation luggage, so that the loading of each aviation luggage is compact and stable, the space waste is reduced, and the economy and the operation efficiency of an airport are effectively improved.

Fig. 1b shows a flowchart of an online loading method for aviation luggage, to which an embodiment of the present invention is applied.

As shown in fig. 1b, on the loading and transporting line of the air baggage, each air baggage is sequentially moved to the position of a compartment of the baggage car, and each compartment of the baggage car corresponds to one palletizing area. All the aviation luggage is conveyed to a stacking area by a sorting conveyor belt and is loaded immediately according to an in-place sequence, only the information of the luggage in the current stacking area and the stacking type information which is loaded are known, the three-dimensional information and the arrival sequence of the follow-up luggage are unknown, and the stacking action is executed by a loading mechanical arm to finish the loading of the luggage.

Wherein, the three-dimensional structure roughness of aviation luggage is relatively poor, for the computational complexity that reduces the algorithm, simplifies the hypothesis to aviation luggage and luggage van carriage: the compartment of the luggage van is a regular cuboid, the bottom surface is flat and uniform, and the bearing capacity is sufficient; the aviation luggage is a regular cuboid, the mass distribution is uniform, the aviation luggage is regarded as a rigid body with enough bearing capacity in the loading process, and no deformation occurs. Further, each airline baggage may be characterized by three characteristics, a length, a width, and a height.

In fig. 1b, the baggage camera arranged on the line Li Yunshu pipeline is used for real-time image acquisition, so that the baggage information (also called baggage size information) matched with the current baggage to be loaded can be obtained, and the stack type information before the current baggage to be loaded is loaded on line each time can be obtained by performing real-time image acquisition through the stack type camera arranged close to the carriage of the baggage car. By inputting the stack type information and the baggage information into the hierarchical tree search model, target baggage stacking positions respectively corresponding to each alternative baggage stacking position can be obtained and correspondingly input into a pre-trained deep reinforcement learning model. The deep reinforcement learning model can decide the optimal action position according to the target luggage stacking position respectively corresponding to each alternative luggage stacking position, namely, the target alternative luggage stacking position is decided from all the alternative luggage stacking positions.

Further, the robot arm may be controlled to perform the loading operation based on the target candidate baggage stacking position, and the currently loaded baggage may be stacked at the target candidate baggage stacking position in the trunk (stacking area) of the baggage car. When one luggage van compartment can not accommodate new aviation luggage, the luggage van compartment is determined to be loaded completely, and the luggage van compartment can be transported to the lower part of the machine for luggage loading. If new luggage to be loaded still exists on the luggage transportation assembly line at the moment, new luggage vehicle carriages can be continuously placed for continuously loading the aviation luggage.

Example two

Fig. 2a is a flowchart of an aviation baggage online loading method according to a second embodiment of the present invention, in this embodiment, an operation of inputting baggage size information and stack type information into a hierarchical tree search model matched with a stacking area and obtaining target node features respectively corresponding to candidate baggage stacking positions in the stacking area is further refined.

Accordingly, as shown in fig. 2a, the method comprises:

s210, acquiring the size information of the luggage to be loaded at present and the stacking type information of the stacking area.

S220, updating each internal node and each leaf node in the hierarchical tree search model according to the stack type information.

Each internal node is used for describing description information of each stacked luggage in the stacking area, and each leaf node is used for describing description information of each alternative luggage stacking position in the stacking area.

In this embodiment, the hierarchical tree search model stores description information of each palletized baggage in the palletizing region and description information of each candidate baggage stacking position in the palletizing region in the form of an internal node and a leaf node.

The description information of the palletized baggage may include baggage size information (length, width, and height) of the palletized baggage, and a baggage stacking position of the palletized baggage in the palletizing area. The description information of the alternative baggage stacking position may be description information of a set region range in the palletizing region, for example, a rectangular coordinate range represented by coordinates of four corner points with a central point of the palletizing region as a coordinate origin.

The internal nodes of the hierarchical tree search model can be uniquely determined according to the stacked luggage in the stacking area, and the leaf nodes of the hierarchical tree search model can be dynamically generated according to the internal nodes. One leaf node corresponds to one alternative baggage stacking position. When the stacking area is empty, the hierarchical tree search model may include only one root node as an internal node and several leaf nodes. The baggage loading process may be considered as an iterative process in which leaf nodes are replaced with internal nodes and several new leaf nodes are generated until the loading algorithm ends when no internal nodes meeting the constraint requirements are generated.

As described above, in order to enable the internal nodes and leaf nodes stored in the hierarchical tree search model to accurately describe the stacking situation of the stacked baggage in the stacking area. After the size information of the luggage to be loaded at present is obtained, the stack type information of the current stacking area is synchronously obtained, and the hierarchical tree search model is updated once (which can also be understood as calibration) based on the stack type information, so that the accuracy of the subsequent calculation process is ensured.

And S230, generating low-dimensional node characteristics respectively corresponding to the stacking positions of the optional luggage in the stacking area according to the luggage size information, the internal nodes and the leaf nodes in the hierarchical tree search model through a multilayer perceptron in the hierarchical tree search model.

In this embodiment, one or more multi-level perceptrons are provided in the hierarchical tree search model for generating low-dimensional node features corresponding to each candidate baggage stacking position (leaf node).

A typical Multi-Layer perceptron (MLP) includes a three-Layer structure: the MLP neural network comprises an input layer, a hidden layer and an output layer, wherein different layers of the MLP neural network are fully connected, so that the identification of data which cannot be linearly separated is realized.

In this embodiment, three independent node-type multilayer sensors may be used to finally describe the low-dimensional node features corresponding to each of the candidate baggage stacking positions.

Correspondingly, in an optional implementation manner of this embodiment, generating, by the multilayer perceptron in the hierarchical tree search model, according to the baggage size information, the internal node and the leaf node in the hierarchical tree search model, the low-dimensional node feature corresponding to each alternative baggage stacking position in the stacking area may include:

inputting each internal node in the hierarchical tree search model into a first node type multilayer perceptron to obtain a first type of characteristics; inputting each leaf node in the hierarchical tree search model into a second node type multilayer sensor respectively, and acquiring second type characteristics corresponding to each leaf node; inputting the luggage size information into a third node type multilayer sensor to obtain a third type of characteristics; and combining the second class of characteristics respectively corresponding to each leaf node with the first class of characteristics and the third class of characteristics respectively to generate low-dimensional node characteristics respectively corresponding to each alternative luggage stacking position in the stacking area.

Wherein, can pass

To describe a first node type multilayer sensor, using

Description information describing each internal node (i.e., each palletized completed baggage) before the loading of the kth currently to-be-loaded baggage is performed. Further, can pass

To describe a first class of features;

similarly, can be obtained by

To describe a second node-based multilayer sensor, use

Describing the description information of the ith leaf node in the hierarchical tree search model before the loading of the kth current luggage to be loaded is carried out, and further, the description information can be obtained

(

) To describe a second class of features corresponding to the ith leaf node;

in addition, can be obtained by

To describe a third node multi-layered sensor, use

Describe the kth currently loaded baggage and, in turn, may pass

A third class of features is described.

Accordingly, can pass

To describe the low-dimensional node characteristics of the ith leaf node (i.e., the ith candidate bag stowage location), wherein,

n is the total number of leaf nodes (alternative baggage stacking positions) in the hierarchical tree search model.

And S240, converting the low-dimensional node characteristics corresponding to each alternative luggage stacking position in the stacking area into high-dimensional node characteristics through a graph attention network in the hierarchical tree search model.

Wherein, a Graph attention network (GAT) is used for extracting high-dimensional node features.

And S250, calculating embedded node characteristics respectively corresponding to the stacking positions of the optional luggage according to the leaf node relation weight in the hierarchical tree search model and the high-dimensional node characteristics respectively corresponding to the stacking positions of the optional luggage in the stacking area through a zooming point-product attention network.

In this embodiment, after obtaining the high-dimensional node features respectively corresponding to each candidate baggage stacking position, in order to describe the target node features of the candidate baggage stacking position more accurately, the weight relationship between the candidate baggage stacking position and other candidate baggage stacking positions is further considered to be fused in the high-dimensional node features of each candidate baggage stacking position, so as to comprehensively consider the relevance between different leaf nodes.

Accordingly, in the scaled dot product attention network in the hierarchical tree search model, the following formula may be used:

and calculating to obtain the embedded node characteristics corresponding to the p-th alternative luggage stacking position

；

Wherein, the first and the second end of the pipe are connected with each other,

、

、

、

is a weight matrix pre-trained in the hierarchical tree search model, n is the total number of leaf nodes in the hierarchical tree search model,

for the high-dimensional node feature corresponding to the p-th candidate baggage stacking position,

for the high-dimensional node feature corresponding to the jth candidate baggage stacking location,

in order to project the dimensions of the feature,

is the transpose operator.

And S260, normalizing the embedded node characteristics corresponding to the alternative luggage stacking positions through a normalization network to obtain target node characteristics corresponding to the alternative luggage stacking positions in the stacking area.

In particular, the normalization network may be a softmax layer in which the normalization network may be according to a formula

And calculating to obtain the target node characteristics corresponding to the p-th alternative luggage stacking position

；

Wherein the content of the first and second substances,

to use a fourth node type multi-layer sensor pair

And (6) processing.

Specifically, a specific implementation manner of inputting the baggage size information and the stack type information into the hierarchical tree search model matched with the stacking area to obtain the target node characteristics corresponding to the stacking positions of the optional baggage in the stacking area is shown in fig. 2b, and fig. 2b shows a network structure diagram of the hierarchical tree search model to which the embodiment of the present invention is applied.

In fig. 2b, the hierarchical tree search model specifically includes three multi-level perceptrons, a graph attention network, a scaled dot product attention network, and a softmax layer, and the final node feature (i.e., the target node feature) of each leaf node can be output to the deep reinforcement learning model according to the input stack type information and the input baggage information (i.e., the baggage size information) through the processing of the network structures.

And S270, inputting the luggage size information, the stack type information and the target node characteristics corresponding to the stacking positions of the optional luggage in the stacking area into the deep reinforcement learning model, and acquiring the stacking position of the target luggage matched with the current luggage to be loaded.

In this embodiment, unlike a method for discretizing an action-state space in a traditional reinforcement learning, the definition of the action-state space in the deep reinforcement learning model is realized through a hierarchical tree search model, and then, a target baggage stacking position finally output by the deep reinforcement learning model is an index of each leaf node currently included in the hierarchical tree search model.

And S280, after the mechanical arm is controlled to stack the current luggage to be loaded to the target luggage stacking position in the stacking area, updating the hierarchical tree search model according to the target luggage stacking position.

In an optional implementation manner of this embodiment, the updating the hierarchical tree search model according to the target baggage stacking position may specifically include:

in the hierarchical tree search model, replacing leaf nodes matched with the target luggage stacking position with internal nodes;

acquiring at least one maximum empty subspace matched with the stacking area according to each internal node obtained by current updating in the hierarchical tree searching model;

and updating each leaf node in the hierarchical tree search model according to the position of the ground corner point respectively corresponding to each maximum empty remainder subspace.

As described above, after an airline baggage is stacked, the internal nodes and leaf nodes included in the hierarchical tree search model need to be updated accordingly to perform a new online loading plan of the current baggage to be loaded. The updating mode of the internal nodes is simple, and the leaf nodes matched with the target luggage stacking positions can be directly replaced by the internal nodes.

The leaf node update policy may adopt an Empty Maximum Space (EMS) policy. Left front lower corner point coordinate using EMS

As a spatial origin, the current EMS size is expressed as

. When it is three-dimensional

After the k-th luggage to be stacked at the stacking position located at the original point of space finishes stacking, the current EMS is divided into three maximum empty remainder spaces, and the coordinates of the original points of the space are respectively

、

、

(ii) a The size of the subspace is respectively

、

、

。

For each maximum empty subspace, the new alternative luggage stacking positions are positioned at four corner points of the bottom surface, namely the lower left corner point

Lower right corner point

Top left corner point

And the upper right corner point

。

In order to realize the accurate loading of the aviation luggage, the coordinate of the planned position in the EMS space needs to be converted to be under the coordinate system of the luggage van, and the coordinate of the lower left corner point of the coordinate system still keeps unchanged; transformation of coordinates of lower right corner point into coordinates of lower right corner point

(ii) a Top left corner point coordinate transformation

(ii) a Transformation of upper right corner point coordinates to

。

According to the technical scheme of the embodiment of the invention, the optimal stacking position of the aviation luggage is calculated by using the deep reinforcement learning algorithm, and the related parameters are adjusted through interaction with the environment, so that the loading of each luggage is compact and stable, the waste of space is reduced, and the economy and the operation efficiency of an airport are improved.

EXAMPLE III

Fig. 3a is a flowchart of an aviation baggage online loading method according to a third embodiment of the present invention, in this embodiment, the operation of inputting the baggage size information, the stacking type information, and the target node characteristics corresponding to each candidate baggage stacking position in the stacking region into the deep reinforcement learning model and obtaining the target baggage stacking position matched with the current baggage to be loaded is further refined.

Correspondingly, as shown in fig. 3a, the method specifically includes:

s310, acquiring the size information of the luggage to be loaded at present and the stacking type information of the stacking area.

And S320, inputting the luggage size information and the stack type information into a hierarchical tree search model matched with the stacking area, and acquiring target node characteristics corresponding to each alternative luggage stacking position in the stacking area.

And S330, determining the luggage category of the luggage to be loaded at present according to the luggage size information through the deep reinforcement learning model.

In this embodiment, the deep reinforcement learning model is obtained by pre-training using an advantageous action evaluation (A2C) algorithm, and is obtained by taking the real data of the baggage car compartment of the aviation baggage at the civil aviation airport as a training sample set and taking the filling rate as an evaluation standard and performing multiple iterative training.

Fig. 3b shows a network structure diagram of a deep reinforcement learning model to which the embodiment of the present invention is applied. As shown in FIG. 3b, the deep reinforcement learning model based on the A2C algorithm includes an Actor network and a Critic network. The Attention model in the Actor network comprises three Attention (namely, attention) models, each Attention model comprises two Linear layers (namely, linear), and each layer is connected by adopting a ReLU activation function; the graph in the Actor network notices that the coding layer comprises a jump Connection layer (namely, skip Connection), a Linear jump Connection layer (namely, linear Skip Connection) and two Linear layers, wherein the Linear jump Connection layer comprises two Linear layers which are connected by a ReLU activation function; the criticic network comprises a linear layer, and the adopted optimizer is an Adam optimizer.

In this embodiment, the Actor decision network is designed based on a pointer mechanism, and is used for outputting and selecting action distribution probability. The global feature may be expressed as

And is and

obtaining a query value q through projection matrix mapping; target node characteristics of each leaf node

Obtaining a set of key values by projecting a matrix

. By query value and all

The sum of the vector dot products of (a) to obtain a fitness function is:

wherein the content of the first and second substances,

a logit function chosen for the leaf node. Using pairs of tanh functions

Cutting is carried out with the range of the parameters

And (5) controlling.

For the dimension of the projection feature, qT is the transpose operation on q.

Accordingly, the probability distribution function for each leaf node may be expressed as:

the Critic network adopts a dominance function to replace a feedback value in the traditional Critic network, and the dominance function is used as an index for measuring whether the current selected action is good or bad

Is defined as:

wherein the content of the first and second substances,

to be under policy

Lower execution action

A corresponding value function;

to be in policy

The sum of the products of the action probability and the function of the value of all possible actions to be taken. The objective of the algorithm is to find the optimal strategy

The jackpot is maximised.

In this embodiment, airline baggage may be classified into five categories A, B, C, D, E in terms of size (e.g., 35cm × 30cm × 15cm and below, 35cm × 30cm × 15cm to 50cm × 45cm × 30cm, 46cm × 41cm × 31cm to 60cm × 50cm × 40cm, 61cm × 51cm × 41cm to 75cm × 70cm × 55cm, others), each currently-to-be-loaded baggage being assigned to only one unique category based on its baggage size information.

And S340, calculating node weights respectively corresponding to the stacking positions of each candidate luggage in the stacking area according to the luggage category and the stacking type information of the current luggage to be loaded through a deep reinforcement learning model.

The inventor finds out through practical tests that: in the optimum case, the same type of air baggage should be loaded as close as possible to the stacking area, in order to reduce the unevenness of the floor of the upper space and to increase the stability of the loading stack.

In the previous example, after the maximum empty subspace is generated according to the EMS method and each leaf node in the hierarchical tree search model is obtained according to the update of each maximum empty subspace, a node weight can be given to each leaf node according to the size information of the kth current baggage to be loaded and the category of the baggage in the stacking type of the stacking completed baggage in the stacking area, and then the node weight of each leaf node can be comprehensively used by the deep reinforcement learning model to decide a target leaf node from all leaf nodes of the hierarchical tree search model.

In an optional implementation manner of this embodiment, calculating, by using a deep reinforcement learning model, node weights respectively corresponding to stacking positions of each candidate baggage in the stacking area according to a baggage category of the current baggage to be loaded and stacking type information may include:

according to the formula:

calculating the node weight of the t-th alternative luggage stacking position relative to the current luggage k to be loaded

；

Wherein the content of the first and second substances,

c is a preset empirical constant which is the volume of the luggage to be loaded at present,

the average distance of the same type of luggage in the luggage k to be loaded and all the stacked luggage in the stacking area is determined.

And S350, setting a reward function of the deep reinforcement learning model according to the node weight corresponding to each alternative luggage stacking position through the deep reinforcement learning model.

In this embodiment, after obtaining the node weights respectively corresponding to each of the candidate baggage stacking positions, the node weights may be obtained by the formula:

setting a reward function corresponding to the t-th alternative luggage stacking position

。

Wherein the content of the first and second substances,

is a scaling constant;

for the volume of the kth currently-to-be-loaded baggage, i.e.

；

Is the total volume of the current luggage compartment, i.e.

L, W, H is the length, width and height of the luggage van compartment volume respectively;

the average of the reward functions for all iterations from the training. To avoid collision, a position reward function is set

Provision is made for: if the luggage is placed close to one side of the mechanical arm, the inner space of the mechanical arm is regarded as an unsafe space

The position after placement is rewarded as

。

And S360, deciding a target leaf node from all leaf nodes of the hierarchical tree search model according to the target node characteristics respectively corresponding to the stacking positions of the optional luggage in the stacking area and the reward function through the deep reinforcement learning model, and taking the target leaf node as the stacking position of the target luggage matched with the current luggage to be loaded.

In this embodiment, the target node characteristics of each leaf node are weighted by the Actor network and the current policy distribution is output

Global features are mapped over a Critic network

And mapping the current luggage loading position to a state value function to calculate the accumulated reward obtained when the current luggage loading is carried out, and finally deciding a target leaf node as a target luggage stacking position matched with the current luggage to be loaded.

And S370, after the mechanical arm is controlled to stack the current luggage to be loaded to the target luggage stacking position in the stacking area, updating the hierarchical tree searching model according to the target luggage stacking position.

According to the technical scheme of the embodiment of the invention, target node characteristics respectively corresponding to the stacking positions of all the optional luggage in the stacking area are obtained in a hierarchical tree search model matched with the stacking area according to the luggage size information of the current luggage to be loaded and the stacking type information of the stacking area, and then a target luggage stacking position matched with the current luggage to be loaded is obtained by using a depth reinforcement learning model; and finally, controlling a mechanical arm to stack the current luggage to be loaded to the target luggage stacking position in the stacking area, and combining hierarchical tree search and a deep reinforcement learning model to realize accurate and rapid automatic online loading planning of the aviation luggage, so that the loading of each aviation luggage is compact and stable, the waste of space is reduced, and the economy and the operating efficiency of an airport are effectively improved.

In the above technology of the embodiments, fig. 3c is a schematic structural diagram of a general diagram of an aviation baggage on-line loading planning method to which the embodiments of the present invention are applied.

With reference to fig. 3c, the training and reasoning processes of the hierarchical tree search model and the deep reinforcement learning model according to the embodiment of the present invention are briefly described.

Before training the hierarchical tree search model and the deep reinforcement learning model, training samples and evaluation criteria need to be determined first. In this embodiment, the real data of the aviation luggage and the luggage van compartment of the civil aviation airport are collected, the data are preprocessed and used for constructing a data set for model training, the data set is processed into a data file in a Pt format, and the data file is divided into a training set and a test set according to the proportion of 8:2.

The data set format comprises the information of the length, the width and the height of the luggage, and each group of data sets form the information of a plurality of luggage into a two-dimensional tensor format construction data set:

wherein li, wi and hi are respectively the length, width and height of the ith luggage, and i belongs to [1,n ].

Three-dimensional data in the data set is an integer with the unit being mm, and if the acquired original data contains decimal, the data needs to be preprocessed. The generation of the data set adopts a random generation method, three-dimensional information is randomly extracted from the preprocessed data to generate the data set, the length of a data set sequence is 3000, 3000 groups of random sequences are included, and each group of random sequences includes 100 pieces of luggage three-dimensional information.

And then, establishing an action-state space representation method, including an action-state space definition method, a leaf node generation method and a selection method, so as to construct and obtain a hierarchical tree search model, and then establishing a deep reinforcement learning Gym environment, including an action space and state space design. In the interaction process of the model and the gym environment, when the loading plan of the luggage to be loaded is planned at each time, the target node characteristics of each leaf node obtained from the hierarchical tree search model are provided for the deep reinforcement learning model, the leaf nodes are subjected to weighting operation through the Actor network, and the current strategy distribution is output

And further obtain the action probability distribution

Performing an action

To obtain a new state

Awards of the game

And a termination flag done, wherein when no leaf node meeting the constraint condition is selected, the termination flag is in position 1, the iteration of the round is ended, and the network parameter of the Actor is updated.

Further, the stored decision network is loaded

Parameter (d) of

Then, the current state s of the compartment and the luggage of the luggage van is obtained in the practical application environment to be used as the Actor network

Is inputted to obtain

Output of (2)

As the optimal solution for the current baggage stacking position. The filling rate is taken as an optimization target, the training mode is that rounds are taken as a unit, 1 round is 1 round, 120000 times of model training are carried out in total, and the filling rate optimization curve is shown in fig. 3 d. It can be obviously observed that the filling rate generally tends to rise, and around 110000 rounds of training, the filling rate of the intelligent agent reaches about 69%, and then the intelligent agent starts to fluctuate around about 69% due to action exploration. From this fluctuation, the strategy obtained by the agent at the end of the training is not the optimal strategy, but it still has an upward trend as seen from its reward curve. Specifically, the luggage assembly line transports luggage to be loaded to shooting areas of a stack type camera and a luggage camera, the stack type information and the luggage information collected by the cameras are input as a state s, and a stored decision network is loaded

Parameters, get

Output of (2)

And as the current stacking position of the luggage, outputting the current stacking position to the end of the mechanical arm executing the action to complete the loading task.

Example four

Fig. 4 is a schematic structural diagram of an online baggage loading and planning device according to a fourth embodiment of the present invention. As shown in fig. 4, the apparatus includes: a real-time information acquisition module 410, a target node characteristic acquisition module 420, a target baggage stacking position acquisition module 430, and a stacking control module 440. Wherein:

a real-time information obtaining module 410, configured to obtain baggage size information of current baggage to be loaded and stack type information of a stacking area;

a target node feature obtaining module 420, configured to input the baggage size information and the stacking type information into a hierarchical tree search model matched with the stacking area, and obtain target node features corresponding to respective candidate baggage stacking positions in the stacking area;

a target baggage stacking position obtaining module 430, configured to input baggage size information, stacking type information, and target node characteristics corresponding to each candidate baggage stacking position in the stacking area into the deep reinforcement learning model, and obtain a target baggage stacking position matched with a current baggage to be loaded;

and the stacking control module 440 is used for controlling the mechanical arm to stack the current luggage to be loaded to the target luggage stacking position in the stacking area, and then updating the hierarchical tree search model according to the target luggage stacking position.

On the basis of the foregoing embodiments, the target node characteristic obtaining module 420 specifically includes:

the hierarchical tree search model updating unit is used for updating each internal node and each leaf node in the hierarchical tree search model according to the stack type information;

each internal node is used for describing description information of each palletized finished baggage in the palletizing area, and each leaf node is used for describing description information of each alternative baggage stacking position in the palletizing area;

the low-dimensional node feature generation unit is used for generating low-dimensional node features respectively corresponding to the stacking positions of the optional luggage in the stacking area according to the luggage size information, the internal nodes and the leaf nodes in the hierarchical tree search model through a multilayer sensor in the hierarchical tree search model;

the high-dimensional node feature conversion unit is used for converting low-dimensional node features corresponding to each alternative luggage stacking position in the stacking area into high-dimensional node features through a graph attention network in the hierarchical tree search model;

the embedded node feature calculating unit is used for calculating embedded node features respectively corresponding to each alternative luggage stacking position in the stacking area according to leaf node relation weights in the hierarchical tree searching model and high-dimensional node features respectively corresponding to each alternative luggage stacking position in the stacking area through a zoom dot product attention network;

and the normalization processing unit is used for performing normalization processing on the embedded node characteristics respectively corresponding to the alternative luggage stacking positions through a normalization network to obtain target node characteristics respectively corresponding to the alternative luggage stacking positions in the stacking area.

On the basis of the foregoing embodiments, the low-dimensional node feature generation unit may be specifically configured to:

inputting each internal node in the hierarchical tree search model into a first node type multilayer perceptron to obtain a first type of characteristics;

inputting each leaf node in the hierarchical tree search model into a second node type multilayer sensor respectively, and acquiring second type characteristics corresponding to each leaf node;

inputting the luggage size information into a third node type multilayer sensor to obtain a third type of characteristics;

and combining the second class of characteristics respectively corresponding to each leaf node with the first class of characteristics and the third class of characteristics respectively to generate low-dimensional node characteristics respectively corresponding to each alternative luggage stacking position in the stacking area.

On the basis of the foregoing embodiments, the embedded node feature calculating unit may be specifically configured to:

according to the formula:

calculating to obtain the embedded node characteristics corresponding to the p-th alternative luggage stacking position

；

Wherein the content of the first and second substances,

、

、

、

in order to project the dimensions of the feature,

is the transpose operator.

On the basis of the foregoing embodiments, the normalization processing unit may be specifically configured to:

according to the formula:

；

Wherein the content of the first and second substances,

for using a fourth node type multilayer sensor pair

And (6) processing.

On the basis of the foregoing embodiments, the target baggage stacking position obtaining module 430 may specifically include:

the luggage type determining unit is used for determining the luggage type of the luggage to be loaded currently according to the luggage size information through the deep reinforcement learning model;

the node weight calculation unit is used for calculating the node weights respectively corresponding to the stacking positions of each candidate luggage in the stacking area according to the luggage category and the stacking type information of the current luggage to be loaded through a deep reinforcement learning model;

the reward function setting unit is used for setting a reward function of the deep reinforcement learning model according to the node weight corresponding to each alternative luggage stacking position through the deep reinforcement learning model;

the target leaf node decision unit is used for deciding a target leaf node from all leaf nodes of the hierarchical tree search model according to target node characteristics respectively corresponding to each alternative luggage stacking position in the stacking area and the reward function through a deep reinforcement learning model, and the target leaf node is used as a target luggage stacking position matched with the current luggage to be loaded;

the deep reinforcement learning model is obtained by using a dominant motion evaluation algorithm for pre-training, the real data of a luggage van compartment of aviation luggage at a civil aviation airport is used as a training sample set, the filling rate is used as an evaluation standard, and the deep reinforcement learning model is obtained by multiple iterative training.

On the basis of the foregoing embodiments, the node weight calculating unit may be specifically configured to:

according to the formula:

；

Wherein the content of the first and second substances,

c is a preset empirical constant for the volume of the luggage to be loaded,

and the average distance between the current to-be-loaded luggage k and all stacked luggage in the stacking area in the same category of luggage is calculated.

On the basis of the foregoing embodiments, the stacking control module 440 may be specifically configured to:

and updating each leaf node in the hierarchical tree search model according to the position of the ground corner point corresponding to each maximum empty subspace.

The aviation luggage online loading planning device provided by the embodiment of the invention can execute the aviation luggage online loading planning method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE five

FIG. 5 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 may also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, for example, implementing an airline baggage on-line loading planning method, namely:

In some embodiments, the method of online load planning for airline baggage may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When loaded into RAM 13 and executed by processor 11, may perform one or more of the steps of an airline baggage online loading planning method described above. Alternatively, in other embodiments, the processor 11 may be configured to perform an airline baggage on-line loading planning method in any other suitable manner (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An aviation luggage online loading planning method is characterized by comprising the following steps:

inputting the luggage size information and the stack type information into a hierarchical tree search model matched with the stacking area, and acquiring target node characteristics corresponding to each alternative luggage stacking position in the stacking area;

2. The method according to claim 1, wherein inputting the baggage size information and the stacking type information into a hierarchical tree search model matched with a stacking area, and acquiring target node features respectively corresponding to alternative baggage stacking positions in the stacking area comprises:

updating each internal node and each leaf node in the hierarchical tree search model according to the stack type information;

each internal node is used for describing description information of each stacked luggage in the stacking area, and each leaf node is used for describing description information of each alternative luggage stacking position in the stacking area;

generating low-dimensional node characteristics respectively corresponding to each alternative luggage stacking position in the stacking area according to the luggage size information, the internal nodes and the leaf nodes in the hierarchical tree searching model through a multilayer sensor in the hierarchical tree searching model;

converting low-dimensional node features respectively corresponding to each alternative luggage stacking position in the stacking area into high-dimensional node features through a graph attention network in a hierarchical tree search model;

calculating embedded node characteristics respectively corresponding to each alternative luggage stacking position according to leaf node relation weight in the hierarchical tree search model and high-dimensional node characteristics respectively corresponding to each alternative luggage stacking position in the stacking area through a scaling dot product attention network;

and carrying out normalization processing on the embedded node characteristics respectively corresponding to the alternative luggage stacking positions through a normalization network to obtain target node characteristics respectively corresponding to the alternative luggage stacking positions in the stacking area.

3. The method of claim 2, wherein generating, by a multilayer perceptron in the hierarchical tree search model, low-dimensional node features respectively corresponding to alternative baggage stacking positions in the stacking area according to the baggage size information, internal nodes and leaf nodes in the hierarchical tree search model comprises:

4. The method of claim 2, wherein calculating, by scaling the dot product attention network, embedded node features corresponding to each of the candidate baggage stacking positions in the hierarchical tree search model according to leaf node relationship weights in the hierarchical tree search model and high-dimensional node features corresponding to each of the candidate baggage stacking positions in the stacking area, comprises:

according to the formula:

；

、

、

、

for the high-dimensional node feature corresponding to the pth candidate baggage stacking position,

in order to project the dimensions of the feature,

is the transpose operator.

5. The method according to claim 4, wherein the normalization processing is performed on the embedded node features respectively corresponding to the alternative baggage stacking positions through a normalization network to obtain target node features respectively corresponding to the alternative baggage stacking positions in the stacking area, and the method comprises the following steps:

according to the formula:

；

Wherein the content of the first and second substances,

to use a fourth node type multi-layer sensor pair

And (4) processing.

6. The method according to any one of claims 2 to 5, wherein the step of inputting the baggage size information, the stacking type information and the target node characteristics corresponding to the candidate baggage stacking positions in the stacking area into the deep reinforcement learning model to obtain the target baggage stacking position matched with the current baggage to be loaded comprises the steps of:

determining the luggage category of the luggage to be loaded at present according to the luggage size information through a deep reinforcement learning model;

calculating node weights respectively corresponding to the stacking positions of all the alternative luggage in the stacking area according to the luggage category and the stacking type information of the current luggage to be loaded through a deep reinforcement learning model;

setting a reward function of the deep reinforcement learning model according to the node weight corresponding to each alternative luggage stacking position through the deep reinforcement learning model;

determining target leaf nodes from all leaf nodes of a hierarchical tree search model through a deep reinforcement learning model according to target node characteristics and the reward function, wherein the target node characteristics correspond to each alternative luggage stacking position in a stacking area respectively, and the target leaf nodes are used as target luggage stacking positions matched with the current luggage to be loaded;

the deep reinforcement learning model is obtained by using a dominant motion evaluation algorithm for pre-training, the real data of a luggage van carriage of aviation luggage at a civil aviation airport is used as a training sample set, the filling rate is used as an evaluation standard, and the deep reinforcement learning model is obtained by multiple times of iterative training.

7. The method according to claim 6, wherein calculating, by a deep reinforcement learning model, node weights respectively corresponding to each candidate baggage stacking position in the stacking area according to the baggage category of the current baggage to be loaded and the stacking type information comprises:

according to the formula:

；

8. An on-line loading planning device for aviation luggage, which is characterized by comprising:

and the stacking control module is used for controlling the mechanical arm to stack the current luggage to be loaded to a target luggage stacking position in the stacking area and then updating the hierarchical tree search model according to the target luggage stacking position.

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of online load planning of airline baggage according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the method for on-line loading planning of airline baggage as claimed in any one of claims 1 to 7 when executed.