CN115271833B

CN115271833B - Method and system for predicting demand of shared bicycle

Info

Publication number: CN115271833B
Application number: CN202211191739.6A
Authority: CN
Inventors: 徐博; 唐懋然; 彭凯; 胡梦兰; 徐晓慧; 谢江山; 彭聪
Original assignee: Hubei Chutianyun Co ltd; Huazhong University of Science and Technology
Current assignee: Hubei Chutianyun Co ltd; Huazhong University of Science and Technology
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-08-25
Anticipated expiration: 2042-09-28
Also published as: CN115271833A

Abstract

The invention provides a method and a system for predicting demand of a shared bicycle, wherein the method comprises the following steps: acquiring historical demand data of each shared bicycle station to obtain a historical demand feature matrix, and generating an adjacency matrix representing the station adjacency relation according to the historical order data; inputting the historical demand characteristic matrix and the adjacent matrix into a graph convolution neural network, obtaining a characteristic matrix containing neighbor site demand information, inputting into a deep self-attention network, and extracting a shared bicycle demand time domain information matrix; and inputting the time domain information matrix of the shared bicycle demand into a convolutional neural network, a residual error structure and a full connection layer, and outputting the demand predicted value of each shared bicycle station in the next time period. According to the invention, the multi-head attention mechanism of the deep self-attention network native can better learn the target interest characteristics in the time domain characteristics and the airspace characteristics, so that the accuracy of demand prediction is improved to a certain extent, and the problem of short-time sharing bicycle demand prediction is better solved.

Description

Method and system for predicting demand of shared bicycle

Technical Field

The invention relates to the field of traffic prediction, in particular to a method and a system for predicting the demand of a shared bicycle.

Background

The specific difficulties and emphasis that an enterprise needs to consider in operating a shared bicycle system are numerous, one of which is how to implement an efficient shared bicycle supply and demand balancing network. Therefore, the conventional management method mainly adopts a manual monitoring system, and balances the supply and demand requirements of shared bicycles at different sites at different moments through different transportation means. However, in the method, the number of shared bicycles of a certain station in a future period is determined according to historical experience by a driver master for transporting the shared bicycles in most scenes, and the method often causes the unbalance of supply and demand because of the error estimation of the number of the shared bicycles and the unexpected flow of a vehicle transportation driver, so that great inconvenience is brought to consumers, great loss is brought to enterprises, and more importantly, the number of the shared bicycles of the certain station is greatly increased because of the unreasonable estimation of the number of the shared bicycles of the station, and great negative influence is brought to traffic and urban management. Thus, it is important to accurately predict the number of bicycles available to the user at any time, place by taking a more aggressive approach due to uncertainty in the number of vehicles renting out and flowing in at any site.

Because of the advantages of long-short-term memory network in processing time sequence model and the advantages of graph convolution neural network in extracting time domain characteristics, a plurality of models based on long-short-term memory network and graph convolution neural network are proposed to solve the problem of forecasting the demand of sharing bicycles. However, the long-term memory network model is not good at being in airspace characteristics, and meanwhile, the long-term memory network cannot efficiently extract interesting characteristics in data input by a user, so that a space-time-based graph convolution model is proposed, but the model still has some defects, and the problems of attention mechanism processing are still lacking in consideration and optimization, how to integrate attention mechanism introduction with multi-source data characteristics, and the like.

Disclosure of Invention

Aiming at the technical problems existing in the prior art, the invention provides a method and a system for predicting the demand of a shared bicycle.

According to a first aspect of the present invention, there is provided a shared bicycle demand prediction method, comprising:

acquiring historical demand data of each shared bicycle station, splicing the historical demand data to obtain a historical demand feature matrix, and generating an adjacency matrix representing the station adjacency relation according to the historical order data;

the historical demand characteristic matrix and the adjacency matrix are input into a graph convolutional neural network to obtain the characteristics of the shared bicycle demand space and the topology information in each time period, and a characteristic matrix containing neighbor site demand information is obtained;

inputting the feature matrix containing the neighbor site demand information into a deep self-attention network, and calculating and extracting a shared bicycle demand time domain information matrix through a multi-head attention mechanism and a feedforward neural network;

and inputting the time domain information matrix of the shared bicycle demand into a convolutional neural network, carrying out dimension ascending on the time domain information matrix of the shared bicycle demand, connecting the time domain information matrix of the shared bicycle demand after dimension ascending with the characteristic matrix of the historical demand through a residual structure, and outputting the predicted value of the demand of each shared bicycle station in the next time period through a full connection layer.

According to a second aspect of the present invention, there is provided a shared bicycle demand prediction system comprising:

the first acquisition module is used for acquiring the historical demand data of each shared bicycle site, splicing the historical demand data to obtain a historical demand characteristic matrix, and acquiring a generated adjacency matrix representing the site adjacency relation according to the historical order data;

the second acquisition module is used for inputting the historical demand characteristic matrix and the adjacency matrix into a graph convolutional neural network so as to acquire the shared bicycle demand space and topology information characteristics in each time period and obtain a characteristic matrix containing neighbor site demand information;

the extraction module is used for inputting the feature matrix containing the neighbor site demand information into a deep self-attention network, and extracting a shared bicycle demand time domain information matrix through calculation of a multi-head attention mechanism and a feedforward neural network;

the prediction output module is used for inputting the time domain information matrix of the shared bicycle demand into the convolutional neural network, carrying out dimension ascending on the time domain information matrix of the shared bicycle demand, connecting the time domain information matrix of the shared bicycle demand after dimension ascending with the characteristic matrix of the historical demand through a residual structure, and outputting the predicted value of the demand of each shared bicycle station in the next time period through the full connection layer.

According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor for implementing the steps of the shared bicycle demand prediction method when executing a computer management class program stored in the memory.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer management class program which, when executed by a processor, implements the steps of a shared bicycle demand prediction method.

According to the method and the system for predicting the demand of the shared bicycle, provided by the invention, the multi-head attention mechanism of the deep self-attention network is used for better learning the target interest characteristics in the time characteristics and the airspace characteristics, so that the accuracy of demand prediction is improved to a certain extent, and the problem of short-time demand prediction of the shared bicycle is better solved.

Drawings

FIG. 1 is a flow chart of a method for predicting demand of a shared bicycle;

FIG. 2 is a schematic diagram of the flow of network data;

FIG. 3 is a diagram illustrating a deep self-attention network selecting loss values corresponding to different super-parameters;

FIG. 4 is a schematic diagram of a system for predicting demand of a shared bicycle according to the present invention;

fig. 5 is a schematic hardware structure of one possible electronic device according to the present invention;

fig. 6 is a schematic hardware structure of a possible computer readable storage medium according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In addition, the technical features of each embodiment or the single embodiment provided by the invention can be combined with each other at will to form a feasible technical scheme, and the combination is not limited by the sequence of steps and/or the structural composition mode, but is necessarily based on the fact that a person of ordinary skill in the art can realize the combination, and when the technical scheme is contradictory or can not realize, the combination of the technical scheme is not considered to exist and is not within the protection scope of the invention claimed.

Based on the problems in the background technology, the invention firstly brings forward the deep self-attention network model to the scenes such as demand prediction of the shared bicycle, and brings forward a new break for the current situation of the field, and has important innovation. Meanwhile, the deep self-attention network and the graph convolution neural network model are combined to be applied to predict the demand quantity prediction of the shared bicycle, and the experimental part proves that the method has strong practical operability. Therefore, the method not only has innovation and breakthrough in theoretical significance, but also has important practical significance in more important fields related to sharing of the bicycle, such as income operation of a specific sharing bicycle enterprise, policies of related urban capacity, and the like.

Fig. 1 is a flowchart of a method for predicting demand of a shared bicycle according to the present invention, as shown in fig. 1, where the method mainly includes:

s1, acquiring historical demand data of each sharing bicycle station, splicing the historical demand data to obtain a historical demand feature matrix, and generating an adjacency matrix representing the adjacent relation of the stations according to historical order data.

It can be understood that the historical demand data of each shared bicycle station is obtained, and the historical demand characteristic matrix X is obtained after the historical demand data are spliced ^N*M Where N represents the number of stations and M represents the time step, i.e., the historical demand data divides the time period in time steps, and the historical demand data of each shared bicycle station is counted. And calculating the site where the initial position of each track is located according to the final position and the initial position of the order, and generating an adjacency matrix A representing the adjacency relationship of the sites.

S2, inputting the historical demand characteristic matrix and the adjacent matrix into a graph convolutional neural network to obtain the characteristics of the shared bicycle demand space and the topology information in each time period, and obtaining the characteristic matrix containing the neighbor site demand information.

It will be appreciated that in order to better represent the historical demand data for each shared bicycle site and the adjacency between sites, a graph structure may be used to define a graph structure asWhereinRepresenting a matrix of vertices of size N, each vertex representing a site,is a vector representation of each vertex that represents the shared bicycle demand of a single site.Representing the collection of edges, wherein the existence of the edges represents the connection relation between the sites and the orders of the sites, and the value of the bicycle demand quantity has the mutual influence relation.Representing an adjacency matrix, wherein only 0 and 1 are elements in the A matrix, representing that edge connection exists between two sites, wherein the specific value is set manually after the adjacency relationship of the global edge is analyzed, and wherein A is that _ij Representing the connection between vertex i and vertex j.

The feature matrix obtained through the two graph convolution calculations is characterized in that the demand vector representation of each vertex contains the demand information of the adjacent matrix, so that the extraction of the demand space information is realized. The method comprises the steps that after a graph convolution neural network is adopted, the characteristics of the shared bicycle demand space and the topology information in each time period can be obtained, and a characteristic matrix containing neighbor site demand information is obtained.

And S3, inputting the feature matrix containing the neighbor site demand information into a deep self-attention network, and calculating and extracting a shared bicycle demand time domain information matrix through a multi-head attention mechanism and a feedforward neural network.

As an embodiment, the depth self-focusing network includes a plurality of coding layers, each coding layer includes a multi-head focusing mechanism layer and a feedforward neural network layer, the plurality of coding layers are connected in series, the feature matrix containing neighbor site demand information is input into the depth self-focusing network, and the shared bicycle demand time domain information matrix is calculated through the multi-head focusing mechanism and extracted through the feedforward neural network, including: dividing the feature matrix containing the neighbor site demand information into site sharing bicycle demand vectors according to time steps, wherein each site sharing bicycle demand vector represents the demand of each site sharing bicycle in a corresponding time period, and the length of each site sharing bicycle is the number of sites; inputting the shared bicycle demand vector of each station into a plurality of coding layers connected in series, and performing multi-head attention operation through a multi-head attention mechanism layer in each coding layer to obtain a corresponding operation result; and calculating to obtain a time domain information matrix of the shared bicycle demand through the feedforward neural network layer based on the calculation results of the multiple multi-head attention mechanism layers.

It will be appreciated that, referring to fig. 2, the depth self-attention network includes a plurality of coding layers, each coding layer includes a multi-head attention mechanism layer and a feedforward neural network layer, each multi-head attention mechanism layer includes a plurality of head attention mechanisms, after inputting a feature matrix containing neighbor site demand information into a transform coding layer of the depth self-attention network, the feature matrix is split into demand vectors of each site according to time steps, and the length of the vectors is the number of sites to be predicted. The vector is used as Q, K, V in the transducer coding layer to carry out multi-head attention operation:

；

q, K, V in the attention mechanism represent the query, key and value, respectively, where the weight of the V value is calculated by the query Q and key K, and after calculating the weight, the weighted sum of the V values is calculated. It should be noted that, for each coding layer, the value of Q, K, V is the same, and for different coding layers, the value of Q, K, V is different, and the specific calculation of the attention mechanism is as follows:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,representing the dimensions of the key, in the multi-head attention mechanism, the head of the ith head attention mechanism is calculated as follows:

；

in the method, in the process of the invention,，andrepresenting the linear variation of Q, K, V in the head of the ith head attention mechanism, respectively. The operation process of the multi-head attention mechanism is as follows: the calculated output is consistent with the dimension of the original input Q, K and V through parallel connection of each head attention operation result and projection of the linear layer, and the attention calculation can be carried out on the Q, K and V in different projection modes through a multi-head attention mechanism. In a specific formula calculation, if the number of heads in the attention mechanism is denoted by n, W ⁰ Representing the weights of the projected linear layers. In a specific formula calculation, if the number of heads in the multi-head attention mechanism is represented by n, the multi-head attention can be represented as shown in the following formula:

。

by multi-head attention calculation, vectors with larger influence on the demand in the input time step length can be subjected to larger weighting calculation, so that the demand can be predicted in the time dimension. After passing through the multi-head attention mechanism layer, the output demand quantity feature matrix enters a transducer feedforward neural network to carry out nonlinear transformation to obtain a final feature matrix.

Wherein, the formula of the feedforward neural network in the transducer coding layer can be defined as:

；

wherein W is ₁ ，W ₂ And b ₁ ，b ₂ The weights and the biases of two linear layers in the feed-forward neural network are respectively.

An exponential linear unit activation function (Exponential Linear Unit, ELU) is used in the present invention as an activation function for the feedforward neural network. The ELU activation function is defined in the following manner:

；

where a is an adjustable constant, the feedforward neural network based on the ELU activation function can be expressed as:

。

as an embodiment, the parameters of the deep self-attention network include a time slot length, the number of heads in the multi-head attention mechanism, and the amount of data processed in batch, wherein the number of heads in each of the head attention mechanism layers is the same; and for a plurality of parameters in the depth self-focusing network, changing one parameter, and keeping other parameters unchanged, so that the one parameter when the loss function loss is minimum is an optimal parameter, and acquiring a plurality of optimal parameters of the depth self-focusing network.

It will be appreciated that in the present invention, the super parameters specifically referred to by the deep self-attention network include the time slot length, the number of heads in the multi-head attention mechanism, and the amount of data being batched. In a specific experiment, the time slot length was set from 5 to 15, and the range of data amounts for batch processing included (4, 8, 12, 16, 20, 24, 28, 32).

The number of heads for each multi-head attention mechanism in the deep self-attention network layer is the same, with specific range settings from 2 to 10. In the specific adjustment process of the super-parameters, the rules of the control variable method are used, and only the parameter values to be adjusted in the experiment are changed, and the variables except the variables are kept unchanged.

Specifically, in the process of super-parameter adjustment, a value is randomly selected from the range values of each parameter to be adjusted to be used as an initialized parameter, and batch processing is started from the beginning of a second round of experimentsThe data volume is adjusted while keeping the value of the other super-parameters unchanged. In this way, the present invention can determine the corresponding loss function valueThe value of the super parameter of the data amount of the batch processing at the minimum time is taken as the optimal value of the super parameter. At the same time, this value will be fixed and not adjusted during the subsequent experiments. By analogy, by using the control variable adjustment method with the same quantity for batch processing, super parameters such as the time slot length, the quantity of heads in each multi-head attention mechanism and the like are continuously adjusted in the experiment. After the specific adjustment of the above-mentioned hyper-parameter values is completed, the combination of values of each optimal hyper-parameter is the value of the optimal hyper-parameter of the deep self-attention network.

In the present invention, referring to fig. 3, the losses of the super parameters of the deep self-attention network corresponding to different values are shown, and MAE and RMSE are respectively used as loss functions, where, in order to significantly influence the RMSE and MAE along with the super parameter tuning result, the values of RMSE and MAE in fig. 3 are the results obtained by multiplying the values of RMSE and MAE by 100 in the experimental results. From fig. 3, the query Q, key K, and V values in the model for predicting the shared bicycle demand based on the combination of the deep self-attention network transform coding layer and the graph rolling neural network GCN can be set to 12, the number of heads of the multi-head attention mechanism to 4, the time slot length to 12, and the data volume per batch of batch processing to 4.

S4, inputting the time domain information matrix of the shared bicycle demand into a convolutional neural network, carrying out dimension ascending on the time domain information matrix of the shared bicycle demand, connecting the time domain information matrix of the shared bicycle demand after dimension ascending with the characteristic matrix of the historical demand through a residual structure, and outputting the demand predicted value of each shared bicycle station in the next time period through a full connection layer.

It can be understood that the feature matrix output by the transform coding layer is used as the input of the 1*1 convolutional neural network, the dimension of the feature matrix is increased, wherein the input channel of the convolutional layer is 1, and the output channel is 8, so that the aggregation of the depth time-space domain features is realized, the feature matrix after the dimension increase is connected with the original feature matrix in a residual structure, the condition that gradient disappears in the previous learning process is prevented, and finally, a full-connection layer is connected to project tensor as a predicted value, namely, the predicted value Yt of the bicycle demand of each station in the next time period.

Referring to fig. 2, for a data processing flow chart of each network, a bicycle history demand matrix of each station and an adjacency matrix between stations are input into a graph convolution neural network, and a feature matrix containing neighbor station demand information is output. And inputting a feature matrix containing neighbor site demand information into a deep self-attention network, wherein the deep self-attention network comprises a multi-head attention mechanism layer and a feedforward neural network layer, and extracting a shared bicycle demand time domain information matrix through multi-head attention calculation and the feedforward neural network. The method comprises the steps of inputting a shared bicycle demand time domain information matrix into two 1*1 convolutional neural networks, ascending dimensions of the shared bicycle demand time domain information matrix, connecting the ascending dimensions of the shared bicycle demand time domain information matrix with a historical demand feature matrix through a residual structure, and outputting demand predicted values of all shared bicycle stations in the next time period through a full connection layer.

The python-based deep learning framework PyTorch is used for running the sharing bicycle demand prediction method based on the GCN-converter combination model. The data set sources are travel order data of back-running bicycles in Xiamen of Fujian province, and the data quantity is 9336117, wherein the total track condition of 310867 bicycles is obtained. The site demand after 15 minutes, 30 minutes, 45 minutes and 60 minutes was predicted in experimental comparison of the baseline model with the model proposed by the present invention, and the results of the comparative experiments of the nine baseline models SVR, XGBoost, LSTM, CNN-1d, CNN-2d, resNet, GCN, GCN — LSTM, tansformer with the shared bicycle demand prediction method based on the GCN-transporter combination model proposed by the present invention are shown in tables 1 and 2.

TABLE 1

TABLE 2

From tables 1 and 2, it can be seen that the individual demands of the individual stations are predicted by various models, and that the individual demands of the individual stations are predicted based on the gcn_transducer model with the smallest loss in table 1 using MAE as the loss evaluation. Similarly, table 2 uses RMSE as a loss evaluation, and it can be seen that the loss is minimal when predicting the bicycle demand at each station based on the gcn_transporter model. Thus, as can be seen from tables 1 and 2, the gcn_transducer model predicts the bicycle demand at each site with higher accuracy.

Referring to fig. 4, the shared bicycle demand prediction system provided by the invention includes a first acquisition module 401, a second acquisition module 402, an extraction module 403, and a prediction output module 404, where:

a first obtaining module 401, configured to obtain historical demand data of each shared bicycle site, splice the historical demand data to obtain a historical demand feature matrix, and obtain, according to historical order data, a generated adjacency matrix representing a site adjacency relationship;

a second obtaining module 402, configured to convolve the historical demand feature matrix and the adjacency matrix input map with a neural network, so as to obtain a feature matrix containing neighbor site demand information by sharing a bicycle demand space and topology information features in each time period;

the extracting module 403 is configured to input the feature matrix containing the neighbor site demand information into a deep self-attention network, and extract a shared bicycle demand time domain information matrix through multi-head attention calculation and a feedforward neural network;

the prediction output module 404 is configured to input the time domain information matrix of the shared bicycle demand into a convolutional neural network, upgrade the time domain information matrix of the shared bicycle demand, connect the time domain information matrix of the shared bicycle demand after upgrade with the feature matrix of the historical demand through a residual structure, and output the predicted value of the demand of each shared bicycle station in the next time period through a full connection layer.

It may be understood that the shared bicycle demand prediction system provided by the present invention corresponds to the shared bicycle demand prediction method provided in the foregoing embodiments, and the relevant technical features of the shared bicycle demand prediction system may refer to the relevant technical features of the shared bicycle demand prediction method, which are not described herein.

Referring to fig. 5, fig. 5 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 5, an embodiment of the present invention provides an electronic device 500, including a memory 510, a processor 520, and a computer program 511 stored in the memory 510 and executable on the processor 520, wherein the processor 520 executes the computer program 511 to implement the steps of the method for predicting the demand of a shared bicycle.

Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of a computer readable storage medium according to the present invention. As shown in fig. 6, the present embodiment provides a computer-readable storage medium 600 on which a computer program 611 is stored, which computer program 611, when executed by a processor, implements the steps of the shared bicycle demand prediction method.

According to the shared bicycle demand prediction method and the shared bicycle demand prediction system, the multi-head attention mechanism of the deep self-attention network is used for better learning the target interest features in the time domain features and the airspace features, so that the accuracy of demand prediction is improved to a certain extent, and the problem of short-time shared bicycle demand prediction is better solved.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for predicting demand of a shared bicycle, comprising:

inputting the shared bicycle demand time domain information matrix into a convolutional neural network, carrying out dimension ascending on the shared bicycle demand time domain information matrix, connecting the dimension-ascending shared bicycle demand time domain information matrix with the historical demand feature matrix through a residual structure, and outputting the demand predicted value of each shared bicycle station in the next time period through a full connection layer;

the historical demand characteristic matrix is expressed as X ^N*M Wherein N represents the number of shared bicycle stations, M represents time steps, and the time steps are divided according to the time steps; the generating an adjacency matrix representing the site adjacency relation according to the historical order data comprises the following steps: calculating each of the historical order data according to the final position and the initial positionGenerating an adjacency matrix A representing the adjacency relationship of a site at the site where the initial position of the track is located; the depth self-attention network comprises a plurality of coding layers, each coding layer comprises a multi-head attention mechanism layer and a feedforward neural network layer, the coding layers are connected in series, the feature matrix containing neighbor site demand information is input into the depth self-attention network, and a shared bicycle demand time domain information matrix is extracted through multi-head attention mechanism calculation and the feedforward neural network, and the method comprises the following steps: dividing the feature matrix containing the neighbor site demand information into site sharing bicycle demand vectors according to time steps, wherein each site sharing bicycle demand vector represents the demand of each site sharing bicycle in a corresponding time period, and the length of each site sharing bicycle is the number of sites; inputting the shared bicycle demand vector of each station into a plurality of coding layers connected in series, and performing multi-head attention operation through a multi-head attention mechanism layer in each coding layer to obtain a corresponding operation result; based on the operation results of the calculation of the multiple multi-head attention mechanism layers, calculating through the feedforward neural network layer to obtain a time domain information matrix of the shared bicycle demand; the intent output=intent (Q, K, V),

wherein Q, K, V represents query, key and value in the multi-head attention mechanism layer respectively, wherein the weight of the V value is calculated by the query Q and the key K, and the weighted sum of the V value is calculated after the weights of the query Q and the key K are calculated;

wherein d _k Representing the dimensions of the key; in the multi-head attention mechanism, the calculation mode of the ith head is as follows:

head _i ＝attention(QW _i ^Q ，KW _i ^K ，VW _i ^V )

wherein W is _j ^Q ，W _j ^K And W is _i ^V Respectively representing the linear changes of Q, K and V in the ith head of the multi-head attention mechanism layer; multi-head attention mechanism layerThe outputs of (2) are:

Multihead(Q，K，V)＝concat(head ₁ ，...，head _n )W ⁰

wherein W is ⁰ Representing the weight of the projected linear layer, n being the number of heads in the multi-head attention mechanism; the calculation result based on the calculation of the multiple multi-head attention mechanism layers is calculated by the feedforward neural network layer to obtain a shared bicycle demand time domain information matrix, which comprises the following steps:

taking the output of the multi-head attention mechanism layer as the input of the feedforward neural network layer to obtain a shared bicycle demand feature matrix of each station after nonlinear transformation in different time periods;

the formula of the feedforward neural network layer is expressed as:

FFN(x)＝ELU(xW ₁ +b ₁ )W ₂ +b ₂ ；

the ELU activation function is defined in the following manner:

wherein W is ₁ ，W ₂ And b ₁ ，b ₂ Respectively weighing and biasing two linear layers in a feedforward neural network, wherein a is an adjustable constant;

expressed by the graph structure, a graph structure is defined as g= (v, χ, ε), where v represents a vertex matrix of size N, each vertex represents a site, χ e R ^N Is a vector representation of each vertex, the vector of the vertex represents the shared bicycle demand of a single site, epsilon represents a set of edges, the existence of the edges represents the connection relationship between the site and the order of the site, the value of the bicycle demand has a relationship with each other, A epsilon R ^N*N Representing an adjacency matrix, wherein only 0 and 1 are elements in the A matrix, representing that edge connection exists between two sites, wherein the specific value is set manually after the adjacency relationship of the global edge is analyzed, and wherein A is that _ij Representing the connection between vertex i and vertex j.

2. The method of claim 1, wherein the parameters of the deep self-attention network include a time slot length, a number of heads in a head attention mechanism, and a volume of data processed in batches, wherein the number of heads in each head attention mechanism layer is the same;

and for a plurality of parameters in the depth self-focusing network, changing one parameter, and keeping other parameters unchanged, so that the one parameter when the loss function loss is minimum is an optimal parameter, and acquiring a plurality of optimal parameters of the depth self-focusing network.

3. The method for predicting demand for a shared bicycle according to claim 2, wherein the time slot length is 12, the number of heads in the multi-head attention mechanism is 4, and the amount of data to be mass-processed is 4.

4. A shared bicycle demand prediction system, comprising:

the prediction output module is used for inputting the time domain information matrix of the shared bicycle demand into a convolutional neural network and outputting the time domain information matrix of the shared bicycle demandThe matrix is subjected to dimension lifting, the dimension-lifted shared bicycle demand time domain information matrix is connected with the history demand characteristic matrix through a residual structure, and then the demand predicted value of each shared bicycle station in the next time period is output through a full connection layer; the historical demand characteristic matrix is expressed as X ^N*M Wherein N represents the number of shared bicycle stations, M represents time steps, and the time steps are divided according to the time steps; the generating an adjacency matrix representing the site adjacency relation according to the historical order data comprises the following steps: according to the final position and the initial position in each piece of historical order data, calculating a site where the initial position of each track is located, and generating an adjacency matrix A representing the adjacency relation of the sites; the depth self-attention network comprises a plurality of coding layers, each coding layer comprises a multi-head attention mechanism layer and a feedforward neural network layer, the coding layers are connected in series, the feature matrix containing neighbor site demand information is input into the depth self-attention network, and a shared bicycle demand time domain information matrix is extracted through multi-head attention mechanism calculation and the feedforward neural network, and the method comprises the following steps: dividing the feature matrix containing the neighbor site demand information into site sharing bicycle demand vectors according to time steps, wherein each site sharing bicycle demand vector represents the demand of each site sharing bicycle in a corresponding time period, and the length of each site sharing bicycle is the number of sites; inputting the shared bicycle demand vector of each station into a plurality of coding layers connected in series, and performing multi-head attention operation through a multi-head attention mechanism layer in each coding layer to obtain a corresponding operation result; based on the operation results of the calculation of the multiple multi-head attention mechanism layers, calculating through the feedforward neural network layer to obtain a time domain information matrix of the shared bicycle demand;

attetino output=atttion (Q, K, V), wherein Q, K, V represents query, key and value respectively in the multi-head Attention mechanism layer, wherein the weight of V value is calculated by query Q and key K, and after the weights of query Q and key K are calculated, the weighted sum of V value is calculated;

head _i ＝attention(QW _i ^Q ，KW _i ^K ，VW _i ^V )；

wherein W is _j ^Q ，W _j ^K And W is _i ^V Respectively representing the linear changes of Q, K and V in the ith head of the multi-head attention mechanism layer; the output of the multi-head attention mechanism layer is:

Multihead(Q，K，V)＝concat(head ₁ ，...，head _n )W ⁰ wherein W is ⁰ Representing the weight of the projected linear layer, n being the number of heads in the multi-head attention mechanism; the calculation result based on the calculation of the multiple multi-head attention mechanism layers is calculated by the feedforward neural network layer to obtain a shared bicycle demand time domain information matrix, which comprises the following steps:

the formula of the feedforward neural network layer is expressed as:

FFN(x)＝ELU(xW ₁ +b ₁ )W ₂ +b ₂ ，

the ELU activation function is defined in the following manner:

5. A computer-readable storage medium, having stored thereon a computer management class program which, when executed by a processor, implements the steps of the shared bicycle demand prediction method as claimed in any one of claims 1 to 3.