CN113052313A - Mass traffic data knowledge mining and parallel processing method - Google Patents
Mass traffic data knowledge mining and parallel processing method Download PDFInfo
- Publication number
- CN113052313A CN113052313A CN202110456757.1A CN202110456757A CN113052313A CN 113052313 A CN113052313 A CN 113052313A CN 202110456757 A CN202110456757 A CN 202110456757A CN 113052313 A CN113052313 A CN 113052313A
- Authority
- CN
- China
- Prior art keywords
- matrix
- compression
- data
- training
- flight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005065 mining Methods 0.000 title claims abstract description 20
- 238000003672 processing method Methods 0.000 title description 3
- 239000011159 matrix material Substances 0.000 claims abstract description 296
- 238000007906 compression Methods 0.000 claims abstract description 142
- 230000006835 compression Effects 0.000 claims abstract description 139
- 238000012549 training Methods 0.000 claims abstract description 89
- 238000004364 calculation method Methods 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 49
- 238000004891 communication Methods 0.000 claims abstract description 16
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 15
- 239000013598 vector Substances 0.000 claims description 46
- 238000012935 Averaging Methods 0.000 claims description 12
- 230000010006 flight Effects 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000013479 data entry Methods 0.000 claims 1
- 238000001914 filtration Methods 0.000 abstract description 9
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Economics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Remote Sensing (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for mining and parallel processing mass traffic data knowledge, which stores an LSTM model in N computing servers of a distributed cluster, divides a training data set into N sets and inputs the N sets into the N computing servers respectively, and trains the computing servers simultaneously, thereby reducing the training time of the LSTM model. After the calculation server finishes one-time forward propagation and backward propagation, the parameter matrix is transmitted to the parameter server and is decomposed into a form of multiplying the two matrixes by adopting a matrix decomposition method, so that the total quantity of the parameters is reduced, and the communication time consumption is reduced. And setting a matrix compression rate, compressing the matrix by using a self-adaptive threshold filtering method, reducing the number of parameters again, and reducing the communication time again. And calculating error matrixes before and after matrix compression and transmitting the error matrixes to the next round of training, calculating the matrixes after errors are eliminated by using the parameter matrixes and the error matrixes when the next round of training is started, and then training the matrixes to compensate errors caused by matrix compression and ensure the accuracy of the LSTM model.
Description
Technical Field
The invention relates to the technical field of traffic big data, in particular to a method for mining and parallel processing mass traffic data knowledge.
Background
Under the background that the informatization level of the traffic system of China is continuously improved, massive traffic big data is continuously generated, and the traffic big data has the characteristics of multi-source heterogeneity, low knowledge density and the like. How to analyze and utilize the traffic big data and mine the knowledge of the traffic big data so as to generate a guiding function for traffic business is a difficult problem which needs to be solved urgently.
Taking the civil aviation industry in China as an example, with the rapid development of the civil aviation industry in China, the airport passenger traffic volume increases year by year, and according to the data statistics of the civil aviation administration in China, the national passenger transport airline company in 2019 executes flights 461.11 ten thousand times, wherein the normal flights 376.52 ten thousand times, and the average normal rate of the flights is 81.65%. The major airline company performed 330.47 ten thousand flights in 2019, with 269.11 ten thousand flights being normal, and the average flight normal rate was 81.43%. The increasing number of flights causes the possibility of flight congestion and collision in the airspace in China to continuously rise, and if an algorithm capable of accurately and quickly predicting the flight path of the flight in a future period is developed, a controller or a pilot can predict the existence of danger in advance before the flight congestion and collision occur, so that the situation can be avoided in time, the danger is greatly avoided, and the flight safety can be ensured, and the life and property safety of passengers can be protected.
Although scholars at home and abroad make great development in the technical field of flight path prediction by knowledge mining of big aviation data at present, the model training needs too long time and cannot reach the rapid standard, and the slow training process leads to the fact that the flight big flight data cannot be utilized to construct an accurate and efficient flight path online incremental prediction model, which means that the availability of the prediction model is low. Therefore, in order to fully utilize knowledge obtained by mining from flight big data and realize accurate and online incremental prediction of flight paths, it is urgently needed to greatly reduce the training time of a prediction model and provide key support for improving the practicability of the flight path prediction model.
Disclosure of Invention
In view of the above, the invention provides a method for mining and parallel processing mass traffic data knowledge, which is used for overcoming the defects that the existing large neural network for predicting flight track points consumes too long time in the training process and cannot effectively perform online incremental prediction.
The invention provides a method for mining and parallel processing mass traffic data knowledge, which comprises the following steps:
s1: acquiring ADS-B data of a flight to be detected in real time, extracting navigation data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time, from the ADS-B data, and deleting the rest fields;
s2: selecting navigation data of six continuous time slices from the extracted data, inputting the navigation data into an input layer of a flight track prediction model trained in advance, and outputting a prediction result of the navigation data of the flight to be tested through forward propagation;
the training process of the flight path prediction model comprises the following steps:
s11: designing a flight path prediction model, wherein model parameters are designed as follows: the number of nodes of an input layer is set to be 6, the number of nodes of an output layer is set to be 1, the prediction time step length is set to be 6, the number of the hidden layers is set to be 1, the number of nodes of the hidden layers is set to be 60, the training round number is set to be 50, the ReLU is used as an activation function, the cross entropy function is used as a loss function, and the random gradient descent method is used as an optimization method of the loss function;
s12: setting any one server as a parameter server, using the rest N servers as N computing servers, placing all the servers in a local area network to ensure smooth communication among the servers, setting all the servers as a distributed training cluster, copying N copies of a designed flight path prediction model to be respectively deployed in the N computing servers, wherein one flight path prediction model corresponds to one computing server; wherein N is a positive integer;
s13: acquiring historical ADS-B data of a plurality of flights, and extracting flight data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time from the historical ADS-B data to serve as training data;
s14: averagely dividing the training data into N sets as training data sets of N computing servers, wherein one set corresponds to one computing server, and the data belonging to the same flight are divided into the same subset in each set; sequencing track point data update time fields belonging to the same subset, regarding two track points with time intervals larger than a time threshold value in all track points belonging to the same flight as the ending point and the starting point of two track sequences, dividing all track points belonging to the same flight into a plurality of different track sequences, and constructing the track points in each track sequence into input vectorsWherein callsign represents flight number, longitude represents position longitude, latitude represents position latitude, height represents flight altitude, angle represents course angle, and time represents track point data updating time;
s15: starting data input in N calculation servers simultaneously, inputting input vectors of six continuous time slices belonging to the same flight path sequence in each subset of each set into six nodes of an input layer of the corresponding calculation server for forward propagation, outputting a prediction result of navigation data of a seventh time slice by inputting the input vector of one time slice into one node, calculating an error between the output prediction result and actual navigation data, and performing backward propagation on the error to obtain N calculation servers subjected to one-time forward propagation and one-time backward propagationParameter matrix ofWherein, the subscript t represents that the parameter matrix is obtained by the t-th iterative training;
s16: hypothesis parameter matrixIs oneThe parameter matrix is decomposed by matrix decompositionDecomposed into three matrices、Andwherein, the matrixIs oneOf a matrix, a matrixIs oneOf a matrix, a matrixIs oneDiagonal moment ofMatrix, matrixHaving non-negative real singular values in descending order, R representing a parameter matrixThe rank of (d); order to,,Matrix ofIs oneOf a matrix, a matrixOne isA matrix of (a);
s17: determining a training pair matrix for each iterationAndaccording to the compression ratio, determining each iteration training pair matrix in a self-adaptive wayAndcompression threshold of, rootAccording to the compression rate and the compression threshold, respectively corresponding to the matrixAndthe parameter number of the first compression matrix is compressed to obtain a first compression matrix and a second compression matrix, and the first compression matrix and the matrix are compressedSubtracting the corresponding parameters to obtain a first error matrix, and compressing the second compression matrix with the first error matrixSubtracting the corresponding parameters to obtain a second error matrix, and inputting the first compression matrix, the second compression matrix, the first error matrix and the second error matrix into the parameter server;
s18: averaging first compression matrixes of N calculation servers to obtain a first average compression matrix, averaging second compression matrixes of N calculation servers to obtain a second average compression matrix, averaging first error matrixes of N calculation servers to obtain a first average error matrix, averaging second error matrixes of N calculation servers to obtain a second average error matrix, and transmitting the first average compression matrix, the second average compression matrix, the first average error matrix and the second average error matrix back to the N calculation servers;
s19: in N calculation servers, a first average compression matrix and a first average error matrix are used for compensating errors generated by the first compression matrix to obtain a first compensation matrix, a second average compression matrix and a second average error matrix are used for compensating errors generated by the second compression matrix to obtain a second compensation matrix, the first compensation matrix and the second compensation matrix are multiplied to obtain an optimized parameter matrix which is used as a new parameter matrix for next iterative training, the step S15 is returned, forward propagation is carried out again by using the new parameter matrix, a new prediction result is output, errors between the new prediction result and actual navigation data are calculated, errors between the new prediction result and the actual navigation data are evaluated by using a loss function, and the training is stopped when the loss function reaches the minimum value or an overfitting phenomenon occurs in the training process.
In a possible implementation manner, in the method for mining and processing the mass traffic data knowledge provided by the present invention, in step S17, the matrices are respectively processed according to the compression rate and the compression thresholdAndthe parameter number of the first compression matrix and the parameter number of the second compression matrix are compressed to obtain a first compression matrix and a second compression matrix, and the method specifically comprises the following steps:
will matrixSetting elements with the values smaller than the compression threshold value corresponding to the row in the vectors of the middle rows as 0, keeping other elements unchanged, obtaining a filter operator Mask matrix, and calculating a first compression matrix and a second compression matrix according to the following formulas:
wherein the content of the first and second substances,representing a compression matrix, p =1, 2,representing the multiplication of corresponding elements of two isotypic matrices.
In a possible implementation manner, in the method for mining and parallel processing of the mass traffic data knowledge provided by the invention, in step S17, the compression threshold is updated after each L times of iterative training, and the value range of L is 1000-1500.
The method for mining and processing the mass traffic data knowledge provided by the invention stores the designed LSTM model in each computing server of the distributed cluster, divides the mass training data set into sets with the same number as the computing servers when the training process starts, inputs different sets into different computing servers, and trains the computing servers simultaneously, thereby greatly reducing the overall training time of the LSTM model. After the calculation server finishes one-time forward propagation and backward propagation, the parameter matrix obtained by transmission to the parameter server is decomposed into a form of multiplying the two matrixes by a matrix decomposition method, and the total quantity of the parameters of the two matrixes is far less than the quantity of the parameters of the parameter matrix, so that the quantity of the parameters needing to be transmitted by each calculation server is greatly reduced, and the time consumed by communication is correspondingly reduced. On the basis of the two matrixes obtained by decomposition, the compression rate of the matrixes is set, a filtering threshold value is determined in each row vector of the matrixes by using a self-adaptive threshold value filtering method, corresponding filtering operator Mask matrixes are obtained, the filtering operator Mask matrixes are multiplied by the matrixes obtained by decomposition, the compression matrixes can be obtained, the number of integral parameters is greatly reduced again compared with the number of the integral parameters before, and the communication time consumption is greatly reduced again. After compression of the parameter matrix is carried out, some calculation errors are inevitably introduced, and if the calculation errors are not eliminated, the errors can continuously accumulate and influence the final model precision after multiple rounds of training. The invention calculates the error vector before and after each row vector compression in the matrix, and transmits the error vector formed by the error vector to the next round of training of each calculation server, and when the next round of training begins, firstly, the parameter matrix obtained from the parameter server and the error matrix are used for calculating to obtain the parameter matrix with the error eliminated, and then the subsequent training work is carried out, thereby well compensating the error caused by the matrix compression and keeping the accuracy of the finally obtained model basically unchanged.
Drawings
Fig. 1 is a flowchart of a method for mining and processing mass traffic data knowledge in parallel according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only illustrative and are not intended to limit the present invention.
The invention provides a method for mining and parallel processing mass traffic data knowledge, which comprises the following steps:
s1: acquiring ADS-B data of a flight to be detected in real time, extracting navigation data of six fields in total of flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time from the ADS-B data, and deleting the rest fields;
s2: selecting navigation data of six continuous time slices from the extracted data, inputting the navigation data into an input layer of a flight track prediction model trained in advance, and outputting a prediction result of the navigation data of the flight to be tested through forward propagation;
the training process of the flight path prediction model comprises the following steps:
s11: designing a flight path prediction model, wherein model parameters are designed as follows: the number of nodes of an input layer is set to be 6, the number of nodes of an output layer is set to be 1, the prediction time step length is set to be 6, the number of the hidden layers is set to be 1, the number of nodes of the hidden layers is set to be 60, the training round number is set to be 50, the ReLU is used as an activation function, the cross entropy function is used as a loss function, and the random gradient descent method is used as an optimization method of the loss function;
s12: setting any one server as a parameter server, using the rest N servers as N computing servers, placing all the servers in a local area network to ensure smooth communication among the servers, setting all the servers as a distributed training cluster, copying N copies of a designed flight path prediction model to be respectively deployed in the N computing servers, wherein one flight path prediction model corresponds to one computing server; wherein N is a positive integer;
s13: acquiring historical ADS-B data of a plurality of flights, and extracting flight data of six fields in total, such as flight numbers, position longitudes, position latitudes, flight altitudes, course angles and track point data updating time, from the historical ADS-B data to serve as training data;
s14: the training data is divided equally into N sets,as training data sets of N computing servers, one set corresponds to one computing server, and data belonging to the same flight are divided into the same subset in each set; sequencing track point data update time fields belonging to the same subset, regarding two track points with time intervals larger than a time threshold value in all track points belonging to the same flight as the ending point and the starting point of two track sequences, dividing all track points belonging to the same flight into a plurality of different track sequences, and constructing the track points in each track sequence into input vectorsWherein callsign represents flight number, longitude represents position longitude, latitude represents position latitude, height represents flight altitude, angle represents course angle, and time represents track point data updating time;
s15: starting data input in N calculation servers simultaneously, inputting input vectors of six continuous time slices belonging to the same flight path sequence in each subset of each set into six nodes of an input layer of the corresponding calculation server for forward propagation, outputting a prediction result of navigation data of a seventh time slice, calculating an error between the output prediction result and actual navigation data, and performing backward propagation on the error to obtain a parameter matrix of the N calculation servers after one-time forward propagation and one-time backward propagationWherein, the subscript t represents that the parameter matrix is obtained by the t-th iterative training;
s16: hypothesis parameter matrixIs oneThe parameter matrix is decomposed by matrix decompositionDecomposed into three matrices、Andwherein, the matrixIs oneOf a matrix, a matrixIs oneOf a matrix, a matrixIs oneOf a diagonal matrix, matrixHaving non-negative real singular values in descending order, R representing a parameter matrixThe rank of (d); order to,,Matrix ofIs oneOf a matrix, a matrixOne isA matrix of (a);
s17: determining a training pair matrix for each iterationAndaccording to the compression ratio, determining each iteration training pair matrix in a self-adaptive wayAndaccording to the compression ratio and the compression threshold, respectively for the matrixAndthe parameter number of the first compression matrix is compressed to obtain a first compression matrix and a second compression matrix, and the first compression matrix and the matrix are compressedSubtracting the corresponding parameters to obtain a first error matrix, and compressing the second compression matrix with the first error matrixSubtracting the corresponding parameters to obtain a second error matrix, and inputting the first compression matrix, the second compression matrix, the first error matrix and the second error matrix into the parameter server;
s18: averaging first compression matrixes of N calculation servers to obtain a first average compression matrix, averaging second compression matrixes of N calculation servers to obtain a second average compression matrix, averaging first error matrixes of N calculation servers to obtain a first average error matrix, averaging second error matrixes of N calculation servers to obtain a second average error matrix, and transmitting the first average compression matrix, the second average compression matrix, the first average error matrix and the second average error matrix back to the N calculation servers;
s19: in N calculation servers, a first average compression matrix and a first average error matrix are used for compensating errors generated by the first compression matrix to obtain a first compensation matrix, a second average compression matrix and a second average error matrix are used for compensating errors generated by the second compression matrix to obtain a second compensation matrix, the first compensation matrix and the second compensation matrix are multiplied to obtain an optimized parameter matrix which is used as a new parameter matrix for next iterative training, the step S15 is returned, forward propagation is carried out again by using the new parameter matrix, a new prediction result is output, errors between the new prediction result and actual navigation data are calculated, errors between the new prediction result and the actual navigation data are evaluated by using a loss function, and the training is stopped when the loss function reaches the minimum value or an overfitting phenomenon occurs in the training process.
In specific implementation, when step S17 of the method for mining and processing mass traffic data knowledge provided by the present invention is executed, the matrix is respectively processed according to the compression rate and the compression thresholdAndparameter (d) ofThe number is compressed to obtain a first compression matrix and a second compression matrix, which can be specifically realized by the following method: will matrixSetting elements with the values smaller than the compression threshold value corresponding to the row in the vectors of the middle rows as 0, keeping other elements unchanged, obtaining a filter operator Mask matrix, and calculating a first compression matrix and a second compression matrix according to the following formulas:
wherein the content of the first and second substances,representing a compression matrix, p =1, 2,representing the multiplication of corresponding elements of two isotypic matrices.
In specific implementation, when step S17 in the method for mining and parallel processing of mass traffic data knowledge provided by the present invention is executed, the compression threshold is updated after L iterative training, and preferably, the value range of L may be 1000 to 1500.
The following describes a specific implementation of the knowledge mining and parallel processing method for mass traffic data according to the present invention in detail by using a specific embodiment.
Example 1: the method comprises a model training process and an actual prediction process.
Model training procedure
The first step is as follows: construction of flight path prediction model
The flight path prediction model is designed by LSTM, and the specific parameters of the model are designed as follows: the number of nodes of an input layer is set to be 6, the number of nodes of an output layer is set to be 1, the prediction time step length is set to be 6, the number of the hidden layers is set to be 1, the number of nodes of the hidden layers is set to be 60, the training round is set to be 50, the ReLU is used as an activation function, the cross entropy function is used as a loss function, and the random gradient descent method is used as the optimization method of the loss function.
The second step is that: building input data
Acquiring historical ADS-B data of a plurality of flights, and extracting flight data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time, from the historical ADS-B data to serve as training data. Dividing all training data into N (N is a positive integer) sets on average, using the N sets as training data sets of N computing servers, wherein one set corresponds to one computing server, and data belonging to the same flight are subdivided into the same subset in each set and recorded as. Then, for the same subsetThe track point data updating time fields in the flight path are sequenced, and two track points with time intervals larger than a time threshold value in all track points belonging to the same flight are taken as the ending point and the starting point of two track sequences, so that all track points belonging to the same flight can be divided into a plurality of different track sequences and recorded as different track sequences. Constructing track points in each track sequence into input vectorsWherein callsign represents flight number, longitude represents position longitude, latitude represents position latitude, height represents flight altitude, angle represents heading angle, and time represents data update time.
The training data set is divided into sets with the same number as the computing servers, different sets are input into different computing servers, and all the servers carry out training simultaneously, so that the overall training time of the LSTM model can be greatly reduced.
The third step: inputting training data into flight path prediction model
Any one server is set as a parameter server, the rest N servers are set as N computing servers, and all the servers are placed in a local area network to ensure smooth communication among the servers. All the servers are set as distributed training clusters, N copies of a designed flight path prediction model (LSTM) are copied and deployed to N computing servers respectively, and one flight path prediction model corresponds to one computing server.
Simultaneously starting data input in N computing servers, and enabling each subset of each set to belong to the same track sequenceThe input vectors of six continuous time slices are input into six nodes of an input layer corresponding to the calculation servers for forward propagation, the input vector of one time slice corresponds to one node, a prediction result of navigation data of a seventh time slice is output, the error between the output prediction result and actual navigation data is calculated, the error is propagated in the reverse direction, and parameter matrixes of N calculation servers after one-time forward propagation and one-time backward propagation are obtainedAnd the subscript t indicates that the parameter matrix is obtained by the t-th iteration training.
hypothesis parameter matrixIs oneThe matrix adopts a matrix decomposition method to matrix the parameters with more parametersDecomposed into three matrices、Andwherein, the matrixIs oneOf a matrix, a matrixIs oneOf a matrix, a matrixIs oneOf a diagonal matrix, matrixHaving non-negative real singular values in descending order, R representing a parameter matrixIs determined. Entire parameter matrixThe formula for decomposition can be written as:
expressed by the formula (1), a parameter matrixCan be written as two smaller matricesAndform of multiplication, i.e.Wherein, in the step (A),is equal to,Is equal to. Original parameter matrixThe number of parameters in (1) isOne, and the two matrices after decompositionAndthe number of parameters is reduced by a large amount, matrixThe number of parameters in (1) isA matrix ofThe number of parameters in (1) isIf one, the sum of the parameter numbers in the two matrixes isAnd (4) respectively. If R is much smaller than the minimum of m and n, it is certain that the sum of the number of parameters of the decomposed two matrices is much smaller than the number of parameters in the original parameter matrix, i.e. R is much smaller than the minimum of m and nTherefore, the number of parameters to be transmitted by each computing server is greatly reduced, and the time consumed by communication is correspondingly reduced. In the actual training process, the original parameter matrix is found through testingIn (1),,if it is presentIs 64, i.e.The matrix decomposition method can convert the original parameter matrixThe 5000000 parameters are reduced to 300000, and the number of parameters needing to be transmitted is greatly reduced.
Matrix after decompositionAndcompressing the parameter number before transmitting to the network constructed by the calculation server and the parameter server, firstly determining the training pair matrix of each iterationAndaccording to the compression ratio, determining each iteration training pair matrix in a self-adaptive wayAndaccording to the compression ratio and the compression threshold, respectively for the matrixAndparameter (d) ofThe number of the first compression matrix and the second compression matrix can be obtained by compressing, compared with the number of the whole parameters before compression, the number of the whole parameters is greatly reduced again, and the communication time consumption is greatly reduced again. By reducing the total number of parameters communicated in a network constructed by the calculation server and the parameter server, the communication overhead among different servers in the training process is reduced as much as possible, so that the total training time is reduced.
For the two matrices obtained by the decomposition in the fourth stepAndthe compression process is the same, and thus, here, the matrix is referred to as a pair matrixThe compression is used as an example to illustrate the matrixThe compression process is similar and will not be described herein.
First, the compression ratio is determinedSince after the compression ratio is determined, the matrix is trained every iterationThe Kth value of each row vector does not change much, whereinTherefore, it is not reasonable to update the compression threshold every iteration of training. In compressing the matrixThe compression threshold is updated after each L iterative training, where L may be 1000 times. When it is firstIn the secondary iteration training, a compression threshold value needs to be calculated, and the matrix is usedEach row of (a) is regarded as m R-dimensional vectors, values in each vector are arranged in a descending order, and a formula is expressed as:
wherein the content of the first and second substances,representing a matrix after descending orderThe 1 st row vector of the first row vector,representing a matrix after descending orderThe 2 nd row vector of the second row vector,representing a matrix after descending orderThe m-th row vector of the vector,representation pair matrixThe elements in each row vector are arranged in descending order, each element in the vector on the left side of the equation is a main target of the subsequent operation, and each element is an R-dimensional vector. Preserving the inner prefixes of each vectorIndividual element, can determine the securityThe smallest of the elements left is the compression threshold of the vector, from which the entire matrix can be derivedThe compression threshold vector of (a) is as follows:
wherein the content of the first and second substances,representation matrixThe compression threshold of the 1 st row vector,representation matrixThe compression threshold of the 2 nd row vector,representation matrixA compression threshold for the m-th row vector; in the subsequent iterative training process, if the iteration times are not integral multiples of L, the compression threshold vector used in the previous iterative training can be used; if the iteration times are integral multiples of L, the compressed threshold vector needs to be updated, and the calculation method is consistent with the first calculation of the compressed threshold vector. Calculating a Mask matrix of filter operators, and calculating the matrixAnd setting the element with the numerical value smaller than the compression threshold value corresponding to the row in each row vector as 0, and keeping other elements unchanged to obtain a matrix which is the Mask matrix of the filter operator. Will matrixMultiplying the filter operator Mask matrix corresponding elements one by one to obtain a compressed matrix, namely a first compressed matrixThe calculation formula can be written as:
wherein the content of the first and second substances,representing the multiplication of corresponding elements of two isotypic matrices.
And a sixth step: compensating compression error and performing communication between servers
After compression of the parameter matrix is carried out, some calculation errors are inevitably introduced, and if the calculation errors are not eliminated, the errors can continuously accumulate and influence the final model precision after multiple rounds of training.
The first compression matrix obtained in the fifth step is processedSum matrixBy subtraction, the quantization error after and before compression, i.e. the AND matrix, can be obtainedAnd a first compression matrixFirst error matrix of the same typeHere, subtraction means subtraction of corresponding elements between two matrices. At this time obtainTo two matrices participating in the communicationAndmatrix ofAndis compared with the matrixThe number of parameters is greatly reduced, and the communication time can be greatly reduced. All the calculation servers transmit the first compression matrix and the first error matrix to the parameter server after obtaining the respective first compression matrix and the first error matrix, if some calculation servers do not finish the calculation after transmitting, other calculation servers synchronously wait, and when the parameter server receives the first compression matrices of all the N calculation serversAnd a first error matrixThen, the first compression matrix of the N computing servers is usedAnd a first error matrixRespectively calculating the average values to obtain a first average error matrixAnd a first average compression matrixThe calculation formula is as follows:
in the above two formulas, the calculation between the matrices is the operation between the corresponding elements of the same type of matrix. Obtaining a first average compression matrixAnd a first average error matrixThen, the first average compression matrix isAnd a first average error matrixAnd simultaneously transmitted back to the N computing servers. In N calculation servers, a first average compression matrix and a first average error matrix are used for compensating errors generated by the first compression matrix to obtain a first compensation matrix, the first compensation matrix and a second compensation matrix are multiplied to obtain an optimized (namely error-eliminated) parameter matrix, and the optimized parameter matrix is used as a new parameter matrix for next iterative training, so that errors caused by matrix compression can be well compensated, and finally obtained model precision is basically kept unchanged.
The seventh step: end of training
And returning to the third step, carrying out forward propagation again by using a new parameter matrix, outputting a new prediction result, calculating the error between the new prediction result and the actual navigation data, evaluating the error between the new prediction result and the actual navigation data by using a loss function, and stopping training when the loss function reaches the minimum value or an overfitting phenomenon occurs in the training process. A trained flight path prediction model (LSTM) can be used to actually predict flight path point positions. After the flight path prediction model is trained,
(II) actual prediction Process
The first step is as follows: flight big data preprocessing
Acquiring flight ADS-B data of domestic flights, wherein the data comprises flight numbers, take-off airports, take-off airport longitudes and latitudes, landing airports, landing airport longitudes and latitudes, planned take-off time, planned arrival time, planned routes, position longitudes and latitudes, flight heights, course angles and data updating time. And (3) carrying out data cleaning on the flight schedule data, extracting six fields of flight number, position longitude, position latitude, flight altitude, course angle and data updating time in the original data table, and deleting the rest fields.
The second step is that: actual prediction
Selecting navigation data of six continuous time slices from the extracted data, inputting the navigation data into an input layer of a flight track prediction model trained in advance, and outputting a prediction result of the navigation data of the flight to be tested through forward propagation.
The method for mining and processing the mass traffic data knowledge provided by the invention stores the designed LSTM model in each computing server of the distributed cluster, divides the mass training data set into sets with the same number as the computing servers when the training process starts, inputs different sets into different computing servers, and trains the computing servers simultaneously, thereby greatly reducing the overall training time of the LSTM model. After the calculation server finishes one-time forward propagation and backward propagation, the parameter matrix obtained by transmission to the parameter server is decomposed into a form of multiplying two matrixes by a matrix decomposition method, and the total quantity of the parameters of the two matrixes is far less than the quantity of the parameters of the parameter matrix, so that the quantity of the parameters needing to be transmitted by each calculation server is greatly reduced, and the time consumed by communication is correspondingly reduced. On the basis of the two matrixes obtained by decomposition, the compression rate of the matrixes is set, a filtering threshold value is determined in each row vector of the matrixes by using a self-adaptive threshold value filtering method, corresponding filtering operator Mask matrixes are obtained, the filtering operator Mask matrixes are multiplied by the matrixes obtained by decomposition, the compression matrixes can be obtained, the number of integral parameters is greatly reduced again compared with the number of the integral parameters, and the communication time consumption is greatly reduced again. After compression of the parameter matrix is carried out, some calculation errors are inevitably introduced, and if the calculation errors are not eliminated, the errors can continuously accumulate and influence the final model precision after multiple rounds of training. The invention calculates the error vector before and after each row vector compression in the matrix, and transmits the error vector formed by the error vector to the next round of training of each calculation server, and when the next round of training begins, firstly, the parameter matrix obtained from the parameter server and the error matrix are used for calculating to obtain the parameter matrix with the error eliminated, and then the subsequent training work is carried out, thereby well compensating the error caused by the matrix compression and keeping the accuracy of the finally obtained model basically unchanged.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (3)
1. A method for mining and parallel processing mass traffic data knowledge is characterized by comprising the following steps:
s1: acquiring ADS-B data of a flight to be detected in real time, extracting navigation data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time, from the ADS-B data, and deleting the rest fields;
s2: selecting navigation data of six continuous time slices from the extracted data, inputting the navigation data into an input layer of a flight track prediction model trained in advance, and outputting a prediction result of the navigation data of the flight to be tested through forward propagation;
the training process of the flight path prediction model comprises the following steps:
s11: designing a flight path prediction model, wherein model parameters are designed as follows: the number of nodes of an input layer is set to be 6, the number of nodes of an output layer is set to be 1, the prediction time step length is set to be 6, the number of the hidden layers is set to be 1, the number of nodes of the hidden layers is set to be 60, the training round number is set to be 50, the ReLU is used as an activation function, the cross entropy function is used as a loss function, and the random gradient descent method is used as an optimization method of the loss function;
s12: setting any one server as a parameter server, using the rest N servers as N computing servers, placing all the servers in a local area network to ensure smooth communication among the servers, setting all the servers as a distributed training cluster, copying N copies of a designed flight path prediction model to be respectively deployed in the N computing servers, wherein one flight path prediction model corresponds to one computing server; wherein N is a positive integer;
s13: acquiring historical ADS-B data of a plurality of flights, and extracting flight data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time from the historical ADS-B data to serve as training data;
s14: averagely dividing the training data into N sets as training data sets of N computing servers, wherein one set corresponds to one computing server, and the data belonging to the same flight are divided into the same subset in each set; sequencing track point data update time fields belonging to the same subset, regarding two track points with time intervals larger than a time threshold value in all track points belonging to the same flight as the ending point and the starting point of two track sequences, dividing all track points belonging to the same flight into a plurality of different track sequences, and constructing the track points in each track sequence into input vectorsWherein callsign represents flight number, longitude represents position longitude, latitude represents position latitude, height represents flight altitude, angle represents course angle, and time represents track point data updating time;
s15: simultaneously starting data entry in N computing serversInputting the input vectors of six continuous time slices belonging to the same track sequence in each subset of each set into six nodes of an input layer of a corresponding calculation server for forward propagation, outputting a prediction result of navigation data of a seventh time slice by inputting the input vector of one time slice into one node, calculating the error between the output prediction result and actual navigation data, and performing backward propagation on the error to obtain a parameter matrix of N calculation servers subjected to one-time forward propagation and one-time backward propagationWherein, the subscript t represents that the parameter matrix is obtained by the t-th iterative training;
s16: hypothesis parameter matrixIs oneThe parameter matrix is decomposed by matrix decompositionDecomposed into three matrices、Andwherein, the matrixIs oneOf a matrix, a matrixIs oneOf a matrix, a matrixIs oneOf a diagonal matrix, matrixHaving non-negative real singular values in descending order, R representing a parameter matrixThe rank of (d); order to,,Matrix ofIs oneOf a matrix, a matrixOne isA matrix of (a);
s17: determining a training pair matrix for each iterationAndaccording to the compression ratio, determining each iteration training pair matrix in a self-adaptive wayAndaccording to the compression ratio and the compression threshold, respectively for the matrixAndthe parameter number of the first compression matrix is compressed to obtain a first compression matrix and a second compression matrix, and the first compression matrix and the matrix are compressedSubtracting the corresponding parameters to obtain a first error matrix, and compressing the second compression matrix with the first error matrixSubtracting the corresponding parameters to obtain a second error matrix, and inputting the first compression matrix, the second compression matrix, the first error matrix and the second error matrix into the parameter server;
s18: averaging first compression matrixes of N calculation servers to obtain a first average compression matrix, averaging second compression matrixes of N calculation servers to obtain a second average compression matrix, averaging first error matrixes of N calculation servers to obtain a first average error matrix, averaging second error matrixes of N calculation servers to obtain a second average error matrix, and transmitting the first average compression matrix, the second average compression matrix, the first average error matrix and the second average error matrix back to the N calculation servers;
s19: in N calculation servers, a first average compression matrix and a first average error matrix are used for compensating errors generated by the first compression matrix to obtain a first compensation matrix, a second average compression matrix and a second average error matrix are used for compensating errors generated by the second compression matrix to obtain a second compensation matrix, the first compensation matrix and the second compensation matrix are multiplied to obtain an optimized parameter matrix which is used as a new parameter matrix for next iterative training, the step S15 is returned, forward propagation is carried out again by using the new parameter matrix, a new prediction result is output, errors between the new prediction result and actual navigation data are calculated, errors between the new prediction result and the actual navigation data are evaluated by using a loss function, and the training is stopped when the loss function reaches the minimum value or an overfitting phenomenon occurs in the training process.
2. The method for knowledge mining and parallel processing of mass traffic data according to claim 1, wherein in step S17, the matrices are respectively processed according to compression rate and compression thresholdAndthe parameter number of the first compression matrix and the parameter number of the second compression matrix are compressed to obtain a first compression matrix and a second compression matrix, and the method specifically comprises the following steps:
will matrixSetting elements with the values smaller than the compression threshold value corresponding to the row in the vectors of the middle rows as 0, keeping other elements unchanged, obtaining a filter operator Mask matrix, and calculating a first compression matrix and a second compression matrix according to the following formulas:
3. The method for mining and parallel processing of mass traffic data knowledge according to claim 1 or 2, wherein in step S17, the compression threshold is updated after each L iterative training, and the value range of L is 1000 to 1500.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110456757.1A CN113052313B (en) | 2021-04-27 | 2021-04-27 | Mass traffic data knowledge mining and parallel processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110456757.1A CN113052313B (en) | 2021-04-27 | 2021-04-27 | Mass traffic data knowledge mining and parallel processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113052313A true CN113052313A (en) | 2021-06-29 |
CN113052313B CN113052313B (en) | 2021-10-15 |
Family
ID=76520620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110456757.1A Active CN113052313B (en) | 2021-04-27 | 2021-04-27 | Mass traffic data knowledge mining and parallel processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113052313B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657687A (en) * | 2021-08-30 | 2021-11-16 | 国家电网有限公司 | Power load prediction method based on feature engineering and multi-path deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150324501A1 (en) * | 2012-12-12 | 2015-11-12 | University Of North Dakota | Analyzing flight data using predictive models |
CN109508812A (en) * | 2018-10-09 | 2019-03-22 | 南京航空航天大学 | A kind of aircraft Trajectory Prediction method based on profound memory network |
CN110443411A (en) * | 2019-07-16 | 2019-11-12 | 青岛民航凯亚系统集成有限公司 | Method based on the ADS-B data prediction flight landing time |
CN111292563A (en) * | 2020-05-12 | 2020-06-16 | 北京航空航天大学 | Flight track prediction method |
-
2021
- 2021-04-27 CN CN202110456757.1A patent/CN113052313B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150324501A1 (en) * | 2012-12-12 | 2015-11-12 | University Of North Dakota | Analyzing flight data using predictive models |
CN109508812A (en) * | 2018-10-09 | 2019-03-22 | 南京航空航天大学 | A kind of aircraft Trajectory Prediction method based on profound memory network |
CN110443411A (en) * | 2019-07-16 | 2019-11-12 | 青岛民航凯亚系统集成有限公司 | Method based on the ADS-B data prediction flight landing time |
CN111292563A (en) * | 2020-05-12 | 2020-06-16 | 北京航空航天大学 | Flight track prediction method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657687A (en) * | 2021-08-30 | 2021-11-16 | 国家电网有限公司 | Power load prediction method based on feature engineering and multi-path deep learning |
CN113657687B (en) * | 2021-08-30 | 2023-09-29 | 国家电网有限公司 | Power load prediction method based on feature engineering and multipath deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN113052313B (en) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Aktürk et al. | Aircraft rescheduling with cruise speed control | |
CN111401601B (en) | Delay propagation-oriented flight take-off and landing time prediction method | |
CN108519988B (en) | Aviation delay causal relationship network construction method based on Glangel inspection | |
CN109190700B (en) | Quantitative analysis method for aviation delay propagation | |
CN116468186B (en) | Flight delay time prediction method, electronic equipment and storage medium | |
CN113052313B (en) | Mass traffic data knowledge mining and parallel processing method | |
Dai et al. | Modeling go-around occurrence using principal component logistic regression | |
Zhang et al. | A hierarchical heuristic approach for solving air traffic scheduling and routing problem with a novel air traffic model | |
CN112270445A (en) | Flight delay wave and comprehensive evaluation method based on statistical analysis and classification prediction | |
Adeke | Modelling of queuing process at airport check-in system: a case study of Manchester and Leeds-Bradford airports | |
CN110796315A (en) | Departure flight delay prediction method based on aging information and deep learning | |
Wieland et al. | Predicting sector complexity using machine learning | |
Guo et al. | Deep-Learning-Based Model for Accident-Type Prediction During Approach and Landing | |
Thiagaraj et al. | A queueing model for airport capacity and delay analysis | |
Xu et al. | Statistical analysis of resilience in an air transport network | |
Provan et al. | Optimization models for strategic runway configuration management under weather uncertainty | |
CN103489040B (en) | A kind of flight collision solution desorption method comprising Local Search | |
Park et al. | Situational Anomaly Detection Using Multi-agent Trajectory Prediction for Terminal Airspace Operations | |
Dönmez | Aircraft sequencing under the uncertainty of the runway occupancy times of arrivals during the backtrack procedure | |
Shortle et al. | Uncertainty importance analysis for aviation event trees | |
Yang et al. | Aircraft trajectory prediction and aviation safety in ADS-B failure conditions based on neural network | |
Stover et al. | Data-driven modeling of aircraft midair separation violation | |
CN117351786B (en) | Flight integrated scheduling method under multi-element constraint | |
Wang et al. | The Integrated Evaluation of Airport Ground Movement Performance Based on GA-BP Network | |
Cai et al. | Multiairport Departure Scheduling via Multiagent Reinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |