CN113052313A

CN113052313A - Mass traffic data knowledge mining and parallel processing method

Info

Publication number: CN113052313A
Application number: CN202110456757.1A
Authority: CN
Inventors: 曹先彬; 刘洪岩; 朱熙; 佟路; 杜文博
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-06-29
Anticipated expiration: 2041-04-27
Also published as: CN113052313B

Abstract

The invention discloses a method for mining and parallel processing mass traffic data knowledge, which stores an LSTM model in N computing servers of a distributed cluster, divides a training data set into N sets and inputs the N sets into the N computing servers respectively, and trains the computing servers simultaneously, thereby reducing the training time of the LSTM model. After the calculation server finishes one-time forward propagation and backward propagation, the parameter matrix is transmitted to the parameter server and is decomposed into a form of multiplying the two matrixes by adopting a matrix decomposition method, so that the total quantity of the parameters is reduced, and the communication time consumption is reduced. And setting a matrix compression rate, compressing the matrix by using a self-adaptive threshold filtering method, reducing the number of parameters again, and reducing the communication time again. And calculating error matrixes before and after matrix compression and transmitting the error matrixes to the next round of training, calculating the matrixes after errors are eliminated by using the parameter matrixes and the error matrixes when the next round of training is started, and then training the matrixes to compensate errors caused by matrix compression and ensure the accuracy of the LSTM model.

Description

Mass traffic data knowledge mining and parallel processing method

Technical Field

The invention relates to the technical field of traffic big data, in particular to a method for mining and parallel processing mass traffic data knowledge.

Background

Under the background that the informatization level of the traffic system of China is continuously improved, massive traffic big data is continuously generated, and the traffic big data has the characteristics of multi-source heterogeneity, low knowledge density and the like. How to analyze and utilize the traffic big data and mine the knowledge of the traffic big data so as to generate a guiding function for traffic business is a difficult problem which needs to be solved urgently.

Taking the civil aviation industry in China as an example, with the rapid development of the civil aviation industry in China, the airport passenger traffic volume increases year by year, and according to the data statistics of the civil aviation administration in China, the national passenger transport airline company in 2019 executes flights 461.11 ten thousand times, wherein the normal flights 376.52 ten thousand times, and the average normal rate of the flights is 81.65%. The major airline company performed 330.47 ten thousand flights in 2019, with 269.11 ten thousand flights being normal, and the average flight normal rate was 81.43%. The increasing number of flights causes the possibility of flight congestion and collision in the airspace in China to continuously rise, and if an algorithm capable of accurately and quickly predicting the flight path of the flight in a future period is developed, a controller or a pilot can predict the existence of danger in advance before the flight congestion and collision occur, so that the situation can be avoided in time, the danger is greatly avoided, and the flight safety can be ensured, and the life and property safety of passengers can be protected.

Although scholars at home and abroad make great development in the technical field of flight path prediction by knowledge mining of big aviation data at present, the model training needs too long time and cannot reach the rapid standard, and the slow training process leads to the fact that the flight big flight data cannot be utilized to construct an accurate and efficient flight path online incremental prediction model, which means that the availability of the prediction model is low. Therefore, in order to fully utilize knowledge obtained by mining from flight big data and realize accurate and online incremental prediction of flight paths, it is urgently needed to greatly reduce the training time of a prediction model and provide key support for improving the practicability of the flight path prediction model.

Disclosure of Invention

In view of the above, the invention provides a method for mining and parallel processing mass traffic data knowledge, which is used for overcoming the defects that the existing large neural network for predicting flight track points consumes too long time in the training process and cannot effectively perform online incremental prediction.

The invention provides a method for mining and parallel processing mass traffic data knowledge, which comprises the following steps:

s1: acquiring ADS-B data of a flight to be detected in real time, extracting navigation data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time, from the ADS-B data, and deleting the rest fields;

s2: selecting navigation data of six continuous time slices from the extracted data, inputting the navigation data into an input layer of a flight track prediction model trained in advance, and outputting a prediction result of the navigation data of the flight to be tested through forward propagation;

the training process of the flight path prediction model comprises the following steps:

s11: designing a flight path prediction model, wherein model parameters are designed as follows: the number of nodes of an input layer is set to be 6, the number of nodes of an output layer is set to be 1, the prediction time step length is set to be 6, the number of the hidden layers is set to be 1, the number of nodes of the hidden layers is set to be 60, the training round number is set to be 50, the ReLU is used as an activation function, the cross entropy function is used as a loss function, and the random gradient descent method is used as an optimization method of the loss function;

s12: setting any one server as a parameter server, using the rest N servers as N computing servers, placing all the servers in a local area network to ensure smooth communication among the servers, setting all the servers as a distributed training cluster, copying N copies of a designed flight path prediction model to be respectively deployed in the N computing servers, wherein one flight path prediction model corresponds to one computing server; wherein N is a positive integer;

s13: acquiring historical ADS-B data of a plurality of flights, and extracting flight data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time from the historical ADS-B data to serve as training data;

s14: averagely dividing the training data into N sets as training data sets of N computing servers, wherein one set corresponds to one computing server, and the data belonging to the same flight are divided into the same subset in each set; sequencing track point data update time fields belonging to the same subset, regarding two track points with time intervals larger than a time threshold value in all track points belonging to the same flight as the ending point and the starting point of two track sequences, dividing all track points belonging to the same flight into a plurality of different track sequences, and constructing the track points in each track sequence into input vectors

Wherein callsign represents flight number, longitude represents position longitude, latitude represents position latitude, height represents flight altitude, angle represents course angle, and time represents track point data updating time;

s15: starting data input in N calculation servers simultaneously, inputting input vectors of six continuous time slices belonging to the same flight path sequence in each subset of each set into six nodes of an input layer of the corresponding calculation server for forward propagation, outputting a prediction result of navigation data of a seventh time slice by inputting the input vector of one time slice into one node, calculating an error between the output prediction result and actual navigation data, and performing backward propagation on the error to obtain N calculation servers subjected to one-time forward propagation and one-time backward propagationParameter matrix of

Wherein, the subscript t represents that the parameter matrix is obtained by the t-th iterative training;

s16: hypothesis parameter matrix

Is one

The parameter matrix is decomposed by matrix decomposition

Decomposed into three matrices

、

And

wherein, the matrix

Is one

Of a matrix, a matrix

Is one

Of a matrix, a matrix

Is one

Diagonal moment ofMatrix, matrix

Having non-negative real singular values in descending order, R representing a parameter matrix

The rank of (d); order to

，

，

Matrix of

Is one

Of a matrix, a matrix

One is

A matrix of (a);

s17: determining a training pair matrix for each iteration

And

according to the compression ratio, determining each iteration training pair matrix in a self-adaptive way

And

compression threshold of, rootAccording to the compression rate and the compression threshold, respectively corresponding to the matrix

And

the parameter number of the first compression matrix is compressed to obtain a first compression matrix and a second compression matrix, and the first compression matrix and the matrix are compressed

Subtracting the corresponding parameters to obtain a first error matrix, and compressing the second compression matrix with the first error matrix

Subtracting the corresponding parameters to obtain a second error matrix, and inputting the first compression matrix, the second compression matrix, the first error matrix and the second error matrix into the parameter server;

s18: averaging first compression matrixes of N calculation servers to obtain a first average compression matrix, averaging second compression matrixes of N calculation servers to obtain a second average compression matrix, averaging first error matrixes of N calculation servers to obtain a first average error matrix, averaging second error matrixes of N calculation servers to obtain a second average error matrix, and transmitting the first average compression matrix, the second average compression matrix, the first average error matrix and the second average error matrix back to the N calculation servers;

s19: in N calculation servers, a first average compression matrix and a first average error matrix are used for compensating errors generated by the first compression matrix to obtain a first compensation matrix, a second average compression matrix and a second average error matrix are used for compensating errors generated by the second compression matrix to obtain a second compensation matrix, the first compensation matrix and the second compensation matrix are multiplied to obtain an optimized parameter matrix which is used as a new parameter matrix for next iterative training, the step S15 is returned, forward propagation is carried out again by using the new parameter matrix, a new prediction result is output, errors between the new prediction result and actual navigation data are calculated, errors between the new prediction result and the actual navigation data are evaluated by using a loss function, and the training is stopped when the loss function reaches the minimum value or an overfitting phenomenon occurs in the training process.

In a possible implementation manner, in the method for mining and processing the mass traffic data knowledge provided by the present invention, in step S17, the matrices are respectively processed according to the compression rate and the compression threshold

And

the parameter number of the first compression matrix and the parameter number of the second compression matrix are compressed to obtain a first compression matrix and a second compression matrix, and the method specifically comprises the following steps:

will matrix

Setting elements with the values smaller than the compression threshold value corresponding to the row in the vectors of the middle rows as 0, keeping other elements unchanged, obtaining a filter operator Mask matrix, and calculating a first compression matrix and a second compression matrix according to the following formulas:

wherein the content of the first and second substances,

representing a compression matrix, p =1, 2,

representing the multiplication of corresponding elements of two isotypic matrices.

In a possible implementation manner, in the method for mining and parallel processing of the mass traffic data knowledge provided by the invention, in step S17, the compression threshold is updated after each L times of iterative training, and the value range of L is 1000-1500.

The method for mining and processing the mass traffic data knowledge provided by the invention stores the designed LSTM model in each computing server of the distributed cluster, divides the mass training data set into sets with the same number as the computing servers when the training process starts, inputs different sets into different computing servers, and trains the computing servers simultaneously, thereby greatly reducing the overall training time of the LSTM model. After the calculation server finishes one-time forward propagation and backward propagation, the parameter matrix obtained by transmission to the parameter server is decomposed into a form of multiplying the two matrixes by a matrix decomposition method, and the total quantity of the parameters of the two matrixes is far less than the quantity of the parameters of the parameter matrix, so that the quantity of the parameters needing to be transmitted by each calculation server is greatly reduced, and the time consumed by communication is correspondingly reduced. On the basis of the two matrixes obtained by decomposition, the compression rate of the matrixes is set, a filtering threshold value is determined in each row vector of the matrixes by using a self-adaptive threshold value filtering method, corresponding filtering operator Mask matrixes are obtained, the filtering operator Mask matrixes are multiplied by the matrixes obtained by decomposition, the compression matrixes can be obtained, the number of integral parameters is greatly reduced again compared with the number of the integral parameters before, and the communication time consumption is greatly reduced again. After compression of the parameter matrix is carried out, some calculation errors are inevitably introduced, and if the calculation errors are not eliminated, the errors can continuously accumulate and influence the final model precision after multiple rounds of training. The invention calculates the error vector before and after each row vector compression in the matrix, and transmits the error vector formed by the error vector to the next round of training of each calculation server, and when the next round of training begins, firstly, the parameter matrix obtained from the parameter server and the error matrix are used for calculating to obtain the parameter matrix with the error eliminated, and then the subsequent training work is carried out, thereby well compensating the error caused by the matrix compression and keeping the accuracy of the finally obtained model basically unchanged.

Drawings

Fig. 1 is a flowchart of a method for mining and processing mass traffic data knowledge in parallel according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only illustrative and are not intended to limit the present invention.

s1: acquiring ADS-B data of a flight to be detected in real time, extracting navigation data of six fields in total of flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time from the ADS-B data, and deleting the rest fields;

s13: acquiring historical ADS-B data of a plurality of flights, and extracting flight data of six fields in total, such as flight numbers, position longitudes, position latitudes, flight altitudes, course angles and track point data updating time, from the historical ADS-B data to serve as training data;

s14: the training data is divided equally into N sets,as training data sets of N computing servers, one set corresponds to one computing server, and data belonging to the same flight are divided into the same subset in each set; sequencing track point data update time fields belonging to the same subset, regarding two track points with time intervals larger than a time threshold value in all track points belonging to the same flight as the ending point and the starting point of two track sequences, dividing all track points belonging to the same flight into a plurality of different track sequences, and constructing the track points in each track sequence into input vectors

s15: starting data input in N calculation servers simultaneously, inputting input vectors of six continuous time slices belonging to the same flight path sequence in each subset of each set into six nodes of an input layer of the corresponding calculation server for forward propagation, outputting a prediction result of navigation data of a seventh time slice, calculating an error between the output prediction result and actual navigation data, and performing backward propagation on the error to obtain a parameter matrix of the N calculation servers after one-time forward propagation and one-time backward propagation

s16: hypothesis parameter matrix

Is one

The parameter matrix is decomposed by matrix decomposition

Decomposed into three matrices

、

And

wherein, the matrix

Is one

Of a matrix, a matrix

Is one

Of a matrix, a matrix

Is one

Of a diagonal matrix, matrix

The rank of (d); order to

，

，

Matrix of

Is one

Of a matrix, a matrix

One is

A matrix of (a);

s17: determining a training pair matrix for each iteration

And

And

according to the compression ratio and the compression threshold, respectively for the matrix

And

In specific implementation, when step S17 of the method for mining and processing mass traffic data knowledge provided by the present invention is executed, the matrix is respectively processed according to the compression rate and the compression threshold

And

parameter (d) ofThe number is compressed to obtain a first compression matrix and a second compression matrix, which can be specifically realized by the following method: will matrix

wherein the content of the first and second substances,

representing a compression matrix, p =1, 2,

In specific implementation, when step S17 in the method for mining and parallel processing of mass traffic data knowledge provided by the present invention is executed, the compression threshold is updated after L iterative training, and preferably, the value range of L may be 1000 to 1500.

The following describes a specific implementation of the knowledge mining and parallel processing method for mass traffic data according to the present invention in detail by using a specific embodiment.

Example 1: the method comprises a model training process and an actual prediction process.

Model training procedure

The first step is as follows: construction of flight path prediction model

The flight path prediction model is designed by LSTM, and the specific parameters of the model are designed as follows: the number of nodes of an input layer is set to be 6, the number of nodes of an output layer is set to be 1, the prediction time step length is set to be 6, the number of the hidden layers is set to be 1, the number of nodes of the hidden layers is set to be 60, the training round is set to be 50, the ReLU is used as an activation function, the cross entropy function is used as a loss function, and the random gradient descent method is used as the optimization method of the loss function.

The second step is that: building input data

Acquiring historical ADS-B data of a plurality of flights, and extracting flight data of six fields in total, namely flight number, position longitude, position latitude, flight altitude, course angle and track point data updating time, from the historical ADS-B data to serve as training data. Dividing all training data into N (N is a positive integer) sets on average, using the N sets as training data sets of N computing servers, wherein one set corresponds to one computing server, and data belonging to the same flight are subdivided into the same subset in each set and recorded as

. Then, for the same subset

The track point data updating time fields in the flight path are sequenced, and two track points with time intervals larger than a time threshold value in all track points belonging to the same flight are taken as the ending point and the starting point of two track sequences, so that all track points belonging to the same flight can be divided into a plurality of different track sequences and recorded as different track sequences

. Constructing track points in each track sequence into input vectors

Wherein callsign represents flight number, longitude represents position longitude, latitude represents position latitude, height represents flight altitude, angle represents heading angle, and time represents data update time.

The training data set is divided into sets with the same number as the computing servers, different sets are input into different computing servers, and all the servers carry out training simultaneously, so that the overall training time of the LSTM model can be greatly reduced.

The third step: inputting training data into flight path prediction model

Any one server is set as a parameter server, the rest N servers are set as N computing servers, and all the servers are placed in a local area network to ensure smooth communication among the servers. All the servers are set as distributed training clusters, N copies of a designed flight path prediction model (LSTM) are copied and deployed to N computing servers respectively, and one flight path prediction model corresponds to one computing server.

Simultaneously starting data input in N computing servers, and enabling each subset of each set to belong to the same track sequence

The input vectors of six continuous time slices are input into six nodes of an input layer corresponding to the calculation servers for forward propagation, the input vector of one time slice corresponds to one node, a prediction result of navigation data of a seventh time slice is output, the error between the output prediction result and actual navigation data is calculated, the error is propagated in the reverse direction, and parameter matrixes of N calculation servers after one-time forward propagation and one-time backward propagation are obtained

And the subscript t indicates that the parameter matrix is obtained by the t-th iteration training.

The fourth step: parameter matrix

Decomposed into two smaller matrices

And

hypothesis parameter matrix

Is one

The matrix adopts a matrix decomposition method to matrix the parameters with more parameters

Decomposed into three matrices

、

And

wherein, the matrix

Is one

Of a matrix, a matrix

Is one

Of a matrix, a matrix

Is one

Of a diagonal matrix, matrix

Is determined. Entire parameter matrix

The formula for decomposition can be written as:

(1)

expressed by the formula (1), a parameter matrix

Can be written as two smaller matrices

And

form of multiplication, i.e.

Wherein, in the step (A),

is equal to

，

Is equal to

. Original parameter matrix

The number of parameters in (1) is

One, and the two matrices after decomposition

And

the number of parameters is reduced by a large amount, matrix

The number of parameters in (1) is

A matrix of

The number of parameters in (1) is

If one, the sum of the parameter numbers in the two matrixes is

And (4) respectively. If R is much smaller than the minimum of m and n, it is certain that the sum of the number of parameters of the decomposed two matrices is much smaller than the number of parameters in the original parameter matrix, i.e. R is much smaller than the minimum of m and n

Therefore, the number of parameters to be transmitted by each computing server is greatly reduced, and the time consumed by communication is correspondingly reduced. In the actual training process, the original parameter matrix is found through testing

In (1),

，

if it is present

Is 64, i.e.

The matrix decomposition method can convert the original parameter matrix

The 5000000 parameters are reduced to 300000, and the number of parameters needing to be transmitted is greatly reduced.

The fifth step: decomposing the matrix

And

further compression

Matrix after decomposition

And

compressing the parameter number before transmitting to the network constructed by the calculation server and the parameter server, firstly determining the training pair matrix of each iteration

And

And

And

parameter (d) ofThe number of the first compression matrix and the second compression matrix can be obtained by compressing, compared with the number of the whole parameters before compression, the number of the whole parameters is greatly reduced again, and the communication time consumption is greatly reduced again. By reducing the total number of parameters communicated in a network constructed by the calculation server and the parameter server, the communication overhead among different servers in the training process is reduced as much as possible, so that the total training time is reduced.

For the two matrices obtained by the decomposition in the fourth step

And

the compression process is the same, and thus, here, the matrix is referred to as a pair matrix

The compression is used as an example to illustrate the matrix

The compression process is similar and will not be described herein.

First, the compression ratio is determined

Since after the compression ratio is determined, the matrix is trained every iteration

The Kth value of each row vector does not change much, wherein

Therefore, it is not reasonable to update the compression threshold every iteration of training. In compressing the matrix

The compression threshold is updated after each L iterative training, where L may be 1000 times. When it is firstIn the secondary iteration training, a compression threshold value needs to be calculated, and the matrix is used

Each row of (a) is regarded as m R-dimensional vectors, values in each vector are arranged in a descending order, and a formula is expressed as:

(2)

wherein the content of the first and second substances,

representing a matrix after descending order

The 1 st row vector of the first row vector,

representing a matrix after descending order

The 2 nd row vector of the second row vector,

representing a matrix after descending order

The m-th row vector of the vector,

representation pair matrix

The elements in each row vector are arranged in descending order, each element in the vector on the left side of the equation is a main target of the subsequent operation, and each element is an R-dimensional vector. Preserving the inner prefixes of each vector

Individual element, can determine the securityThe smallest of the elements left is the compression threshold of the vector, from which the entire matrix can be derived

The compression threshold vector of (a) is as follows:

(3)

wherein the content of the first and second substances,

representation matrix

The compression threshold of the 1 st row vector,

representation matrix

The compression threshold of the 2 nd row vector,

representation matrix

A compression threshold for the m-th row vector; in the subsequent iterative training process, if the iteration times are not integral multiples of L, the compression threshold vector used in the previous iterative training can be used; if the iteration times are integral multiples of L, the compressed threshold vector needs to be updated, and the calculation method is consistent with the first calculation of the compressed threshold vector. Calculating a Mask matrix of filter operators, and calculating the matrix

And setting the element with the numerical value smaller than the compression threshold value corresponding to the row in each row vector as 0, and keeping other elements unchanged to obtain a matrix which is the Mask matrix of the filter operator. Will matrix

Multiplying the filter operator Mask matrix corresponding elements one by one to obtain a compressed matrix, namely a first compressed matrix

The calculation formula can be written as:

(4)

wherein the content of the first and second substances,

And a sixth step: compensating compression error and performing communication between servers

After compression of the parameter matrix is carried out, some calculation errors are inevitably introduced, and if the calculation errors are not eliminated, the errors can continuously accumulate and influence the final model precision after multiple rounds of training.

The first compression matrix obtained in the fifth step is processed

Sum matrix

By subtraction, the quantization error after and before compression, i.e. the AND matrix, can be obtained

And a first compression matrix

First error matrix of the same type

Here, subtraction means subtraction of corresponding elements between two matrices. At this time obtainTo two matrices participating in the communication

And

matrix of

And

is compared with the matrix

The number of parameters is greatly reduced, and the communication time can be greatly reduced. All the calculation servers transmit the first compression matrix and the first error matrix to the parameter server after obtaining the respective first compression matrix and the first error matrix, if some calculation servers do not finish the calculation after transmitting, other calculation servers synchronously wait, and when the parameter server receives the first compression matrices of all the N calculation servers

And a first error matrix

Then, the first compression matrix of the N computing servers is used

And a first error matrix

Respectively calculating the average values to obtain a first average error matrix

And a first average compression matrix

The calculation formula is as follows:

(5)

(6)

in the above two formulas, the calculation between the matrices is the operation between the corresponding elements of the same type of matrix. Obtaining a first average compression matrix

And a first average error matrix

Then, the first average compression matrix is

And a first average error matrix

And simultaneously transmitted back to the N computing servers. In N calculation servers, a first average compression matrix and a first average error matrix are used for compensating errors generated by the first compression matrix to obtain a first compensation matrix, the first compensation matrix and a second compensation matrix are multiplied to obtain an optimized (namely error-eliminated) parameter matrix, and the optimized parameter matrix is used as a new parameter matrix for next iterative training, so that errors caused by matrix compression can be well compensated, and finally obtained model precision is basically kept unchanged.

The seventh step: end of training

And returning to the third step, carrying out forward propagation again by using a new parameter matrix, outputting a new prediction result, calculating the error between the new prediction result and the actual navigation data, evaluating the error between the new prediction result and the actual navigation data by using a loss function, and stopping training when the loss function reaches the minimum value or an overfitting phenomenon occurs in the training process. A trained flight path prediction model (LSTM) can be used to actually predict flight path point positions. After the flight path prediction model is trained,

(II) actual prediction Process

The first step is as follows: flight big data preprocessing

Acquiring flight ADS-B data of domestic flights, wherein the data comprises flight numbers, take-off airports, take-off airport longitudes and latitudes, landing airports, landing airport longitudes and latitudes, planned take-off time, planned arrival time, planned routes, position longitudes and latitudes, flight heights, course angles and data updating time. And (3) carrying out data cleaning on the flight schedule data, extracting six fields of flight number, position longitude, position latitude, flight altitude, course angle and data updating time in the original data table, and deleting the rest fields.

The second step is that: actual prediction

Selecting navigation data of six continuous time slices from the extracted data, inputting the navigation data into an input layer of a flight track prediction model trained in advance, and outputting a prediction result of the navigation data of the flight to be tested through forward propagation.

The method for mining and processing the mass traffic data knowledge provided by the invention stores the designed LSTM model in each computing server of the distributed cluster, divides the mass training data set into sets with the same number as the computing servers when the training process starts, inputs different sets into different computing servers, and trains the computing servers simultaneously, thereby greatly reducing the overall training time of the LSTM model. After the calculation server finishes one-time forward propagation and backward propagation, the parameter matrix obtained by transmission to the parameter server is decomposed into a form of multiplying two matrixes by a matrix decomposition method, and the total quantity of the parameters of the two matrixes is far less than the quantity of the parameters of the parameter matrix, so that the quantity of the parameters needing to be transmitted by each calculation server is greatly reduced, and the time consumed by communication is correspondingly reduced. On the basis of the two matrixes obtained by decomposition, the compression rate of the matrixes is set, a filtering threshold value is determined in each row vector of the matrixes by using a self-adaptive threshold value filtering method, corresponding filtering operator Mask matrixes are obtained, the filtering operator Mask matrixes are multiplied by the matrixes obtained by decomposition, the compression matrixes can be obtained, the number of integral parameters is greatly reduced again compared with the number of the integral parameters, and the communication time consumption is greatly reduced again. After compression of the parameter matrix is carried out, some calculation errors are inevitably introduced, and if the calculation errors are not eliminated, the errors can continuously accumulate and influence the final model precision after multiple rounds of training. The invention calculates the error vector before and after each row vector compression in the matrix, and transmits the error vector formed by the error vector to the next round of training of each calculation server, and when the next round of training begins, firstly, the parameter matrix obtained from the parameter server and the error matrix are used for calculating to obtain the parameter matrix with the error eliminated, and then the subsequent training work is carried out, thereby well compensating the error caused by the matrix compression and keeping the accuracy of the finally obtained model basically unchanged.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for mining and parallel processing mass traffic data knowledge is characterized by comprising the following steps:

s15: simultaneously starting data entry in N computing serversInputting the input vectors of six continuous time slices belonging to the same track sequence in each subset of each set into six nodes of an input layer of a corresponding calculation server for forward propagation, outputting a prediction result of navigation data of a seventh time slice by inputting the input vector of one time slice into one node, calculating the error between the output prediction result and actual navigation data, and performing backward propagation on the error to obtain a parameter matrix of N calculation servers subjected to one-time forward propagation and one-time backward propagation

s16: hypothesis parameter matrix

Is one

The parameter matrix is decomposed by matrix decomposition

Decomposed into three matrices

、

And

wherein, the matrix

Is one

Of a matrix, a matrix

Is one

Of a matrix, a matrix

Is one

Of a diagonal matrix, matrix

The rank of (d); order to

，

，

Matrix of

Is one

Of a matrix, a matrix

One is

A matrix of (a);

s17: determining a training pair matrix for each iteration

And

And

And

2. The method for knowledge mining and parallel processing of mass traffic data according to claim 1, wherein in step S17, the matrices are respectively processed according to compression rate and compression threshold

And

will matrix

wherein the content of the first and second substances,

representing a compression matrix, p =1, 2,

3. The method for mining and parallel processing of mass traffic data knowledge according to claim 1 or 2, wherein in step S17, the compression threshold is updated after each L iterative training, and the value range of L is 1000 to 1500.