CN114067069A

CN114067069A - Track representation method and system based on deep learning

Info

Publication number: CN114067069A
Application number: CN202111354503.5A
Authority: CN
Inventors: 姚琼杰; 尹玉成; 石涤文; 丁豪; 刘奋
Original assignee: Heading Data Intelligence Co Ltd
Current assignee: Heading Data Intelligence Co Ltd
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-02-18

Abstract

The invention relates to a track representation method and a system based on deep learning, wherein the method comprises the following steps: encoding sampling points in the trajectory using a neural network; extracting spatial features of the trajectory using a convolution operation; and extracting the time sequence characteristics of the track by using an LSTM model. The invention codes the space-time characteristics of the track into dense semantic vectors by combining the neural network and the LSTM, and the calculation of the semantic vectors is equivalent to the calculation of the original track, thereby solving the problem that the traditional track calculation method is too dependent on the quality of sampling points, being capable of being used as the basis of a machine learning model in high-precision map making and being beneficial to improving the performance of the model.

Description

Track representation method and system based on deep learning

Technical Field

The invention belongs to the technical field of high-precision map making, and particularly relates to a track representation method and system based on deep learning.

Background

Guideline generation is a pain point in high precision mapping, serving for intelligent driving. The traditional method based on kinematics generates a guide line through a mathematical model, but real scene information of a road is not considered in actual use, the actual road scene is complex and changeable, and the guide line generated by the mathematical model is difficult to apply. In the crowd-sourced high-precision map, trajectory data-driven guideline generation is a promising solution to the above problem, and one of the most critical technologies is trajectory representation. It requires that the extracted trajectory representation vector be able to preserve the spatiotemporal features in the original trajectory.

Disclosure of Invention

In order to solve the problems that the traditional guideline generation process cannot adapt to the actual road scene and the space-time attribute of the guideline is not reserved, the invention provides a track representation method based on deep learning in a first aspect, which comprises the following steps: acquiring track sampling point data, and encoding sampling points in a track into vectors with fixed dimensions by using a neural network model; extracting a spatial feature vector by utilizing convolution and pooling operations of the neural network model for the coded track sampling points; and inputting the space characteristic vector into an LSTM model, and obtaining a track semantic vector with space-time characteristics by the LSTM model through extracting time sequence characteristics in the space characteristic vector.

In some embodiments of the present invention, the extracting spatial features from the vector with fixed dimensions to obtain a spatial feature vector includes the following steps: and performing space feature extraction on the vector with the fixed dimensionality by using convolution and pooling operations of a plurality of convolution networks to obtain a space feature vector.

Further, the extracting spatial features of the vector with fixed dimensions by using convolution and pooling operations of a plurality of convolution networks to obtain a spatial feature vector includes: carrying out convolution operation on the encoded track sampling points by utilizing two one-dimensional convolution networks to obtain vectors with fixed dimensions; and performing maximum pooling on the vector with the fixed dimension to obtain a characteristic vector.

In some embodiments of the invention, further comprising: and decoding and reconstructing the track semantic vector of the space-time characteristics by using a trained decoder, and outputting a reconstructed track point sequence.

Further, the self-encoder is trained by the following steps:

extracting time sequence characteristics of the space characteristic vectors, and coding the time sequence characteristics and the corresponding space characteristic vectors to obtain space-time characteristic vectors; decoding the space-time characteristic vector to obtain a decoded time sequence vector and a decoded space vector; outputting a reconstructed track point sequence according to the decoded time sequence vector and the decoded space vector; and calculating the error of the track point sequence, and updating the self-encoder until the error of the track point sequence tends to be stable and is lower than a threshold value.

Further, the decoder is trained through the following steps of extracting the time sequence characteristics of the space characteristic vector, and coding the time sequence characteristics and the space characteristic vector corresponding to the time sequence characteristics to obtain a space-time characteristic vector; decoding the space-time characteristic vector to obtain a decoded time sequence vector and a decoded space vector; outputting a reconstructed track point sequence according to the decoded time sequence vector and the decoded space vector; and calculating the error of the track point sequence, and updating the decoder until the error of the track point sequence tends to be stable and is lower than a threshold value.

In the above embodiment, the acquiring trace sample point data and encoding the trace sample point data into a vector of a fixed dimension includes: and encoding the data of the plurality of track sampling points into a vector with fixed dimensionality through a full connection layer of the convolutional network.

In a second aspect of the present invention, a trajectory representation system based on deep learning is provided, including: the acquisition module is used for acquiring track sampling point data and encoding the sampling points in the track into vectors with fixed dimensionality by utilizing a neural network model; the extraction module is used for extracting the space characteristic vector by utilizing the convolution and pooling operation of the neural network model for the coded track sampling point; and the output module is used for inputting the spatial feature vector into an LSTM model, and the LSTM model obtains a track semantic vector with space-time features by extracting time sequence features in the spatial feature vector.

The system further comprises a reconstruction module, wherein the reconstruction module is used for decoding and reconstructing the track semantic vector of the space-time characteristics by using a trained decoder and outputting a reconstructed track point sequence.

In a third aspect of the present invention, there is provided an electronic device comprising: one or more processors; a storage device, configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the deep learning based trajectory representation method provided by the first aspect of the present invention.

In a fourth aspect of the present invention, a computer-readable medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the deep learning based trajectory representation method provided in the first aspect of the present invention.

The invention has the beneficial effects that:

1. the invention provides a track representation method based on deep learning, which utilizes an unsupervised self-encoder model to encode a track into a dense vector with fixed dimensionality for other algorithms in subsequent scenes such as a high-precision map or automatic driving and the like;

2. the reconstruction with the time characteristics of the track point sequence is realized by utilizing the self-encoder through encoding and decoding the space characteristic vector, so that the space-time characteristics of the generated track data are met, the adaptability and the automation degree of the guide line are further improved, and the effective support is improved for the subsequent high-precision map making.

Drawings

FIG. 1 is a basic flow diagram of a deep learning based trajectory representation method in some embodiments of the invention;

FIG. 2 is a schematic diagram illustrating an exemplary process for extracting semantic vectors of temporal features according to some embodiments of the present invention;

FIG. 3 is a schematic diagram of a trajectory representation method based on deep learning in some embodiments of the invention;

FIG. 4 is a block diagram of an exemplary self-encoder according to some embodiments of the present invention;

FIG. 5 is a second schematic structural diagram of a self-encoder according to some embodiments of the present invention;

FIG. 6 is a diagram illustrating the effect of guidelines generated using the deep learning-based trajectory representation method of the present invention;

FIG. 7 is a diagram illustrating a detailed structure of a deep learning based trajectory representation system in some embodiments of the invention;

fig. 8 is a schematic structural diagram of an electronic device in some embodiments of the invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, in a first aspect of the present invention, there is provided a trajectory representation method based on deep learning, including: s100, acquiring track sampling point data, and encoding sampling points in a track into vectors with fixed dimensions by using a neural network model; s200, extracting spatial feature vectors by utilizing convolution and pooling operations of the neural network model for the coded track sampling points; s300, inputting the space characteristic vector into an LSTM model, and obtaining a track semantic vector with space-time characteristics by the LSTM model through extracting time sequence characteristics in the space characteristic vector.

Referring to fig. 2 and 3, in S200 according to some embodiments of the present invention, the performing spatial feature extraction on the vector with fixed dimensions to obtain a spatial feature vector includes the following steps: and performing space feature extraction on the vector with the fixed dimensionality by using convolution and pooling operations of a plurality of convolution networks to obtain a space feature vector.

Schematically, fig. 3 shows the original trajectory t sequentially passing through the sampling point encoding, the convolution module encoding, the LSTM encoding module,The LSTM decoding module and the deconvolution decoding module obtain a reconstructed track

The data processing form of the original track is sequentially subjected to the original track t, a point sequence P of the track t, a space characteristic vector PS and a space-time characteristic P_StDecoded time sequence vector and reconstructed track

By reconstructing the trajectory

And calculating the error of the original trajectory t to train or optimize the convolution network.

Preferably, the performing spatial feature extraction on the vector with fixed dimensionality by using convolution and pooling operations of a plurality of convolution networks to obtain a spatial feature vector includes: and performing convolution operation on the vector with the fixed dimensionality by utilizing two one-dimensional convolution networks, and performing maximum pooling on the vector with the fixed dimensionality after convolution to obtain a feature vector. And (3) extracting a vector with spatial characteristics by combining a plurality of track sampling points by adopting a 2d (2-dimensional) convolution network and a Pooling Pooling technology.

It will be appreciated that the extraction of the features of the invention into the reconstruction can be seen as the process of the original track going from encoding to decoding by the self-encoder. Schematically, fig. 4 or 5 shows a general network structure of the inventive self-encoder, x_i,y_iRespectively representing the longitude and the latitude of the track sampling point; FC (full Connected layers) denotes a Fully Connected layer for encoding of longitude and latitude. Conv (convolution) and Pool are Conv1d (one-dimensional convolution) and Pool operations in a convolutional network (CNN for short) respectively, for extracting spatial features in a trajectory. LSTM (Long Short Term Memory Network) is RNN (Recurrent neural Network) Network, and is used for extracting time sequence characteristics in the track.

Fig. 5 shows a special network structure of the self-encoder of the present invention, i.e. a splicing or merging process of the spatial feature vector and the spatio-temporal feature vector is added on the basis of fig. 4, when the self-encoder evolves to a conditional decoder (conditional decoder network), i.e. the spatio-temporal feature vector input from the encoder is processed into a uniform feature vector for the conditional self-encoder to encode.

In some embodiments of the present invention, decoding and reconstructing the trajectory semantic vector of the spatio-temporal features by using a trained decoder, and outputting a reconstructed trajectory point sequence.

Referring to fig. 4 or 5, in S300 according to some embodiments of the present invention, the self-encoder is trained by S301, extracting the time sequence feature of the spatial feature vector, and encoding the time sequence feature and the corresponding spatial feature vector to obtain a spatio-temporal feature vector; s302, decoding the space-time characteristic vector to obtain a decoded time sequence vector and a decoded space vector; s303, outputting a reconstructed track point sequence according to the decoded time sequence vector and the decoded space vector; s304, calculating the error of the track point sequence, and updating the decoder until the error of the track point sequence tends to be stable and is lower than a threshold value.

It is understood that the self-encoder structure in the deep learning model generally includes an encoding module and a decoding module, i.e., a combination of LSTM and convolutional network in the encoding stage or decoding stage corresponding to the present application; optionally, the encoding module or the decoding module has the same structure, except that the corresponding data processing is changed to the opposite operation, for example, convolution corresponds to deconvolution, and encoding corresponds to decoding, so the training processes of the decoder and the encoder are basically the same.

On the basis of the spatial feature vector extracted in the S100-S200, an LSTM network is adopted to extract the time sequence feature of the track point, and the last LSTM unit is used as the coding vector of the track, and the vector has a track representation vector of the semantic feature of the original track. And (3) carrying out LSTM decoding on the coded vector, wherein in order to accelerate convergence, the LSTM is a conditional decoder, and the input of each LSTM unit is a spliced vector of the track representation vector and the input of the coding network. Therefore, in an embodiment of the present invention, the mini-batch (small batch) calculation process in the training process includes:

step 1, loading training data and test data;

step 2, converting the track points of the global coordinate into local coordinates, wherein the origin of the local coordinates is the central point of the lane track;

step 3, combining the x and y coordinates of the track points into a vector with fixed dimensionality by full connection through Embedding;

step 4, extracting the spatial feature vector of the track by using two layers of 1d convolution and Maxpool on the basis of the Embedding vector;

step 5, extracting time sequence characteristic vectors by adopting an LSTM model on the basis of the space characteristic vectors, and taking the output of the last LSTM unit as the finally extracted space-time characteristic vectors;

step 6, splicing the space-time characteristic vector and the input of the coding layer LSTM together to be used as the input of the decoding layer LSTM to decode a time sequence vector;

step 7, decoding the time sequence vector into a space vector through deconvolution and inverse Maxpool;

step 8, outputting a reconstructed track point sequence through the full connection layer;

step 9, calculating a loss function, wherein the loss function is MSE (Mean squared error);

and step 10, back propagation, gradient calculation and model parameter updating. Specifically, the training algorithm is a mini-batch-based BP algorithm, and the learning strategy is RMSprop. Optionally, the learning strategy (optimization strategy) at least includes one of GD, SGD, Momentum, and Adam.

Referring to fig. 5, in order to improve the encoding and decoding efficiency of the self-encoder, in step S302, the decoding the spatio-temporal feature vector to obtain a decoded temporal vector and a decoded spatial vector includes: s3021, extracting time sequence characteristics of the space characteristic vector by using a first LSTM model, and coding the time sequence characteristics and the space characteristic vector corresponding to the time sequence characteristics to obtain a space-time characteristic vector; s3022, decoding the space-time feature vector by using second LSTM model decoding to obtain a decoded time sequence vector; and performing deconvolution and inverse pooling on the decoded time sequence vector to obtain a decoded time sequence vector and a decoded space vector. Optionally, the first LSTM model or the second LSTM model may be replaced by other deep learning models or machine learning models with a function of extracting time-series features, such as RNN, TCN, GRU, and the like.

In step S100 of the above embodiment, the acquiring and encoding trajectory sample point data into a vector with fixed dimensions includes: and encoding the data of a plurality of track sampling points into a vector with fixed dimensionality through the Embedd i ng and the full-connection layer of the convolutional network.

Specifically, the sample points consist essentially of longitude and latitude, each of which is encoded into a vector of fixed dimensionality by a fully connected network (the fully connected layer of the convolutional network). On the basis of the coding vector, a 2d convolutional network and Poo l i ng pooling technology are adopted to combine a plurality of track sampling points and extract the vector with the spatial characteristic. The method can be understood that the track with the non-fixed length is coded into the vector, so that the conversion links of vector operation are reduced, and the training speed of the deep learning model is improved.

Schematically, fig. 6 shows an effect diagram of a guideline generated by the deep learning-based trajectory representation method of the present invention; the largest octagonal frame is the intersection range of the road level, a lane line is arranged outside the octagonal frame, and the line without the small triangle mark in the frame is a virtual lane line, so that the method has no practical significance. The line marked by the small triangle is a guide line generated based on the trajectory representation method of the invention, and as can be seen from the figure, the line is smooth and basically conforms to reasonable actual driving behaviors.

Example 2

Referring to fig. 7, in a second aspect of the present invention, there is provided a trajectory representation system 1 based on deep learning, including: the acquisition module 11 is configured to acquire track sampling point data, and encode the sampling points in the track into vectors of fixed dimensions by using a neural network model; the extraction module 12 is configured to extract a spatial feature vector from the encoded trace sampling point by using convolution and pooling operations of the neural network model; and the output module 13 is configured to input the spatial feature vector into an LSTM model, and the LSTM model obtains a trajectory semantic vector with space-time features by extracting time sequence features in the spatial feature vector.

Furthermore, the reconstruction module comprises an encoding unit, a decoding unit, a reconstruction unit and a calculation unit, wherein the encoding unit is used for extracting the time sequence characteristics of the space characteristic vector, and encoding the time sequence characteristics and the corresponding space characteristic vector to obtain a space-time characteristic vector; the decoding unit is used for decoding the space-time characteristic vector to obtain a decoded time sequence vector and a decoded space vector; the reconstruction unit is used for outputting a reconstructed track point sequence according to the decoded time sequence vector and the decoded space vector; and the computing unit is used for computing the error of the track point sequence and updating the decoder until the error of the track point sequence tends to be stable and is lower than a threshold value.

Example 3

Referring to fig. 8, in a third aspect of the present invention, there is provided an electronic apparatus comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of the first aspect of the invention.

The electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following devices may be connected to the I/O interface 505 in general: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 8 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more computer programs which, when executed by the electronic device, cause the electronic device to:

computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A track representation method based on deep learning is characterized by comprising the following steps:

acquiring track sampling point data, and encoding sampling points in a track into vectors with fixed dimensions by using a neural network model;

extracting a spatial feature vector by utilizing convolution and pooling operations of the neural network model for the coded track sampling points;

and inputting the space characteristic vector into an LSTM model, and obtaining a track semantic vector with space-time characteristics by the LSTM model through extracting time sequence characteristics in the space characteristic vector.

2. The trajectory representation method based on deep learning of claim 1, wherein the spatial feature extraction of the vector with fixed dimension to obtain a spatial feature vector comprises the following steps:

and performing space feature extraction on the vector with the fixed dimensionality by using convolution and pooling operations of a plurality of convolution networks to obtain a space feature vector.

3. The trajectory representation method based on deep learning of claim 2, wherein the performing spatial feature extraction on the vector with fixed dimensions by using convolution and pooling operations of a plurality of convolution networks to obtain a spatial feature vector comprises:

carrying out convolution operation on the encoded track sampling points by utilizing two one-dimensional convolution networks to obtain vectors with fixed dimensions;

and performing maximum pooling on the vector with the fixed dimension to obtain a characteristic vector.

4. The trajectory representation method based on deep learning of claim 1, further comprising:

and decoding and reconstructing the track semantic vector of the space-time characteristics by using a trained decoder, and outputting a reconstructed track point sequence.

5. The deep learning based trajectory representation method of claim 4, wherein the decoder is trained by:

extracting time sequence characteristics of the space characteristic vectors, and coding the time sequence characteristics and the corresponding space characteristic vectors to obtain space-time characteristic vectors;

decoding the space-time characteristic vector to obtain a decoded time sequence vector and a decoded space vector;

outputting a reconstructed track point sequence according to the decoded time sequence vector and the decoded space vector;

and calculating the error of the track point sequence, and updating the decoder until the error of the track point sequence tends to be stable and is lower than a threshold value.

6. The trajectory representation method based on deep learning according to any one of claims 1 to 5, wherein the obtaining and encoding trajectory sample point data into a vector of fixed dimensions comprises:

and encoding the data of the plurality of track sampling points into a vector with fixed dimensionality through a full connection layer of the convolutional network.

7. A trajectory representation system based on deep learning, comprising:

the acquisition module is used for acquiring track sampling point data and encoding the sampling points in the track into vectors with fixed dimensionality by utilizing a neural network model;

the extraction module is used for extracting the space characteristic vector by utilizing the convolution and pooling operation of the neural network model for the coded track sampling point;

and the output module is used for inputting the spatial feature vector into an LSTM model, and the LSTM model obtains a track semantic vector with space-time features by extracting time sequence features in the spatial feature vector.

8. The deep learning based trajectory representation system of claim 7, further comprising a reconstruction module,

and the reconstruction module is used for decoding and reconstructing the track semantic vector of the space-time characteristic by using the trained decoder and outputting a reconstructed track point sequence.

9. An electronic device, comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the deep learning based trajectory representation method of any one of claims 1 to 6.

10. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the deep learning based trajectory representation method according to any one of claims 1 to 6.