CN115331037A

CN115331037A - Track data clustering method and device

Info

Publication number: CN115331037A
Application number: CN202210697754.1A
Authority: CN
Inventors: 李青; 何鑫泰; 陈坤; 王润泽
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-06-20
Filing date: 2022-06-20
Publication date: 2022-11-11

Abstract

The application provides a track data clustering method and a device, wherein the method comprises the following steps: acquiring a plurality of track data to be clustered, and generating a track image based on the track data to be clustered; the method comprises the steps of inputting data of all first pixel points into a self-encoder to obtain a feature set determined by the encoder in the self-encoder, enabling all the first pixel points to form a track image, enabling the feature set to contain extracted features based on the data of all the first pixel points, enabling the self-encoder to be obtained by training the data obtained by utilizing a track training image and performing deletion processing on the track training image, clustering the feature sets to obtain a plurality of clusters, and enabling track data to be clustered corresponding to feature sets in the clusters to be clustered into one class.

Description

Track data clustering method and device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a trajectory data clustering method and apparatus.

Background

The equipment coverage is limited, equipment faults, signals are not good, transmission loss and the like can cause missing sections in the track data, the missing sections can damage the characteristics of the track and influence the matching of track points, and therefore the accuracy of clustering the track data can be influenced.

Therefore, when there is a missing track data, how to ensure the accuracy of clustering the track data becomes a problem.

Disclosure of Invention

The application provides the following technical scheme:

one aspect of the present application provides a trajectory data clustering method, including:

acquiring a plurality of track data to be clustered, and generating a track image based on the track data to be clustered;

inputting data of all first pixel points into a self-encoder to obtain a feature set determined by an encoder in the self-encoder, wherein all the first pixel points form the track image, the feature set comprises data based on all the first pixel points and extracted features, the self-encoder is obtained by utilizing a track training image and data obtained by carrying out deletion processing on the track training image, and the track training image is generated based on the track training data;

and clustering each feature set to obtain a plurality of clusters, and clustering the track data to be clustered corresponding to the feature sets in the clusters into one class.

Optionally, the track data to be clustered includes a plurality of track point data, and the track point data includes longitude and latitude attribute data;

the track image generation based on the track data to be clustered comprises the following steps:

and generating a track based on the longitude and latitude attribute data in the track data to be clustered, and generating a track image containing the track.

Optionally, the track data to be clustered includes a plurality of track point data, and the track point data includes longitude and latitude attribute data and at least one non-longitude and latitude attribute data;

generating a track based on a plurality of longitude and latitude attribute data in the track data to be clustered, and generating a first image containing the track;

determining a color value corresponding to the non-latitude and longitude attribute data, and performing color filling on the track in the first image based on the color value to obtain a second image;

and determining to obtain a track image based on at least one second image, wherein the data of all first pixels comprises longitude and latitude attribute data corresponding to pixels in each second pixel set and color values of the pixels in the second pixel sets, the pixels in the second pixel sets form the second image, and all first pixels form the track image.

Optionally, the clustering each feature set to obtain a plurality of clusters includes:

selecting k feature sets from each feature set as k clustering centers respectively, and initially determining k clusters, wherein the clusters contain the clustering centers, and k is not more than the total number of the feature sets;

calculating the distance between each feature set to be clustered and each clustering center to obtain a plurality of distances, and allocating the feature sets to be clustered to a cluster to which the clustering center corresponding to the shortest distance in the plurality of distances belongs, wherein the feature sets to be clustered are one feature set except the k clustering centers in each feature set;

respectively recalculating the clustering centers of the clusters to obtain the current clustering centers;

determining whether a difference between the current cluster center and a historical cluster center is less than a threshold;

if the value is smaller than the threshold value, finishing clustering;

and if not, returning to the step of executing the steps of calculating the distance between each feature set to be clustered and each clustering center aiming at each feature set to be clustered.

Optionally, the self-encoder is trained by:

acquiring track training data, generating a track to be used based on the track training data, and generating a track training image containing the track to be used;

selecting partial pixel points from all third pixel points, and performing deletion processing on data of the partial pixel points to obtain target data of the partial pixel points, wherein all third pixel points form the track training image;

inputting the target data of the partial pixels and the data of the pixels except for the partial pixels in all the third pixels into a self-encoder to obtain first characteristics extracted by an encoder in the self-encoder based on the target data of the partial pixels and the data of the pixels except for the partial pixels in all the third pixels, and recovery data determined by a decoder in the self-encoder based on the first characteristics;

determining a loss function value for the self-encoder, the loss function value characterizing a difference between the recovered data and data for all of the third pixels;

judging whether the loss function value of the self-encoder is within a preset threshold range or not;

if not, updating the parameters of the self-encoder, and returning to the step of acquiring the track training data until the loss function value is within the preset threshold range.

Optionally, the determining a loss function value of the self-encoder includes:

calculation formula based on loss function

Determining a loss function value for the self-encoder;

wherein y _ true _i The data of the ith pixel point in all the third pixel points is the data of the ith pixel point in the track training image, the data of the pixel point on the track in the track training image is more than 0, and the data of the pixel point which is not on the track in the track training image is equal to 0, y _true _i For the recovered data of said ith pixel determined by said decoder,

and performing summation operation on the difference between the data of n pixel points and the recovered data, wherein n is the number of the pixel points in the track training image, alpha represents weight and is larger than 1.

Another aspect of the present application provides a trajectory data clustering device, including:

the acquisition module is used for acquiring a plurality of track data to be clustered;

the generating module is used for generating a track image based on the track data to be clustered;

the determination module is used for inputting data of all first pixel points into a self-encoder to obtain a feature set determined by an encoder in the self-encoder, wherein all the first pixel points form the track image, the feature set comprises features extracted based on the data of all the first pixel points, the self-encoder is obtained by utilizing the track training image and data obtained by carrying out deletion processing on the track training image, and the track training image is generated based on the track training data;

and the clustering module is used for clustering each feature set to obtain a plurality of clusters, and clustering the track data to be clustered corresponding to the feature sets in the clusters into one class.

Optionally, the track data to be clustered includes multiple track point data, where the track point data includes longitude and latitude attribute data;

the generation module is specifically configured to:

and generating a track based on the plurality of longitude and latitude attribute data in the track data to be clustered, and generating a track image containing the track.

Optionally, the track data to be clustered includes a plurality of track point data, where the track point data includes longitude and latitude attribute data and at least one non-longitude and latitude attribute data;

the generation module is specifically configured to:

and determining to obtain a track image based on at least one second image, wherein the data of all the first pixels comprises longitude and latitude attribute data corresponding to the pixels in each second pixel set and color values of the pixels in the second pixel sets, the pixels in the second pixel sets form the second image, and all the first pixels form the track image.

Optionally, the clustering module clusters each feature set to obtain a process of multiple clusters, which specifically includes:

calculating the distance between each feature set to be clustered and each clustering center to obtain a plurality of distances, and assigning the feature set to be clustered to a cluster to which the clustering center corresponding to the shortest distance in the plurality of distances belongs, wherein the feature set to be clustered is one of the feature sets except the k clustering centers in each feature set;

respectively recalculating the clustering centers of the clusters to obtain current clustering centers;

if the value is smaller than the threshold value, finishing clustering;

Compared with the prior art, the beneficial effect of this application is:

in the application, a track training image is generated based on track training data, a self-encoder is obtained by utilizing the track training image and data obtained by carrying out missing processing on the track training image, the self-encoder can learn to identify and repair missing data under the condition that pixel points are not missing, on the basis, the data of each pixel point in the track image is input into the self-encoder, under the condition that the track image is missing, the self-encoder can identify the missing part in the track image and repair the missing part, therefore, richer features are extracted from the data after being repaired, a feature set containing the richer features is clustered, the clustering precision can be ensured, and the clustering precision of the track data set is further ensured. Based on the characteristics, clustering is carried out, and the clustering precision can be ensured.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and for those skilled in the art, other drawings may be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a trajectory data clustering method provided in embodiment 1 of the present application;

FIG. 2 (a) is a second image corresponding to the boat letter attribute data, FIG. 2 (b) is a second image corresponding to the elevation attribute data, and FIG. 2 (c) is a second image corresponding to the velocity attribute data;

FIG. 3 is a schematic diagram of a track image provided herein;

FIG. 4 is a schematic view of a covered scene provided herein;

fig. 5 is a schematic structural diagram of a self-encoder provided in the present application;

FIG. 6 is a schematic illustration of a data set provided herein;

FIG. 7 is a comparison diagram of a clustering result provided in the present application;

FIG. 8 is a schematic diagram comparing the clustering results provided herein;

fig. 9 is a schematic structural diagram of a trajectory data clustering device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to solve the above problem, the present application provides a trajectory data clustering method, and the trajectory data clustering method provided by the present application is introduced next.

Referring to fig. 1, a schematic flowchart of a track data clustering method provided in embodiment 1 of the present application, as shown in fig. 1, the method may include, but is not limited to, the following steps:

s11, acquiring a plurality of track data to be clustered, and generating a track image based on the track data to be clustered.

In this embodiment, the track data to be clustered includes a plurality of track point data, and the track point data may include longitude and latitude attribute data. Of course, the trace point data may also include longitude and latitude attribute data and at least one non-longitude and latitude attribute data. Wherein the at least one non-latitude and longitude attribute data may include: any one or more of elevation attribute data, speed attribute data, and heading attribute data.

Corresponding to the embodiment that the track point data includes latitude and longitude attribute data, the track image is generated based on the track data to be clustered, which may include but is not limited to:

s111, generating a track based on the longitude and latitude attribute data in the track data to be clustered, and generating a track image containing the track.

In this embodiment, corresponding to an implementation mode that the trace point data may include longitude and latitude attribute data and at least one non-longitude and latitude attribute data, the track image is generated based on the to-be-clustered track data, which may include but is not limited to:

and S112, generating a track based on the longitude and latitude attribute data in the track data to be clustered, and generating a first image containing the track.

S113, determining a color value corresponding to the non-latitude and longitude attribute data, and performing color filling on the track in the first image based on the color value to obtain a second image.

In this embodiment, normalization processing may be performed on the non-latitude and longitude attribute data to obtain normalized data, and a color value corresponding to the normalized data is determined. For example, if the at least one non-latitude and longitude attribute data includes: and respectively carrying out normalization processing on the elevation attribute data, the speed attribute data and the heading attribute data to obtain normalized elevation attribute data, normalized speed attribute data and normalized heading attribute data, and determining corresponding color values.

For example, the track in the first image is color-filled based on the color value corresponding to the normalized heading attribute data to obtain the second image shown in fig. 2 (a), the track in the first image is color-filled based on the color value corresponding to the normalized elevation attribute data to obtain the second image shown in fig. 2 (b), and the track in the first image is color-filled based on the color value corresponding to the normalized speed attribute data to obtain the second image shown in fig. 2 (c).

S114, determining to obtain a track image based on at least one second image, wherein the data of all first pixel points comprise longitude and latitude attribute data corresponding to the pixel points in each second pixel point set and color values of the pixel points in the second pixel point set, the pixel points in the second pixel point set form the second image, and all first pixel points form the track image.

This step may include: and performing superposition processing on at least one second image to obtain a track image. For example, as shown in fig. 3, a second image obtained by color filling the track in the first image based on the color value corresponding to the normalized heading attribute data, a second image obtained by color filling the track in the first image based on the color value corresponding to the normalized elevation attribute data, and a second image obtained by color filling the track in the first image based on the color value corresponding to the normalized speed attribute data are superimposed to obtain a track image including three layers of second images. For example, if the second image is an image with a size of 108 × 108, the second pixel point set forming the second image includes 108 × 108 pixel points, the data of the pixel points in the second pixel point set includes longitude and latitude attribute data and color values of three channels, and the color values of the three channels of 108 × 108 pixel points may form the first color matrix 108 × 3. The trajectory image obtained based on the three second images includes three layers of second images, the data of all the first pixels includes longitude and latitude attribute data corresponding to the pixels in the three second pixel sets and color values of the pixels in the second pixel sets, and the color values of the pixels in each second pixel set form a second color matrix 108 x 9.

Fig. 3 mainly describes a mode of obtaining the trajectory image, and the form of the trajectory image is not limited to the trajectory image shown in fig. 3.

It can be understood that the pixel points and the positions of the pixel points included in each second image are the same, so that the data of all the first pixel points may include longitude and latitude attribute data corresponding to the pixel point included in one of the second images.

Through steps S112-S114, the multidimensional attribute data of the trajectory data to be clustered can be introduced more fully as input data from the encoder.

Step S12, inputting data of all first pixel points into a self-encoder to obtain a feature set determined by an encoder in the self-encoder, wherein all the first pixel points form the track image, the feature set comprises features extracted based on the data of all the first pixel points, the self-encoder is obtained by utilizing the track training image and data obtained by carrying out deletion processing on the track training image, and the track training image is generated based on the track training data.

In this embodiment, the self-encoder may be trained, but not limited to, by:

s121, obtaining trajectory training data, generating a trajectory to be used based on the trajectory training data, and generating a trajectory training image containing the trajectory to be used.

In this embodiment, the trajectory training data includes a plurality of trajectory training point data, and the trajectory training point data may include latitude and longitude attribute data. Of course, the trajectory training point data may also include longitude and latitude attribute data and at least one non-longitude and latitude attribute data. Wherein the at least one non-latitude and longitude attribute data may include: any one or more of elevation attribute data, speed attribute data, and heading attribute data.

Corresponding to the embodiment that the track point data includes latitude and longitude attribute data, generating a to-be-used track based on the track training data, and generating a track training image including the to-be-used track, which may include but is not limited to:

s1211, generating a to-be-used track based on the longitude and latitude attribute data in the track training data, and generating a track training image containing the to-be-used track.

In this embodiment, corresponding to an implementation mode in which the trajectory training point data may include longitude and latitude attribute data and at least one non-longitude and latitude attribute data, a trajectory to be used is generated based on the trajectory training data, and a trajectory training image including the trajectory to be used is generated, which may include, but is not limited to:

and S1212, generating a to-be-used track based on the plurality of longitude and latitude attribute data in the track training data, and generating a third image containing the to-be-used track.

S1213, determining a color value corresponding to the non-latitude and longitude attribute data, and performing color filling on the to-be-used track in the second image based on the color value to obtain a fourth image.

And S1214, determining to obtain a track training image based on at least one fourth image, wherein the data of all third pixels comprise the color values of pixels in the fourth pixel sets in the longitude and latitude attribute data sets corresponding to the pixels in the fourth pixel sets, the pixels in the fourth pixel sets form the fourth image, and all third pixels form the track training image.

Similar processes in this step can be referred to related descriptions in step S114, and are not described herein again.

S122, selecting partial pixel points from the data of all the third pixel points, and performing deletion processing on the data of the partial pixel points to obtain target data of the partial pixel points, wherein all the third pixel points form the track training image.

The missing processing is performed on the data of the partial pixel points, which may include but is not limited to:

s1221, obtaining data of background pixel points, wherein the background pixel points belong to the background part except the to-be-used track in the track training image in all the third pixel points;

s1222, replacing the data of the partial pixel points with the data of the background pixel points to obtain the target data of the partial pixel points.

For example, the missing processing mode may be described by using an image processing mode, and as shown in fig. 4, a region a formed by pixels having color values consistent with those of the background pixels is used to cover a portion of the trajectory to be used in the trajectory training image, so that the trajectory to be used is missing, and a covered trajectory training image is obtained.

Of course, the missing processing on the data of the partial pixel points may also include, but is not limited to: and replacing the data of the partial pixel points with 0.

And S123, inputting the target data of the partial pixel points and the data of the pixel points except the partial pixel points in all the third pixel points into a self-encoder to obtain a first characteristic extracted by an encoder in the self-encoder based on the target data of the partial pixel points and the data of the pixel points except the partial pixel points in all the third pixel points, and recovery data determined by a decoder in the self-encoder based on the first characteristic.

And S124, determining a loss function value of the self-encoder, wherein the loss function value represents the difference between the recovered data and the data of all the third pixel points.

This step may include, but is not limited to, including:

calculation formula based on loss function

Determining a loss function value for the self-encoder;

wherein y _ true _i The data of the ith pixel point in all the third pixel points is the data of the ith pixel point in the track training image, the data of the pixel point on the track in the track training image is more than 0, and the data of the pixel point which is not on the track in the track training image is equal to 0, y _pred _i Determining, for the decoder, the restored data for the ith pixel point,

It can be understood that (y _ true) _i -y_pred _i ) ² Multiplying by the weight α (α > 1) can make the loss function calculation focus more on the part of the trajectory.

And S125, judging whether the loss function value of the self-encoder is within a preset threshold range.

If not, go to step S126.

And S126, updating the parameters of the self-encoder, and returning to execute the step S121 until the loss function value is within the preset threshold range.

In this embodiment, a convolutional neural network structure is used for image recognition. The track image is relatively simple, does not need to be processed by an excessively complex network model, specifically, but not limited to, a convolutional layer of a network LeNet-5 model can be used as an encoder for feature extraction of an image, and a symmetric design decoder constitutes an auto encoder, the structure of the finally designed auto encoder is as shown in FIG. 5, the encoder includes an input layer (input), a convolutional layer (conv), a maximization pooling layer (maxpool), a convolutional layer (conv), and a maximization pooling layer (maxpool), and the decoder includes: the upsampled layer (upsample), the convolutional layer (conv), and the upsampled layer (upsample) and convolutional layer (conv) output layers (output), and the feature vector output by the maxpool of the encoder is used as the input of the upsample of the decoder.

And S13, clustering each feature set to obtain a plurality of clusters, and clustering the track data to be clustered corresponding to the feature sets in the clusters into one class.

In this embodiment, each feature set may be clustered based on, but not limited to, a k-means method or a DBSCAN method. Specifically, clustering each of the features based on a k-means method to obtain a plurality of clusters, which may include:

s131, selecting k feature sets from each feature set as k clustering centers respectively, and initially determining k clusters, wherein the clusters contain the clustering centers, and k is not more than the total number of the feature sets;

s132, aiming at each feature set to be clustered, calculating the distance between the feature set to be clustered and each clustering center to obtain a plurality of distances, and allocating the feature set to be clustered to a cluster to which the clustering center corresponding to the shortest distance in the plurality of distances belongs, wherein the feature set to be clustered is one feature set except the k clustering centers in each feature set;

in this embodiment, the distance between the feature set to be clustered and each clustering center may be calculated by, but is not limited to, the following calculation formula:

wherein d is _ij ＝dist(X _i ，X _j ) Representing the distance, X, between the feature set to be clustered and the clustering center _it One of the features in the feature set to be clustered is represented, X _jt And k represents the number of the features in the feature set to be clustered and the features in the clustering center.

S133, respectively recalculating the clustering centers of the clusters to obtain current clustering centers;

s134, determining whether the difference between the current clustering center and the historical clustering center is smaller than a threshold value;

if the threshold value is less than the threshold value, executing step S135; if not, the process returns to step S132.

And step S135, finishing clustering.

In this embodiment, the track images corresponding to the feature set in the cluster may be determined, the to-be-clustered track data corresponding to each track image may be determined, and the determined to-be-clustered track data may be clustered into one category.

In this embodiment, a trajectory training image is generated based on trajectory training data, a self-encoder is obtained by training data obtained by using the trajectory training image and missing processing performed on the trajectory training image, and it is ensured that the self-encoder can learn to identify and repair missing data under the condition that pixel points are not missing, on this basis, data of all first pixel points are input to the self-encoder, and under the condition that the trajectory image is missing, the self-encoder can identify missing parts in the trajectory image and repair the missing parts, so that richer features are extracted from the data after repair processing, a feature set containing the richer features is clustered, the clustering precision can be ensured, and the clustering precision of the trajectory data set is further ensured.

And then, verifying the effect of the trajectory data clustering method by using the data of the takeoff stage of the airplane. Specifically, the trajectory data of the takeoff phase of the aircraft is intercepted and used as a data set of a simulation experiment, and the first 100 points ADS-B of the departed part of flights from the Shanghai Rainbow bridge airport are selected as sample data, as shown in fig. 6.

It can be seen that the traces in the figure can be roughly divided into five types, and fifty traces are taken from each type, and the labels are made. After preprocessing, on the basis of ideal data, continuous track points are deleted at random positions to manufacture missing segments, and the missing rate is measured according to the proportion of the number of the deleted track points to the total number of the points. Data sets with deletion rates of 10%, 20%, 30%, 40% and 50% were generated, respectively.

In order to verify the clustering effect of the missing track data by the track data clustering method provided by the embodiment, an experiment is performed on the manufactured missing data set, and the statistical comparison of the clustering results of different methods is shown in fig. 7. The curve in fig. 7 is the change in cluster purity obtained by each algorithm as the data set deletion rate changes, and each point is the result of averaging from multiple experiments. As can be seen from fig. 7, under the absence condition, both the clustering methods can achieve a clustering purity close to 100%, but the clustering effects are reduced with the increase of the absence rate, but the method based on the self-encoder has a slow reduction speed, and the performance of the self-encoder in terms of the clustering purity is significantly improved compared with that of the common self-encoder, which proves that the method provided by the present invention can provide help in terms of the missing track clustering problem.

To further investigate the performance of the proposed method compared to the often used traditional trajectory clustering method, the proposed self-encoder based method is compared with a typical DTW algorithm herein. The DTW method is firstly proposed aiming at the problem of language segment classification, the algorithm is simple and flexible, and due to the characteristic of local distortion, the DTW method is very suitable for processing non-aligned discrete track sequences and is one of the most common methods in track similarity matching, so that the DTW method is selected for carrying out comparison experiments.

It should be noted that, when applied to track similarity calculation, most of traditional distance measurement methods such as DTW are based on spatial distance calculation, and when introducing other features, parameters such as weight and the like will have negative effects if not specially designed, so in the experiments in this section, the longitude and latitude and the elevation are converted into a geocentric coordinate system in a unified unit, which is convenient for calculation of the spatial distance of the DTW method. In the process of imaging and utilizing the self-encoder, only the position characteristics of the track are processed, only one layer of image with the introduced elevation attribute is input, and the fairness of the experiment is ensured.

In addition, in order to better simulate the situation of an actual data set, the mixed missing data set is adopted in the section, the manufactured missing data and the complete data are mixed according to a certain proportion, and the robustness of the algorithm to the missing is tested under different mixing proportions. The experiments show that the performance of the algorithm is obviously reduced in the process of reducing the loss rate from 20% to 30%, so that the experiments are further carried out on the basis, 25% of the loss data and the complete data are mixed into a mixed data set according to different proportions, and the situation of loss of part of tracks in the data set is simulated.

Statistics the results of clustering on mixed datasets of different loss rates based on the self-encoder, DTW and three methods are shown in fig. 8.

From the results, it can be seen that the method based on the self-encoder has obvious advantages in terms of clustering effect, while the DTW method is superior to the ordinary self-encoder in terms of anti-dropout performance. Compared with the common self-encoder, the self-encoder provided by the invention has the advantage that the anti-deletion performance is improved, and the clustering effect and the anti-deletion performance are not inferior to those of the classical method.

In addition, the running speed of the algorithm is also an important index, and in order to compare the time overhead of the algorithms, the time required by the algorithms to calculate the similarity of 100 and 10000 pairs of tracks is respectively counted. The results are shown in Table 1:

TABLE 1

As can be seen from the comparison of actual running data loss time, the algorithm based on the self-encoder has more advantages in running time than the conventional algorithm. The self-encoder model designed by the application is slightly slower than a self-encoder without processing in the operation speed, but has little disadvantage and still keeps advantages compared with the traditional algorithm.

In addition, the time counted by the above table is calculated one by the tracks, parallel calculation is not considered, and the method based on the network model has great advantages in the aspect of parallel calculation, so that the running speed of the self-encoder method provided by the application can be guaranteed.

Corresponding to the embodiment of the track data clustering method provided by the application, the application provides an embodiment of a track data clustering device.

Referring to fig. 9, the trajectory data clustering device includes: an acquisition module 100, a generation module 200, a determination module 300, and a clustering module 400.

An obtaining module 100, configured to obtain a plurality of trajectory data to be clustered;

a generating module 200, configured to generate a track image based on the to-be-clustered track data;

the determining module 300 is configured to input data of all first pixel points to a self-encoder to obtain a feature set determined by an encoder in the self-encoder, where the all first pixel points constitute the track image, the feature set includes features extracted based on the data of all first pixel points, the self-encoder is obtained by using a track training image and training data obtained by performing deletion processing on the track training image, and the track training image is generated based on the track training data;

and the clustering module 400 is configured to cluster each feature set to obtain a plurality of clusters, and cluster the trajectory data to be clustered corresponding to the feature sets in the clusters into one class.

In this embodiment, the track data to be clustered includes a plurality of track point data, and the track point data includes longitude and latitude attribute data;

the generating module 200 may specifically be configured to:

In this embodiment, the track data to be clustered includes a plurality of track point data, where the track point data includes longitude and latitude attribute data and at least one non-longitude and latitude attribute data;

the generating module 200 may be specifically configured to:

The process of clustering each feature set by the clustering module 400 to obtain a plurality of clusters may specifically include:

if the value is smaller than the threshold value, finishing clustering;

and if not, returning to execute the step of calculating the distance between each feature set to be clustered and each clustering center aiming at each feature set to be clustered.

In this embodiment, the trajectory data clustering device may further include:

a training module to:

determining a loss function value for said self-encoder, said loss function value characterizing a difference between said recovered data and data of said all third pixels;

The process of determining the loss function value of the self-encoder by the training module may specifically include:

calculation formula based on loss function

Determining a loss function value for the self-encoder;

wherein y _ true _i The data of the ith pixel point in all the third pixel points is the data of the ith pixel point in the track training image, the data of the pixel point on the track in the track training image is more than 0, and the data of the pixel point which is not on the track in the track training image is equal to 0, y _pred _i Determining the recovery data of the ith pixel point for the decoder,

It should be noted that the focus of each embodiment is different from that of other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and reference may be made to the partial description of the method embodiment for relevant points.

Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional identical elements in processes, methods, articles, or devices that comprise the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.

From the above description of the embodiments, it is clear for those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments or some portions of the embodiments of the present application.

The above provides a detailed description of a trajectory data clustering method and apparatus provided by the present application, and a specific example is applied in the description to explain the principle and the implementation manner of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific embodiments and the application scope may be changed, and in view of the above, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A trajectory data clustering method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the track data to be clustered comprises a plurality of track point data, and the track point data comprises longitude and latitude attribute data;

3. The method of claim 1, wherein the trajectory data to be clustered comprises a plurality of trajectory point data, the trajectory point data comprising longitude and latitude attribute data and at least one non-longitude and latitude attribute data;

generating a track image based on the track data to be clustered comprises the following steps:

and determining to obtain a track image based on at least one second image, wherein the data of all the first pixel points comprises longitude and latitude attribute data corresponding to the pixel points in each second pixel point set and color values of the pixel points in the second pixel point set, the pixel points in the second pixel point set form the second image, and all the first pixel points form the track image.

4. The method of claim 1, wherein clustering each of the feature sets to obtain a plurality of clusters comprises:

for each feature set to be clustered, calculating the distance between the feature set to be clustered and each clustering center to obtain a plurality of distances, and assigning the feature set to be clustered to a cluster to which the clustering center corresponding to the shortest distance in the plurality of distances belongs, wherein the feature set to be clustered is one of the feature sets except the k clustering centers in each feature set;

if the value is smaller than the threshold value, finishing clustering;

and if not, returning to execute the step of calculating the distance between the feature set to be clustered and each clustering center aiming at each feature set to be clustered.

5. The method of claim 1, wherein the self-encoder is trained by:

inputting the target data of the partial pixel points and the data of the pixel points except the partial pixel points in all the third pixel points into a self-encoder to obtain a first characteristic extracted by an encoder in the self-encoder based on the target data of the partial pixel points and the data of the pixel points except the partial pixel points in all the third pixel points, and recovery data determined by a decoder in the self-encoder based on the first characteristic;

and if not, updating the parameters of the self-encoder, and returning to the step of acquiring the track training data until the loss function value is within the range of the preset threshold value.

6. The method of claim 5, wherein said determining a loss function value for the self-encoder comprises:

calculation formula based on loss function

Determining a loss function value for the self-encoder;

wherein, y _ true _i The data of the ith pixel point in all the third pixel points is the data of the ith pixel point in the all the third pixel points, the data of the pixel point on the track in the track training image is more than 0, and the data of the pixel point which is not on the track in the track training image is equal to O, y _ pred _i Determining, for the decoder, the restored data for the ith pixel point,

7. A trajectory data clustering device, comprising:

the determination module is used for inputting data of all first pixel points into a self-encoder to obtain a feature set determined by an encoder in the self-encoder, wherein the track image is formed by all the first pixel points, the feature set comprises features extracted based on the data of all the first pixel points, the self-encoder is obtained by utilizing a track training image and data obtained by carrying out deletion processing on the track training image, and the track training image is generated based on the track training data;

8. The device of claim 6, wherein the track data to be clustered comprises a plurality of track point data, and the track point data comprises longitude and latitude attribute data;

the generation module is specifically configured to:

9. The apparatus of claim 6, wherein the trajectory data to be clustered comprises a plurality of trajectory point data, the trajectory point data comprising longitude and latitude attribute data and at least one non-longitude and latitude attribute data;

the generation module is specifically configured to:

10. The apparatus according to claim 6, wherein the process of clustering each feature set by the clustering module to obtain a plurality of clusters specifically comprises:

if the value is smaller than the threshold value, finishing clustering;