CN114613124A

CN114613124A - Traffic information processing method, device, terminal and computer-readable storage medium

Info

Publication number: CN114613124A
Application number: CN202011395101.5A
Authority: CN
Inventors: 代浩; 王洋; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2022-06-10
Anticipated expiration: 2040-12-03
Also published as: CN114613124B; WO2022116326A1

Abstract

The present invention is applicable to the field of intelligent transportation, and provides a traffic information processing method, device, terminal and computer-readable storage medium. The method includes: acquiring traffic point information; converting the traffic point information into traffic track information; according to the traffic track information Determine the adjacency relationship between any two adjacent traffic points in the traffic trajectory and the number of traffic passes; based on the information of the traffic point and the adjacency relationship between any two adjacent traffic points and the number of passes, use the spectral clustering method to obtain a high-dimensional map of the traffic network Feature matrix; extract category information of traffic points from high-dimensional representation; classify and store traffic points based on category information to form a database. In this technical solution, the number of track communications is used to measure the similarity between traffic points, and the spectral clustering algorithm is used to characterize the relationship between spatiotemporal data from the perspective of a graph network, so that the clustering results of traffic points are more realistic. of the transportation network.

Description

Traffic information processing method, device, terminal and computer-readable storage medium

技术领域technical field

本发明属于智能交通领域，尤其涉及一种交通信息处理方法、装置、设备及计算机可读存储介质。The present invention belongs to the field of intelligent traffic, and in particular relates to a traffic information processing method, device, device and computer-readable storage medium.

背景技术Background technique

在城市计算和智能交通领域中，平面聚类方法被广泛运用，如进行交通客流趋势分析及预测，公共交通区域划分等。通过使用聚类方法，可以将城市的平面划为不同的类，每个类可能代表不同的道路、小区等，还可以进一步从中挖掘出商圈、住宅、拥堵路段等。In the field of urban computing and intelligent transportation, plane clustering methods are widely used, such as the analysis and prediction of traffic passenger flow trends, and the division of public transportation areas. By using the clustering method, the plane of the city can be divided into different classes, and each class may represent different roads, communities, etc., and can further excavate business districts, residences, and congested road sections.

近年来，随着城市大数据和智能交通技术的发展，数据中可以挖掘出越来越多的信息，进一步为不同的应用提供了支持，如商圈挖掘、精准广告、交通运力分析、客流预测等等。为了方便分析，这类应用都需要一个基础的前提，即需要将城市平面图进行离散化，也就是说将城市的平面划分为不同的小块，这样便可以给每个小块打上不同的标签，支持进一步的分析。In recent years, with the development of urban big data and intelligent transportation technology, more and more information can be mined from the data, which further provides support for different applications, such as business district mining, precise advertising, traffic capacity analysis, and passenger flow forecasting. and many more. In order to facilitate analysis, such applications require a basic premise, that is, the city floor plan needs to be discretized, that is, the city plane is divided into different small blocks, so that each small block can be labeled with different labels. Support for further analysis.

通常我们采集到的数据都是由交通参与者，如公交、网约车、出租车、地铁、私家车等上报的GPS数据，而这些GPS数据代表了城市中可以通行的不同道路，如何将这些GPS点划分为不同的聚类，是一个相当基础而重要的问题。Usually the data we collect are GPS data reported by traffic participants, such as buses, online car-hailing, taxis, subways, private cars, etc. These GPS data represent different roads that can pass in the city. How to combine these The division of GPS points into different clusters is a fairly basic and important problem.

针对空间平面的聚类，目前有很多方法，如K-Means、DBSCAN、以及大量的基于这两种的变体，通过计算GPS点之间的欧式距离，结合这两种方法可以将GPS点划为不同的类，即将空间平面划为不同的块。For the clustering of the spatial plane, there are currently many methods, such as K-Means, DBSCAN, and a large number of variants based on these two. By calculating the Euclidean distance between GPS points, combining these two methods can be used to delineate GPS points. For different classes, the space plane is divided into different blocks.

上述现有技术的主要问题是，它们采用欧式距离作为衡量指标，即空间上的远近关系。但在实际场景中，物体(如车辆或人)通常在空间网络中移动，即是以一种网络结构的方式在移动，而不是在欧几里得平面空间上移动。举个例子，车辆通常是沿着城市道路在移动，而尽管两条路在空间上可能相邻很近，但事实上不可能有车辆能从一条路跨过隔栏移动到另一条路。在以欧式距离衡量关系的K-Means或是DBSCAN中，这样的两条路上的车辆上报的GPS，会被认为是一类，但这显然与实际情况并不符合。即是说，利用空间距离来做聚类，会将完全不可能相交的相邻点分到同一类中。The main problem with the above-mentioned prior art is that they use the Euclidean distance as a measurement index, that is, the distance relationship in space. But in real scenes, objects (such as vehicles or people) usually move in a spatial network, that is, in a network-structured manner, rather than in Euclidean plane space. For example, vehicles are often moving along city roads, and although two roads may be close in space, it is virtually impossible for a vehicle to move from one road across the barrier to the other. In K-Means or DBSCAN, which measures the relationship by Euclidean distance, the GPS reported by the vehicles on such two roads will be considered as one type, but this is obviously not in line with the actual situation. That is to say, using spatial distance for clustering will classify adjacent points that are completely impossible to intersect into the same class.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供了交通信息处理方法、装置、终端及计算机可读存储介质，以解决在进行交通信息处理时，传统的聚类算法中使用欧式距离导致无法表示复杂城市交通网络的问题。In view of this, the embodiments of the present invention provide a traffic information processing method, device, terminal, and computer-readable storage medium, so as to solve the problem that the use of Euclidean distance in traditional clustering algorithms results in inability to represent complex urban traffic networks during traffic information processing. The problem.

本发明实施例的第一方面提供了一种交通信息的处理方法，包括：A first aspect of the embodiments of the present invention provides a method for processing traffic information, including:

获取交通点的信息；Get information about traffic points;

将所述交通点的信息转换为交通轨迹信息；converting the information of the traffic point into traffic track information;

根据所述交通轨迹信息确定交通轨迹中任意两个邻接的交通点的邻接关系以及轨迹通行次数；Determine the adjacency relationship between any two adjacent traffic points in the traffic track and the number of track passes according to the traffic track information;

基于所述交通点的信息和任意两个邻接的交通点的邻接关系以及通行次数，使用谱聚类的方法得到由所述交通点构成的交通网络的高维表征；Based on the information of the traffic point and the adjacency relationship between any two adjacent traffic points and the number of passes, the method of spectral clustering is used to obtain a high-dimensional representation of the traffic network composed of the traffic points;

从所述高维表征中提取出所述交通点的类别信息；extracting category information of the traffic point from the high-dimensional representation;

基于所述类别信息对所述交通点的信息进行分类存储，形成数据库。Based on the category information, the information of the traffic point is classified and stored to form a database.

本发明实施例的第二方面提供了一种交通信息的处理装置，包括：A second aspect of the embodiments of the present invention provides a device for processing traffic information, including:

获取单元，用于获取交通点的信息；The acquisition unit is used to acquire the information of the traffic point;

转换单元，用于将所述交通点的信息转换为交通轨迹信息；a conversion unit, configured to convert the information of the traffic point into traffic track information;

确定单元，用于根据所述交通轨迹信息确定交通轨迹中任意两个邻接的交通点的邻接关系以及轨迹通行次数；a determining unit, configured to determine the adjacency relationship between any two adjacent traffic points in the traffic track and the number of track passes according to the traffic track information;

计算单元，用于基于所述交通点的信息和任意两个邻接的交通点的邻接关系以及通行次数，使用谱聚类的方法得到由所述交通点构成的交通网络的高维表征；a computing unit, configured to obtain a high-dimensional representation of the traffic network composed of the traffic points by using the method of spectral clustering based on the information of the traffic points and the adjacency relationship between any two adjacent traffic points and the number of passes;

提取单元，用于从所述高维表征中提取出所述交通点的类别信息；an extraction unit, configured to extract the category information of the traffic point from the high-dimensional representation;

存储单元，用于基于所述类别信息对所述交通点的信息进行分类存储，形成数据库。The storage unit is configured to classify and store the information of the traffic point based on the category information to form a database.

本发明实施例的第三方面提供了一种终端，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上述方法的步骤。A third aspect of the embodiments of the present invention provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, which is implemented when the processor executes the computer program the steps of the above method.

本发明实施例的第四方面提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现如上述方法的步骤。A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, wherein when the computer program is executed by a processor, the steps of the above method are implemented.

本发明实施例与现有技术相比存在的有益效果是：The beneficial effects that the embodiment of the present invention has compared with the prior art are:

本技术方案中，使用轨迹通信次数来衡量交通点之间的相似性，并利用谱聚类算法，从图网络的角度来表征时空数据之间的关系，使得对交通点的聚类结果更加符合实际的交通网络情况。In this technical solution, the number of track communications is used to measure the similarity between traffic points, and the spectral clustering algorithm is used to characterize the relationship between spatiotemporal data from the perspective of a graph network, so that the clustering results of traffic points are more consistent with actual traffic network conditions.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present invention. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本发明的交通信息处理方法的第一实施例的流程图；Fig. 1 is a flow chart of the first embodiment of the traffic information processing method of the present invention;

图2是本发明的交通信息处理方法的第一实施例中S13的细化流程示意图；FIG. 2 is a schematic diagram of the refinement flow of S13 in the first embodiment of the traffic information processing method of the present invention;

图3是本发明的交通信息处理方法的第一实施例中S14的细化流程示意图；3 is a schematic diagram of a refinement flow of S14 in the first embodiment of the traffic information processing method of the present invention;

图4是本发明的交通信息处理方法的第一实施例中分布式数据库系统的结构示意图；4 is a schematic structural diagram of a distributed database system in the first embodiment of the traffic information processing method of the present invention;

图5是不同聚类算法的效果图；Figure 5 is the effect diagram of different clustering algorithms;

图6是本发明的交通信息处理方法的第二实施例的流程图；6 is a flowchart of a second embodiment of the traffic information processing method of the present invention;

图7是数据查询延迟测试图；Figure 7 is a data query delay test chart;

图8是本发明的交通信息处理装置的第一实施例的结构示意图；FIG. 8 is a schematic structural diagram of the first embodiment of the traffic information processing device of the present invention;

图9是本发明的终端的第一实施例的结构示意图。FIG. 9 is a schematic structural diagram of a first embodiment of a terminal of the present invention.

具体实施方式Detailed ways

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、技术之类的具体细节，以便透彻理解本发明实施例。然而，本领域的技术人员应当清楚，在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中，省略对众所周知的系统、装置、电路以及方法的详细说明，以免不必要的细节妨碍本发明的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as specific system structures and technologies are set forth in order to provide a thorough understanding of the embodiments of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

为了说明本发明所述的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solutions of the present invention, the following specific embodiments are used for description.

在本发明的交通信息处理方法的实施例中，通过获取交通点的信息，将交通点的信息转换为交通轨迹信息，并根据交通轨迹信息确定交通轨迹中任意两个邻接的交通点的邻接关系以及轨迹通行次数，基于交通点的信息和任意两个邻接的交通点的邻接关系以及通行次数，使用谱聚类的方法得到由交通点构成的交通网络的高维图特征矩阵。并基于从高维图特征矩阵中提取出交通点的类别信息对交通点进行分类存储，形成数据库。In the embodiment of the traffic information processing method of the present invention, the information of the traffic point is obtained by converting the information of the traffic point into the traffic track information, and the adjacency relationship between any two adjacent traffic points in the traffic track is determined according to the traffic track information. As well as the number of track passes, based on the information of traffic points and the adjacency relationship between any two adjacent traffic points and the number of passes, the high-dimensional graph feature matrix of the traffic network composed of traffic points is obtained by using the method of spectral clustering. And based on the category information of the traffic points extracted from the feature matrix of the high-dimensional graph, the traffic points are classified and stored to form a database.

在本发明的实施例中，使用轨迹通信次数来衡量交通点之间的相似性，并利用谱聚类算法，从图网络的角度来表征时空数据之间的关系，使得对交通点的聚类结果更加符合实际的交通网络情况。In the embodiment of the present invention, the number of track communications is used to measure the similarity between traffic points, and the spectral clustering algorithm is used to characterize the relationship between spatiotemporal data from the perspective of a graph network, so that the clustering of traffic points The results are more in line with the actual traffic network situation.

图1为本发明的交通信息处理方法的第一实施例的流程图，如图1所示，交通信息处理方法包括以下步骤：FIG. 1 is a flowchart of the first embodiment of the traffic information processing method of the present invention. As shown in FIG. 1 , the traffic information processing method includes the following steps:

S11，获取交通点的信息。S11, acquiring information of the traffic point.

其中，交通点的信息包括由交通参与者(如公交、网约车、出租车、地铁、私家车等)上报的GPS数据。每条GPS数据可以表示为(参与者ID，上报时间time，经度longtitude，纬度latitude)。获取的方式可以是直接接收，也可以是从本地或者其他存储介质中读取该存储介质所存储的历史GPS数据。Among them, the information of the traffic point includes GPS data reported by traffic participants (such as public transport, online car-hailing, taxi, subway, private car, etc.). Each piece of GPS data can be represented as (participant ID, reporting time time, longitude longtitude, latitude latitude). The acquisition method may be direct receiving, or may be reading historical GPS data stored in the storage medium from a local or other storage medium.

S12，将交通点的信息转换为交通轨迹信息。S12: Convert the traffic point information into traffic track information.

在本实施例中，交通轨迹信息包括交通轨迹序列，具体地，根据GPS数据的参与者ID将同一参与者上报的GPS数据分为一组，并按照GPS数据的上报时间排序，得到各交通参与者的交通轨迹序列。In this embodiment, the traffic trajectory information includes a traffic trajectory sequence. Specifically, the GPS data reported by the same participant is grouped into one group according to the participant ID of the GPS data, and sorted according to the reporting time of the GPS data to obtain each traffic participant ID. The traffic trajectory sequence of the person.

在本发明的方法的其他实施例中，还可以根据GPS数据中的其他参数，对GPS数据进行分类，以获取相应的交通轨迹信息。In other embodiments of the method of the present invention, the GPS data may also be classified according to other parameters in the GPS data to obtain corresponding traffic track information.

S13，根据交通轨迹信息确定交通轨迹中任意两个邻接的交通点的邻接关系以及轨迹通行次数。S13: Determine the adjacency relationship between any two adjacent traffic points in the traffic track and the number of times of the track passing according to the traffic track information.

在本实施中，如图2所示，步骤S13具体包括以下子步骤：In this implementation, as shown in FIG. 2 , step S13 specifically includes the following sub-steps:

S131，将交通轨迹信息拆分为邻接的交通点对；S131, splitting the traffic trajectory information into adjacent traffic point pairs;

S132，根据邻接的交通点对确定任意两个邻接的交通点的邻接关系；S132, determine the adjacency relationship of any two adjacent traffic points according to the adjacent traffic point pair;

S133，根据邻接的交通点对确定任意两个邻接的交通点的轨迹通行次数。S133 , determining the number of track passes of any two adjacent traffic points according to the adjacent traffic point pair.

具体的，在子步骤S131中，可以用geohash的方法把GPS数据的经度以及纬度编码成一个编号lx，即lx代表了一个具体的交通点，也就是代表一对经纬度。假设x1、x2、x3和x4是逐一邻接的四个交通点，这样轨迹序列就可以表示为(i,lx1,t1),(i,lx2,t2),(i,lx3,t3),(i,lx4,t4)，其中i为上报相应GPS数据的交通参与者的ID，lx1-lx4分别对应为交通点x1-x4的经纬度经geohash编码所获得编号，t1-t4为相应的GPS数据上报时间。相应拆分出的邻接的交通点对表示为(lx1,lx2),(lx2,lx3)和(lx3,lx4)。通过上述方法，将每条轨迹序列拆分为邻接的交通点对的集合(la，lb)，其中a和b表示任意两个邻接的交通点。Specifically, in sub-step S131, the longitude and latitude of the GPS data can be encoded into a number lx by the method of geohash, that is, lx represents a specific traffic point, that is, a pair of longitude and latitude. Assuming that x1, x2, x3 and x4 are four traffic points adjacent one by one, the trajectory sequence can be expressed as (i,lx1,t1),(i,lx2,t2),(i,lx3,t3),(i , lx4, t4), where i is the ID of the traffic participant reporting the corresponding GPS data, lx1-lx4 correspond to the numbers obtained by geohash coding of the longitude and latitude of the traffic point x1-x4, and t1-t4 are the corresponding GPS data reporting time . The corresponding split-out adjacent traffic point pairs are denoted as (lx1,lx2), (lx2,lx3) and (lx3,lx4). Through the above method, each trajectory sequence is split into a set (la, lb) of adjacent traffic point pairs, where a and b represent any two adjacent traffic points.

在子步骤S132中，将每个GPS点看作顶点，那么在顶点la和lb之间就存在一个邻接边，即邻接关系e(a，b)。In sub-step S132, each GPS point is regarded as a vertex, then there is an adjoining edge between the vertexes la and lb, that is, an adjacency relationship e(a, b).

在子步骤S133中，由于是无向图，统计邻接的交通点对的集合中出现(la，lb)和(lb，la)的数量，就可以得到任意两个邻接的交通点的轨迹通行次数w(a,b)，w(a,b)表示为:In sub-step S133, since it is an undirected graph, the number of (la, lb) and (lb, la) appearing in the set of adjacent traffic point pairs can be counted, and the number of track passes of any two adjacent traffic points can be obtained. w(a,b), w(a,b) is expressed as:

w(a，b)＝w′(a，b)+w′(b，a)w(a,b)=w'(a,b)+w'(b,a)

S14，基于交通点的信息和任意两个邻接的交通点的邻接关系以及通行次数，使用谱聚类的方法得到由所述交通点构成的交通网络的高维图特征矩阵。S14 , based on the information of the traffic points and the adjacency relationship between any two adjacent traffic points and the number of passes, a spectral clustering method is used to obtain a high-dimensional graph feature matrix of the traffic network composed of the traffic points.

在本实施例中，如图3所示，步骤S14具体包括以下子步骤：In this embodiment, as shown in FIG. 3 , step S14 specifically includes the following sub-steps:

S141,以任意两个邻接的交通点的轨迹通行次数为相应两个邻接的交通点的邻接关系的权重值，来确定邻接矩阵；S141, determine the adjacency matrix with the number of times of the trajectory passing of any two adjacent traffic points as the weight value of the adjacency relationship of the corresponding two adjacent traffic points;

S142,根据交通点的信息和任意两个邻接的交通点的邻接关系以及邻接矩阵，获得由交通点构成的交通网络的带权重的无向邻接图；S142, according to the information of the traffic point and the adjacency relationship of any two adjacent traffic points and the adjacency matrix, obtain a weighted undirected adjacency graph of the traffic network composed of the traffic points;

S143,根据带权重的无向邻接图，计算由交通点构成的交通网络的高维图特征矩阵。S143, according to the weighted undirected adjacency graph, calculate a high-dimensional graph feature matrix of the traffic network composed of traffic points.

具体的，在子步骤S141中，设邻接关系e(a,b)的总权值为w(a,b)，也就是任意两个邻接的交通点之间的相似度，这样可以得到一个邻接矩阵W。Specifically, in sub-step S141, set the total weight of the adjacency relationship e(a,b) to w(a,b), that is, the similarity between any two adjacent traffic points, so that an adjacency can be obtained. matrix W.

在子步骤S142中，根据任意GPS点的数据V和任意两个邻接的交通点的邻接关系E以及邻接矩阵W(邻接矩阵表示所有顶点之间是否有连接，如果有连接，那连接边权重就是对应的矩阵上的值)，获得由这些GPS点构成的交通网络的带权重的无向邻接图G(V,E,W)。In sub-step S142, according to the data V of any GPS point and the adjacency relationship E of any two adjacent traffic points and the adjacency matrix W (the adjacency matrix indicates whether there is a connection between all vertices, if there is a connection, then the connection edge weight is value on the corresponding matrix) to obtain a weighted undirected adjacency graph G(V,E,W) of the traffic network composed of these GPS points.

在子步骤S143中，根据带权重的无向邻接图G(V,E,W)，可以计算出每个交通点对应的度(degree)di：In sub-step S143, according to the weighted undirected adjacency graph G(V, E, W), the degree (degree) di corresponding to each traffic point can be calculated:

计算出所有交通点对应的度，就可以获得度矩阵Deg。After calculating the degrees corresponding to all traffic points, the degree matrix Deg can be obtained.

基于度矩阵Deg以及邻接矩阵W，通过以下公式计算带权重的无向邻接图G的非正则化的拉普拉斯矩阵Lap：Based on the degree matrix Deg and the adjacency matrix W, the unregularized Laplacian matrix Lap of the weighted undirected adjacency graph G is calculated by the following formula:

Lap＝Deg-WLap=Deg-W

之后，求解出拉普拉斯矩阵Lap的特征向量和特征值。为了减少计算量，同时避免大量无意义的聚类的出现，将对特征向量Λ_n使用TopK算法：After that, the eigenvectors and eigenvalues of the Laplacian matrix Lap are solved. In order to reduce the amount of computation and avoid the appearance of a large number of meaningless clusters, the TopK algorithm will be used for the feature vector _Λn :

Λ_n←solve|Lap-λI|＝0extract the eigenvalues(λ₁，λ₂，…，λ_n)；Λ _n ←solve|Lap-λI|=0 extract the eigenvalues(λ ₁ , λ ₂ ,...,λ _n );

Λ_k←TopK(Λ_n)；Λ _k ←TopK(Λ _n );

具体的，普拉斯矩阵Lap是n*n的，前面取特征值得到了n个特征值(n是顶点的个数)，每个特征值对应一个长度为n的特征向量。然后对特征值按大小取topk(最大的k个特征值)，把前k个特征值对应的特征向量按顺序排列，就得到了一个k*n的新矩阵，即为这些GPS点构成的交通网络的高维图特征矩阵。Specifically, the Plasma matrix Lap is n*n, and n eigenvalues (n is the number of vertices) are obtained by taking the eigenvalues above, and each eigenvalue corresponds to an eigenvector of length n. Then take topk (the largest k eigenvalues) according to the size of the eigenvalues, and arrange the eigenvectors corresponding to the first k eigenvalues in order to obtain a new matrix of k*n, which is the traffic composed of these GPS points. The high-dimensional graph feature matrix of the network.

S15，从高维图特征矩阵中提取出交通点的类别信息。S15, the category information of the traffic point is extracted from the high-dimensional graph feature matrix.

在本实施例中，在高维图特征矩阵上进行聚类，就可以提取出不同交通点的类别信息，提取出的交通点的类别信息包括GPS点的分类表(经度，纬度，类别编号)。In this embodiment, clustering is performed on the feature matrix of the high-dimensional map, and the category information of different traffic points can be extracted, and the category information of the extracted traffic points includes the classification table of GPS points (longitude, latitude, category number) .

S16，基于类别信息对交通点的信息进行分类存储，形成数据库。S16, classify and store the information of the traffic point based on the category information to form a database.

在本实施例中，得到GPS点的分类表之后，根据分类表将原来GPS数据存储到数据库中，之后利用类别编号clusterID即可查出对应的GPS数据，做进一步分析。In this embodiment, after the classification table of GPS points is obtained, the original GPS data is stored in the database according to the classification table, and then the corresponding GPS data can be found out by using the category number clusterID for further analysis.

具体的，可以使用分布式的数据库对原始的GPS数据进行转换存储。例如，采用HBase+Phoenix(Phoenix是Hbase的一个插件，可以为HBase提供一个类似SQL的查询接口，通过它可以直接使用SQL来查询HBase的数据)的方式存储数据，聚类算法的实现则是基于Spark-SQL框架，因此整个系统的架构如图4所示。Specifically, a distributed database can be used to convert and store the original GPS data. For example, using HBase+Phoenix (Phoenix is a plug-in of HBase, which can provide HBase with a query interface similar to SQL, through which you can directly use SQL to query HBase data) The way to store data, the implementation of the clustering algorithm is based on Spark-SQL framework, so the architecture of the whole system is shown in Figure 4.

交通参与者上报的GPS数据RawData，在分析引擎Analysis Engine中运行的上述算法来实现分类。Analysis Engine是程序运行的大数据框架和程序，可以用Spark-SQL来实现上述算法。服务器节点Master是spark用来管理集群的服务器节点，程序通过master进行提交。服务器节点worker是分布式运行算法的服务器节点，提交到master的程序被切分后分别发送到不同的worker上去运行，运行完成后把结果发送回master。The GPS data RawData reported by traffic participants is classified by the above-mentioned algorithm running in the Analysis Engine. Analysis Engine is a big data framework and program that programs run, and Spark-SQL can be used to implement the above algorithms. The server node Master is the server node used by spark to manage the cluster, and the program is submitted through the master. The server node worker is the server node that runs the algorithm in a distributed manner. The program submitted to the master is divided and sent to different workers to run. After the operation is completed, the result is sent back to the master.

客户端Query可以运行在客户的终端设备上，例如客户的计算机，通过安装SQL客户端的就可以查询HBASE中存储的数据。The client Query can run on the client's terminal device, such as the client's computer, and the data stored in HBASE can be queried by installing the SQL client.

其中，Spark-SQL实现了上述的聚类算法，并根据产生的分类表将GPS数据存储到HBase中，HBase将使用多台服务器(即图中的Region Server)作为存储介质，即GPS数据数据分散在不同的服务器上。具体的，通过将ClusterID放在行键Rowkey(HBase通过Rowkey来查询数据)的开头，然后将以ClusterID为c1(Rowkey开头是c1)的数据都放在第一服务器Region Server1，ClusterID为c2的都放到第二服务器Region Server2，以此类推，实现了不同类别的GPS数据放到不同的服务器中。Among them, Spark-SQL implements the above clustering algorithm, and stores the GPS data in HBase according to the generated classification table. HBase will use multiple servers (ie Region Server in the figure) as the storage medium, that is, the GPS data data is scattered on different servers. Specifically, by placing the ClusterID at the beginning of the row key Rowkey (HBase uses Rowkey to query data), and then placing the data whose ClusterID is c1 (Rowkey starts with c1) on the first server Region Server1, and the ClusterID is c2. Put it into the second server Region Server2, and so on, to realize that different types of GPS data are put into different servers.

另外，为了优化查询效率，可以用倒排索引来设计HBase的RowKey：In addition, in order to optimize the query efficiency, the RowKey of HBase can be designed with an inverted index:

RowKey＝ClusterID+MaxValue-Timestamp+IDRowKey=ClusterID+MaxValue-Timestamp+ID

其中ClusterID是聚类的分类ID，MaxValue是时间的最大值，利用它与实际时间差来实现倒排索引，即是最新的数据被放在数据库上层，查询效率更高。一般情况下，数据在空间上的分布并不均匀，因此使用ClusterID作为RowKey的前缀难免会出现热点问题。为了解决这个问题并使计算并行进行，RowKey被尽可能广泛地分散。例如，可以使用Phoenix的一个功能Salted Tables，指定有多少台服务器(Region Servers)，它会透明的在Rowkey前面再加上一个Hash符号，然后根据前面说的，不同符号开头的Rowkey的数据会放到不同的服务器上，均匀地把数据放到不同的服务器。因为rowkey的第二段字符串是ClusterID，所以也会尽可能让同一个ClusterID的数据放到一台服务器。Among them, ClusterID is the classification ID of the cluster, and MaxValue is the maximum value of time. The inverted index is realized by using the difference between it and the actual time, that is, the latest data is placed in the upper layer of the database, and the query efficiency is higher. In general, data is not evenly distributed in space, so using ClusterID as the prefix of RowKey will inevitably lead to hot issues. To solve this problem and make the computation parallelize, RowKey is distributed as widely as possible. For example, you can use Salted Tables, a function of Phoenix, to specify how many servers (Region Servers) there are. It will transparently add a Hash symbol in front of Rowkey, and then according to the above, the data of Rowkey at the beginning of different symbols will be placed To different servers, evenly put the data on different servers. Because the second string of rowkey is ClusterID, the data of the same ClusterID will be put on one server as much as possible.

针对北京的出租车轨迹，分别使用(a)KMeans方法，(b)DBSCAN方法以及(c)本发明第一实施例的方法(即图5中的Spectral)来对GPS数据分类，分类结果如图5所示，可以看出使用(a)KMeans方法和(b)DBSCAN方法进行聚类，通常会将空间分为比较规整的块，而这些和实际情况往往并不符合。而使用(c)本发明第一实施例的方法进行聚类，将采用图网络的形式表征空间，划分的不同类别更符合真实的轨道或是道路的分布。For the taxi trajectory in Beijing, (a) KMeans method, (b) DBSCAN method and (c) the method of the first embodiment of the present invention (ie Spectral in FIG. 5 ) are used to classify GPS data, and the classification results are shown in the figure 5, it can be seen that clustering using (a) KMeans method and (b) DBSCAN method usually divides the space into relatively regular blocks, which are often inconsistent with the actual situation. Using (c) the method of the first embodiment of the present invention for clustering will use the form of a graph network to represent the space, and the different categories divided are more in line with the distribution of real tracks or roads.

在本发明的交通信息处理方法的第一实施例中，使用图的方式来表示不同交通点之间的邻接关系，进而利用邻接图来进行聚类分析，划分城市交通空间平面。针对衡量不同交通点之间的相关性的问题，利用历史交通数据的统计分析出两个邻接的交通点之间的轨迹通行次数进行表示，这种方法比简单的欧式距离更健壮，对运动对象的距离测量误差更加鲁棒。In the first embodiment of the traffic information processing method of the present invention, the adjacency relationship between different traffic points is represented by a graph, and then the adjacency graph is used to perform cluster analysis to divide the urban traffic space plane. Aiming at the problem of measuring the correlation between different traffic points, the statistics of historical traffic data are used to analyze the number of track passes between two adjacent traffic points. The distance measurement error is more robust.

图6是本发明的交通信息处理方法的第二实施例的流程图，在本实施例中，本发明的方法包括：6 is a flowchart of a second embodiment of the traffic information processing method of the present invention. In this embodiment, the method of the present invention includes:

S61，获取交通点的信息；S61, obtain information of the traffic point;

S62，将交通点的信息转换为交通轨迹信息；S62, converting the traffic point information into traffic trajectory information;

S63，根据交通轨迹信息确定交通轨迹中任意两个邻接的交通点的邻接关系以及轨迹通行次数；S63, determine the adjacency relationship between any two adjacent traffic points in the traffic track and the number of track passes according to the traffic track information;

S64，基于交通点的信息和任意两个邻接的交通点的邻接关系以及通行次数，使用谱聚类的方法得到由交通点构成的交通网络的高维图特征矩阵；S64, based on the information of the traffic point and the adjacency relationship between any two adjacent traffic points and the number of passes, use the method of spectral clustering to obtain a high-dimensional graph feature matrix of the traffic network composed of the traffic points;

S65，从高维图特征矩阵中提取出所述交通点的类别信息；S65, extract the category information of the traffic point from the high-dimensional map feature matrix;

S66，基于类别信息对交通点的信息进行分类存储，形成数据库；S66, classify and store the traffic point information based on the category information to form a database;

S67，接收获取特定类别的交通点的信息的请求；S67, receiving a request for obtaining information of a traffic point of a specific category;

S68，基于请求中的类别信息，从数据库中获取相应类别的交通点的信息，以进行响应。S68, based on the category information in the request, obtain information of traffic points of the corresponding category from the database to respond.

在本实施例中，步骤S61-S66与方法第一实施例的S11-S16对应相同，在此不再赘述。In this embodiment, steps S61-S66 correspond to the same as S11-S16 in the first embodiment of the method, and are not repeated here.

参见图4，在步骤S67中，从客户端Query接收获取交通点的信息的请求，请求中包括交通点的信息的类别信息。在步骤S68中，基于类别信息，从数据库中获取相应类别的交通点的信息，以进行响应。例如，将相应的交通点的信息直接或者做预设处理之后发送到相应的客户端。Referring to FIG. 4 , in step S67 , a request for acquiring information of the traffic point is received from the client Query, and the request includes category information of the information of the traffic point. In step S68, based on the category information, the information of the traffic point of the corresponding category is obtained from the database to respond. For example, the information of the corresponding traffic point is sent to the corresponding client directly or after preset processing.

由于采用了HBase+Phoenix的方式存储数据，Phoenix可以为HBase提供一个类似SQL的查询接口，通过它可以直接使用SQL来查询HBase的数据，查询时SQL语句会被Phoenix解析成并行语句，在多个存储介质上并行查询数据，加快效率，可以应用到毫秒级实时查询。Due to the use of HBase+Phoenix to store data, Phoenix can provide a SQL-like query interface for HBase, through which you can directly use SQL to query HBase data. When querying, the SQL statement will be parsed into parallel statements by Phoenix. Data is queried in parallel on the storage medium to speed up efficiency and can be applied to millisecond-level real-time queries.

如图7所示，针对43亿的GPS数据集上进行了测试，本发明的框架随着查询的输入规模增多，查询延迟呈近似线形的增长，大部分情况下可以达到4s内的延迟，满足实时查询的需求。As shown in Fig. 7, the test is carried out on the 4.3 billion GPS data set. As the input scale of the query increases, the query delay increases approximately linearly. In most cases, the delay within 4s can be achieved, satisfying the real-time query requirements.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本发明实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

本发明实施例中，还提供了一种交通信息处理装置，交通信息处理装置包括的各单元用于执行图1对应的实施例中的各步骤。具体请参阅图1-5对应的实施例中的相关描述。图8示出了本发明的交通信息处理装置800的第一实施例的结构示意图，包括：In the embodiment of the present invention, a traffic information processing apparatus is also provided, and each unit included in the traffic information processing apparatus is used to execute each step in the embodiment corresponding to FIG. 1 . For details, please refer to the relevant descriptions in the embodiments corresponding to FIGS. 1-5 . FIG. 8 shows a schematic structural diagram of the first embodiment of the traffic information processing apparatus 800 of the present invention, including:

获取单元81，用于获取交通点的信息；an obtaining unit 81, used for obtaining the information of the traffic point;

转换单元82，用于将交通点的信息转换为交通轨迹信息；A conversion unit 82, configured to convert the information of the traffic point into traffic track information;

确定单元83，用于根据交通轨迹信息确定交通轨迹中任意两个邻接的交通点的邻接关系以及轨迹通行次数；A determination unit 83, configured to determine the adjacency relationship between any two adjacent traffic points in the traffic track and the number of track passes according to the traffic track information;

计算单元84，用于基于交通点的信息和任意两个邻接的交通点的邻接关系以及通行次数，使用谱聚类的方法得到由交通点构成的交通网络的高维图特征矩阵；The computing unit 84 is used to obtain the high-dimensional map feature matrix of the traffic network formed by the traffic points by using the method of spectral clustering based on the information of the traffic points and the adjacency relationship of any two adjacent traffic points and the number of passes;

提取单元85，用于从高维表征中提取出交通点的类别信息；The extraction unit 85 is used for extracting the category information of the traffic point from the high-dimensional representation;

存储单元86，用于基于类别信息对交通点的信息进行分类存储，形成数据库。The storage unit 86 is configured to classify and store the information of the traffic point based on the category information to form a database.

其中，确定单元83包括：Wherein, the determining unit 83 includes:

第一确定模块831，用于将交通轨迹信息拆分为邻接的交通点对；a first determining module 831, configured to split the traffic trajectory information into adjacent traffic point pairs;

第二确定模块832，用于根据邻接的交通点对确定任意两个邻接的交通点的邻接关系；The second determining module 832 is configured to determine the adjacency relationship between any two adjacent traffic points according to the adjacent traffic point pair;

第三确定模块833，用于根据邻接的交通点对确定任意两个邻接的交通点的轨迹通行次数。The third determining module 833 is configured to determine the number of track passes of any two adjacent traffic points according to the adjacent traffic point pair.

其中，计算单元84包括：Wherein, the computing unit 84 includes:

邻接矩阵计算模块841，用于以任意两个邻接的交通点的轨迹通行次数为相应两个邻接的交通点的邻接关系的权重值，来确定邻接矩阵；The adjacency matrix calculation module 841 is used to determine the adjacency matrix by taking the number of tracks of any two adjoining traffic points as the weight value of the adjacency relationship of the corresponding two adjoining traffic points;

邻接图计算模块842，用于根据交通点的信息和任意两个邻接的交通点的邻接关系以及所述邻接矩阵，获得由交通点构成的交通网络的带权重的无向邻接图；The adjacency graph calculation module 842 is used to obtain the weighted undirected adjacency graph of the traffic network composed of the traffic points according to the information of the traffic points and the adjacency relationship between any two adjacent traffic points and the adjacency matrix;

高维表征计算模块843，用于根据带权重的无向邻接图，计算由交通点构成的交通网络的高维图特征矩阵。The high-dimensional representation calculation module 843 is configured to calculate a high-dimensional graph feature matrix of a traffic network composed of traffic points according to the weighted undirected adjacency graph.

进一步地，高维表征计算模块843包括以下子模块(图中未示出)：Further, the high-dimensional representation calculation module 843 includes the following sub-modules (not shown in the figure):

第一子模块，用于根据带权重的无向邻接图，计算出所有交通点对应的度，以获得度矩阵；The first sub-module is used to calculate the degrees corresponding to all traffic points according to the weighted undirected adjacency graph to obtain a degree matrix;

第二子模块，用于基于度矩阵以及所述邻接矩阵，计算带权重的无向邻接图的拉普拉斯矩阵；The second submodule is used to calculate the Laplacian matrix of the weighted undirected adjacency graph based on the degree matrix and the adjacency matrix;

第三子模块，用于求解拉普拉斯矩阵的特征向量和特征值；The third submodule is used to solve the eigenvectors and eigenvalues of the Laplace matrix;

第四子模块，用于根据拉普拉斯矩阵的特征向量和特征值，使用TopK算法，获得由交通点构成的交通网络的高维图特征矩阵。The fourth sub-module is used to obtain a high-dimensional graph feature matrix of a traffic network composed of traffic points by using the TopK algorithm according to the eigenvectors and eigenvalues of the Laplace matrix.

在本发明的交通信息处理装置的第二实施例中，基于图8，装置还包括接收单元和响应单元，用于执行图6对应的实施例中的相应的步骤，具体请参阅图6对应的实施例中的相关描述。In the second embodiment of the traffic information processing device of the present invention, based on FIG. 8 , the device further includes a receiving unit and a response unit for performing the corresponding steps in the embodiment corresponding to FIG. 6 . For details, please refer to the corresponding Relevant descriptions in the examples.

其中，接收单元用于接收获取特定类别的交通点的信息的请求。Wherein, the receiving unit is configured to receive a request for acquiring information of a traffic point of a specific category.

响应单元用于基请求中的类别信息，从数据库中获取相应类别的交通点的信息，以进行响应。The response unit is used for the category information in the base request, and obtains the traffic point information of the corresponding category from the database to respond.

本发明还提供了一种终端，如图9所示，终端100包括：处理器101、存储器102以及存储在存储器102中并可在处理器101上运行的计算机程序103。处理器101执行计算机程序103时实现上述交通信息处理方法的各实施例中的步骤。或者，处理器101执行计算机程序103时实现上述各装置实施例中各单元/模块/子模块的功能。The present invention also provides a terminal. As shown in FIG. 9 , the terminal 100 includes: a processor 101 , a memory 102 , and a computer program 103 stored in the memory 102 and running on the processor 101 . When the processor 101 executes the computer program 103, the steps in each embodiment of the traffic information processing method described above are implemented. Alternatively, when the processor 101 executes the computer program 103, the functions of the units/modules/sub-modules in the foregoing apparatus embodiments are implemented.

示例性的，计算机程序103可以被分割成一个或多个单元/模块/子模块，所述一个或者多个单元/模块/子模块被存储在存储器102中，并由处理器101执行，以完成本发明。所述一个或多个单元/模块/子模块可以是能够完成特定功能的一系列计算机程序指令段，该指令段用于描述计算机程序103在上述交通信息处理装置/终端100中的执行过程。例如，所述计算机程序62可以被分割成获取模块、执行模块、生成模块(虚拟装置中的模块)，各模块具体功能如下：Exemplarily, the computer program 103 may be divided into one or more units/modules/submodules that are stored in the memory 102 and executed by the processor 101 to complete the this invention. The one or more units/modules/submodules may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 103 in the above-mentioned traffic information processing device/terminal 100 . For example, the computer program 62 can be divided into an acquisition module, an execution module, and a generation module (modules in a virtual device), and the specific functions of each module are as follows:

获取交通点的信息；将交通点的信息转换为交通轨迹信息；根据交通轨迹信息确定交通轨迹中任意两个邻接的交通点的邻接关系以及轨迹通行次数；基于交通点的信息和任意两个邻接的交通点的邻接关系以及通行次数，使用谱聚类的方法得到由交通点构成的交通网络的高维图特征矩阵；从高维图特征矩阵中提取出所述交通点的类别信息；基于类别信息对交通点进行分类存储，形成数据库。Obtain the information of traffic points; convert the information of traffic points into traffic trajectory information; determine the adjacency relationship between any two adjacent traffic points in the traffic trajectory and the number of track passes based on the traffic trajectory information; based on the information of traffic points and any two adjacencies The adjacency relationship and the number of traffic points of the traffic points are obtained by using the spectral clustering method to obtain the high-dimensional map feature matrix of the traffic network composed of traffic points; the category information of the traffic points is extracted from the high-dimensional map feature matrix; based on the category The information is classified and stored for traffic points to form a database.

终端100可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。终端100可包括，但不仅限于，处理器101、存储器102。本领域技术人员可以理解，图9仅仅是终端100的示例，并不构成对终端100的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如终端100还可以包括输入输出设备、网络接入设备、总线等。The terminal 100 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal 100 may include, but is not limited to, a processor 101 and a memory 102 . Those skilled in the art can understand that FIG. 9 is only an example of the terminal 100, and does not constitute a limitation to the terminal 100, and may include more or less components than the one shown, or combine some components, or different components, such as The terminal 100 may also include input and output devices, network access devices, buses, and the like.

处理器101可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现成可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 101 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf processor Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

存储器102可以是终端100的内部存储单元，例如终端100的硬盘或内存。存储器102也可以是终端100的外部存储设备，例如终端100上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。进一步地，存储器102还可以既包括终端100的内部存储单元，也包括外部存储设备。存储器102用于存储计算机程序103以及终端100所需的其他程序和数据。存储器102还可以用于暂时地存储已经输出或者将要输出的数据。The memory 102 may be an internal storage unit of the terminal 100 , such as a hard disk or a memory of the terminal 100 . The memory 102 may also be an external storage device of the terminal 100, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, and a flash memory card (Flash Card) equipped on the terminal 100. Wait. Further, the memory 102 may also include both an internal storage unit of the terminal 100 and an external storage device. The memory 102 is used to store the computer program 103 and other programs and data required by the terminal 100 . The memory 102 may also be used to temporarily store data that has been or will be output.

本发明还提供了一种计算机可读存储介质，计算机可读存储介质存储有计算机程序，计算机程序被处理器执行时实现如交通信息处理方法任一实施例中的步骤。The present invention also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps in any of the embodiments of the traffic information processing method.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将所述装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example. Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above-mentioned system, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述或记载的部分，可以参见其它实施例的相关描述。In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the present invention.

在本发明所提供的实施例中，应该理解到，所揭露的装置/终端设备和方法，可以通过其它的方式实现。例如，以上所描述的装置/终端设备实施例仅仅是示意性的，例如，所述模块或单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口，装置或单元的间接耦合或通讯连接，可以是电性，机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实现上述实施例方法中的全部或部分流程，也可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。The integrated modules/units, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented. . Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, the computer-readable media Electric carrier signals and telecommunication signals are not included.

以上所述实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围，均应包含在本发明的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it is still possible to implement the foregoing implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in the within the protection scope of the present invention.

Claims

1. A method for processing traffic information, wherein the method comprises:

Get information about traffic points;

converting the information of the traffic point into traffic track information;

Determine the adjacency relationship between any two adjacent traffic points in the traffic track and the number of track passes according to the traffic track information;

Based on the information of the traffic point and the adjacency relationship between any two adjacent traffic points and the number of passes, the method of spectral clustering is used to obtain a high-dimensional representation of the traffic network composed of the traffic points;

extracting category information of the traffic point from the high-dimensional representation;

Based on the category information, the information of the traffic point is classified and stored to form a database.

2. The method according to claim 1, wherein the information of the traffic point comprises GPS data, and the converting the information of the traffic point into traffic track information comprises:

The GPS data reported by the same traffic participant is sorted according to the reporting time, so as to obtain the traffic track information of the traffic participant.

3. The method according to claim 1, wherein determining the adjacency relationship and the number of passes of any two adjacent traffic points in the traffic trajectory according to the traffic trajectory information, comprising:

splitting the traffic trajectory information into adjacent traffic point pairs;

Determine the adjacency relationship of any two adjacent traffic points according to the adjacent traffic point pair;

Determine the number of track passes of any two adjacent traffic points according to the adjacent traffic point pairs.

4. The method according to claim 1, characterized in that, based on the information of the traffic point and the adjacency relationship between any two adjacent traffic points and the number of track passes, the method obtained by the spectral clustering method is obtained by the said traffic point. A high-dimensional representation of the traffic network composed of traffic points, including:

The adjacency matrix is determined by taking the number of track passes of any two adjacent traffic points as the weight value of the adjacency relationship of the corresponding two adjacent traffic points;

According to the information of the traffic point, the adjacency relationship between any two adjacent traffic points and the adjacency matrix, obtain a weighted undirected adjacency graph of the traffic network composed of the traffic points;

From the weighted undirected adjacency graph, a high-dimensional representation of the traffic network consisting of the traffic points is computed.

5 . The method according to claim 4 , wherein the calculating a high-dimensional representation of the traffic network composed of the traffic points according to the weighted undirected adjacency graph, comprising: 6 .

According to the weighted undirected adjacency graph, the degrees corresponding to all traffic points are calculated to obtain a degree matrix;

based on the degree matrix and the adjacency matrix, calculating a Laplace matrix of the weighted undirected adjacency graph;

solve the eigenvectors and eigenvalues of the Laplacian matrix;

According to the eigenvectors and eigenvalues of the Laplacian matrix, using the TopK algorithm, a high-dimensional representation of the traffic network composed of the traffic points is obtained.

6. The method according to claim 1, wherein the method further comprises:

Receive requests for information on specific categories of traffic points;

Based on the category information in the request, the information of the traffic point of the corresponding category is obtained from the database to respond.

7. A device for processing traffic information, wherein the device comprises:

The acquisition unit is used to acquire the information of the traffic point;

a conversion unit, configured to convert the information of the traffic point into traffic track information;

a determining unit, configured to determine the adjacency relationship between any two adjacent traffic points in the traffic track and the number of track passes according to the traffic track information;

a computing unit, configured to obtain a high-dimensional representation of the traffic network composed of the traffic points by using the method of spectral clustering based on the information of the traffic points and the adjacency relationship between any two adjacent traffic points and the number of passes;

an extraction unit, configured to extract the category information of the traffic point from the high-dimensional representation;

The storage unit is configured to classify and store the information of the traffic point based on the category information to form a database.

8. The apparatus according to claim 7, wherein the computing unit comprises:

The adjacency matrix calculation module is used to determine the adjacency matrix by taking the number of track passes of any two adjacent traffic points as the weight value of the adjacency relationship between the corresponding two adjacent traffic points;

an adjacency graph calculation module, configured to obtain a weighted undirected adjacency graph of the traffic network composed of the traffic points according to the information of the traffic points, the adjacency relationship between any two adjacent traffic points, and the adjacency matrix;

A high-dimensional representation calculation module, configured to calculate a high-dimensional representation of the traffic network composed of the traffic points according to the weighted undirected adjacency graph.

9. A terminal, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements the computer program as claimed in claim 1 when the processor executes the computer program The steps of any one of to 6.

10. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented .