WO2015062540A9 - Driving amount model event-based storage and index methods and system - Google Patents

Driving amount model event-based storage and index methods and system Download PDF

Info

Publication number
WO2015062540A9
WO2015062540A9 PCT/CN2014/090016 CN2014090016W WO2015062540A9 WO 2015062540 A9 WO2015062540 A9 WO 2015062540A9 CN 2014090016 W CN2014090016 W CN 2014090016W WO 2015062540 A9 WO2015062540 A9 WO 2015062540A9
Authority
WO
WIPO (PCT)
Prior art keywords
data
usage model
traffic usage
index
tree
Prior art date
Application number
PCT/CN2014/090016
Other languages
French (fr)
Chinese (zh)
Other versions
WO2015062540A1 (en
Inventor
黄晓庆
饶佳
刘祎
杨景
Original Assignee
中国移动通信集团公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国移动通信集团公司 filed Critical 中国移动通信集团公司
Publication of WO2015062540A1 publication Critical patent/WO2015062540A1/en
Publication of WO2015062540A9 publication Critical patent/WO2015062540A9/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Definitions

  • the invention relates to the field of vehicle networking, in particular to a storage and indexing method and system based on a traffic usage model event.
  • Telemaitcs terminal equipment of the Internet of Vehicles is expected to usher in explosive growth.
  • Telemaitcs refers to the on-board computer system using wireless communication technology, which will bring considerable value-added revenue and continuous value to operators' data service model.
  • Opportunities for growth Different from the traditional Intelligent Transportation System (ITS), the Internet of Vehicles pays more attention to the interaction between vehicles and vehicles, vehicles and roads, and between vehicles. It can be said that the emergence of the Internet of Vehicles redefines the mode of operation of vehicles.
  • ITS Intelligent Transportation System
  • the storage and indexing of the original data of the vehicle network provided by the information subject is an important basis and premise for realizing the optimization of vehicle traffic operation mode and effective use of resources.
  • millions of information entities will periodically generate the original data of the car network, which leads to the bottleneck of the traditional car network relational database in terms of scalability, making the throughput of the car networking system less than required.
  • the existing cloud data management system has the characteristics of high scalability, high fault tolerance and high availability. It has natural scalability and supports high concurrency. It is often chosen as the way to solve the original data storage and index of the Internet of Vehicles. Some cloud data management systems also support the MapReduce model to improve the performance and efficiency of the query. In the index, the double-layer index is used to solve the massive data and system scalability.
  • the first way is a data management system based on distributed storage.
  • the distributed storage method does not store data on one or more specific nodes, but uses a limited range of storage space of different machines through the network, so that these storage spaces constitute a virtual Storage devices, data storage is scattered throughout the network.
  • the distributed storage method adopts Key-Value key value storage mode, which supports efficient point query and range query on row key (rowkey), and full table scan comparison for non-primary key (rowkey) query, although MapReduce model can be utilized. Improve the efficiency of the query, but for queries with a lower selection rate, the performance is poor;
  • the second way is based on the double-layer indexing method of cloud storage.
  • a local local index is established for the data of each computer node in the network, and the local index is only responsible for the data of the local node.
  • each computer node needs to share a part of the storage space for use.
  • the global index is stored.
  • the global index is composed of partial local indexes. Due to the limitation of storage space and query efficiency, it is impossible to publish all local indexes to the global index. Therefore, some local indexes need to be selected according to the set rules. Indexing, for the selected local index, the global index can be organized in different ways.
  • the computer nodes need to be continuously split and adjusted, and the maintenance cost of the index is too high, which has a great influence on the throughput of the car network system.
  • the above two methods do not fully consider the relationship between the main data information subjects of the "people-car-road" car network, and lack of pertinence, and can not facilitate the subsequent analysis and processing based on traffic events.
  • the present invention provides a storage and indexing method based on a traffic usage model event, which uses the method to store and index the processing of the vehicle network data, the index update times are small, and the vehicle network data is evenly distributed.
  • the invention also provides a storage and indexing system based on the traffic usage model event, which uses the system to store and index the car network data, the number of index updates is small, and the original data of the car network is evenly distributed.
  • a storage and indexing method based on a traffic usage model event comprising:
  • the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
  • the traffic usage model event is indexed by a multi-path search tree B+tree, wherein the leaf node of the B+tree is an n-tree R-tree, and the index is divided into multiple sub-divisions of the car network original data block of the traffic usage model event. Spatial data segment;
  • the historical data of the corresponding driving usage model event is stored in the setting area, and an index of the recording level is established for the set area.
  • the plurality of subspace data segments are divided by a K-dimensional index tree K-dimension Tree or an average quadtree bucket PR Quadtree, and a plurality of complementary overlapping rectangular subspace data segments are obtained by dividing, correspondingly stored in an R-tree index. Storage area.
  • the index of the record level is a local index, and the local index adopts an R tree manner or a grid index manner.
  • the method further includes:
  • the division strategy is determined whether the division strategy is reasonable. If not, the division strategy is adjusted, and the corresponding traffic usage model is re-based according to the division strategy.
  • the car network raw data block of the event is divided into multiple subspace data segment storage.
  • Calculating the subspace data variance according to the subspace data segment size When determining that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth is greater than or equal to the set second threshold, adjusting the partitioning strategy to reduce the size a spatial data segment; when it is determined that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold, the partitioning strategy is adjusted to expand the subspace data segment.
  • a storage and indexing system based on a traffic usage model event comprising: a model building module, a storage indicating module and an indexing module, wherein
  • Establishing a model module for establishing a traffic usage model event and the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
  • the storage instruction module is configured to divide the original data of the vehicle network into the original data block of the vehicle network according to the traffic usage model event, and divide the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage;
  • the historical data of the driving usage model event is stored in the setting area;
  • the indexing module is configured to adopt a B+tree index for the traffic usage model event, wherein the B+tree has an R-tree on the leaf node, and the index corresponds to the plurality of subspace data segments divided by the car network original data block of the traffic usage model event. ; Establish an index of the record level for the set area.
  • the storage indication module is further configured to divide the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage, and use K-dimension Tree or Bucket PR Quadtree to divide, and obtain a plurality of complementary overlaps by dividing The rectangular subspace data segment is correspondingly stored in the storage area using the R-tree index.
  • the system further includes an update partitioning module, configured to determine whether the partitioning strategy is reasonable according to the sub-space data segment size and the tree depth of the traffic usage original data segment of the corresponding traffic usage model time of the sub-space data segment, and if not, adjust the partitioning strategy. ;
  • the storage indication module is further configured to divide the car network original data block corresponding to the traffic usage model event into multiple subspace data segment storages according to the division strategy.
  • the update partitioning module is further configured to calculate the subspace data variance according to the subspace data segment size, and determine that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth is greater than or equal to the set second threshold. Adjusting the partitioning strategy to reduce the subspace data segment; when determining that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold When the value is adjusted, the partitioning strategy is adjusted to expand the subspace data segment.
  • the present invention sets a traffic usage model event, which includes a traffic usage association rule corresponding to different items of different information bodies.
  • the vehicle network data includes the vehicle network raw data and historical data provided by each information body, wherein the vehicle network original data adopts the traffic usage model event and the coarse-grained level index of the subspace under the traffic usage model event, and is a traffic usage model event.
  • the historical data in is set to a fine-grained level index of the record level. Since the vehicle network raw data related to the traffic usage model event is indexed by the existing traffic usage model event index, there is no need to update the index, and the vehicle network original data is included in a certain range under the traffic usage model event.
  • the subspace is evenly distributed, so the dimensional cost of the index is also controlled within a valid range, without affecting storage performance and index update times. Therefore, the method and system provided by the present invention store and index the Internet of Vehicles data, the index update times are small, and the vehicle network data is evenly distributed.
  • FIG. 1 is a schematic structural diagram of an association relationship between "person-vehicle-road” information bodies according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for indexing vehicle network original data based on a driving usage model according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a coarse-grained level indexing process for data related to a driving usage model according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a process for storing and indexing car network original data from an indexing level and a storage layer according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a storage and indexing system based on a traffic usage model event according to an embodiment of the present invention.
  • the present invention provides a three-dimensional information master based on "human-vehicle-road”
  • the model of driving usage between the bodies, and the concept of "traffic usage” is proposed, which is described in detail below.
  • Driving usage usage is the abbreviation of usage, it is a measure of the use of resources, and the usage management rules are the management of usage. From a single dimension, the most familiar measurement of the behavior of electricity use by electricity meters is an example of usage management. If the behavior of power use is extended to measure in time dimension, by grasping the relationship between power usage and time, and then adjusting the relationship between pricing and market supply and demand, the optimization of supply and use of power resources can be achieved. Strategy. It can be seen that the relationship between the resource design supply and the resource usage based on the usage management model can be described by a multidimensional space. The higher the spatial dimension described, the more variables that can be used for resource allocation, and the greater the benefit space.
  • traffic usage is an important data concept for realizing industrial synergy through the establishment of multi-party contractual relationships in the Internet of Vehicles platform.
  • Its resources involve multiple actors, such as car owners, depots, traffic management and insurance.
  • the use behavior includes: depreciation of the car, loss of traffic accidents, expenses of auto insurance premiums, penalties for fines and penalties, etc.
  • new semantics and new functions can be brought to the traffic usage, and then the car More beneficiary services in the networking industry.
  • the driving usage model different industry entities in the car networking industry, because of different business objectives, care about different parameters in the driving process. Therefore, the process of providing the required driving amount to these different subjects is to process the data usage in a corresponding data subject demand space through a certain data processing, and the data model is the usage model.
  • the vehicle usage of the insured vehicle driving evaluation is the demand projection of the insurance company as the main body; for the owner, the vehicle is safe to drive and avoid the road.
  • Congestion is the main demand
  • the traffic usage of traffic capacity is the projection of demand as the main vehicle.
  • FIG. 1 is a schematic structural diagram of a relationship between a human-vehicle-road information body according to an embodiment of the present invention, where the graph is a closed-loop relationship structure diagram including four interfaces and affecting each other, wherein
  • the human-vehicle interface that is, the driving behavior synergy, involves the information subject being the person and the vehicle, including the driver through the accelerator pedal, the brake and the steering wheel, manipulating the direction, controlling the driving speed, and realizing the control of the vehicle;
  • the human-road interface that is, the traffic information matching synergy, involves the information subject as the person and the road, including the driver's continuous determination and response according to the characteristics of the vehicle, road and traffic changes during the driving process to adapt to the changes of the road environment. ;
  • the vehicle-road interface that is, the vehicle driving synergy, involves the main body of the vehicle and the road, including the interaction and sharing between the vehicle and the road, and realizes the coordination and cooperation between the vehicle and the road infrastructure;
  • the human-vehicle-road interface that is, the traffic behavior coordination, involves the information subject as the person, the car and the road, including the dynamic process in which the driver controls the vehicle according to the predetermined target and operates according to the traffic rules, and the vehicle is also subject to the road and environmental conditions. The impact of the joint completion of traffic behavior events.
  • the established driving usage model includes the driving usage association rules corresponding to different items of different information subjects.
  • the present invention sets a traffic usage model event, which includes the traffic usage association rules corresponding to different items of different information bodies.
  • the vehicle network data includes the vehicle network raw data and historical data provided by each information body, wherein the vehicle network original data adopts the traffic usage model event and the coarse-grained level index of the subspace under the traffic usage model event, and is a traffic usage model event.
  • the historical data in is set to a fine-grained level index of the record level.
  • an adaptive data partitioning method is adopted in the indexing process, so that the index of the original data of the Internet of Vehicles is relatively uniform.
  • the vehicle network raw data related to the traffic usage model event is indexed by the existing traffic usage model event index, there is no need to update the index, and the vehicle network original data is included in a certain range under the traffic usage model event.
  • the subspace is evenly distributed, so the dimensional cost of the index is also controlled within a valid range, without affecting storage performance and index update times.
  • FIG. 2 is a flowchart of a method for indexing vehicle network original data based on a driving usage model according to an embodiment of the present invention, where specific steps are as follows:
  • Step 201 Establish a traffic usage model event, and the traffic usage model event includes a traffic usage association rule corresponding to different items of different information bodies;
  • Step 202 After obtaining the original data of the vehicle network, classifying the original data block of the vehicle network according to the traffic usage model event, and dividing the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage;
  • a plurality of subspace data segments are divided by a K-dimension tree or a Bucket PR Quadtree, and a plurality of complementary overlapping rectangular subspace data segments are obtained by division, and corresponding storage is performed.
  • Step 203 The traffic usage model event is indexed by a multi-path search tree (B+tree), wherein the leaf node of the B+tree is an n-tree (R-tree), and the index corresponds to the car network original of the traffic usage model event. a plurality of subspace data segments divided by the data block;
  • B+tree multi-path search tree
  • R-tree n-tree
  • Step 204 Store historical data of the corresponding driving usage model event in the setting area, and establish an index of the recording level for the set area;
  • the index of the record level may be a local index, and the local index may adopt an R tree or a grid index.
  • step 202 after the car network original data block corresponding to the traffic usage model event is divided into a plurality of subspace data segments, the method further includes:
  • the division strategy is determined whether the division strategy is reasonable. If not, the division strategy is adjusted, and the corresponding traffic usage model is re-based according to the division strategy.
  • the car network raw data block of the event is divided into multiple subspace data segment storage.
  • Calculating the subspace data variance according to the subspace data segment size When determining that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth is greater than or equal to the set second threshold, adjusting the partitioning strategy to reduce the size a spatial data segment; when it is determined that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold, the partitioning strategy is adjusted to expand the subspace data segment.
  • FIG. 3 is a schematic diagram of a coarse-grained level indexing process for data related to a driving usage model according to an embodiment of the present invention.
  • the traffic usage model event related data is updated for the traffic usage model event related data, and a subspace is created for the updated driving usage model event, and the driving usage model event related data is divided into multiple data segments.
  • these subspaces correspond to a storage area in the car networking database, which is a distributed database, using an R-tree index.
  • the original data of the car network is divided according to the traffic usage model event.
  • the traffic usage model event can be formed, including different information subjects.
  • the vehicle network raw data is distributed with the traffic usage model event, so according to a traffic usage model event occurrence and end anchor, the car network raw data can be divided into several data blocks related to the traffic usage model event (Event Data Block).
  • the car network raw data is first divided into several blocks according to the traffic usage model event in the event dimension.
  • the two-dimensional space is divided into several data segments, and several data segments are respectively Stored in several subspaces.
  • the car network raw data block [A s1 , A e1 ) between the traffic usage model event start anchor and the end anchor is stored on the storage node of the corresponding traffic usage model event, if The corresponding driving usage model event is stored by the cloud storage system, and the storage node of the corresponding driving usage model event is determined through the interface of the cloud storage system, and the original data block [A s1 , A e1 ) of the vehicle network is updated to the storage node. on.
  • the B+Tree index driving usage model event is used, and the leaf node of B+Tree corresponds to an R-Tree, and the R-Tree is used to index the vehicle of the driving usage model event.
  • the storage nodes are mostly spatially fixed, so for subspaces in a car network raw data block [A s1 , A e1 ), K-dimension Tree or Bucket can be used.
  • the PR Quadtree is divided, and finally, a plurality of complementary overlapping rectangular subspace regions are obtained, and an R-tree index is adopted for the overlapping rectangular subspace regions.
  • the original car network raw data block [A s1 , A e1 ) in the original driving usage model event becomes historical data for historical data.
  • a local index of the record level can be established for each region, and the local index uses the R tree or the grid index to index the historical data in each traffic usage model event.
  • FIG. 4 is a view of the vehicle usage model event from the index level and the storage layer according to an embodiment of the present invention. Schematic diagram of the process of storing and indexing the original data of the car network.
  • the vehicle network raw data is divided into the traffic usage model event related data and the traffic usage model event-independent data; then, according to the driving usage model event occurrence anchor And ending the anchor, the traffic usage model event related data is divided into data blocks corresponding to the traffic usage model event; again, for each data block, divided in two dimensions, divided into several subspaces, and each of the data blocks
  • the sub-space data segment is stored in an area of the distributed storage system, and the sub-space data segment of the traffic usage model event-related data block corresponding to a traffic usage model event is guaranteed to exist in the same area as much as possible, thereby reducing the need for the query process.
  • the number of scanned areas improves query efficiency.
  • the index level there are mainly three levels, wherein the traffic usage model event index and the subspace index are for the current vehicle network raw data, and the grid index is for the historical data index corresponding to the traffic usage model event.
  • the data is divided into current vehicle network raw data and historical data on the traffic usage model event dimension.
  • the traffic usage model event Index uses B+tree mode Line, because the multiple subspace data segments of the traffic usage model event are stored in different regions, the R-tree index is used.
  • the traffic usage model event is updated, the historical data is no longer changed, so the historical data can be stored in batches and indexed at the record level, for example, an R-tree index or a grid index can be used. In this way, the cost of index update maintenance is relatively low, and the impact on the original data storage of the car network is relatively small, ensuring that the car network system can support large-scale frequent updates.
  • the original data of the Internet of Vehicles is monotonously increased in time dimension, and the driving usage model can also change with time. This requires dividing the original data stream of the car network into several feedback cycles and within the feedback cycle.
  • the total number N of the traffic usage model event records in the feedback period is set, and if each subspace is at most S records, the data in each event segment is equally divided into R subspaces.
  • the traffic usage data block of the corresponding traffic usage model event in the first feedback cycle is divided into Blocks are E11, E12, E13...E1k.
  • sub-space partitioning is performed on the E11, E12, E13...E1k using the Bucket PR KD-tree, the depth Dep i of the tree is recorded, and the size of the data segment of each subspace is monitored, and is calculated according to the formula (1).
  • N i represents the number of subspaces within Ei
  • x m represents the size of the mth subspace of Ei
  • D i represents the variance of the size of the subspace within Ei.
  • the size of D i reflects the degree of uniformity of data partitioning in the data segment of the subspace.
  • the division of the data segment is adjusted: if D i is greater than or equal to the set number a threshold value, indicating that the sub-space data segment in the traffic usage data block of the corresponding traffic usage model is unevenly distributed, and when Dep i is greater than or equal to the set second threshold, the sub-space data segment needs to be reduced; if D i If the first threshold is less than the first threshold, the data segment of each subspace in the traffic usage data block of the corresponding traffic usage model is relatively uniform. If Dep i is less than the second threshold, the data volume is too small. The two subspace data segments are merged;
  • the partitioning strategy can be fixed and no dynamic partitioning is performed. In this way, the partitioning scheme can be determined in advance, and the original data of the vehicle network does not need to be dynamically divided when stored, thereby further improving the storage performance.
  • FIG. 5 is a schematic structural diagram of a storage and indexing system based on a traffic usage model event according to an embodiment of the present invention. As shown in the figure, the method includes: establishing a model module, a storage indication module, and an index module, where
  • Establishing a model module for establishing a traffic usage model event and the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
  • the storage instruction module is configured to divide the original data of the vehicle network into the original data block of the vehicle network according to the traffic usage model event, and divide the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage;
  • the historical data of the driving usage model event is stored in the setting area;
  • the indexing module is configured to adopt a B+tree index for the traffic usage model event, wherein the B+tree has an R-tree on the leaf node, and the index corresponds to the plurality of subspace data segments divided by the car network original data block of the traffic usage model event. ; Establish an index of the record level for the set area.
  • the storage indication module is further configured to divide the car network original data block corresponding to the traffic usage model event into a plurality of subspace data segments for storage, and use K-dimension Tree or Bucket PR Quadtree to divide, and obtain a plurality of The complementary overlapping rectangular subspace data segments are correspondingly stored in the storage area using the R-tree index.
  • the system further includes an update partitioning module, configured to determine whether the partitioning strategy is reasonable according to the sub-space data segment size and the tree depth of the original usage data segment of the traffic usage model time of the sub-space data segment. If no, adjust the division strategy;
  • the storage indication module is further configured to divide the car network original data block corresponding to the traffic usage model event into multiple subspace data segment storages according to the division strategy.
  • the update partitioning module is further configured to calculate the subspace data variance according to the subspace data segment size, and determine that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth When the second threshold is greater than or equal to the set value, the partitioning strategy is adjusted to reduce the subspace. a data segment; when it is determined that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold, the partitioning strategy is adjusted to expand the subspace data segment.
  • the method and system provided by the invention fully consider the requirement for effective measurement of the driving amount, and realize the optimal storage and utilization of the vehicle network related resource data.
  • the method and system provided by the invention fully consider that the original data of the vehicle network is continuously generated, and the historical data generally does not change after being generated, and the distribution of the original data of the vehicle network corresponding to the traffic usage model event often has a tilt, with time The change of the driving usage model event will also change.
  • the subspace data segment is divided, the imbalance of data distribution is considered at the same time, and the demand for the measurement of the traffic usage resource is satisfied, which has practical guiding significance for the vehicle networking solution.
  • the applicable scenarios and examples of the methods and systems provided by the present invention include, but are not limited to, the following vehicle networking applications: intelligent transportation systems, mass data storage and indexing, and resource usage metering, etc., which can meet the needs of existing vehicle network data storage applications.

Abstract

Disclosed are driving amount model event-based storage and index methods and a system. A driving amount model event is set and the amount model event comprises driving amount associated rules corresponding to different projects of different information main bodies. Data for Internet of vehicles comprises raw data and historical data provided by each information main body for the Internet of vehicles, wherein the driving amount model event and a coarse-grained-level index of a sub-space under the driving amount model event are used for the raw data for the Internet of vehicles, and a fine-grained-level index at a record level is set for the historical data in the driving amount model event. Therefore, the methods and the system provided by the present invention reduces a number of index update times and makes the data for the Internet of vehicles distributed evenly when storing and indexing the data for the Internet of vehicles.

Description

基于行车用量模型事件的存储及索引方法及系统Storage and indexing method and system based on driving usage model event
相关申请的交叉引用Cross-reference to related applications
本申请主张在2013年10月31日在中国提交的中国专利申请号No.201310532545.2的优先权,其全部内容通过引用包含于此。The present application claims priority to Chinese Patent Application No. 201310532545.2, filed on Jan. 31, 2013, the entire content of
技术领域Technical field
本发明涉及车联网领域,特别涉及一种基于行车用量模型事件的存储及索引方法及系统。The invention relates to the field of vehicle networking, in particular to a storage and indexing method and system based on a traffic usage model event.
背景技术Background technique
随着车联网相关技术的不断成熟,传感器技术、移动通信技术、大数据技术和智能计算技术等均开始与车联网产业深度融合。在市场需求的带动下,车联网的Telemaitcs终端设备有望迎来爆发式的增长,其中,Telemaitcs指应用无线通信技术的车载电脑系统,从而为运营商开拓数据服务模式带来可观的增值收入和持续增长的机遇。区别于传统的智能交通系统(ITS,Intelligent Transport System),车联网更注重车与车、车与路、车与人之间的交互通信,可以说车联网的出现重新定义车辆交通运行方式。With the continuous maturity of related technologies in the Internet of Vehicles, sensor technology, mobile communication technology, big data technology and intelligent computing technology have begun to integrate deeply with the car networking industry. Driven by market demand, Telemaitcs terminal equipment of the Internet of Vehicles is expected to usher in explosive growth. Among them, Telemaitcs refers to the on-board computer system using wireless communication technology, which will bring considerable value-added revenue and continuous value to operators' data service model. Opportunities for growth. Different from the traditional Intelligent Transportation System (ITS), the Internet of Vehicles pays more attention to the interaction between vehicles and vehicles, vehicles and roads, and between vehicles. It can be said that the emergence of the Internet of Vehicles redefines the mode of operation of vehicles.
对信息主体提供的车联网原始数据进行存储及索引,是实现对车辆交通运行方式优化和资源有效利用的重要基础和前提。在车联网环境下,数以百万计的信息主体会周期产生车联网原始数据,导致传统的车联网关系型数据库在扩展性方面遇到了瓶颈,使得车联网系统吞吐量达不到要求,无法支持数万、甚至是数十万的并发操作,因此,就需要提供一种新的车联网原始数据的存储及索引方法来适应车联网原始数据的管理需要。The storage and indexing of the original data of the vehicle network provided by the information subject is an important basis and premise for realizing the optimization of vehicle traffic operation mode and effective use of resources. In the Internet of Vehicles environment, millions of information entities will periodically generate the original data of the car network, which leads to the bottleneck of the traditional car network relational database in terms of scalability, making the throughput of the car networking system less than required. Supporting tens of thousands or even hundreds of thousands of concurrent operations, it is necessary to provide a new storage and indexing method for the original data of the Internet of Vehicles to meet the management needs of the original data of the Internet of Vehicles.
现有的云数据管理系统具有高可扩展性、高容错性和高可用性等技术特点,天然具有很好的扩展性,同时支持高度并发,常常被选择成为解决车联网原始数据存储及索引的方式,一些云数据管理系统还支持映射化简(MapReduce)模型提高查询的性能和效率,在索引时,采用双层索引的方式,解决数据的海量性及系统的可扩展性。 The existing cloud data management system has the characteristics of high scalability, high fault tolerance and high availability. It has natural scalability and supports high concurrency. It is often chosen as the way to solve the original data storage and index of the Internet of Vehicles. Some cloud data management systems also support the MapReduce model to improve the performance and efficiency of the query. In the index, the double-layer index is used to solve the massive data and system scalability.
目前,针对数据存储及索引的方式主要有两种:Currently, there are two main ways to store and index data:
第一种方式,基于分布式存储的数据管理系统。与常见的集中式存储方式不同,分布式存储方式并不是将数据存储在某个或多个特定的节点上,而是通过网络使用限定范围的不同机器的存储空间,使得这些存储空间构成一个虚拟的存储设备,数据存储分散在网络中的各个角落。分布式存储方式采用Key-Value的键值存储方式,在行主键(rowkey)上支持高效的点查询和范围查询,对于非主键(rowkey)的查询则需要全表扫描比较,虽然可以利用MapReduce模型提高查询的效率,但是对于选择率比较低的查询来说,性能比较差;The first way is a data management system based on distributed storage. Unlike the common centralized storage method, the distributed storage method does not store data on one or more specific nodes, but uses a limited range of storage space of different machines through the network, so that these storage spaces constitute a virtual Storage devices, data storage is scattered throughout the network. The distributed storage method adopts Key-Value key value storage mode, which supports efficient point query and range query on row key (rowkey), and full table scan comparison for non-primary key (rowkey) query, although MapReduce model can be utilized. Improve the efficiency of the query, but for queries with a lower selection rate, the performance is poor;
第二种方式,基于云存储的双层索引方式。在双层索引方式下,对网络中每个计算机节点的数据建立一个本地的局部索引,该局部索引只负责本地节点的数据,除局部索引外,每个计算机节点还需要共享一部分存储空间用来存储全局索引,全局索引是由部分局部索引组成的,由于存储空间的限制和查询效率的要求,不可能将所有的局部索引都发布到全局索引中,所以需要按照设定的规则选择部分局部索引进行索引,对于被选择的局部索引,在全局索引可以采用设定的不同方式进行组织。The second way is based on the double-layer indexing method of cloud storage. In the double-layer index mode, a local local index is established for the data of each computer node in the network, and the local index is only responsible for the data of the local node. In addition to the local index, each computer node needs to share a part of the storage space for use. The global index is stored. The global index is composed of partial local indexes. Due to the limitation of storage space and query efficiency, it is impossible to publish all local indexes to the global index. Therefore, some local indexes need to be selected according to the set rules. Indexing, for the selected local index, the global index can be organized in different ways.
虽然上述两种方式都可以实现数据的存储及索引。但是,对车联网原始数据采用何种方式存储及索引,优化资源存储和管理,仍然是个问题。这是因为,将上述两种方式应用到对车联网原始数据的存储及索引,存在以下问题:首先,采用基于分布式存储的数据管理系统对车联网原始数据存储及索引时,由于该系统采用的是分布式的架构设计,所以对于选择率比较低的车联网原始数据查询来说,性能比较差;其次,基于云存储的双层索引方式采用的R-Tree方式作为局部索引和全局索引,在车联网原始数据的索引过程中,需要对计算机节点进行不断的分裂调整,索引的维护代价过高,对车联网系统的吞吐量产生很大影响。最重要地是,上述两种方式并未充分考虑“人-车-路”各个车联网原始数据信息主体间的关联关系,缺乏针对性,无法对后续的基于交通事件的分析和处理提供便利。Although the above two methods can achieve data storage and indexing. However, it is still a problem to store and index the original data of the Internet of Vehicles to optimize resource storage and management. This is because, when applying the above two methods to the storage and indexing of the original data of the Internet of Vehicles, there are the following problems: First, when the data storage system based on distributed storage is used to store and index the original data of the vehicle network, the system adopts The distributed architecture design, so the performance of the car network raw data query with relatively low selection rate is relatively poor; secondly, the R-Tree method based on the cloud storage double-layer indexing method is used as the local index and the global index. In the indexing process of the original data of the car network, the computer nodes need to be continuously split and adjusted, and the maintenance cost of the index is too high, which has a great influence on the throughput of the car network system. Most importantly, the above two methods do not fully consider the relationship between the main data information subjects of the "people-car-road" car network, and lack of pertinence, and can not facilitate the subsequent analysis and processing based on traffic events.
发明内容 Summary of the invention
有鉴于此,本发明提供一种基于行车用量模型事件的存储及索引方法,采用该方法存储及索引对车联网数据进行处理,索引更新次数少,且使得车联网数据分布均匀。In view of this, the present invention provides a storage and indexing method based on a traffic usage model event, which uses the method to store and index the processing of the vehicle network data, the index update times are small, and the vehicle network data is evenly distributed.
本发明还提供一种基于行车用量模型事件的存储及索引系统,采用该系统存储及索引对车联网数据进行处理,索引更新次数少,且使得车联网原始数据分布均匀。The invention also provides a storage and indexing system based on the traffic usage model event, which uses the system to store and index the car network data, the number of index updates is small, and the original data of the car network is evenly distributed.
为达到上述目的,本发明实施的技术方案具体是这样实现的:To achieve the above objective, the technical solution implemented by the present invention is specifically implemented as follows:
一种基于行车用量模型事件的存储及索引方法,该方法包括:A storage and indexing method based on a traffic usage model event, the method comprising:
建立行车用量模型事件,行车用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则;Establishing a traffic usage model event, the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
获取车联网原始数据,根据行车用量模型事件划分为车联网原始数据块,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储;Obtaining the original data of the vehicle network, dividing into the original data block of the vehicle network according to the traffic usage model event, and dividing the original data block of the vehicle network corresponding to the traffic usage model event into multiple sub-space data segments for storage;
将行车用量模型事件采用多路搜索树B+tree进行索引,其中的B+tree的叶子节点上为n叉树R-tree,索引对应行车用量模型事件的车联网原始数据块所划分的多个子空间数据段;The traffic usage model event is indexed by a multi-path search tree B+tree, wherein the leaf node of the B+tree is an n-tree R-tree, and the index is divided into multiple sub-divisions of the car network original data block of the traffic usage model event. Spatial data segment;
将对应行车用量模型事件的历史数据存储在设定区域中,为所设定区域建立记录级别的索引。The historical data of the corresponding driving usage model event is stored in the setting area, and an index of the recording level is established for the set area.
所述多个子空间数据段采用K维索引树K-dimension Tree或平均四叉树Bucket PR Quadtree划分,通过划分,得到若干个互补重叠的矩形子空间数据段,对应存储在采用R-tree索引的存储区域。The plurality of subspace data segments are divided by a K-dimensional index tree K-dimension Tree or an average quadtree bucket PR Quadtree, and a plurality of complementary overlapping rectangular subspace data segments are obtained by dividing, correspondingly stored in an R-tree index. Storage area.
所述记录级别的索引为局部索引,该局部索引采用R树方式或网格索引方式。The index of the record level is a local index, and the local index adopts an R tree manner or a grid index manner.
将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储之后,该方法进一步包括:After the car network raw data block corresponding to the traffic usage model event is divided into multiple subspace data segments, the method further includes:
根据子空间数据段大小及子空间数据段所在对应行车用量模型时间的行车用量原始数据段的树深度,确定划分策略是否合理,如果否,则调整划分策略,重新根据划分策略将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储。According to the sub-space data segment size and the tree depth of the original data segment of the traffic usage model time of the sub-space data segment, it is determined whether the division strategy is reasonable. If not, the division strategy is adjusted, and the corresponding traffic usage model is re-based according to the division strategy. The car network raw data block of the event is divided into multiple subspace data segment storage.
所述确定划分策略是否合理为: Whether the determination of the division strategy is reasonable is:
根据子空间数据段大小计算该子空间数据方差,当判断计算得到的该子空间数据方差大于等于设置的第一阈值且该树深度大于等于设置的第二阈值时,则调整划分策略为缩小子空间数据段;当判断计算得到的该子空间数据方差小于设置的第一阈值,且该树深度小于设置的第二阈值时,则调整划分策略为扩大子空间数据段。Calculating the subspace data variance according to the subspace data segment size. When determining that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth is greater than or equal to the set second threshold, adjusting the partitioning strategy to reduce the size a spatial data segment; when it is determined that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold, the partitioning strategy is adjusted to expand the subspace data segment.
一种基于行车用量模型事件的存储及索引系统,该系统包括:建立模型模块、存储指示模块及索引模块,其中,A storage and indexing system based on a traffic usage model event, the system comprising: a model building module, a storage indicating module and an indexing module, wherein
建立模型模块,用于建立行车用量模型事件,行车用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则;Establishing a model module for establishing a traffic usage model event, and the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
存储指示模块,用于获取车联网原始数据后,根据行车用量模型事件划分为车联网原始数据块,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段进行存储;将对应行车用量模型事件的历史数据存储在设定区域;The storage instruction module is configured to divide the original data of the vehicle network into the original data block of the vehicle network according to the traffic usage model event, and divide the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage; The historical data of the driving usage model event is stored in the setting area;
索引模块,用于将行车用量模型事件采用B+tree索引,其中的B+tree的叶子节点上为R-tree,索引对应行车用量模型事件的车联网原始数据块所划分的多个子空间数据段;为所设定区域建立记录级别的索引。The indexing module is configured to adopt a B+tree index for the traffic usage model event, wherein the B+tree has an R-tree on the leaf node, and the index corresponds to the plurality of subspace data segments divided by the car network original data block of the traffic usage model event. ; Establish an index of the record level for the set area.
所述存储指示模块,还用于将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段进行存储时采用K-dimension Tree或Bucket PR Quadtree划分,通过划分,得到若干个互补重叠的矩形子空间数据段,对应存储在采用R-tree索引的存储区域。The storage indication module is further configured to divide the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage, and use K-dimension Tree or Bucket PR Quadtree to divide, and obtain a plurality of complementary overlaps by dividing The rectangular subspace data segment is correspondingly stored in the storage area using the R-tree index.
该系统还包括更新划分模块,用于根据子空间数据段大小及子空间数据段所在对应行车用量模型时间的行车用量原始数据段的树深度,确定划分策略是否合理,如果否,则调整划分策略;The system further includes an update partitioning module, configured to determine whether the partitioning strategy is reasonable according to the sub-space data segment size and the tree depth of the traffic usage original data segment of the corresponding traffic usage model time of the sub-space data segment, and if not, adjust the partitioning strategy. ;
所述存储指示模块,还用于重新根据划分策略将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储。The storage indication module is further configured to divide the car network original data block corresponding to the traffic usage model event into multiple subspace data segment storages according to the division strategy.
所述更新划分模块,还用于根据子空间数据段大小计算该子空间数据方差,当判断计算得到的该子空间数据方差大于等于设置的第一阈值且该树深度大于等于设置的第二阈值时,则调整划分策略为缩小子空间数据段;当判断计算得到的该子空间数据方差小于设置的第一阈值,且该树深度小于设置的第二阈 值时,则调整划分策略为扩大子空间数据段。The update partitioning module is further configured to calculate the subspace data variance according to the subspace data segment size, and determine that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth is greater than or equal to the set second threshold. Adjusting the partitioning strategy to reduce the subspace data segment; when determining that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold When the value is adjusted, the partitioning strategy is adjusted to expand the subspace data segment.
由上述方案可以看出,本发明设置行车用量模型事件,该用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则。车联网数据包括各个信息主体提供的车联网原始数据及历史数据,其中,将车联网原始数据采用行车用量模型事件及该行车用量模型事件下的子空间的粗粒度级别索引,为行车用量模型事件中的历史数据设置为记录级别的细粒度级别索引。由于与行车用量模型事件相关的车联网原始数据在索引时是采用已有的行车用量模型事件索引,所以不需要更新索引,且将车联网原始数据包含在行车用量模型事件下的一定范围内的子空间内且均匀分布,因此索引的维度代价也控制在有效的范围内,不会影响存储性能及索引更新次数。因此,本发明提供的方法及系统在对车联网数据进行存储及索引时,索引更新次数少,且使得车联网数据分布均匀。It can be seen from the above solution that the present invention sets a traffic usage model event, which includes a traffic usage association rule corresponding to different items of different information bodies. The vehicle network data includes the vehicle network raw data and historical data provided by each information body, wherein the vehicle network original data adopts the traffic usage model event and the coarse-grained level index of the subspace under the traffic usage model event, and is a traffic usage model event. The historical data in is set to a fine-grained level index of the record level. Since the vehicle network raw data related to the traffic usage model event is indexed by the existing traffic usage model event index, there is no need to update the index, and the vehicle network original data is included in a certain range under the traffic usage model event. The subspace is evenly distributed, so the dimensional cost of the index is also controlled within a valid range, without affecting storage performance and index update times. Therefore, the method and system provided by the present invention store and index the Internet of Vehicles data, the index update times are small, and the vehicle network data is evenly distributed.
附图说明DRAWINGS
图1为本发明实施例提供的“人-车-路”信息主体间的关联关系结构示意图;1 is a schematic structural diagram of an association relationship between "person-vehicle-road" information bodies according to an embodiment of the present invention;
图2为本发明实施例提供的基于行车用量模型索引车联网原始数据的方法流程图;2 is a flowchart of a method for indexing vehicle network original data based on a driving usage model according to an embodiment of the present invention;
图3为本发明实施例提供的对行车用量模型相关数据具体进行粗粒度级别索引过程示意图;FIG. 3 is a schematic diagram of a coarse-grained level indexing process for data related to a driving usage model according to an embodiment of the present invention; FIG.
图4为本发明实施例提供的从索引层面及存储层面对基于行车用量模型存储及索引车联网原始数据的过程示意图;FIG. 4 is a schematic diagram of a process for storing and indexing car network original data from an indexing level and a storage layer according to an embodiment of the present invention;
图5为本发明实施例提供的基于行车用量模型事件的存储及索引系统结构示意图。FIG. 5 is a schematic structural diagram of a storage and indexing system based on a traffic usage model event according to an embodiment of the present invention.
具体实施方式detailed description
为使本发明的目的、技术方案及优点更加清楚明白,以下参照附图并举实施例,对本发明作进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings.
为了解决本发明提供的方案,本发明提供了基于“人-车-路”三维信息主 体间的行车用量模型,并提出了“行车用量”的概念,以下详细说明。In order to solve the solution provided by the present invention, the present invention provides a three-dimensional information master based on "human-vehicle-road" The model of driving usage between the bodies, and the concept of "traffic usage" is proposed, which is described in detail below.
行车用量,用量是使用量的简称,是一种对资源使用的行为计量,而用量管理规则则是对使用量的管理。从单一维度说,最熟悉的通过电表对电力使用的行为进行计量管理就是用量管理的一个实例。如果将电力使用的行为扩展到以时间维度进行计量,通过掌握电力使用量随时间的变化关系,进而调节定价和市场供需关系,可以达到对电力资源供给和使用的优化,即苏伟的阶梯定价策略。可见,基于用量管理的模型设计资源供给和使用资源的行为,两者之间的关系可以采用多维空间描述,描述的空间维度越高,可以用于资源配置的变量越多,受益空间越大。在这里,行车用量是车联网平台中通过建立多方契约关系实现产业协同的重要数据概念,其资源涉及多个行为主体,比如车主、车厂、交通管理及保险等,针对车主而言,其对资源的使用行为又包括:车的折旧、交通事故的损失、车险保费的支出、违章罚款和处罚等,随着资源项的增加,可以为行车用量带来新的语义和新的功能,进而为车联网产业中的更多受益者服务。Driving usage, usage is the abbreviation of usage, it is a measure of the use of resources, and the usage management rules are the management of usage. From a single dimension, the most familiar measurement of the behavior of electricity use by electricity meters is an example of usage management. If the behavior of power use is extended to measure in time dimension, by grasping the relationship between power usage and time, and then adjusting the relationship between pricing and market supply and demand, the optimization of supply and use of power resources can be achieved. Strategy. It can be seen that the relationship between the resource design supply and the resource usage based on the usage management model can be described by a multidimensional space. The higher the spatial dimension described, the more variables that can be used for resource allocation, and the greater the benefit space. Here, traffic usage is an important data concept for realizing industrial synergy through the establishment of multi-party contractual relationships in the Internet of Vehicles platform. Its resources involve multiple actors, such as car owners, depots, traffic management and insurance. For the owners, their resources are The use behavior includes: depreciation of the car, loss of traffic accidents, expenses of auto insurance premiums, penalties for fines and penalties, etc. With the increase of resources, new semantics and new functions can be brought to the traffic usage, and then the car More beneficiary services in the networking industry.
行车用量模型,车联网产业内的不同产业主体,因为不同的经营目标,关心行车过程中不同的参数。因此,提供给这些不同主体的所需行车用量的过程,就是将行车用量通过一定的数据处理在相应信息主体需求空间进行数据投影的过程,而这个数据模型即为用量模型。举一个例子说明,比如:对于公安交通管理局,其主要职责包括道路交通管理控制和交通安全安保,获取道路交通事故的行车用量,则成为以公安交通管理局为主体的需求投影;对于保险公司,其从减少事故赔付率、降低投保风险及赚取盈利出发,提取投保车量驾驶评估的行车用量,则成为以保险公司为主体的需求投影;对于车主,其从保障车辆行驶安全、规避道路拥塞为主要诉求,获取交通通行能力的行车用量,则成为以车为主体的需求投影。The driving usage model, different industry entities in the car networking industry, because of different business objectives, care about different parameters in the driving process. Therefore, the process of providing the required driving amount to these different subjects is to process the data usage in a corresponding data subject demand space through a certain data processing, and the data model is the usage model. To give an example, for example, for the Public Security Traffic Management Bureau, its main responsibilities include road traffic management control and traffic safety and security, and the traffic usage of road traffic accidents becomes a demand projection with the Public Security Traffic Administration as the main body; for insurance companies From the perspective of reducing the accident compensation rate, reducing the risk of insurance and earning profits, the vehicle usage of the insured vehicle driving evaluation is the demand projection of the insurance company as the main body; for the owner, the vehicle is safe to drive and avoid the road. Congestion is the main demand, and the traffic usage of traffic capacity is the projection of demand as the main vehicle.
“人-车-路”信息主体间的关联关系The relationship between the "people-car-road" information subject
图1为本发明实施例提供的“人-车-路”信息主体间的关联关系结构示意图,该图形成了包含四个界面且彼此影响的闭环关系结构图,其中,FIG. 1 is a schematic structural diagram of a relationship between a human-vehicle-road information body according to an embodiment of the present invention, where the graph is a closed-loop relationship structure diagram including four interfaces and affecting each other, wherein
人-车界面,即驾驶行为协同,涉及的信息主体为人和车,包括驾驶人通过加速踏板、制动和转向盘,操纵方向,控制行车速度,实现对车辆的控制; The human-vehicle interface, that is, the driving behavior synergy, involves the information subject being the person and the vehicle, including the driver through the accelerator pedal, the brake and the steering wheel, manipulating the direction, controlling the driving speed, and realizing the control of the vehicle;
人-路界面,即交通信息匹配协同,涉及的信息主体为人和路,包括驾驶人在行驶过程中根据掌握车辆、道路及交通变化特征,不断作出正确的判断与反应,以适应道路环境的变化;The human-road interface, that is, the traffic information matching synergy, involves the information subject as the person and the road, including the driver's continuous determination and response according to the characteristics of the vehicle, road and traffic changes during the driving process to adapt to the changes of the road environment. ;
车-路界面,即车辆行驶协同,涉及的信息主体为车和路,包括通过车车、车路信息交互和共享,实现车辆和道路基础设施之间协同与配合;The vehicle-road interface, that is, the vehicle driving synergy, involves the main body of the vehicle and the road, including the interaction and sharing between the vehicle and the road, and realizes the coordination and cooperation between the vehicle and the road infrastructure;
人-车-路界面,即交通行为协同,涉及的信息主体为人、车和路,包括在驾驶人控制车辆按着预定目标,按照交通规则运行的动态过程中,同时车辆也受到道路和环境状况的影响,共同完成交通行为事件。The human-vehicle-road interface, that is, the traffic behavior coordination, involves the information subject as the person, the car and the road, including the dynamic process in which the driver controls the vehicle according to the predetermined target and operates according to the traffic rules, and the vehicle is also subject to the road and environmental conditions. The impact of the joint completion of traffic behavior events.
在本发明中,建立的行车用量模型包括了不同信息主体的不同项目所对应的行车用量关联规则。In the present invention, the established driving usage model includes the driving usage association rules corresponding to different items of different information subjects.
在对车联网数据进行存储及索引时,索引更新次数少,且使得车联网数据分布均匀,本发明设置行车用量模型事件,该用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则。车联网数据包括各个信息主体提供的车联网原始数据及历史数据,其中,将车联网原始数据采用行车用量模型事件及该行车用量模型事件下的子空间的粗粒度级别索引,为行车用量模型事件中的历史数据设置为记录级别的细粒度级别索引。更进一步地,由于车联网原始数据的时空特性,所以在索引过程中采用自适应的数据划分方式,使得对车联网原始数据的索引比较均匀。When storing and indexing the Internet of Vehicles data, the index update times are small, and the vehicle network data is evenly distributed. The present invention sets a traffic usage model event, which includes the traffic usage association rules corresponding to different items of different information bodies. . The vehicle network data includes the vehicle network raw data and historical data provided by each information body, wherein the vehicle network original data adopts the traffic usage model event and the coarse-grained level index of the subspace under the traffic usage model event, and is a traffic usage model event. The historical data in is set to a fine-grained level index of the record level. Furthermore, due to the spatio-temporal characteristics of the original data of the Internet of Vehicles, an adaptive data partitioning method is adopted in the indexing process, so that the index of the original data of the Internet of Vehicles is relatively uniform.
由于与行车用量模型事件相关的车联网原始数据在索引时是采用已有的行车用量模型事件索引,所以不需要更新索引,且将车联网原始数据包含在行车用量模型事件下的一定范围内的子空间内且均匀分布,因此索引的维度代价也控制在有效的范围内,不会影响存储性能及索引更新次数。Since the vehicle network raw data related to the traffic usage model event is indexed by the existing traffic usage model event index, there is no need to update the index, and the vehicle network original data is included in a certain range under the traffic usage model event. The subspace is evenly distributed, so the dimensional cost of the index is also controlled within a valid range, without affecting storage performance and index update times.
图2为本发明实施例提供的基于行车用量模型索引车联网原始数据的方法流程图,其具体步骤为:FIG. 2 is a flowchart of a method for indexing vehicle network original data based on a driving usage model according to an embodiment of the present invention, where specific steps are as follows:
步骤201、建立行车用量模型事件,行车用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则;Step 201: Establish a traffic usage model event, and the traffic usage model event includes a traffic usage association rule corresponding to different items of different information bodies;
步骤202、获取车联网原始数据后,根据行车用量模型事件划分为车联网原始数据块,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储; Step 202: After obtaining the original data of the vehicle network, classifying the original data block of the vehicle network according to the traffic usage model event, and dividing the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage;
在本步骤中,多个子空间数据段采用K维索引树(K-dimension Tree)或平均四叉树(Bucket PR Quadtree)划分,通过划分,得到若干个互补重叠的矩形子空间数据段,对应存储于采用R-tree索引的存储区域;In this step, a plurality of subspace data segments are divided by a K-dimension tree or a Bucket PR Quadtree, and a plurality of complementary overlapping rectangular subspace data segments are obtained by division, and corresponding storage is performed. a storage area using an R-tree index;
步骤203、将行车用量模型事件采用多路搜索树(B+tree)进行索引,其中的B+tree的叶子节点上为n叉树(R-tree),索引对应行车用量模型事件的车联网原始数据块所划分的多个子空间数据段;Step 203: The traffic usage model event is indexed by a multi-path search tree (B+tree), wherein the leaf node of the B+tree is an n-tree (R-tree), and the index corresponds to the car network original of the traffic usage model event. a plurality of subspace data segments divided by the data block;
步骤204、将对应行车用量模型事件的历史数据存储在设定区域中,为所设定区域建立记录级别的索引;Step 204: Store historical data of the corresponding driving usage model event in the setting area, and establish an index of the recording level for the set area;
在本步骤中,记录级别的索引可以为局部索引,该局部索引可以采用R树或网格索引两种方式。In this step, the index of the record level may be a local index, and the local index may adopt an R tree or a grid index.
在步骤202中,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储之后,该方法进一步包括:In step 202, after the car network original data block corresponding to the traffic usage model event is divided into a plurality of subspace data segments, the method further includes:
根据子空间数据段大小及子空间数据段所在对应行车用量模型时间的行车用量原始数据段的树深度,确定划分策略是否合理,如果否,则调整划分策略,重新根据划分策略将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储。According to the sub-space data segment size and the tree depth of the original data segment of the traffic usage model time of the sub-space data segment, it is determined whether the division strategy is reasonable. If not, the division strategy is adjusted, and the corresponding traffic usage model is re-based according to the division strategy. The car network raw data block of the event is divided into multiple subspace data segment storage.
所述确定划分策略是否合理为:Whether the determination of the division strategy is reasonable is:
根据子空间数据段大小计算该子空间数据方差,当判断计算得到的该子空间数据方差大于等于设置的第一阈值且该树深度大于等于设置的第二阈值时,则调整划分策略为缩小子空间数据段;当判断计算得到的该子空间数据方差小于设置的第一阈值,且该树深度小于设置的第二阈值时,则调整划分策略为扩大子空间数据段。Calculating the subspace data variance according to the subspace data segment size. When determining that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth is greater than or equal to the set second threshold, adjusting the partitioning strategy to reduce the size a spatial data segment; when it is determined that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold, the partitioning strategy is adjusted to expand the subspace data segment.
图3为本发明实施例提供的对行车用量模型相关数据具体进行粗粒度级别索引过程示意图,如图3所示,首先,需要根据行车用量模型事件,对车联网原始数据分区,得到行车用量模型事件相关数据;然后,对行车用量模型事件相关数据进行对应的行车用量模型事件更新,且为该更新后的行车用量模型事件创建子空间,将该行车用量模型事件相关数据分为多个数据段设置在子空间中,这些子空间对应车联网数据库中的一个存储区域,该数据库为分布式数据库,采用R树索引。 FIG. 3 is a schematic diagram of a coarse-grained level indexing process for data related to a driving usage model according to an embodiment of the present invention. As shown in FIG. 3, first, it is necessary to partition the original data of the vehicle network according to the driving usage model event to obtain a driving usage model. Event-related data; then, the traffic usage model event related data is updated for the traffic usage model event related data, and a subspace is created for the updated driving usage model event, and the driving usage model event related data is divided into multiple data segments. Set in the subspace, these subspaces correspond to a storage area in the car networking database, which is a distributed database, using an R-tree index.
以下对图3所示的过程详细说明。The process shown in Figure 3 will be described in detail below.
首先,根据行车用量模型事件对车联网原始数据进行划分First, the original data of the car network is divided according to the traffic usage model event.
由于“人-车-路”三个信息主体提供的车联网原始数据相互关联又彼此影响的闭环关系,根据不同信息组团对行车用量的需求投影,可以形成行车用量模型事件,包括不同信息主体的不同项目所对应的行车用量关联规则。Due to the closed-loop relationship between the original data of the car network provided by the three information bodies of "people-car-road" and the mutual influence of each other, according to the demand projection of the traffic usage of different information groups, the traffic usage model event can be formed, including different information subjects. The traffic usage association rules for different projects.
车联网原始数据是随着行车用量模型事件分布,所以按照某一行车用量模型事件发生和结束锚,可以将车联网原始数据分为若干个与行车用量模型事件相关的数据块(Event Data Block),锚采用A表示,车联网原始数据DBS={[As1,Ae1),[As2,Ae2),…,[Asi,Aei),…},其中[Asi,Aei)是一个左闭右开的数据区间,表示针对行车用量模型事件的车联网原始数据块,这些区间是不重叠的。The vehicle network raw data is distributed with the traffic usage model event, so according to a traffic usage model event occurrence and end anchor, the car network raw data can be divided into several data blocks related to the traffic usage model event (Event Data Block). The anchor is represented by A, the car network raw data DBS={[A s1 , A e1 ), [A s2 , A e2 ),..., [A si , A ei ),...}, where [A si , A ei ) It is a data interval of left closed right opening, indicating the car network raw data block for the traffic usage model event, these intervals are not overlapping.
在具体实现上,首先根据行车用量模型事件在事件维度上将车联网原始数据分成若干个块,针对每个块,在二维空间中进行划分,划分为若干个数据段,若干个数据段分别存储在若干个子空间内。In the specific implementation, the car network raw data is first divided into several blocks according to the traffic usage model event in the event dimension. For each block, the two-dimensional space is divided into several data segments, and several data segments are respectively Stored in several subspaces.
为了保证存储数据段的子空间划分合理,则需要监控每个子空间大小及计算子空间的深度和偏移量;根据计算结果确定划分是否合理,如果不合理,比如超过设置的分割数据段阈值,则调整子空间的分割策略。In order to ensure that the subspace division of the stored data segment is reasonable, it is necessary to monitor the size of each subspace and calculate the depth and offset of the subspace; according to the calculation result, it is determined whether the division is reasonable, and if it is unreasonable, such as exceeding the set segmentation data segment threshold, Then adjust the segmentation strategy of the subspace.
将车联网原始数据存储到对应的行车用量模型事件的存储节点上Store the vehicle network raw data on the storage node of the corresponding driving usage model event
当将车联网原始数据划分完成后,将行车用量模型事件开始锚和结束锚之间的车联网原始数据块[As1,Ae1)存储到对应的行车用量模型事件的存储节点上,如果该对应的行车用量模型事件是采用云存储系统存储,则通过云存储系统的接口确定该对应的行车用量模型事件的存储节点,将车联网原始数据块[As1,Ae1)更新到该存储节点上。After the vehicle network raw data is divided, the car network raw data block [A s1 , A e1 ) between the traffic usage model event start anchor and the end anchor is stored on the storage node of the corresponding traffic usage model event, if The corresponding driving usage model event is stored by the cloud storage system, and the storage node of the corresponding driving usage model event is determined through the interface of the cloud storage system, and the original data block [A s1 , A e1 ) of the vehicle network is updated to the storage node. on.
更新对应的行车用量模型事件的索引Update the index of the corresponding driving usage model event
为了加快行车用量模型事件的点查询和范围查询,利用B+Tree索引行车用量模型事件,B+Tree的叶子节点对应一棵R-Tree,该R-Tree用来索引该行车用量模型事件的车联网原始数据块[As1,Ae1)所划分的子空间,当将车联网原始数据块[As1,Ae1)存储到对应的行车用量模型事件的存储节点时,更新对应的B+Tree索引。In order to speed up the point query and range query of the traffic usage model event, the B+Tree index driving usage model event is used, and the leaf node of B+Tree corresponds to an R-Tree, and the R-Tree is used to index the vehicle of the driving usage model event. The subspace divided by the networked original data block [A s1 , A e1 ), when the car network original data block [A s1 , A e1 ) is stored to the storage node of the corresponding traffic usage model event, the corresponding B+Tree is updated. index.
创建车联网原始数据块[As1,Ae1)的子空间索引 Create a subspace index of the car network raw data block [A s1 , A e1 )
对于大多数车联网应用环境中,存储节点在空间上分布范围大都是固定的,所以对某个车联网原始数据块[As1,Ae1)内的子空间,可以采用K-dimension Tree 或Bucket PR Quadtree进行划分,通过划分,最终得到若干个互补重叠的矩形子空间区域,对这些重叠的矩形子空间区域,采用R树索引。For most car networking applications, the storage nodes are mostly spatially fixed, so for subspaces in a car network raw data block [A s1 , A e1 ), K-dimension Tree or Bucket can be used. The PR Quadtree is divided, and finally, a plurality of complementary overlapping rectangular subspace regions are obtained, and an R-tree index is adopted for the overlapping rectangular subspace regions.
对车联网原始数据块[As1,Ae1)的历史数据建立记录级别的索引[As2,Ae2)Establish an index of the record level [A s2 , A e2 ) for the historical data of the car network raw data block [A s1 , A e1 )
当车联网原始数据块[As1,Ae1)更新了对应的行车用量模型事件后,原来的行车用量模型事件中的车联网原始数据块[As1,Ae1)成为历史数据,对于历史数据,为了进一步加快查询速度,可以为每个区域建立一个记录级别的局部索引,局部索引采用R树或网格索引两种方式,索引各个行车用量模型事件中的历史数据。When the car network raw data block [A s1 , A e1 ) updates the corresponding driving usage model event, the original car network raw data block [A s1 , A e1 ) in the original driving usage model event becomes historical data for historical data. In order to further speed up the query, a local index of the record level can be established for each region, and the local index uses the R tree or the grid index to index the historical data in each traffic usage model event.
为了方便叙述,以下从索引层面及存储层面对基于行车用量模型存储及索引车联网原始数据的过程进行详细说明,图4为本发明实施例提供的从索引层面及存储层面对基于行车用量模型事件存储及索引车联网原始数据的过程示意图。For convenience of description, the following describes the process of storing and indexing the vehicle network raw data based on the traffic usage model from the index level and the storage layer. FIG. 4 is a view of the vehicle usage model event from the index level and the storage layer according to an embodiment of the present invention. Schematic diagram of the process of storing and indexing the original data of the car network.
从图4可以看出,在存储层面上,按照行车用量模型事件维度,将车联网原始数据划分为行车用量模型事件相关数据及行车用量模型事件无关数据;然后,根据行车用量模型事件的发生锚和结束锚,将行车用量模型事件相关数据分为对应行车用量模型事件的数据块;再次,针对每个数据块,在二维空间上划分,划分为若干个子空间,每个数据块中的多个子空间数据段存储到分布式存储系统中的一个区域内,保证对应某一行车用量模型事件的行车用量模型事件相关数据块的子空间数据段尽量存在在相同的区域内,减少查询过程中需要扫描的区域数量,提高查询效率。It can be seen from Fig. 4 that, at the storage level, according to the traffic usage model event dimension, the vehicle network raw data is divided into the traffic usage model event related data and the traffic usage model event-independent data; then, according to the driving usage model event occurrence anchor And ending the anchor, the traffic usage model event related data is divided into data blocks corresponding to the traffic usage model event; again, for each data block, divided in two dimensions, divided into several subspaces, and each of the data blocks The sub-space data segment is stored in an area of the distributed storage system, and the sub-space data segment of the traffic usage model event-related data block corresponding to a traffic usage model event is guaranteed to exist in the same area as much as possible, thereby reducing the need for the query process. The number of scanned areas improves query efficiency.
在索引层面上,主要包括三个层次,其中的行车用量模型事件索引以及子空间索引针对的是当前车联网原始数据,网格索引针对的是行车用量模型事件对应的历史数据索引。At the index level, there are mainly three levels, wherein the traffic usage model event index and the subspace index are for the current vehicle network raw data, and the grid index is for the historical data index corresponding to the traffic usage model event.
具体地,在索引时,在行车用量模型事件维度上,将数据分为当前车联网原始数据和历史数据。对于当前车联网原始数据,仅仅对其所在的数据段及所在的子空间索引,而不对数据记录本身索引,这样在当前车联网原始数据存储时大大减少更新索引的次数,其中的行车用量模型事件索引采用B+tree方式进 行,由于行车用量模型事件的多个子空间数据段都存储在不同的区域中,所以采用R-tree索引。当行车用量模型事件更新后,历史数据不再改变,所以可以批量地对历史数据存储并建立记录级别的索引,比如可以采用R-tree索引或网格索引。这样,索引更新维护的代价比较低,对车联网原始数据存储的影响比较小,保证车联网系统能够支持大规模的频繁更新。Specifically, at the time of indexing, the data is divided into current vehicle network raw data and historical data on the traffic usage model event dimension. For the current data of the original Internet of the car, only the data segment and the sub-space index where it is located, but not the data record itself, so that the number of times of updating the index is greatly reduced in the current car network original data storage, wherein the traffic usage model event Index uses B+tree mode Line, because the multiple subspace data segments of the traffic usage model event are stored in different regions, the R-tree index is used. When the traffic usage model event is updated, the historical data is no longer changed, so the historical data can be stored in batches and indexed at the record level, for example, an R-tree index or a grid index can be used. In this way, the cost of index update maintenance is relatively low, and the impact on the original data storage of the car network is relatively small, ensuring that the car network system can support large-scale frequent updates.
具体说明一下子空间数据段如何划分且优化Explain exactly how subspace data segments are divided and optimized
在实际应用中,车联网原始数据在时间维度上是单调增加的,行车用量模型也可以随着时间发生改变,这就需要将车联网原始数据流划分为若干个反馈周期,并在反馈周期内对行车用量模型事件及子空间划分策略进行自适应的逐步优化:In practical applications, the original data of the Internet of Vehicles is monotonously increased in time dimension, and the driving usage model can also change with time. This requires dividing the original data stream of the car network into several feedback cycles and within the feedback cycle. Adaptive step-by-step optimization of traffic usage model events and subspace partitioning strategies:
第一步骤,根据具体的应用场景,将设定反馈周期内的行车用量模型事件记录总数N,假设每个子空间最多为S条记录,则每个事件段内的数据平均划分为R个子空间,第一个反馈周期内的对应行车用量模型事件的行车用量数据块划分成
Figure PCTCN2014090016-appb-000001
块,分别为E11,E12,E13…E1k。
In the first step, according to the specific application scenario, the total number N of the traffic usage model event records in the feedback period is set, and if each subspace is at most S records, the data in each event segment is equally divided into R subspaces. The traffic usage data block of the corresponding traffic usage model event in the first feedback cycle is divided into
Figure PCTCN2014090016-appb-000001
Blocks are E11, E12, E13...E1k.
第二步骤,分别对E11,E12,E13…E1k采用Bucket PR KD-tree进行子空间划分的,记录下树的深度Depi,并监控每个子空间数据段的大小,根据公式(1)计算得到对应行车用量模型的行车用量数据块内各子空间数据段的数据量方差:In the second step, sub-space partitioning is performed on the E11, E12, E13...E1k using the Bucket PR KD-tree, the depth Dep i of the tree is recorded, and the size of the data segment of each subspace is monitored, and is calculated according to the formula (1). The variance of the data amount of each subspace data segment in the traffic usage data block corresponding to the driving usage model:
Figure PCTCN2014090016-appb-000002
  公式(1),
Figure PCTCN2014090016-appb-000002
Formula 1),
其中Ni表示Ei内子空间的数目,xm表示第Ei第m个子空间的大小,Di表示Ei内子空间大小的方差。Di的大小反应了该子空间数据段内数据划分的均匀程度。Where N i represents the number of subspaces within Ei, x m represents the size of the mth subspace of Ei, and D i represents the variance of the size of the subspace within Ei. The size of D i reflects the degree of uniformity of data partitioning in the data segment of the subspace.
第三个步骤,根据对应行车用量模型的行车用量数据块内各子空间数据段的数据量方差Di和数据划分的层数Depi调整数据段的划分:如果Di大于等于设定的第一阈值,说明对应行车用量模型的行车用量数据块内各子空间数据段分布不均匀,并且当Depi大于等于设定的第二阈值时,则需要缩小划分的子空间数据段;如果Di小于设定的第一阈值,说明对应行车用量模型的行车用量数据块内各子空间数据段分布比较均匀,此时如果Depi小于设定第二阈值,则说明数据量太少,把相邻的两个子空间数据段合并; In the third step, according to the data amount variance D i of each subspace data segment in the traffic usage data block of the corresponding traffic usage model and the layer number Dep i of the data division, the division of the data segment is adjusted: if D i is greater than or equal to the set number a threshold value, indicating that the sub-space data segment in the traffic usage data block of the corresponding traffic usage model is unevenly distributed, and when Dep i is greater than or equal to the set second threshold, the sub-space data segment needs to be reduced; if D i If the first threshold is less than the first threshold, the data segment of each subspace in the traffic usage data block of the corresponding traffic usage model is relatively uniform. If Dep i is less than the second threshold, the data volume is too small. The two subspace data segments are merged;
第四个步骤,通过对划分策略进行监测,如果数据段的划分和区域的划分策略在连续一段反馈周期内保持不变,则可以把划分策略固定下来,不再进行动态划分。这样就可以预先确定好划分方案,车联网原始数据在存储时就不需要再动态划分了,从而进一步提高存储的性能。In the fourth step, by monitoring the partitioning strategy, if the partitioning of the data segment and the partitioning strategy of the region remain unchanged for a continuous feedback period, the partitioning strategy can be fixed and no dynamic partitioning is performed. In this way, the partitioning scheme can be determined in advance, and the original data of the vehicle network does not need to be dynamically divided when stored, thereby further improving the storage performance.
更进一步地,在系统运行过程中,仍需要对数据分布情况进行监控,一旦发现数据分布出现不均衡的情况,则重新采用动态划分策略。Further, during the operation of the system, it is still necessary to monitor the distribution of data. Once the data distribution is found to be unbalanced, the dynamic partitioning strategy is re-used.
图5为本发明实施例提供的基于行车用量模型事件的存储及索引系统结构示意图,如图所示,包括:建立模型模块、存储指示模块及索引模块,其中,FIG. 5 is a schematic structural diagram of a storage and indexing system based on a traffic usage model event according to an embodiment of the present invention. As shown in the figure, the method includes: establishing a model module, a storage indication module, and an index module, where
建立模型模块,用于建立行车用量模型事件,行车用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则;Establishing a model module for establishing a traffic usage model event, and the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
存储指示模块,用于获取车联网原始数据后,根据行车用量模型事件划分为车联网原始数据块,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段进行存储;将对应行车用量模型事件的历史数据存储在设定区域;The storage instruction module is configured to divide the original data of the vehicle network into the original data block of the vehicle network according to the traffic usage model event, and divide the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage; The historical data of the driving usage model event is stored in the setting area;
索引模块,用于将行车用量模型事件采用B+tree索引,其中的B+tree的叶子节点上为R-tree,索引对应行车用量模型事件的车联网原始数据块所划分的多个子空间数据段;为所设定区域建立记录级别的索引。The indexing module is configured to adopt a B+tree index for the traffic usage model event, wherein the B+tree has an R-tree on the leaf node, and the index corresponds to the plurality of subspace data segments divided by the car network original data block of the traffic usage model event. ; Establish an index of the record level for the set area.
在本发明中,存储指示模块,还用于将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段进行存储时采用K-dimension Tree或Bucket PR Quadtree划分,通过划分,得到若干个互补重叠的矩形子空间数据段,对应存储于采用R-tree索引的存储区域。In the present invention, the storage indication module is further configured to divide the car network original data block corresponding to the traffic usage model event into a plurality of subspace data segments for storage, and use K-dimension Tree or Bucket PR Quadtree to divide, and obtain a plurality of The complementary overlapping rectangular subspace data segments are correspondingly stored in the storage area using the R-tree index.
在本发明实施例中,该系统还包括更新划分模块,用于根据子空间数据段大小及子空间数据段所在对应行车用量模型时间的行车用量原始数据段的树深度,确定划分策略是否合理,如果否,则调整划分策略;In the embodiment of the present invention, the system further includes an update partitioning module, configured to determine whether the partitioning strategy is reasonable according to the sub-space data segment size and the tree depth of the original usage data segment of the traffic usage model time of the sub-space data segment. If no, adjust the division strategy;
所述存储指示模块,还用于重新根据划分策略将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储。The storage indication module is further configured to divide the car network original data block corresponding to the traffic usage model event into multiple subspace data segment storages according to the division strategy.
在本发明实施例中,所述更新划分模块,还用于根据子空间数据段大小计算该子空间数据方差,当判断计算得到的该子空间数据方差大于等于设置的第一阈值且该树深度大于等于设置的第二阈值时,则调整划分策略为缩小子空间 数据段;当判断计算得到的该子空间数据方差小于设置的第一阈值,且该树深度小于设置的第二阈值时,则调整划分策略为扩大子空间数据段。In the embodiment of the present invention, the update partitioning module is further configured to calculate the subspace data variance according to the subspace data segment size, and determine that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth When the second threshold is greater than or equal to the set value, the partitioning strategy is adjusted to reduce the subspace. a data segment; when it is determined that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold, the partitioning strategy is adjusted to expand the subspace data segment.
本发明提供的方法及系统充分考虑了针对行车用量进行有效计量的需求,实现对车联网相关资源数据的优化存储及利用。本发明提供的方法及系统充分考虑了车联网原始数据在不断的生成,而历史数据一般生成后不会改变,另外对应行车用量模型事件的车联网原始数据分布往往具有倾斜性,随着时间的推移行车用量模型事件也会发生变化,在进行子空间数据段划分的时候同时考虑数据分布上的不均衡性,满足行车用量资源计量的需求,对于车联网解决方案具有实操指导意义。The method and system provided by the invention fully consider the requirement for effective measurement of the driving amount, and realize the optimal storage and utilization of the vehicle network related resource data. The method and system provided by the invention fully consider that the original data of the vehicle network is continuously generated, and the historical data generally does not change after being generated, and the distribution of the original data of the vehicle network corresponding to the traffic usage model event often has a tilt, with time The change of the driving usage model event will also change. When the subspace data segment is divided, the imbalance of data distribution is considered at the same time, and the demand for the measurement of the traffic usage resource is satisfied, which has practical guiding significance for the vehicle networking solution.
本发明提供的方法及系统适用场景和实例包括但不限于以下车联网应用:智能交通系统、海量数据存储和索引、以及资源使用计量等,可以满足现有车联网数据存储应用的需求。The applicable scenarios and examples of the methods and systems provided by the present invention include, but are not limited to, the following vehicle networking applications: intelligent transportation systems, mass data storage and indexing, and resource usage metering, etc., which can meet the needs of existing vehicle network data storage applications.
以上举较佳实施例,对本发明的目的、技术方案和优点进行了进一步详细说明,所应理解的是,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。 The present invention has been described in detail with reference to the preferred embodiments of the present invention. All modifications, equivalent substitutions and improvements made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (9)

  1. 一种基于行车用量模型事件的存储及索引方法,其特征在于,该方法包括:A storage and indexing method based on a traffic usage model event, the method comprising:
    建立行车用量模型事件,行车用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则;Establishing a traffic usage model event, the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
    获取车联网原始数据,根据行车用量模型事件划分为车联网原始数据块,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储;Obtaining the original data of the vehicle network, dividing into the original data block of the vehicle network according to the traffic usage model event, and dividing the original data block of the vehicle network corresponding to the traffic usage model event into multiple sub-space data segments for storage;
    将行车用量模型事件采用多路搜索树B+tree进行索引,其中的B+tree的叶子节点上为n叉树R-tree,索引对应行车用量模型事件的车联网原始数据块所划分的多个子空间数据段;The traffic usage model event is indexed by a multi-path search tree B+tree, wherein the leaf node of the B+tree is an n-tree R-tree, and the index is divided into multiple sub-divisions of the car network original data block of the traffic usage model event. Spatial data segment;
    将对应行车用量模型事件的历史数据存储在设定区域中,为所设定区域建立记录级别的索引。The historical data of the corresponding driving usage model event is stored in the setting area, and an index of the recording level is established for the set area.
  2. 如权利要求1所述的方法,其特征在于,所述多个子空间数据段采用K维索引树K-dimension Tree或平均四叉树Bucket PR Quadtree划分,通过划分,得到若干个互补重叠的矩形子空间数据段,对应存储在采用R-tree索引的存储区域。The method according to claim 1, wherein the plurality of subspace data segments are divided by a K-dimensional index tree K-dimension Tree or an average quadtree bucket PR Quadtree, and a plurality of complementary overlapping rectangles are obtained by dividing. The spatial data segment is correspondingly stored in a storage area using an R-tree index.
  3. 如权利要求1所述的方法,其特征在于,所述记录级别的索引为局部索引,该局部索引采用R树方式或网格索引方式。The method according to claim 1, wherein the index of the record level is a local index, and the local index adopts an R-tree manner or a mesh index manner.
  4. 如权利要求1所述的方法,其特征在于,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储之后,该方法进一步包括:The method of claim 1, wherein after the car network raw data block corresponding to the traffic usage model event is divided into the plurality of subspace data segments, the method further comprises:
    根据子空间数据段大小及子空间数据段所在对应行车用量模型时间的行车用量原始数据段的树深度,确定划分策略是否合理,如果否,则调整划分策略,重新根据划分策略将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储。According to the sub-space data segment size and the tree depth of the original data segment of the traffic usage model time of the sub-space data segment, it is determined whether the division strategy is reasonable. If not, the division strategy is adjusted, and the corresponding traffic usage model is re-based according to the division strategy. The car network raw data block of the event is divided into multiple subspace data segment storage.
  5. 如权利要求4所述的方法,其特征在于,所述确定划分策略是否合理为:The method of claim 4 wherein said determining whether the partitioning strategy is reasonable is:
    根据子空间数据段大小计算该子空间数据方差,当判断计算得到的该子空间数据方差大于等于设置的第一阈值且该树深度大于等于设置的第二阈值时, 则调整划分策略为缩小子空间数据段;当判断计算得到的该子空间数据方差小于设置的第一阈值,且该树深度小于设置的第二阈值时,则调整划分策略为扩大子空间数据段。Calculating the subspace data variance according to the subspace data segment size, when determining that the calculated subspace data variance is greater than or equal to the set first threshold and the tree depth is greater than or equal to the set second threshold, Adjusting the partitioning strategy to reduce the subspace data segment; when determining that the calculated subspace data variance is less than the set first threshold, and the tree depth is less than the set second threshold, adjusting the partitioning strategy to expand the subspace data segment .
  6. 一种基于行车用量模型事件的存储及索引系统,其特征在于,该系统包括:建立模型模块、存储指示模块及索引模块,其中,A storage and indexing system based on a traffic usage model event, wherein the system comprises: a model building module, a storage indicating module and an indexing module, wherein
    建立模型模块,用于建立行车用量模型事件,行车用量模型事件包括不同信息主体的不同项目所对应的行车用量关联规则;Establishing a model module for establishing a traffic usage model event, and the traffic usage model event includes a traffic usage association rule corresponding to different items of different information subjects;
    存储指示模块,用于获取车联网原始数据后,根据行车用量模型事件划分为车联网原始数据块,将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段进行存储;将对应行车用量模型事件的历史数据存储在设定区域;The storage instruction module is configured to divide the original data of the vehicle network into the original data block of the vehicle network according to the traffic usage model event, and divide the original data block of the vehicle network corresponding to the traffic usage model event into a plurality of subspace data segments for storage; The historical data of the driving usage model event is stored in the setting area;
    索引模块,用于将行车用量模型事件采用B+tree索引,其中的B+tree的叶子节点上为R-tree,索引对应行车用量模型事件的车联网原始数据块所划分的多个子空间数据段;为所设定区域建立记录级别的索引。The indexing module is configured to adopt a B+tree index for the traffic usage model event, wherein the B+tree has an R-tree on the leaf node, and the index corresponds to the plurality of subspace data segments divided by the car network original data block of the traffic usage model event. ; Establish an index of the record level for the set area.
  7. 如权利要求6所述的系统,其特征在于,所述存储指示模块,还用于将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段进行存储时采用K-dimension Tree或Bucket PR Quadtree划分,通过划分,得到若干个互补重叠的矩形子空间数据段,对应存储在采用R-tree索引的存储区域。The system of claim 6, wherein the storage indication module is further configured to divide the car network raw data block corresponding to the traffic usage model event into a plurality of subspace data segments for storage by using a K-dimension Tree or Bucket PR Quadtree partitioning, by dividing, obtains a plurality of complementary overlapping rectangular subspace data segments, correspondingly stored in a storage area using an R-tree index.
  8. 如权利要求6所述的系统,其特征在于,该系统还包括更新划分模块,用于根据子空间数据段大小及子空间数据段所在对应行车用量模型时间的行车用量原始数据段的树深度,确定划分策略是否合理,如果否,则调整划分策略;The system according to claim 6, wherein the system further comprises an update partitioning module, configured to: according to the sub-space data segment size and the tree depth of the original data segment of the driving usage model time of the sub-space data segment, Determine whether the division strategy is reasonable, and if not, adjust the division strategy;
    所述存储指示模块,还用于重新根据划分策略将对应行车用量模型事件的车联网原始数据块划分为多个子空间数据段存储。The storage indication module is further configured to divide the car network original data block corresponding to the traffic usage model event into multiple subspace data segment storages according to the division strategy.
  9. 如权利要求8所述的系统,其特征在于,所述更新划分模块,还用于根据子空间数据段大小计算该子空间数据方差,当判断计算得到的该子空间数据方差大于等于设置的第一阈值且该树深度大于等于设置的第二阈值时,则调整划分策略为缩小子空间数据段;当判断计算得到的该子空间数据方差小于设置的第一阈值,且该树深度小于设置的第二阈值时,则调整划分策略为扩大子 空间数据段。 The system according to claim 8, wherein the update dividing module is further configured to calculate the subspace data variance according to the subspace data segment size, and determine that the calculated subspace data variance is greater than or equal to the set number. When the threshold is greater than or equal to the set second threshold, the partitioning strategy is adjusted to reduce the subspace data segment; when the calculated subspace data variance is determined to be less than the set first threshold, and the tree depth is less than the set threshold When the second threshold is used, the partitioning strategy is adjusted to be an extension. Spatial data segment.
PCT/CN2014/090016 2013-10-31 2014-10-31 Driving amount model event-based storage and index methods and system WO2015062540A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310532545.2 2013-10-31
CN201310532545.2A CN104598475B (en) 2013-10-31 2013-10-31 Storage and indexing means and system based on driving dosage model event

Publications (2)

Publication Number Publication Date
WO2015062540A1 WO2015062540A1 (en) 2015-05-07
WO2015062540A9 true WO2015062540A9 (en) 2015-06-18

Family

ID=53003386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/090016 WO2015062540A1 (en) 2013-10-31 2014-10-31 Driving amount model event-based storage and index methods and system

Country Status (2)

Country Link
CN (1) CN104598475B (en)
WO (1) WO2015062540A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302879B (en) * 2015-10-12 2019-03-08 百度在线网络技术(北京)有限公司 For determining the method and apparatus of user demand
US11237162B2 (en) * 2016-05-24 2022-02-01 Jsr Corporation Composite particles, coated particles, method for producing composite particles, ligand-containing solid phase carrier and method for detecting or separating target substance in sample
CN106447724A (en) * 2016-09-12 2017-02-22 厦门大学 Method for determining region limit based on scan conversion algorithm and mesh compression
JP6914035B2 (en) 2016-12-28 2021-08-04 スリーエム イノベイティブ プロパティズ カンパニー A method for manufacturing a sheet-shaped laminate, a mold for molding the sheet-shaped laminate, and a sheet-shaped laminate.
CN109242227A (en) * 2017-07-10 2019-01-18 卢照敢 The driving risk and assessment models of car steering behavior
CN110209884B (en) * 2018-01-10 2022-08-05 杭州海康威视数字技术股份有限公司 Index checking method and device
CN111443401A (en) * 2020-04-01 2020-07-24 南通大学 Weather prediction and information transfer system based on expressway network
CN111800742B (en) * 2020-05-20 2022-10-28 北京掌行通信息技术有限公司 Management method and device of mobile position data, storage medium and terminal
CN116978232B (en) * 2023-09-21 2024-01-12 深圳市领航者汽车智能技术开发有限公司 Vehicle data management system and method based on Internet of vehicles

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137684A1 (en) * 2009-12-08 2011-06-09 Peak David F System and method for generating telematics-based customer classifications
CN102055800A (en) * 2010-12-13 2011-05-11 南京大学 Traffic internet of things (IOT) layering system architecture based on information gathering
CN102314519B (en) * 2011-10-11 2012-12-19 中国软件与技术服务股份有限公司 Information searching method based on public security domain knowledge ontology model
CN103324642B (en) * 2012-03-23 2016-12-14 日电(中国)有限公司 System and method and the data query method of index is set up for data
CN103049464A (en) * 2012-03-30 2013-04-17 北京峰盛博远科技有限公司 Heterogeneous geospatial data management technique based on spatial object generalized model and grid body indexing
CN103577555A (en) * 2013-10-21 2014-02-12 汕头大学 Big data analysis method based on internet of vehicles

Also Published As

Publication number Publication date
WO2015062540A1 (en) 2015-05-07
CN104598475B (en) 2018-02-23
CN104598475A (en) 2015-05-06

Similar Documents

Publication Publication Date Title
WO2015062540A1 (en) Driving amount model event-based storage and index methods and system
CN106372114B (en) A kind of on-line analysing processing system and method based on big data
CN103412897B (en) A kind of parallel data processing method based on distributed frame
Ben Brahim et al. Spatial data extension for Cassandra NoSQL database
US20130254212A1 (en) Data indexing system, data indexing method and data querying method
DE112014004794T5 (en) Allocating map matching tasks through cluster servers on the vehicles' Internet
CN109344207B (en) Big data platform of integrative frequency spectrum all over the sky based on big dipper scanning
Noruzoliaee et al. Truck platooning in the US national road network: A system-level modeling approach
CN108920552A (en) A kind of distributed index method towards multi-source high amount of traffic
CN106202506A (en) Three-dimensional traffic Noise map update method in conjunction with offline storage Yu instant computing
CN109257422B (en) Method for reconstructing perception network information
CN104915897A (en) Computer implementation method for power grid planning evaluation service
Wang et al. Research on parallelized real-time map matching algorithm for massive GPS data
CN110532283A (en) A kind of smart city big data processing system based on Hadoop aggregated structure
CN111311038A (en) Evaluation method of traffic management and control service index
Huang Research on the revolution of multidimensional learning space in the big data environment
CN102004771A (en) Method for querying reverse neighbors of moving object based on dynamic cutting
CN109410367B (en) Vehicle power supply management method, computer readable storage medium and server
Mahéo et al. Customised shortest paths using a distributed reverse oracle
Yang et al. Mechanism design for stochastic dynamic parking resource allocation
CN110019466A (en) Big data integrated system based on metadata
CN108363756A (en) A kind of intelligent transportation big data processing system
Xie et al. Construction for the city taxi trajectory data analysis system by Hadoop platform
CN107273464B (en) Distributed measurement similarity query processing method based on publish/subscribe mode
CN111782596B (en) Radio monitoring data processing method based on high-performance hybrid computing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14857681

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14857681

Country of ref document: EP

Kind code of ref document: A1