CN112650453B - Method and system for storing and inquiring traffic data - Google Patents

Method and system for storing and inquiring traffic data Download PDF

Info

Publication number
CN112650453B
CN112650453B CN202011631496.4A CN202011631496A CN112650453B CN 112650453 B CN112650453 B CN 112650453B CN 202011631496 A CN202011631496 A CN 202011631496A CN 112650453 B CN112650453 B CN 112650453B
Authority
CN
China
Prior art keywords
data
storage area
target
cold
thermal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011631496.4A
Other languages
Chinese (zh)
Other versions
CN112650453A (en
Inventor
李鹏
郭冰
罗天成
夏曙东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing China Transinfo Stock Co ltd
Original Assignee
Beijing China Transinfo Stock Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing China Transinfo Stock Co ltd filed Critical Beijing China Transinfo Stock Co ltd
Priority to CN202011631496.4A priority Critical patent/CN112650453B/en
Publication of CN112650453A publication Critical patent/CN112650453A/en
Application granted granted Critical
Publication of CN112650453B publication Critical patent/CN112650453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a system for storing and inquiring traffic data, comprising the following steps: merging the received real-time traffic data according to the first period to obtain primary thermal data, and storing the primary thermal data into a temporary partition in a thermal data storage area; merging the temporary partitions according to the second period to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area; combining the first partitions according to the third period to obtain three-level thermal data, and storing the three-level thermal data into a second partition in a thermal data storage area; merging the plurality of second partitions according to the fourth period to obtain first-level cold data, and transferring the first-level cold data into a cold data storage area; and performing format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in a cold data storage area. Combining and moving the hot data in the hot data storage area to the cold data storage area, and reducing the pressure of resources occupied by the hot data storage area; and performing format conversion and compression on the first-stage cold data, and reducing the pressure of resources occupied by the cold data storage area.

Description

Method and system for storing and inquiring traffic data
Technical Field
The application relates to the technical field of big data storage and query, in particular to a method and a system for storing and querying traffic data.
Background
The gate and portal passing record in the current traffic big data field is an important component of the traffic big data, and records the time and position information of the vehicles passing the gate and portal on the highway and at high speed. The collection equipment for the entire province or city will typically be very numerous and the number of vehicles passing through the bayonets and the portals per day will also be very large. Traffic big data platforms storing data typically store a large amount of vehicle crossing gate and portal data information, and therefore the requirements for storage resources are relatively high, resulting in higher costs. And meanwhile, the data volume is larger and larger with the accumulation of time, so that the query and calculation performance of the data are reduced. But the business analysis value of this information stored on traffic big data platforms will gradually decrease over time.
In view of the foregoing, there is a need for a method and system for storing and querying traffic data that reduces storage costs and increases query efficiency.
Disclosure of Invention
In order to solve the problems, the application provides a method and a system for storing and inquiring traffic data.
In a first aspect, the present application proposes a method for storing traffic data, comprising:
obtaining primary thermal data according to the received real-time traffic data in the first period, and storing the primary thermal data into a temporary partition in a thermal data storage area;
According to a second period, merging the first-level thermal data in the temporary partitions to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area, wherein the second period is a first multiple of the first period, and the number of the temporary partitions corresponding to the merged first-level thermal data is equal to the first multiple;
According to a third period, merging the second-level thermal data in the plurality of first partitions to obtain third-level thermal data, and transferring the third-level thermal data to a second partition in a thermal data storage area, wherein the third period is a second multiple of the second period, and the number of the first partitions corresponding to the merged second-level thermal data is equal to the second multiple;
According to a fourth period, merging the three-level heat data in the second partitions, transferring the three-level heat data into a cold data storage area to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are merged is equal to the third multiple;
And carrying out format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in the cold data storage area.
Preferably, the primary thermal data, the secondary thermal data and the tertiary thermal data are stored in the thermal data storage area in a storage format supporting HTAP processing mode.
Preferably, the primary cold data and the secondary cold data are stored in the cold data storage area in a storage format supporting OLAP query mode.
Preferably, the thermal data storage area is a solid state disk.
Preferably, the cold data storage area is a mechanical hard disk.
In a second aspect, the present application proposes a method for querying traffic data, the traffic data being stored according to the above method for storing traffic data, comprising:
Receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
And querying target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
Preferably, querying the target traffic data from the target storage area using a large-scale concurrency processing engine includes:
if the target storage area only comprises the thermal data storage area, adopting a large-scale concurrency engine to process and acquire the target traffic data from the thermal data storage area;
if the target storage area only comprises the cold data storage area, acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine;
and if the target storage area comprises the hot data storage area and the cold data storage area, a large-scale concurrent processing engine is adopted to respectively acquire target hot data from the hot data storage area, acquire target cold data from the cold data storage area, and combine the target hot data and the target cold data as the target traffic data.
In a third aspect, the present application provides a system for storing traffic data, comprising:
The data module is used for obtaining primary heat data of the received real-time traffic data according to the first period; combining a plurality of primary thermal data according to a second period to obtain secondary thermal data, wherein the second period is a first multiple of the first period, and the number of temporary partitions corresponding to the combined primary thermal data is equal to the first multiple; combining a plurality of second-level thermal data according to a third period to obtain third-level thermal data, wherein the third period is a second multiple of the second period, and the number of first partitions corresponding to the combined second-level thermal data is equal to the second multiple; combining a plurality of three-level heat data according to a fourth period to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are combined is equal to the third multiple;
the data compression module is used for carrying out format conversion and compression on the primary cold data to obtain secondary cold data;
the thermal data storage module comprises a temporary partition, a first partition and a second partition, and is used for storing primary thermal data, secondary thermal data and tertiary thermal data;
and the cold data storage module is used for storing the primary cold data and the secondary cold data.
In a fourth aspect, the present application proposes a system for querying traffic data, comprising:
The target storage area query module is used for receiving a query request, querying a storage area of the target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
And the target traffic data query module is used for querying target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
Preferably, the target traffic data query module includes:
The data acquisition unit is used for acquiring the target traffic data from the hot data storage area by adopting a large-scale concurrent processing engine according to the target storage area, or acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine, or acquiring target hot data from the hot data storage area by adopting a large-scale concurrent processing engine, or acquiring target cold data from the cold data storage area respectively;
and the data merging unit is used for merging the target hot data and the target cold data as the target traffic data when the target storage area comprises a hot data storage area and a cold data storage area.
The application has the advantages that: by sequentially merging all the partitions in the thermal data storage area according to the first period, the second period, the third period and the fourth period to obtain primary cold data, and moving the primary cold data to the cold data storage area, the thermal data storage amount in the thermal data storage area can be reduced, so that the pressure of resources occupied by the thermal data storage area is reduced; the format conversion and compression of the first-level cold data in the cold data storage area can reduce the cold data storage amount in the cold data storage area, thereby reducing the pressure of resources occupied by the cold data storage area; and judging a target storage area of the target traffic data according to the query conditions and the metadata, and querying the target traffic data from the target storage area comprising the hot data storage area and/or the cold data storage area by using the large-scale concurrent processing engine, so that the query efficiency can be improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic diagram of steps of a method for storing traffic data provided by the present application;
FIG. 2 is a schematic illustration of a thermal data store for a method of storing traffic data provided by the present application;
FIG. 3 is a schematic diagram of a cold data storage area of a method for storing traffic data provided by the present application;
FIG. 4 is a schematic diagram of steps of a method for querying traffic data provided by the present application;
FIG. 5 is a flow chart of a method for querying traffic data provided by the present application;
FIG. 6 is a schematic diagram of a system for storing traffic data provided by the present application;
Fig. 7 is a schematic diagram of a system for querying traffic data provided by the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In a first aspect, according to an embodiment of the present application, a method for storing traffic data is provided for an on-board module, as shown in fig. 1, including:
s101, obtaining primary heat data according to the received real-time traffic data in a first period, and storing the primary heat data into a temporary partition in a heat data storage area;
s102, merging the first-level thermal data in the temporary partitions according to a second period to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area, wherein the second period is a first multiple of the first period, and the number of the temporary partitions corresponding to the merged first-level thermal data is equal to the first multiple;
S103, merging the second-level thermal data in the plurality of first partitions according to a third period to obtain third-level thermal data, and transferring the third-level thermal data to the second partitions in the thermal data storage area, wherein the third period is a second multiple of the second period, and the number of the first partitions corresponding to the merged second-level thermal data is equal to the second multiple;
s104, merging three-level heat data in a plurality of second partitions according to a fourth period, and transferring the three-level heat data into a cold data storage area to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the merged three-level heat data is equal to the third multiple;
s105, performing format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in a cold data storage area.
The primary thermal data, the secondary thermal data and the tertiary thermal data are all stored in a thermal data storage area in a storage format supporting an HTAP processing mode. The primary cold data and the secondary cold data are stored in a cold data storage area by adopting a storage format supporting an OLAP query mode. The thermal data storage area is a solid state disk. The cold data storage area is a mechanical hard disk.
Because the data volume is large for the traffic data of the gate and the portal in the traffic data scene and the real-time writing throughput of the data is large, the traditional large data platform is more suitable for processing the large data application of the T+1 period, and the support for the real-time writing of the data is poor. For the big data scene of real-time writing (T+0), the requirement on hardware is generally higher, and larger memory configuration and SSD fixed hard disk storage are required. According to the embodiment of the application, different storage strategies and storage media are adopted for the cold and hot data, and the cold and hot data storage format can be compatible with various MPP engines, so that the associated query of the cold and hot data is realized.
The following is a further explanation of the embodiments of the present application.
Firstly, according to the service life cycle characteristics of data, the hot and cold time ranges of the data are reasonably divided, and the hot data and the cold data are divided. Here, the first cycle is 30 minutes, the second cycle is 1 hour, the third cycle is 24 hours, and the fourth cycle is 6 months.
And the thermal data is written into the thermal data storage program in real time by the data acquisition module through the vehicle passing gate and portal data, and the storage medium is suitable for using the SSD solid state disk.
As shown in fig. 2, since hot data is written in real time and the writing amount is large, a large number of small files are usually generated on a large data platform during hot data writing (one small file is usually generated per writing batch). A large number of small files can affect disk IO read efficiency and create significant stress on the disk during querying and computing, thus requiring the merging of hot data, however file merging can create system chunking. Therefore, it is necessary to partition the hot data according to the data characteristics. And dividing the thermal data into two parts, namely a temporary partition and a formal partition according to the time frequency of data updating for the data of the vehicle passing through the gate and the portal. The temporary partition number and the formal partition number of the hot data are set. The formal partition may be divided into two parts, a first partition and a second partition, respectively. The first partition is used for storing the secondary thermal data combined by hours; the second partition is used to store the three-level thermal data that is merged by day. Wherein the temporary partition may be used to store primary thermal data that is merged for a short period of time, such as primary thermal data that is merged at 30 minutes.
In the temporary partition, data storage is performed for one partition every 30 minutes, namely the temporary partition comprises a plurality of 30-minute partitions, each 30-minute partition stores different primary thermal data, the partitions of the past 2 30 minutes are combined once every hour, the combined primary thermal data form secondary thermal data, the secondary thermal data obtained after combination are migrated to the first partition, and the 2 30-minute partitions and the corresponding primary thermal data in the temporary partition are deleted.
In the formal division, the secondary heat data corresponding to the hour division of the last 48 hours and the secondary heat data corresponding to the day division are reserved. The first partition includes a plurality of 1-hour partitions, each 1-hour partition storing different secondary thermal data. The second partition includes a plurality of 24-hour partitions, each 24-hour partition storing different tertiary thermal data. In the first partition, the hour partitions of the past 24 hours are merged once every 24 hours, the merged secondary thermal data form secondary thermal data, and the tertiary thermal data obtained after merging are migrated to the second partition for storing by day. Partitions that remain in the first partition for more than 48 hours and their corresponding secondary thermal data are deleted every 48 hours. Third-level thermal data in the second partition accumulated for more than 6 months is archived into the cold data storage area by default daily, and the per-day partition and its corresponding third-level thermal data stored (archived) into the cold data storage area are deleted.
And for the three-level thermal data in the second partition, merging the second partition according to the service life cycle of the data, converting the merged three-level thermal data into first-level cold data, archiving the first-level cold data into a cold data storage area, and deleting the second partition corresponding to the transferred three-level thermal data in the second partition of the corresponding thermal data storage area.
As shown in fig. 3, since the cold data is stored as history data and the data is written to the cold data storage in batch, the frequency of changing the data is low, so the hot data and the cold data are usually in different storage formats, and therefore, the combined three-level hot data transferred as the first-level cold data into the cold data storage area needs to be converted into the storage format corresponding to the cold data storage area, so that the three-level hot data meets the storage mode of the cold data, and therefore, the first-level cold data after format conversion is stored in a column. Because the access frequency and the modification frequency of the cold data are lower, a compression algorithm with higher compression ratio can be used for encoding and compressing the data in each column, so that the occupation of the disk storage space is reduced, and the hardware storage cost of massive cold data is reduced. Therefore, the primary cold data subjected to format conversion is also required to be encoded and compressed according to the data type of the column, so as to obtain the secondary cold data. Because the inquiry frequency of the cold data is low, frequent modification is not needed, and the data are sequentially written in batches, the cold data can be stored by adopting a mechanical disk. The cold data and the hot data adopt distributed multi-copy storage, so that RAID storage is not required to be built for data backup. The format conversion mode comprises the following steps: parquet, and ORC, one of which may be optionally selected as a conversion of the cold data store. The encoding compression method includes: both LZO and Snappy may be selected from which one may be selected as a coding compression scheme for the cold data storage area. The partition storage mode comprises the following steps: three copies and erasure codes, one of which can be selected as a partition storage mode of the cold data storage area.
In embodiments of the present application, tertiary hot data is migrated to a cold data store, typically by archiving tertiary hot data per day to the cold data store for storage, typically archiving one tertiary hot data per day. When three-level thermal data archiving is performed for the first time, if historical data exists, N three-level thermal data partitioned by day can be archived at a time.
In a second aspect, the present application proposes a method for querying traffic data, as shown in fig. 4, where the traffic data is stored according to a method for storing traffic data as described above, and the method includes:
s201, receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
s202, inquiring the target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
Querying target traffic data from a target storage area by adopting a large-scale concurrent processing engine comprises the following steps: if the target storage area only comprises the thermal data storage area, acquiring target traffic data from the thermal data storage area by adopting a large-scale concurrent processing engine; if the target storage area is only included in the cold data storage area, acquiring target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine; if the target storage area comprises a hot data storage area and a cold data storage area, a large-scale concurrent processing engine is adopted to respectively acquire target hot data from the hot data storage area, acquire target cold data from the cold data storage area, and combine the target hot data and the target cold data as target traffic data.
An embodiment of the present application will be further described below, as shown in fig. 5.
The hot data storage area is distributed storage which adopts SSD solid state disk Kudu and the like and is suitable for HTAP processing mode, and the cold data storage area is distributed storage which adopts mechanical disk HDFS and the like and is suitable for OLAP query mode. The thermal data (primary thermal data, secondary thermal data and tertiary thermal data) are stored in Kudu and other distributed storage suitable for an HTAP processing mode, and SSD solid state disks are adopted for physical storage; cold data (primary cold data and secondary cold data) are stored in distributed storage suitable for an OLAP query mode, such as HDFS, and are physically stored by using a mechanical disk. The data query engine for querying the target traffic data by adopting the large-scale concurrent Processing engine adopts a large-scale concurrent Processing (MASSIVELY PARALLEL Processing, MPP) engine such as an Impala or Presto engine. Metadata is stored in Hive uniformly for storage. Hive is a database technology that can define databases and tables to analyze structured data. Portal data vehicle data, toll gate information, vehicle information, payment information, and time information may all be one field of metadata stored in Hive. And taking the time condition in the data query request as the judgment basis of the query. Specifically, a large-scale concurrent processing engine is adopted to acquire a storage area where target traffic data is located from time information in metadata stored in Hive as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area, the target traffic data is queried, after the target storage area is acquired, when the target storage area only comprises a hot data storage area range, a data query engine directly queries Kudu storage, acquires the target traffic data from the hot data, and returns the data. When the target storage area only comprises the cold data storage area range, the data query engine directly queries the data stored in the HDFS, acquires the target traffic data from the cold data, and returns the data. When the target storage area comprises a hot data storage area and a cold data storage area, the data query engine queries and acquires hot data and cold data from Kudu and HDFS simultaneously, and performs JOIN association operation on the queried hot data and cold data in the memory to obtain a result set, and the combined result set is used as return data. The big data storage and query technology used in the embodiment of the application can use but is not limited to an open source Hadoop big data ecological component commonly used in the industry.
In a third aspect, the present application proposes a system for storing traffic data, as shown in fig. 6, comprising:
The data module 101 is configured to obtain first-level heat data from the received real-time traffic data according to the first period; combining the plurality of primary thermal data according to a second period to obtain secondary thermal data, wherein the second period is a first multiple of the first period, and the number of temporary partitions corresponding to the combined primary thermal data is equal to the first multiple; combining the plurality of second-level thermal data according to a third period to obtain third-level thermal data, wherein the third period is a second multiple of the second period, and the number of first partitions corresponding to the combined second-level thermal data is equal to the second multiple; combining the plurality of three-level heat data according to a fourth period to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the combined three-level heat data is equal to the third multiple;
The data compression module 102 is configured to perform format conversion and compression on the first-stage cold data to obtain second-stage cold data;
the thermal data storage module 103 includes a temporary partition, a first partition, and a second partition for storing primary thermal data, secondary thermal data, and tertiary thermal data;
the cold data storage module 104 is used for storing primary cold data and secondary cold data.
In a fourth aspect, the present application proposes a system for querying traffic data, as shown in fig. 7, comprising:
the target storage area query module 201 is configured to receive a query request and metadata, query a storage area of target traffic data, and use the storage area as a target storage area, where the target storage area includes a hot data storage area and/or a cold data storage area;
The target traffic data query module 202 is configured to query the target traffic data from the target storage area using the large-scale concurrency processing engine.
The target traffic data query module comprises:
The data acquisition unit is used for acquiring target traffic data from the hot data storage area by adopting a large-scale concurrent processing engine according to the target storage area, or acquiring target traffic data from the cold data storage area by adopting the large-scale concurrent processing engine, or respectively acquiring target hot data from the hot data storage area and acquiring target cold data from the cold data storage area by adopting the large-scale concurrent processing engine;
and a data merging unit for merging the target hot data and the target cold data as target traffic data when the target storage area includes the hot data storage area and the cold data storage area.
According to the method, the first-stage cold data is obtained by sequentially combining the partitions in the hot data storage area according to the first period, the second period, the third period and the fourth period, and the first-stage cold data is moved to the cold data storage area, so that the hot data storage amount in the hot data storage area can be reduced, and the pressure of resources occupied by the hot data storage area is reduced; the format conversion and compression of the first-level cold data in the cold data storage area can reduce the cold data storage amount in the cold data storage area, thereby reducing the pressure of resources occupied by the cold data storage area; and judging the storage area of the target traffic data according to the query conditions and the metadata, and querying the target traffic data from the target storage area comprising the hot data storage area and/or the cold data storage area by using the large-scale concurrent processing engine, so that the query efficiency and the data query range can be improved. Compared with the existing data storage scheme, the embodiment of the application can reasonably divide the hot and cold time ranges of the data according to the service life cycle characteristics of the data, but not simply store the hot data for a period of time according to the timeliness of the data, and delete the cold and hot data directly. The hardware cost of the server used in the embodiment of the application is lower, and particularly, the storage cost is greatly reduced; the method supports simultaneous inquiry of cold data and hot data, and enlarges the effective inquiry range of the data; the method solves the problem of frequent writing data in a period of T+0 (real time) of a large data platform, and generates file merging pressure caused by a large number of small files.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for storing traffic data, comprising:
obtaining primary thermal data according to the received real-time traffic data in the first period, and storing the primary thermal data into a temporary partition in a thermal data storage area;
According to a second period, merging the first-level thermal data in the temporary partitions to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area, wherein the second period is a first multiple of the first period, and the number of the temporary partitions corresponding to the merged first-level thermal data is equal to the first multiple;
According to a third period, merging the second-level thermal data in the plurality of first partitions to obtain third-level thermal data, and transferring the third-level thermal data to a second partition in a thermal data storage area, wherein the third period is a second multiple of the second period, and the number of the first partitions corresponding to the merged second-level thermal data is equal to the second multiple;
According to a fourth period, merging the three-level heat data in the second partitions, transferring the three-level heat data into a cold data storage area to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are merged is equal to the third multiple;
And carrying out format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in the cold data storage area.
2. The method for storing traffic data according to claim 1, wherein the primary thermal data, the secondary thermal data, and the tertiary thermal data are each stored in the thermal data storage area in a storage format supporting HTAP processing mode.
3. The method for storing traffic data according to claim 1, wherein the primary cold data and the secondary cold data are each stored in the cold data storage area in a storage format supporting OLAP query mode.
4. The method for storing traffic data according to claim 2, wherein the thermal data storage area is a solid state disk.
5. A method for storing traffic data according to claim 3, wherein the cold data storage area is a mechanical hard disk.
6. A method for querying traffic data, wherein the traffic data is stored according to a method for storing traffic data as claimed in any one of claims 1-5, comprising:
Receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
And querying target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
7. The method for querying traffic data according to claim 6, wherein querying the target traffic data from the target storage area using a large-scale concurrency processing engine comprises:
If the target storage area only comprises the thermal data storage area, adopting a large-scale concurrency engine to process to acquire the target traffic data from the thermal data storage area;
If the target storage area only comprises the cold data storage area, acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine;
And if the target storage area comprises the hot data storage area and the cold data storage area, a large-scale concurrent processing engine is adopted to respectively acquire target hot data from the hot data storage area, acquire target cold data from the cold data storage area, and combine the target hot data and the target cold data as the target traffic data.
8. A system for storing traffic data, comprising:
The data module is used for obtaining primary heat data of the received real-time traffic data according to the first period; combining a plurality of primary thermal data according to a second period to obtain secondary thermal data, wherein the second period is a first multiple of the first period, and the number of temporary partitions corresponding to the combined primary thermal data is equal to the first multiple; combining a plurality of second-level thermal data according to a third period to obtain third-level thermal data, wherein the third period is a second multiple of the second period, and the number of first partitions corresponding to the combined second-level thermal data is equal to the second multiple; combining a plurality of three-level heat data according to a fourth period to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are combined is equal to the third multiple;
the data compression module is used for carrying out format conversion and compression on the primary cold data to obtain secondary cold data;
the thermal data storage module comprises a temporary partition, a first partition and a second partition, and is used for storing primary thermal data, secondary thermal data and tertiary thermal data;
and the cold data storage module is used for storing the primary cold data and the secondary cold data.
9. A system for querying traffic data, wherein the traffic data is stored in a system for storing traffic data according to claim 8, comprising:
The target storage area query module is used for receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
and the target traffic data query module is used for querying the target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
10. The system for querying traffic data according to claim 9, wherein said target traffic data querying module comprises:
The data acquisition unit is used for acquiring the target traffic data from the hot data storage area by adopting a large-scale concurrent processing engine according to the target storage area, or acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine, or acquiring target hot data from the hot data storage area by adopting a large-scale concurrent processing engine, or acquiring target cold data from the cold data storage area respectively;
and the data merging unit is used for merging the target hot data and the target cold data as the target traffic data when the target storage area comprises a hot data storage area and a cold data storage area.
CN202011631496.4A 2020-12-31 2020-12-31 Method and system for storing and inquiring traffic data Active CN112650453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011631496.4A CN112650453B (en) 2020-12-31 2020-12-31 Method and system for storing and inquiring traffic data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011631496.4A CN112650453B (en) 2020-12-31 2020-12-31 Method and system for storing and inquiring traffic data

Publications (2)

Publication Number Publication Date
CN112650453A CN112650453A (en) 2021-04-13
CN112650453B true CN112650453B (en) 2024-05-14

Family

ID=75366792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011631496.4A Active CN112650453B (en) 2020-12-31 2020-12-31 Method and system for storing and inquiring traffic data

Country Status (1)

Country Link
CN (1) CN112650453B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356225A (en) * 2021-12-17 2022-04-15 得一微电子股份有限公司 Data storage method and device of memory, terminal equipment and storage medium
CN115827653B (en) * 2022-11-25 2023-09-05 深圳计算科学研究院 Pure column type updating method and device for HTAP and mass data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400598A (en) * 2007-08-14 2013-11-20 三星电子株式会社 Solid state memory, computer system including same, and method of operating same
CN103942289A (en) * 2014-04-12 2014-07-23 广西师范大学 Memory caching method oriented to range querying on Hadoop
CN106934001A (en) * 2017-03-03 2017-07-07 广州天源迪科信息技术有限公司 Distributed quick inventory inquiry system and method
US9747202B1 (en) * 2013-03-14 2017-08-29 Sandisk Technologies Llc Storage module and method for identifying hot and cold data
CN108268217A (en) * 2018-01-10 2018-07-10 北京航天云路有限公司 A kind of bedding storage method based on the cold and hot classification of time series data
CN109033360A (en) * 2018-07-26 2018-12-18 腾讯科技(深圳)有限公司 A kind of data query method, apparatus, server and storage medium
CN109947373A (en) * 2019-03-28 2019-06-28 北京大道云行科技有限公司 Data processing method and device
CN110908608A (en) * 2019-11-22 2020-03-24 苏州浪潮智能科技有限公司 Storage space saving method and system
CN111475506A (en) * 2020-03-30 2020-07-31 广州虎牙科技有限公司 Data storage and query method, device, system, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150324447A1 (en) * 2014-05-08 2015-11-12 Altibase Corp. Hybrid database management system and method of managing tables therein
US20170017405A1 (en) * 2015-07-14 2017-01-19 HGST Netherlands B.V. Systems and methods for improving flash-oriented file system garbage collection
US20180136842A1 (en) * 2016-11-11 2018-05-17 Hewlett Packard Enterprise Development Lp Partition metadata for distributed data objects

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400598A (en) * 2007-08-14 2013-11-20 三星电子株式会社 Solid state memory, computer system including same, and method of operating same
US9747202B1 (en) * 2013-03-14 2017-08-29 Sandisk Technologies Llc Storage module and method for identifying hot and cold data
CN103942289A (en) * 2014-04-12 2014-07-23 广西师范大学 Memory caching method oriented to range querying on Hadoop
CN106934001A (en) * 2017-03-03 2017-07-07 广州天源迪科信息技术有限公司 Distributed quick inventory inquiry system and method
CN108268217A (en) * 2018-01-10 2018-07-10 北京航天云路有限公司 A kind of bedding storage method based on the cold and hot classification of time series data
CN109033360A (en) * 2018-07-26 2018-12-18 腾讯科技(深圳)有限公司 A kind of data query method, apparatus, server and storage medium
CN109947373A (en) * 2019-03-28 2019-06-28 北京大道云行科技有限公司 Data processing method and device
CN110908608A (en) * 2019-11-22 2020-03-24 苏州浪潮智能科技有限公司 Storage space saving method and system
CN111475506A (en) * 2020-03-30 2020-07-31 广州虎牙科技有限公司 Data storage and query method, device, system, equipment and storage medium

Also Published As

Publication number Publication date
CN112650453A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
US11080277B2 (en) Data set compression within a database system
CN111125089B (en) Time sequence data storage method, device, server and storage medium
CN102646130B (en) Method for storing and indexing mass historical data
CN112650453B (en) Method and system for storing and inquiring traffic data
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
JP6495568B2 (en) Method, computer readable storage medium and system for performing incremental SQL server database backup
CN102364474B (en) Metadata storage system for cluster file system and metadata management method
CN102375853A (en) Distributed database system, method for building index therein and query method
CN110727406B (en) Data storage scheduling method and device
CN103366015A (en) OLAP (on-line analytical processing) data storage and query method based on Hadoop
CN102663090A (en) Method and device for inquiry metadata
CN102890722A (en) Indexing method applied to time sequence historical database
CN104239377A (en) Platform-crossing data retrieval method and device
CN102750377A (en) Massive data storage and retrieval method
US12032581B2 (en) Processing variable-length fields via formatted record data
CN111752931B (en) Intelligent storage table implementation method and system for NEWSQL database management system
CN110866006A (en) Method and device for archiving expired data
CN102880615A (en) Data storage method and device
CN111104457A (en) Massive space-time data management method based on distributed database
US20210326320A1 (en) Data segment storing in a database system
CN102411632B (en) Chain table-based memory database page type storage method
CN102890719A (en) Method and device for fuzzy research of license plate numbers
Wang et al. PLSM: a highly efficient LSM-tree index supporting real-time big data analysis
CN102521256A (en) High-reliability data protection method of real-time/historical database
CN104408128A (en) Read optimization method for asynchronously updating indexes based on B+ tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant