CN112650453B - Method and system for storing and inquiring traffic data - Google Patents
Method and system for storing and inquiring traffic data Download PDFInfo
- Publication number
- CN112650453B CN112650453B CN202011631496.4A CN202011631496A CN112650453B CN 112650453 B CN112650453 B CN 112650453B CN 202011631496 A CN202011631496 A CN 202011631496A CN 112650453 B CN112650453 B CN 112650453B
- Authority
- CN
- China
- Prior art keywords
- data
- storage area
- target
- cold
- thermal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013500 data storage Methods 0.000 claims abstract description 141
- 238000005192 partition Methods 0.000 claims abstract description 86
- 238000006243 chemical reaction Methods 0.000 claims abstract description 14
- 238000007906 compression Methods 0.000 claims abstract description 14
- 230000006835 compression Effects 0.000 claims abstract description 14
- 239000007787 solid Substances 0.000 claims description 6
- 238000013144 data compression Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 241000282813 Aepyceros melampus Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0685—Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a method and a system for storing and inquiring traffic data, comprising the following steps: merging the received real-time traffic data according to the first period to obtain primary thermal data, and storing the primary thermal data into a temporary partition in a thermal data storage area; merging the temporary partitions according to the second period to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area; combining the first partitions according to the third period to obtain three-level thermal data, and storing the three-level thermal data into a second partition in a thermal data storage area; merging the plurality of second partitions according to the fourth period to obtain first-level cold data, and transferring the first-level cold data into a cold data storage area; and performing format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in a cold data storage area. Combining and moving the hot data in the hot data storage area to the cold data storage area, and reducing the pressure of resources occupied by the hot data storage area; and performing format conversion and compression on the first-stage cold data, and reducing the pressure of resources occupied by the cold data storage area.
Description
Technical Field
The application relates to the technical field of big data storage and query, in particular to a method and a system for storing and querying traffic data.
Background
The gate and portal passing record in the current traffic big data field is an important component of the traffic big data, and records the time and position information of the vehicles passing the gate and portal on the highway and at high speed. The collection equipment for the entire province or city will typically be very numerous and the number of vehicles passing through the bayonets and the portals per day will also be very large. Traffic big data platforms storing data typically store a large amount of vehicle crossing gate and portal data information, and therefore the requirements for storage resources are relatively high, resulting in higher costs. And meanwhile, the data volume is larger and larger with the accumulation of time, so that the query and calculation performance of the data are reduced. But the business analysis value of this information stored on traffic big data platforms will gradually decrease over time.
In view of the foregoing, there is a need for a method and system for storing and querying traffic data that reduces storage costs and increases query efficiency.
Disclosure of Invention
In order to solve the problems, the application provides a method and a system for storing and inquiring traffic data.
In a first aspect, the present application proposes a method for storing traffic data, comprising:
obtaining primary thermal data according to the received real-time traffic data in the first period, and storing the primary thermal data into a temporary partition in a thermal data storage area;
According to a second period, merging the first-level thermal data in the temporary partitions to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area, wherein the second period is a first multiple of the first period, and the number of the temporary partitions corresponding to the merged first-level thermal data is equal to the first multiple;
According to a third period, merging the second-level thermal data in the plurality of first partitions to obtain third-level thermal data, and transferring the third-level thermal data to a second partition in a thermal data storage area, wherein the third period is a second multiple of the second period, and the number of the first partitions corresponding to the merged second-level thermal data is equal to the second multiple;
According to a fourth period, merging the three-level heat data in the second partitions, transferring the three-level heat data into a cold data storage area to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are merged is equal to the third multiple;
And carrying out format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in the cold data storage area.
Preferably, the primary thermal data, the secondary thermal data and the tertiary thermal data are stored in the thermal data storage area in a storage format supporting HTAP processing mode.
Preferably, the primary cold data and the secondary cold data are stored in the cold data storage area in a storage format supporting OLAP query mode.
Preferably, the thermal data storage area is a solid state disk.
Preferably, the cold data storage area is a mechanical hard disk.
In a second aspect, the present application proposes a method for querying traffic data, the traffic data being stored according to the above method for storing traffic data, comprising:
Receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
And querying target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
Preferably, querying the target traffic data from the target storage area using a large-scale concurrency processing engine includes:
if the target storage area only comprises the thermal data storage area, adopting a large-scale concurrency engine to process and acquire the target traffic data from the thermal data storage area;
if the target storage area only comprises the cold data storage area, acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine;
and if the target storage area comprises the hot data storage area and the cold data storage area, a large-scale concurrent processing engine is adopted to respectively acquire target hot data from the hot data storage area, acquire target cold data from the cold data storage area, and combine the target hot data and the target cold data as the target traffic data.
In a third aspect, the present application provides a system for storing traffic data, comprising:
The data module is used for obtaining primary heat data of the received real-time traffic data according to the first period; combining a plurality of primary thermal data according to a second period to obtain secondary thermal data, wherein the second period is a first multiple of the first period, and the number of temporary partitions corresponding to the combined primary thermal data is equal to the first multiple; combining a plurality of second-level thermal data according to a third period to obtain third-level thermal data, wherein the third period is a second multiple of the second period, and the number of first partitions corresponding to the combined second-level thermal data is equal to the second multiple; combining a plurality of three-level heat data according to a fourth period to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are combined is equal to the third multiple;
the data compression module is used for carrying out format conversion and compression on the primary cold data to obtain secondary cold data;
the thermal data storage module comprises a temporary partition, a first partition and a second partition, and is used for storing primary thermal data, secondary thermal data and tertiary thermal data;
and the cold data storage module is used for storing the primary cold data and the secondary cold data.
In a fourth aspect, the present application proposes a system for querying traffic data, comprising:
The target storage area query module is used for receiving a query request, querying a storage area of the target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
And the target traffic data query module is used for querying target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
Preferably, the target traffic data query module includes:
The data acquisition unit is used for acquiring the target traffic data from the hot data storage area by adopting a large-scale concurrent processing engine according to the target storage area, or acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine, or acquiring target hot data from the hot data storage area by adopting a large-scale concurrent processing engine, or acquiring target cold data from the cold data storage area respectively;
and the data merging unit is used for merging the target hot data and the target cold data as the target traffic data when the target storage area comprises a hot data storage area and a cold data storage area.
The application has the advantages that: by sequentially merging all the partitions in the thermal data storage area according to the first period, the second period, the third period and the fourth period to obtain primary cold data, and moving the primary cold data to the cold data storage area, the thermal data storage amount in the thermal data storage area can be reduced, so that the pressure of resources occupied by the thermal data storage area is reduced; the format conversion and compression of the first-level cold data in the cold data storage area can reduce the cold data storage amount in the cold data storage area, thereby reducing the pressure of resources occupied by the cold data storage area; and judging a target storage area of the target traffic data according to the query conditions and the metadata, and querying the target traffic data from the target storage area comprising the hot data storage area and/or the cold data storage area by using the large-scale concurrent processing engine, so that the query efficiency can be improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic diagram of steps of a method for storing traffic data provided by the present application;
FIG. 2 is a schematic illustration of a thermal data store for a method of storing traffic data provided by the present application;
FIG. 3 is a schematic diagram of a cold data storage area of a method for storing traffic data provided by the present application;
FIG. 4 is a schematic diagram of steps of a method for querying traffic data provided by the present application;
FIG. 5 is a flow chart of a method for querying traffic data provided by the present application;
FIG. 6 is a schematic diagram of a system for storing traffic data provided by the present application;
Fig. 7 is a schematic diagram of a system for querying traffic data provided by the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In a first aspect, according to an embodiment of the present application, a method for storing traffic data is provided for an on-board module, as shown in fig. 1, including:
s101, obtaining primary heat data according to the received real-time traffic data in a first period, and storing the primary heat data into a temporary partition in a heat data storage area;
s102, merging the first-level thermal data in the temporary partitions according to a second period to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area, wherein the second period is a first multiple of the first period, and the number of the temporary partitions corresponding to the merged first-level thermal data is equal to the first multiple;
S103, merging the second-level thermal data in the plurality of first partitions according to a third period to obtain third-level thermal data, and transferring the third-level thermal data to the second partitions in the thermal data storage area, wherein the third period is a second multiple of the second period, and the number of the first partitions corresponding to the merged second-level thermal data is equal to the second multiple;
s104, merging three-level heat data in a plurality of second partitions according to a fourth period, and transferring the three-level heat data into a cold data storage area to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the merged three-level heat data is equal to the third multiple;
s105, performing format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in a cold data storage area.
The primary thermal data, the secondary thermal data and the tertiary thermal data are all stored in a thermal data storage area in a storage format supporting an HTAP processing mode. The primary cold data and the secondary cold data are stored in a cold data storage area by adopting a storage format supporting an OLAP query mode. The thermal data storage area is a solid state disk. The cold data storage area is a mechanical hard disk.
Because the data volume is large for the traffic data of the gate and the portal in the traffic data scene and the real-time writing throughput of the data is large, the traditional large data platform is more suitable for processing the large data application of the T+1 period, and the support for the real-time writing of the data is poor. For the big data scene of real-time writing (T+0), the requirement on hardware is generally higher, and larger memory configuration and SSD fixed hard disk storage are required. According to the embodiment of the application, different storage strategies and storage media are adopted for the cold and hot data, and the cold and hot data storage format can be compatible with various MPP engines, so that the associated query of the cold and hot data is realized.
The following is a further explanation of the embodiments of the present application.
Firstly, according to the service life cycle characteristics of data, the hot and cold time ranges of the data are reasonably divided, and the hot data and the cold data are divided. Here, the first cycle is 30 minutes, the second cycle is 1 hour, the third cycle is 24 hours, and the fourth cycle is 6 months.
And the thermal data is written into the thermal data storage program in real time by the data acquisition module through the vehicle passing gate and portal data, and the storage medium is suitable for using the SSD solid state disk.
As shown in fig. 2, since hot data is written in real time and the writing amount is large, a large number of small files are usually generated on a large data platform during hot data writing (one small file is usually generated per writing batch). A large number of small files can affect disk IO read efficiency and create significant stress on the disk during querying and computing, thus requiring the merging of hot data, however file merging can create system chunking. Therefore, it is necessary to partition the hot data according to the data characteristics. And dividing the thermal data into two parts, namely a temporary partition and a formal partition according to the time frequency of data updating for the data of the vehicle passing through the gate and the portal. The temporary partition number and the formal partition number of the hot data are set. The formal partition may be divided into two parts, a first partition and a second partition, respectively. The first partition is used for storing the secondary thermal data combined by hours; the second partition is used to store the three-level thermal data that is merged by day. Wherein the temporary partition may be used to store primary thermal data that is merged for a short period of time, such as primary thermal data that is merged at 30 minutes.
In the temporary partition, data storage is performed for one partition every 30 minutes, namely the temporary partition comprises a plurality of 30-minute partitions, each 30-minute partition stores different primary thermal data, the partitions of the past 2 30 minutes are combined once every hour, the combined primary thermal data form secondary thermal data, the secondary thermal data obtained after combination are migrated to the first partition, and the 2 30-minute partitions and the corresponding primary thermal data in the temporary partition are deleted.
In the formal division, the secondary heat data corresponding to the hour division of the last 48 hours and the secondary heat data corresponding to the day division are reserved. The first partition includes a plurality of 1-hour partitions, each 1-hour partition storing different secondary thermal data. The second partition includes a plurality of 24-hour partitions, each 24-hour partition storing different tertiary thermal data. In the first partition, the hour partitions of the past 24 hours are merged once every 24 hours, the merged secondary thermal data form secondary thermal data, and the tertiary thermal data obtained after merging are migrated to the second partition for storing by day. Partitions that remain in the first partition for more than 48 hours and their corresponding secondary thermal data are deleted every 48 hours. Third-level thermal data in the second partition accumulated for more than 6 months is archived into the cold data storage area by default daily, and the per-day partition and its corresponding third-level thermal data stored (archived) into the cold data storage area are deleted.
And for the three-level thermal data in the second partition, merging the second partition according to the service life cycle of the data, converting the merged three-level thermal data into first-level cold data, archiving the first-level cold data into a cold data storage area, and deleting the second partition corresponding to the transferred three-level thermal data in the second partition of the corresponding thermal data storage area.
As shown in fig. 3, since the cold data is stored as history data and the data is written to the cold data storage in batch, the frequency of changing the data is low, so the hot data and the cold data are usually in different storage formats, and therefore, the combined three-level hot data transferred as the first-level cold data into the cold data storage area needs to be converted into the storage format corresponding to the cold data storage area, so that the three-level hot data meets the storage mode of the cold data, and therefore, the first-level cold data after format conversion is stored in a column. Because the access frequency and the modification frequency of the cold data are lower, a compression algorithm with higher compression ratio can be used for encoding and compressing the data in each column, so that the occupation of the disk storage space is reduced, and the hardware storage cost of massive cold data is reduced. Therefore, the primary cold data subjected to format conversion is also required to be encoded and compressed according to the data type of the column, so as to obtain the secondary cold data. Because the inquiry frequency of the cold data is low, frequent modification is not needed, and the data are sequentially written in batches, the cold data can be stored by adopting a mechanical disk. The cold data and the hot data adopt distributed multi-copy storage, so that RAID storage is not required to be built for data backup. The format conversion mode comprises the following steps: parquet, and ORC, one of which may be optionally selected as a conversion of the cold data store. The encoding compression method includes: both LZO and Snappy may be selected from which one may be selected as a coding compression scheme for the cold data storage area. The partition storage mode comprises the following steps: three copies and erasure codes, one of which can be selected as a partition storage mode of the cold data storage area.
In embodiments of the present application, tertiary hot data is migrated to a cold data store, typically by archiving tertiary hot data per day to the cold data store for storage, typically archiving one tertiary hot data per day. When three-level thermal data archiving is performed for the first time, if historical data exists, N three-level thermal data partitioned by day can be archived at a time.
In a second aspect, the present application proposes a method for querying traffic data, as shown in fig. 4, where the traffic data is stored according to a method for storing traffic data as described above, and the method includes:
s201, receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
s202, inquiring the target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
Querying target traffic data from a target storage area by adopting a large-scale concurrent processing engine comprises the following steps: if the target storage area only comprises the thermal data storage area, acquiring target traffic data from the thermal data storage area by adopting a large-scale concurrent processing engine; if the target storage area is only included in the cold data storage area, acquiring target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine; if the target storage area comprises a hot data storage area and a cold data storage area, a large-scale concurrent processing engine is adopted to respectively acquire target hot data from the hot data storage area, acquire target cold data from the cold data storage area, and combine the target hot data and the target cold data as target traffic data.
An embodiment of the present application will be further described below, as shown in fig. 5.
The hot data storage area is distributed storage which adopts SSD solid state disk Kudu and the like and is suitable for HTAP processing mode, and the cold data storage area is distributed storage which adopts mechanical disk HDFS and the like and is suitable for OLAP query mode. The thermal data (primary thermal data, secondary thermal data and tertiary thermal data) are stored in Kudu and other distributed storage suitable for an HTAP processing mode, and SSD solid state disks are adopted for physical storage; cold data (primary cold data and secondary cold data) are stored in distributed storage suitable for an OLAP query mode, such as HDFS, and are physically stored by using a mechanical disk. The data query engine for querying the target traffic data by adopting the large-scale concurrent Processing engine adopts a large-scale concurrent Processing (MASSIVELY PARALLEL Processing, MPP) engine such as an Impala or Presto engine. Metadata is stored in Hive uniformly for storage. Hive is a database technology that can define databases and tables to analyze structured data. Portal data vehicle data, toll gate information, vehicle information, payment information, and time information may all be one field of metadata stored in Hive. And taking the time condition in the data query request as the judgment basis of the query. Specifically, a large-scale concurrent processing engine is adopted to acquire a storage area where target traffic data is located from time information in metadata stored in Hive as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area, the target traffic data is queried, after the target storage area is acquired, when the target storage area only comprises a hot data storage area range, a data query engine directly queries Kudu storage, acquires the target traffic data from the hot data, and returns the data. When the target storage area only comprises the cold data storage area range, the data query engine directly queries the data stored in the HDFS, acquires the target traffic data from the cold data, and returns the data. When the target storage area comprises a hot data storage area and a cold data storage area, the data query engine queries and acquires hot data and cold data from Kudu and HDFS simultaneously, and performs JOIN association operation on the queried hot data and cold data in the memory to obtain a result set, and the combined result set is used as return data. The big data storage and query technology used in the embodiment of the application can use but is not limited to an open source Hadoop big data ecological component commonly used in the industry.
In a third aspect, the present application proposes a system for storing traffic data, as shown in fig. 6, comprising:
The data module 101 is configured to obtain first-level heat data from the received real-time traffic data according to the first period; combining the plurality of primary thermal data according to a second period to obtain secondary thermal data, wherein the second period is a first multiple of the first period, and the number of temporary partitions corresponding to the combined primary thermal data is equal to the first multiple; combining the plurality of second-level thermal data according to a third period to obtain third-level thermal data, wherein the third period is a second multiple of the second period, and the number of first partitions corresponding to the combined second-level thermal data is equal to the second multiple; combining the plurality of three-level heat data according to a fourth period to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the combined three-level heat data is equal to the third multiple;
The data compression module 102 is configured to perform format conversion and compression on the first-stage cold data to obtain second-stage cold data;
the thermal data storage module 103 includes a temporary partition, a first partition, and a second partition for storing primary thermal data, secondary thermal data, and tertiary thermal data;
the cold data storage module 104 is used for storing primary cold data and secondary cold data.
In a fourth aspect, the present application proposes a system for querying traffic data, as shown in fig. 7, comprising:
the target storage area query module 201 is configured to receive a query request and metadata, query a storage area of target traffic data, and use the storage area as a target storage area, where the target storage area includes a hot data storage area and/or a cold data storage area;
The target traffic data query module 202 is configured to query the target traffic data from the target storage area using the large-scale concurrency processing engine.
The target traffic data query module comprises:
The data acquisition unit is used for acquiring target traffic data from the hot data storage area by adopting a large-scale concurrent processing engine according to the target storage area, or acquiring target traffic data from the cold data storage area by adopting the large-scale concurrent processing engine, or respectively acquiring target hot data from the hot data storage area and acquiring target cold data from the cold data storage area by adopting the large-scale concurrent processing engine;
and a data merging unit for merging the target hot data and the target cold data as target traffic data when the target storage area includes the hot data storage area and the cold data storage area.
According to the method, the first-stage cold data is obtained by sequentially combining the partitions in the hot data storage area according to the first period, the second period, the third period and the fourth period, and the first-stage cold data is moved to the cold data storage area, so that the hot data storage amount in the hot data storage area can be reduced, and the pressure of resources occupied by the hot data storage area is reduced; the format conversion and compression of the first-level cold data in the cold data storage area can reduce the cold data storage amount in the cold data storage area, thereby reducing the pressure of resources occupied by the cold data storage area; and judging the storage area of the target traffic data according to the query conditions and the metadata, and querying the target traffic data from the target storage area comprising the hot data storage area and/or the cold data storage area by using the large-scale concurrent processing engine, so that the query efficiency and the data query range can be improved. Compared with the existing data storage scheme, the embodiment of the application can reasonably divide the hot and cold time ranges of the data according to the service life cycle characteristics of the data, but not simply store the hot data for a period of time according to the timeliness of the data, and delete the cold and hot data directly. The hardware cost of the server used in the embodiment of the application is lower, and particularly, the storage cost is greatly reduced; the method supports simultaneous inquiry of cold data and hot data, and enlarges the effective inquiry range of the data; the method solves the problem of frequent writing data in a period of T+0 (real time) of a large data platform, and generates file merging pressure caused by a large number of small files.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method for storing traffic data, comprising:
obtaining primary thermal data according to the received real-time traffic data in the first period, and storing the primary thermal data into a temporary partition in a thermal data storage area;
According to a second period, merging the first-level thermal data in the temporary partitions to obtain second-level thermal data, and transferring the second-level thermal data to a first partition in a thermal data storage area, wherein the second period is a first multiple of the first period, and the number of the temporary partitions corresponding to the merged first-level thermal data is equal to the first multiple;
According to a third period, merging the second-level thermal data in the plurality of first partitions to obtain third-level thermal data, and transferring the third-level thermal data to a second partition in a thermal data storage area, wherein the third period is a second multiple of the second period, and the number of the first partitions corresponding to the merged second-level thermal data is equal to the second multiple;
According to a fourth period, merging the three-level heat data in the second partitions, transferring the three-level heat data into a cold data storage area to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are merged is equal to the third multiple;
And carrying out format conversion and compression on the primary cold data to obtain secondary cold data, and storing the secondary cold data in the cold data storage area.
2. The method for storing traffic data according to claim 1, wherein the primary thermal data, the secondary thermal data, and the tertiary thermal data are each stored in the thermal data storage area in a storage format supporting HTAP processing mode.
3. The method for storing traffic data according to claim 1, wherein the primary cold data and the secondary cold data are each stored in the cold data storage area in a storage format supporting OLAP query mode.
4. The method for storing traffic data according to claim 2, wherein the thermal data storage area is a solid state disk.
5. A method for storing traffic data according to claim 3, wherein the cold data storage area is a mechanical hard disk.
6. A method for querying traffic data, wherein the traffic data is stored according to a method for storing traffic data as claimed in any one of claims 1-5, comprising:
Receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
And querying target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
7. The method for querying traffic data according to claim 6, wherein querying the target traffic data from the target storage area using a large-scale concurrency processing engine comprises:
If the target storage area only comprises the thermal data storage area, adopting a large-scale concurrency engine to process to acquire the target traffic data from the thermal data storage area;
If the target storage area only comprises the cold data storage area, acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine;
And if the target storage area comprises the hot data storage area and the cold data storage area, a large-scale concurrent processing engine is adopted to respectively acquire target hot data from the hot data storage area, acquire target cold data from the cold data storage area, and combine the target hot data and the target cold data as the target traffic data.
8. A system for storing traffic data, comprising:
The data module is used for obtaining primary heat data of the received real-time traffic data according to the first period; combining a plurality of primary thermal data according to a second period to obtain secondary thermal data, wherein the second period is a first multiple of the first period, and the number of temporary partitions corresponding to the combined primary thermal data is equal to the first multiple; combining a plurality of second-level thermal data according to a third period to obtain third-level thermal data, wherein the third period is a second multiple of the second period, and the number of first partitions corresponding to the combined second-level thermal data is equal to the second multiple; combining a plurality of three-level heat data according to a fourth period to obtain first-level cold data, wherein the fourth period is a third multiple of a third period, and the number of the three-level heat data which are combined is equal to the third multiple;
the data compression module is used for carrying out format conversion and compression on the primary cold data to obtain secondary cold data;
the thermal data storage module comprises a temporary partition, a first partition and a second partition, and is used for storing primary thermal data, secondary thermal data and tertiary thermal data;
and the cold data storage module is used for storing the primary cold data and the secondary cold data.
9. A system for querying traffic data, wherein the traffic data is stored in a system for storing traffic data according to claim 8, comprising:
The target storage area query module is used for receiving a query request, querying a storage area of target traffic data according to the query request and metadata, and taking the storage area as a target storage area, wherein the target storage area comprises a hot data storage area and/or a cold data storage area;
and the target traffic data query module is used for querying the target traffic data from the target storage area by adopting a large-scale concurrent processing engine.
10. The system for querying traffic data according to claim 9, wherein said target traffic data querying module comprises:
The data acquisition unit is used for acquiring the target traffic data from the hot data storage area by adopting a large-scale concurrent processing engine according to the target storage area, or acquiring the target traffic data from the cold data storage area by adopting a large-scale concurrent processing engine, or acquiring target hot data from the hot data storage area by adopting a large-scale concurrent processing engine, or acquiring target cold data from the cold data storage area respectively;
and the data merging unit is used for merging the target hot data and the target cold data as the target traffic data when the target storage area comprises a hot data storage area and a cold data storage area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011631496.4A CN112650453B (en) | 2020-12-31 | 2020-12-31 | Method and system for storing and inquiring traffic data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011631496.4A CN112650453B (en) | 2020-12-31 | 2020-12-31 | Method and system for storing and inquiring traffic data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112650453A CN112650453A (en) | 2021-04-13 |
CN112650453B true CN112650453B (en) | 2024-05-14 |
Family
ID=75366792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011631496.4A Active CN112650453B (en) | 2020-12-31 | 2020-12-31 | Method and system for storing and inquiring traffic data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112650453B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114356225A (en) * | 2021-12-17 | 2022-04-15 | 得一微电子股份有限公司 | Data storage method and device of memory, terminal equipment and storage medium |
CN115827653B (en) * | 2022-11-25 | 2023-09-05 | 深圳计算科学研究院 | Pure column type updating method and device for HTAP and mass data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103400598A (en) * | 2007-08-14 | 2013-11-20 | 三星电子株式会社 | Solid state memory, computer system including same, and method of operating same |
CN103942289A (en) * | 2014-04-12 | 2014-07-23 | 广西师范大学 | Memory caching method oriented to range querying on Hadoop |
CN106934001A (en) * | 2017-03-03 | 2017-07-07 | 广州天源迪科信息技术有限公司 | Distributed quick inventory inquiry system and method |
US9747202B1 (en) * | 2013-03-14 | 2017-08-29 | Sandisk Technologies Llc | Storage module and method for identifying hot and cold data |
CN108268217A (en) * | 2018-01-10 | 2018-07-10 | 北京航天云路有限公司 | A kind of bedding storage method based on the cold and hot classification of time series data |
CN109033360A (en) * | 2018-07-26 | 2018-12-18 | 腾讯科技(深圳)有限公司 | A kind of data query method, apparatus, server and storage medium |
CN109947373A (en) * | 2019-03-28 | 2019-06-28 | 北京大道云行科技有限公司 | Data processing method and device |
CN110908608A (en) * | 2019-11-22 | 2020-03-24 | 苏州浪潮智能科技有限公司 | Storage space saving method and system |
CN111475506A (en) * | 2020-03-30 | 2020-07-31 | 广州虎牙科技有限公司 | Data storage and query method, device, system, equipment and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150324447A1 (en) * | 2014-05-08 | 2015-11-12 | Altibase Corp. | Hybrid database management system and method of managing tables therein |
US20170017405A1 (en) * | 2015-07-14 | 2017-01-19 | HGST Netherlands B.V. | Systems and methods for improving flash-oriented file system garbage collection |
US20180136842A1 (en) * | 2016-11-11 | 2018-05-17 | Hewlett Packard Enterprise Development Lp | Partition metadata for distributed data objects |
-
2020
- 2020-12-31 CN CN202011631496.4A patent/CN112650453B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103400598A (en) * | 2007-08-14 | 2013-11-20 | 三星电子株式会社 | Solid state memory, computer system including same, and method of operating same |
US9747202B1 (en) * | 2013-03-14 | 2017-08-29 | Sandisk Technologies Llc | Storage module and method for identifying hot and cold data |
CN103942289A (en) * | 2014-04-12 | 2014-07-23 | 广西师范大学 | Memory caching method oriented to range querying on Hadoop |
CN106934001A (en) * | 2017-03-03 | 2017-07-07 | 广州天源迪科信息技术有限公司 | Distributed quick inventory inquiry system and method |
CN108268217A (en) * | 2018-01-10 | 2018-07-10 | 北京航天云路有限公司 | A kind of bedding storage method based on the cold and hot classification of time series data |
CN109033360A (en) * | 2018-07-26 | 2018-12-18 | 腾讯科技(深圳)有限公司 | A kind of data query method, apparatus, server and storage medium |
CN109947373A (en) * | 2019-03-28 | 2019-06-28 | 北京大道云行科技有限公司 | Data processing method and device |
CN110908608A (en) * | 2019-11-22 | 2020-03-24 | 苏州浪潮智能科技有限公司 | Storage space saving method and system |
CN111475506A (en) * | 2020-03-30 | 2020-07-31 | 广州虎牙科技有限公司 | Data storage and query method, device, system, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112650453A (en) | 2021-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11080277B2 (en) | Data set compression within a database system | |
CN111125089B (en) | Time sequence data storage method, device, server and storage medium | |
CN102646130B (en) | Method for storing and indexing mass historical data | |
CN112650453B (en) | Method and system for storing and inquiring traffic data | |
CN107423422B (en) | Spatial data distributed storage and search method and system based on grid | |
JP6495568B2 (en) | Method, computer readable storage medium and system for performing incremental SQL server database backup | |
CN102364474B (en) | Metadata storage system for cluster file system and metadata management method | |
CN102375853A (en) | Distributed database system, method for building index therein and query method | |
CN110727406B (en) | Data storage scheduling method and device | |
CN103366015A (en) | OLAP (on-line analytical processing) data storage and query method based on Hadoop | |
CN102663090A (en) | Method and device for inquiry metadata | |
CN102890722A (en) | Indexing method applied to time sequence historical database | |
CN104239377A (en) | Platform-crossing data retrieval method and device | |
CN102750377A (en) | Massive data storage and retrieval method | |
US12032581B2 (en) | Processing variable-length fields via formatted record data | |
CN111752931B (en) | Intelligent storage table implementation method and system for NEWSQL database management system | |
CN110866006A (en) | Method and device for archiving expired data | |
CN102880615A (en) | Data storage method and device | |
CN111104457A (en) | Massive space-time data management method based on distributed database | |
US20210326320A1 (en) | Data segment storing in a database system | |
CN102411632B (en) | Chain table-based memory database page type storage method | |
CN102890719A (en) | Method and device for fuzzy research of license plate numbers | |
Wang et al. | PLSM: a highly efficient LSM-tree index supporting real-time big data analysis | |
CN102521256A (en) | High-reliability data protection method of real-time/historical database | |
CN104408128A (en) | Read optimization method for asynchronously updating indexes based on B+ tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |