CN111694693A - Data stream storage method and device and computer storage medium - Google Patents
Data stream storage method and device and computer storage medium Download PDFInfo
- Publication number
- CN111694693A CN111694693A CN201910184336.0A CN201910184336A CN111694693A CN 111694693 A CN111694693 A CN 111694693A CN 201910184336 A CN201910184336 A CN 201910184336A CN 111694693 A CN111694693 A CN 111694693A
- Authority
- CN
- China
- Prior art keywords
- data
- data stream
- partition
- storage
- keywords
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
Abstract
A data stream storage method, apparatus and computer storage medium, the method comprising: acquiring a data stream; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and when the data storage fails, the data can be easily partitioned according to the keywords of the data, so that the corresponding data can be recovered.
Description
Technical Field
The present invention relates to the field of data processing, and in particular, to a data stream storage method and apparatus, and a computer storage medium.
Background
In data processing, there is a data stream consisting of a plurality of sets of data generated in continuous large quantities, which data stream continuously brings along the data. Data will be lost if the data in the data stream is not processed or stored in a timely manner.
In the prior art, a scheme for processing the data stream is to directly store data in the data stream.
However, with the above scheme, when a storage failure occurs to data, it is difficult to determine the location of data storage due to the excessive data volume of the data stream, and data recovery is affected.
Disclosure of Invention
The invention solves the technical problem of difficult data recovery.
To solve the foregoing technical problem, an embodiment of the present invention provides a data stream storage method, including: acquiring a data stream; determining keywords of data in the data stream; distributing the data to partitions according to the keywords of the data; and storing the data in each partition.
Optionally, a data stream consisting of a plurality of pieces of data is acquired by Kafka.
Optionally, a Hash algorithm is used to calculate the keywords of each piece of data.
Optionally, according to the number of preset partitions, calculating the partition sequence number corresponding to the data through the keywords of the data by using a Hash modulo algorithm.
Optionally, serializing the data in each partition; and storing the serialized data.
Optionally, snapshot storage is performed on data in each partition.
The present invention also provides a data stream storage apparatus, comprising: an acquisition unit configured to acquire a data stream; a determining unit for determining a keyword of data in the data stream; the distribution unit is used for distributing the data to the partitions according to the keywords of the data; and the storage unit is used for storing the data in each partition.
Optionally, the obtaining unit is further configured to obtain a data stream composed of a plurality of pieces of data by Kafka.
Optionally, the determining unit is further configured to calculate a keyword of each piece of data by using a Hash algorithm.
Optionally, the allocating unit is further configured to calculate, according to the number of preset partitions, partition sequence numbers corresponding to the data by using a Hash modulo algorithm through the keywords of the data.
Optionally, the storage unit is further configured to serialize data in each partition; and storing the serialized data.
Optionally, the storage unit is further configured to perform snapshot storage on data in each partition.
The present invention also provides a computer-readable storage medium, on which computer instructions are stored, where the computer instructions are a non-volatile storage medium or a non-transitory storage medium, and when executed, the computer instructions perform the steps of any one of the above data stream storage methods.
The invention also provides a data stream storage device, which comprises a memory and a processor, wherein the memory is stored with computer instructions, and the processor executes the steps of any one of the data stream storage methods when the computer instructions are executed.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
by acquiring a data stream; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and when the data storage fails, the data can be easily partitioned according to the keywords of the data, so that the corresponding data can be recovered.
Drawings
Fig. 1 is a schematic flow chart of a data stream storage method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data stream storage device according to an embodiment of the present invention.
Detailed Description
In the prior art, a scheme for processing a data stream is to directly store data in the data stream. However, with the above scheme, when a storage failure occurs to data, it is difficult to determine the location of data storage due to the excessive data volume of the data stream, and data recovery is affected.
In the embodiment of the invention, data flow is obtained; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and when the data storage fails, the data can be easily partitioned according to the keywords of the data, so that the corresponding data can be recovered.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, a flow chart of a data stream storage method according to an embodiment of the present invention is schematically shown, and the following detailed description is made with reference to specific steps.
Step S101, a data stream is acquired.
In a particular implementation, the plurality of data generated constitutes a data stream as the data continues to be generated. Typically, the amount of data within a data stream is large. If the data in the data stream is classified, processed and analyzed in real time, the burden on computational power is large. However, if no processing is done on the data stream, data is lost. Therefore, in the embodiment of the invention, the data stream can be acquired and stored, and the loss of the data stream is avoided.
In the embodiment of the invention, a data stream consisting of a plurality of pieces of data can be acquired through Kafka.
In one implementation, Kafka is a high-throughput, distributed data platform that can be used to process all data generated by various data sources. The Kafka is used as a platform for acquiring the data stream, and the Kafka has the advantages that the data throughput is high, a large amount of continuously generated data can be handled, and the condition of data loss or system failure is avoided.
Step S102, determining keywords of data in the data stream.
In particular implementations, a key to data in a data stream may be used as an identification of the data for distinguishing the data from other data. Determining the keywords of the data in the data stream can improve the efficiency in the data query process and avoid the situation that the data storage position is difficult to determine.
In a specific implementation, the keyword of the data may be identification information for characterizing the corresponding data; or the identification information with distinguishing function can be realized by establishing the association relation with the data. In specific application, the user can make corresponding settings according to the actual application scenario.
In the embodiment of the invention, the keywords of each piece of data can be calculated by using a Hash algorithm.
In a specific implementation, the Hash algorithm can be used to compress data of any length into a data digest of a fixed length, and the data digests corresponding to different data are different. Therefore, the data abstract calculated by the Hash algorithm can be used for quickly identifying data, the efficiency in the data query process is improved, and the situation that the data storage position is difficult to determine is avoided.
And step S103, distributing the data to partitions according to the keywords of the data.
In specific implementation, because the data volume of the data stream is usually huge, the data in the data stream can be allocated to different partitions, and then the data of different partitions are processed, so that the data processing pressure of each partition node can be reduced, and a system fault is avoided.
In particular implementations, a key to the data may be used as a criterion to assign the data to different partitions. Therefore, in the data query process, the partition of the data can be determined according to the keywords of the data, and then the data query is further performed, so that the efficiency in the data query process can be improved, and the situation that the data storage position is difficult to determine is avoided.
In a specific implementation, a partition may be a data processing node at a software level or a data processor at a hardware level.
In the embodiment of the invention, according to the number of the preset partitions, a Hash modular algorithm can be used for calculating the partition serial number corresponding to the data through the keywords of the data.
In a specific implementation, the Hash touch algorithm may be configured to establish a mapping relationship between data and a preset partition according to a Hash value of the data, that is, a data digest calculated by the Hash algorithm, where the partition is represented as a partition number in the mapping relationship, that is, the Hash value of the data and the partition number establish the mapping relationship.
Step S104, storing the data in each partition.
In the embodiment of the present invention, before storing the data in each partition, the data in each partition may be serialized.
In particular implementations, Serialization (Serialization) is the process of converting state information of an object into a form that can be stored or transmitted. Therefore, the data is serialized so as to facilitate the storage of the data.
In a specific implementation, after the data is stored in a serialized manner, the serialized data can be read by deserialization when the data is read.
In the embodiment of the present invention, when the data in each partition is stored, Snapshot (Snapshot) storage may be performed on the data in each partition.
In specific implementation, the snapshot can realize rapid data storage to deal with a large amount of continuously generated data, and the situation that data is lost due to untimely processing is avoided.
From the above, by acquiring a data stream; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and meanwhile, when data storage fails, the data can be easily partitioned according to the keyword positioning data of the data, so that the corresponding data can be recovered.
Referring to fig. 2, a schematic structural diagram of a data stream storage device 20 according to an embodiment of the present invention is shown, which specifically includes: an acquisition unit 201 for acquiring a data stream; a determining unit 202, configured to determine a keyword of data in a data stream; the allocation unit 203 is used for allocating the data to the partitions according to the keywords of the data; the storage unit 204 is configured to store data in each partition.
In this embodiment of the present invention, the obtaining unit 201 may be further configured to obtain a data stream composed of a plurality of pieces of data by Kafka.
In this embodiment of the present invention, the determining unit 202 may be further configured to calculate a keyword of each piece of data by using a Hash algorithm.
In this embodiment of the present invention, the allocating unit 203 may be further configured to calculate, according to the number of preset partitions, partition sequence numbers corresponding to the data by using a Hash modulo algorithm through the keywords of the data.
In this embodiment of the present invention, the storage unit 204 may be further configured to serialize data in each partition; and storing the serialized data.
In this embodiment of the present invention, the storage unit 204 may be further configured to perform snapshot storage on data in each partition.
The present invention also provides a computer-readable storage medium, on which computer instructions are stored, where the computer instructions are a non-volatile storage medium or a non-transitory storage medium, and when executed, the computer instructions perform the steps of the data stream storage method provided by the embodiment of the present invention.
The invention also provides a data stream storage device, which comprises a memory and a processor, wherein the memory is stored with computer instructions, and the processor executes the steps of the data stream storage method provided by the embodiment of the invention when the computer instructions are executed.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by instructing the relevant hardware through a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (14)
1. A method for storing a data stream, comprising:
acquiring a data stream;
determining keywords of data in the data stream;
distributing the data to partitions according to the keywords of the data;
and storing the data in each partition.
2. The data stream storage method of claim 1, wherein the obtaining the data stream comprises:
a data stream consisting of a plurality of pieces of data is acquired by Kafka.
3. The data stream storage method of claim 1, wherein determining the key of the data in the data stream comprises:
and calculating the key words of each piece of data by using a Hash algorithm.
4. The data stream storage method according to claim 3, wherein the allocating data to partitions according to the keywords of the data comprises:
and calculating the partition serial number corresponding to the data through the keywords of the data by using a Hash modular algorithm according to the number of preset partitions.
5. The data stream storage method according to claim 1, wherein the storing the data in each partition includes:
serializing the data in each partition;
and storing the serialized data.
6. The data stream storage method according to claim 1, wherein the storing the data in each partition includes:
and carrying out snapshot storage on the data in each partition.
7. A data stream storage device, comprising:
an acquisition unit configured to acquire a data stream;
a determining unit for determining a keyword of data in the data stream;
the distribution unit is used for distributing the data to the partitions according to the keywords of the data;
and the storage unit is used for storing the data in each partition.
8. The data stream storage device according to claim 7, wherein the obtaining unit is further configured to obtain a data stream composed of a plurality of pieces of data by Kafka.
9. The data stream storage device of claim 7, wherein the determining unit is further configured to calculate a key for each piece of data using a Hash algorithm.
10. The data stream storage device according to claim 9, wherein the allocating unit is further configured to calculate, according to the number of preset partitions, a partition number corresponding to the data by using a Hash modulo algorithm through a keyword of the data.
11. The data stream storage device of claim 7, wherein the storage unit is further configured to serialize data in each partition; and storing the serialized data.
12. The data stream storage device according to claim 7, wherein the storage unit is further configured to perform snapshot storage on data in each partition.
13. A computer readable storage medium having stored thereon computer instructions, the computer readable storage medium being a non-volatile storage medium or a non-transitory storage medium, wherein the computer instructions when executed perform the steps of the data stream storage method according to any one of claims 1 to 6.
14. A data stream storage device comprising a memory and a processor, the memory having stored thereon computer instructions, wherein the processor performs the steps of the data stream storage method of any one of claims 1 to 6 when the computer instructions are executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184336.0A CN111694693A (en) | 2019-03-12 | 2019-03-12 | Data stream storage method and device and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910184336.0A CN111694693A (en) | 2019-03-12 | 2019-03-12 | Data stream storage method and device and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111694693A true CN111694693A (en) | 2020-09-22 |
Family
ID=72474820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910184336.0A Withdrawn CN111694693A (en) | 2019-03-12 | 2019-03-12 | Data stream storage method and device and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111694693A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136114A (en) * | 2011-11-30 | 2013-06-05 | 华为技术有限公司 | Storage method and storage device |
CN103229151A (en) * | 2012-12-27 | 2013-07-31 | 华为技术有限公司 | Partition extension method and device |
CN103548022A (en) * | 2011-03-28 | 2014-01-29 | 思杰系统有限公司 | Systems and methods of UTF-8 pattern matching |
CN103838770A (en) * | 2012-11-26 | 2014-06-04 | 中国移动通信集团北京有限公司 | Logic data partition method and system |
CN104133661A (en) * | 2014-07-30 | 2014-11-05 | 西安电子科技大学 | Multi-core parallel hash partitioning optimizing method based on column storage |
CN106488055A (en) * | 2015-08-28 | 2017-03-08 | 华为软件技术有限公司 | Calling list rearrangement method, back end equipment and routing node device |
CN107015872A (en) * | 2016-12-09 | 2017-08-04 | 上海壹账通金融科技有限公司 | The processing method and processing device of monitoring data |
CN107633001A (en) * | 2017-08-03 | 2018-01-26 | 北京空间科技信息研究所 | Hash partition optimization method and device |
US20180129579A1 (en) * | 2016-11-10 | 2018-05-10 | Nec Laboratories America, Inc. | Systems and Methods with a Realtime Log Analysis Framework |
WO2019019056A1 (en) * | 2017-07-26 | 2019-01-31 | 杭州复杂美科技有限公司 | Method for frontal machine to participate in block chain consensus |
-
2019
- 2019-03-12 CN CN201910184336.0A patent/CN111694693A/en not_active Withdrawn
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103548022A (en) * | 2011-03-28 | 2014-01-29 | 思杰系统有限公司 | Systems and methods of UTF-8 pattern matching |
CN103136114A (en) * | 2011-11-30 | 2013-06-05 | 华为技术有限公司 | Storage method and storage device |
CN103838770A (en) * | 2012-11-26 | 2014-06-04 | 中国移动通信集团北京有限公司 | Logic data partition method and system |
CN103229151A (en) * | 2012-12-27 | 2013-07-31 | 华为技术有限公司 | Partition extension method and device |
CN104133661A (en) * | 2014-07-30 | 2014-11-05 | 西安电子科技大学 | Multi-core parallel hash partitioning optimizing method based on column storage |
CN106488055A (en) * | 2015-08-28 | 2017-03-08 | 华为软件技术有限公司 | Calling list rearrangement method, back end equipment and routing node device |
US20180129579A1 (en) * | 2016-11-10 | 2018-05-10 | Nec Laboratories America, Inc. | Systems and Methods with a Realtime Log Analysis Framework |
CN107015872A (en) * | 2016-12-09 | 2017-08-04 | 上海壹账通金融科技有限公司 | The processing method and processing device of monitoring data |
WO2019019056A1 (en) * | 2017-07-26 | 2019-01-31 | 杭州复杂美科技有限公司 | Method for frontal machine to participate in block chain consensus |
CN107633001A (en) * | 2017-08-03 | 2018-01-26 | 北京空间科技信息研究所 | Hash partition optimization method and device |
Non-Patent Citations (1)
Title |
---|
刘春颖;张晓芬;张悦;: "数据库集群研究中的一致性哈希算法分析" * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10176208B2 (en) | Processing time series data from multiple sensors | |
CN108829610B (en) | Memory management method and device in neural network forward computing process | |
CN107832406B (en) | Method, device, equipment and storage medium for removing duplicate entries of mass log data | |
US11347787B2 (en) | Image retrieval method and apparatus, system, server, and storage medium | |
CN106980623B (en) | Data model determination method and device | |
CN110147407B (en) | Data processing method and device and database management server | |
CN105517644B (en) | Data partitioning method and equipment | |
CN106874281B (en) | Method and device for realizing database read-write separation | |
CN109376196B (en) | Method and device for batch synchronization of redo logs | |
US8898677B2 (en) | Data arrangement calculating system, data arrangement calculating method, master unit and data arranging method | |
US11567940B1 (en) | Cache-aware system and method for identifying matching portions of two sets of data in a multiprocessor system | |
CN109063005B (en) | Data migration method and system, storage medium and electronic device | |
WO2014021978A4 (en) | Aggregating data in a mediation system | |
CN110647531A (en) | Data synchronization method, device, equipment and computer readable storage medium | |
CN111966631A (en) | Mirror image file generation method, system, equipment and medium capable of being rapidly distributed | |
CN107656796B (en) | Virtual machine cold migration method, system and equipment | |
CN111522811A (en) | Database processing method and device, storage medium and terminal | |
CN113268328A (en) | Batch processing method and device, computer equipment and storage medium | |
CN111026736B (en) | Data blood margin management method and device and data blood margin analysis method and device | |
CN110222046B (en) | List data processing method, device, server and storage medium | |
CN110765125B (en) | Method and device for storing data | |
CN111694693A (en) | Data stream storage method and device and computer storage medium | |
CN110019357B (en) | Database query script generation method and device | |
CN110851437A (en) | Storage method, device and equipment | |
CN112579591B (en) | Data verification method, device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200922 |