CN111694693A - Data stream storage method and device and computer storage medium - Google Patents

Data stream storage method and device and computer storage medium Download PDF

Info

Publication number
CN111694693A
CN111694693A CN201910184336.0A CN201910184336A CN111694693A CN 111694693 A CN111694693 A CN 111694693A CN 201910184336 A CN201910184336 A CN 201910184336A CN 111694693 A CN111694693 A CN 111694693A
Authority
CN
China
Prior art keywords
data
data stream
partition
storage
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910184336.0A
Other languages
Chinese (zh)
Inventor
唐英荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jingzan Rongxuan Technology Co ltd
Original Assignee
Shanghai Jingzan Rongxuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jingzan Rongxuan Technology Co ltd filed Critical Shanghai Jingzan Rongxuan Technology Co ltd
Priority to CN201910184336.0A priority Critical patent/CN111694693A/en
Publication of CN111694693A publication Critical patent/CN111694693A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data

Abstract

A data stream storage method, apparatus and computer storage medium, the method comprising: acquiring a data stream; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and when the data storage fails, the data can be easily partitioned according to the keywords of the data, so that the corresponding data can be recovered.

Description

Data stream storage method and device and computer storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a data stream storage method and apparatus, and a computer storage medium.
Background
In data processing, there is a data stream consisting of a plurality of sets of data generated in continuous large quantities, which data stream continuously brings along the data. Data will be lost if the data in the data stream is not processed or stored in a timely manner.
In the prior art, a scheme for processing the data stream is to directly store data in the data stream.
However, with the above scheme, when a storage failure occurs to data, it is difficult to determine the location of data storage due to the excessive data volume of the data stream, and data recovery is affected.
Disclosure of Invention
The invention solves the technical problem of difficult data recovery.
To solve the foregoing technical problem, an embodiment of the present invention provides a data stream storage method, including: acquiring a data stream; determining keywords of data in the data stream; distributing the data to partitions according to the keywords of the data; and storing the data in each partition.
Optionally, a data stream consisting of a plurality of pieces of data is acquired by Kafka.
Optionally, a Hash algorithm is used to calculate the keywords of each piece of data.
Optionally, according to the number of preset partitions, calculating the partition sequence number corresponding to the data through the keywords of the data by using a Hash modulo algorithm.
Optionally, serializing the data in each partition; and storing the serialized data.
Optionally, snapshot storage is performed on data in each partition.
The present invention also provides a data stream storage apparatus, comprising: an acquisition unit configured to acquire a data stream; a determining unit for determining a keyword of data in the data stream; the distribution unit is used for distributing the data to the partitions according to the keywords of the data; and the storage unit is used for storing the data in each partition.
Optionally, the obtaining unit is further configured to obtain a data stream composed of a plurality of pieces of data by Kafka.
Optionally, the determining unit is further configured to calculate a keyword of each piece of data by using a Hash algorithm.
Optionally, the allocating unit is further configured to calculate, according to the number of preset partitions, partition sequence numbers corresponding to the data by using a Hash modulo algorithm through the keywords of the data.
Optionally, the storage unit is further configured to serialize data in each partition; and storing the serialized data.
Optionally, the storage unit is further configured to perform snapshot storage on data in each partition.
The present invention also provides a computer-readable storage medium, on which computer instructions are stored, where the computer instructions are a non-volatile storage medium or a non-transitory storage medium, and when executed, the computer instructions perform the steps of any one of the above data stream storage methods.
The invention also provides a data stream storage device, which comprises a memory and a processor, wherein the memory is stored with computer instructions, and the processor executes the steps of any one of the data stream storage methods when the computer instructions are executed.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
by acquiring a data stream; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and when the data storage fails, the data can be easily partitioned according to the keywords of the data, so that the corresponding data can be recovered.
Drawings
Fig. 1 is a schematic flow chart of a data stream storage method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data stream storage device according to an embodiment of the present invention.
Detailed Description
In the prior art, a scheme for processing a data stream is to directly store data in the data stream. However, with the above scheme, when a storage failure occurs to data, it is difficult to determine the location of data storage due to the excessive data volume of the data stream, and data recovery is affected.
In the embodiment of the invention, data flow is obtained; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and when the data storage fails, the data can be easily partitioned according to the keywords of the data, so that the corresponding data can be recovered.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, a flow chart of a data stream storage method according to an embodiment of the present invention is schematically shown, and the following detailed description is made with reference to specific steps.
Step S101, a data stream is acquired.
In a particular implementation, the plurality of data generated constitutes a data stream as the data continues to be generated. Typically, the amount of data within a data stream is large. If the data in the data stream is classified, processed and analyzed in real time, the burden on computational power is large. However, if no processing is done on the data stream, data is lost. Therefore, in the embodiment of the invention, the data stream can be acquired and stored, and the loss of the data stream is avoided.
In the embodiment of the invention, a data stream consisting of a plurality of pieces of data can be acquired through Kafka.
In one implementation, Kafka is a high-throughput, distributed data platform that can be used to process all data generated by various data sources. The Kafka is used as a platform for acquiring the data stream, and the Kafka has the advantages that the data throughput is high, a large amount of continuously generated data can be handled, and the condition of data loss or system failure is avoided.
Step S102, determining keywords of data in the data stream.
In particular implementations, a key to data in a data stream may be used as an identification of the data for distinguishing the data from other data. Determining the keywords of the data in the data stream can improve the efficiency in the data query process and avoid the situation that the data storage position is difficult to determine.
In a specific implementation, the keyword of the data may be identification information for characterizing the corresponding data; or the identification information with distinguishing function can be realized by establishing the association relation with the data. In specific application, the user can make corresponding settings according to the actual application scenario.
In the embodiment of the invention, the keywords of each piece of data can be calculated by using a Hash algorithm.
In a specific implementation, the Hash algorithm can be used to compress data of any length into a data digest of a fixed length, and the data digests corresponding to different data are different. Therefore, the data abstract calculated by the Hash algorithm can be used for quickly identifying data, the efficiency in the data query process is improved, and the situation that the data storage position is difficult to determine is avoided.
And step S103, distributing the data to partitions according to the keywords of the data.
In specific implementation, because the data volume of the data stream is usually huge, the data in the data stream can be allocated to different partitions, and then the data of different partitions are processed, so that the data processing pressure of each partition node can be reduced, and a system fault is avoided.
In particular implementations, a key to the data may be used as a criterion to assign the data to different partitions. Therefore, in the data query process, the partition of the data can be determined according to the keywords of the data, and then the data query is further performed, so that the efficiency in the data query process can be improved, and the situation that the data storage position is difficult to determine is avoided.
In a specific implementation, a partition may be a data processing node at a software level or a data processor at a hardware level.
In the embodiment of the invention, according to the number of the preset partitions, a Hash modular algorithm can be used for calculating the partition serial number corresponding to the data through the keywords of the data.
In a specific implementation, the Hash touch algorithm may be configured to establish a mapping relationship between data and a preset partition according to a Hash value of the data, that is, a data digest calculated by the Hash algorithm, where the partition is represented as a partition number in the mapping relationship, that is, the Hash value of the data and the partition number establish the mapping relationship.
Step S104, storing the data in each partition.
In the embodiment of the present invention, before storing the data in each partition, the data in each partition may be serialized.
In particular implementations, Serialization (Serialization) is the process of converting state information of an object into a form that can be stored or transmitted. Therefore, the data is serialized so as to facilitate the storage of the data.
In a specific implementation, after the data is stored in a serialized manner, the serialized data can be read by deserialization when the data is read.
In the embodiment of the present invention, when the data in each partition is stored, Snapshot (Snapshot) storage may be performed on the data in each partition.
In specific implementation, the snapshot can realize rapid data storage to deal with a large amount of continuously generated data, and the situation that data is lost due to untimely processing is avoided.
From the above, by acquiring a data stream; determining keywords of data in the data stream; distributing the data into partitions according to the keywords of the data; the data in each zone is stored. By adopting the scheme, the situation that data is lost due to untimely processing can be avoided, and meanwhile, when data storage fails, the data can be easily partitioned according to the keyword positioning data of the data, so that the corresponding data can be recovered.
Referring to fig. 2, a schematic structural diagram of a data stream storage device 20 according to an embodiment of the present invention is shown, which specifically includes: an acquisition unit 201 for acquiring a data stream; a determining unit 202, configured to determine a keyword of data in a data stream; the allocation unit 203 is used for allocating the data to the partitions according to the keywords of the data; the storage unit 204 is configured to store data in each partition.
In this embodiment of the present invention, the obtaining unit 201 may be further configured to obtain a data stream composed of a plurality of pieces of data by Kafka.
In this embodiment of the present invention, the determining unit 202 may be further configured to calculate a keyword of each piece of data by using a Hash algorithm.
In this embodiment of the present invention, the allocating unit 203 may be further configured to calculate, according to the number of preset partitions, partition sequence numbers corresponding to the data by using a Hash modulo algorithm through the keywords of the data.
In this embodiment of the present invention, the storage unit 204 may be further configured to serialize data in each partition; and storing the serialized data.
In this embodiment of the present invention, the storage unit 204 may be further configured to perform snapshot storage on data in each partition.
The present invention also provides a computer-readable storage medium, on which computer instructions are stored, where the computer instructions are a non-volatile storage medium or a non-transitory storage medium, and when executed, the computer instructions perform the steps of the data stream storage method provided by the embodiment of the present invention.
The invention also provides a data stream storage device, which comprises a memory and a processor, wherein the memory is stored with computer instructions, and the processor executes the steps of the data stream storage method provided by the embodiment of the invention when the computer instructions are executed.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by instructing the relevant hardware through a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (14)

1. A method for storing a data stream, comprising:
acquiring a data stream;
determining keywords of data in the data stream;
distributing the data to partitions according to the keywords of the data;
and storing the data in each partition.
2. The data stream storage method of claim 1, wherein the obtaining the data stream comprises:
a data stream consisting of a plurality of pieces of data is acquired by Kafka.
3. The data stream storage method of claim 1, wherein determining the key of the data in the data stream comprises:
and calculating the key words of each piece of data by using a Hash algorithm.
4. The data stream storage method according to claim 3, wherein the allocating data to partitions according to the keywords of the data comprises:
and calculating the partition serial number corresponding to the data through the keywords of the data by using a Hash modular algorithm according to the number of preset partitions.
5. The data stream storage method according to claim 1, wherein the storing the data in each partition includes:
serializing the data in each partition;
and storing the serialized data.
6. The data stream storage method according to claim 1, wherein the storing the data in each partition includes:
and carrying out snapshot storage on the data in each partition.
7. A data stream storage device, comprising:
an acquisition unit configured to acquire a data stream;
a determining unit for determining a keyword of data in the data stream;
the distribution unit is used for distributing the data to the partitions according to the keywords of the data;
and the storage unit is used for storing the data in each partition.
8. The data stream storage device according to claim 7, wherein the obtaining unit is further configured to obtain a data stream composed of a plurality of pieces of data by Kafka.
9. The data stream storage device of claim 7, wherein the determining unit is further configured to calculate a key for each piece of data using a Hash algorithm.
10. The data stream storage device according to claim 9, wherein the allocating unit is further configured to calculate, according to the number of preset partitions, a partition number corresponding to the data by using a Hash modulo algorithm through a keyword of the data.
11. The data stream storage device of claim 7, wherein the storage unit is further configured to serialize data in each partition; and storing the serialized data.
12. The data stream storage device according to claim 7, wherein the storage unit is further configured to perform snapshot storage on data in each partition.
13. A computer readable storage medium having stored thereon computer instructions, the computer readable storage medium being a non-volatile storage medium or a non-transitory storage medium, wherein the computer instructions when executed perform the steps of the data stream storage method according to any one of claims 1 to 6.
14. A data stream storage device comprising a memory and a processor, the memory having stored thereon computer instructions, wherein the processor performs the steps of the data stream storage method of any one of claims 1 to 6 when the computer instructions are executed.
CN201910184336.0A 2019-03-12 2019-03-12 Data stream storage method and device and computer storage medium Withdrawn CN111694693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910184336.0A CN111694693A (en) 2019-03-12 2019-03-12 Data stream storage method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910184336.0A CN111694693A (en) 2019-03-12 2019-03-12 Data stream storage method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN111694693A true CN111694693A (en) 2020-09-22

Family

ID=72474820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910184336.0A Withdrawn CN111694693A (en) 2019-03-12 2019-03-12 Data stream storage method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN111694693A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136114A (en) * 2011-11-30 2013-06-05 华为技术有限公司 Storage method and storage device
CN103229151A (en) * 2012-12-27 2013-07-31 华为技术有限公司 Partition extension method and device
CN103548022A (en) * 2011-03-28 2014-01-29 思杰系统有限公司 Systems and methods of UTF-8 pattern matching
CN103838770A (en) * 2012-11-26 2014-06-04 中国移动通信集团北京有限公司 Logic data partition method and system
CN104133661A (en) * 2014-07-30 2014-11-05 西安电子科技大学 Multi-core parallel hash partitioning optimizing method based on column storage
CN106488055A (en) * 2015-08-28 2017-03-08 华为软件技术有限公司 Calling list rearrangement method, back end equipment and routing node device
CN107015872A (en) * 2016-12-09 2017-08-04 上海壹账通金融科技有限公司 The processing method and processing device of monitoring data
CN107633001A (en) * 2017-08-03 2018-01-26 北京空间科技信息研究所 Hash partition optimization method and device
US20180129579A1 (en) * 2016-11-10 2018-05-10 Nec Laboratories America, Inc. Systems and Methods with a Realtime Log Analysis Framework
WO2019019056A1 (en) * 2017-07-26 2019-01-31 杭州复杂美科技有限公司 Method for frontal machine to participate in block chain consensus

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103548022A (en) * 2011-03-28 2014-01-29 思杰系统有限公司 Systems and methods of UTF-8 pattern matching
CN103136114A (en) * 2011-11-30 2013-06-05 华为技术有限公司 Storage method and storage device
CN103838770A (en) * 2012-11-26 2014-06-04 中国移动通信集团北京有限公司 Logic data partition method and system
CN103229151A (en) * 2012-12-27 2013-07-31 华为技术有限公司 Partition extension method and device
CN104133661A (en) * 2014-07-30 2014-11-05 西安电子科技大学 Multi-core parallel hash partitioning optimizing method based on column storage
CN106488055A (en) * 2015-08-28 2017-03-08 华为软件技术有限公司 Calling list rearrangement method, back end equipment and routing node device
US20180129579A1 (en) * 2016-11-10 2018-05-10 Nec Laboratories America, Inc. Systems and Methods with a Realtime Log Analysis Framework
CN107015872A (en) * 2016-12-09 2017-08-04 上海壹账通金融科技有限公司 The processing method and processing device of monitoring data
WO2019019056A1 (en) * 2017-07-26 2019-01-31 杭州复杂美科技有限公司 Method for frontal machine to participate in block chain consensus
CN107633001A (en) * 2017-08-03 2018-01-26 北京空间科技信息研究所 Hash partition optimization method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘春颖;张晓芬;张悦;: "数据库集群研究中的一致性哈希算法分析" *

Similar Documents

Publication Publication Date Title
US10176208B2 (en) Processing time series data from multiple sensors
CN108829610B (en) Memory management method and device in neural network forward computing process
CN107832406B (en) Method, device, equipment and storage medium for removing duplicate entries of mass log data
US11347787B2 (en) Image retrieval method and apparatus, system, server, and storage medium
CN106980623B (en) Data model determination method and device
CN110147407B (en) Data processing method and device and database management server
CN105517644B (en) Data partitioning method and equipment
CN106874281B (en) Method and device for realizing database read-write separation
CN109376196B (en) Method and device for batch synchronization of redo logs
US8898677B2 (en) Data arrangement calculating system, data arrangement calculating method, master unit and data arranging method
US11567940B1 (en) Cache-aware system and method for identifying matching portions of two sets of data in a multiprocessor system
CN109063005B (en) Data migration method and system, storage medium and electronic device
WO2014021978A4 (en) Aggregating data in a mediation system
CN110647531A (en) Data synchronization method, device, equipment and computer readable storage medium
CN111966631A (en) Mirror image file generation method, system, equipment and medium capable of being rapidly distributed
CN107656796B (en) Virtual machine cold migration method, system and equipment
CN111522811A (en) Database processing method and device, storage medium and terminal
CN113268328A (en) Batch processing method and device, computer equipment and storage medium
CN111026736B (en) Data blood margin management method and device and data blood margin analysis method and device
CN110222046B (en) List data processing method, device, server and storage medium
CN110765125B (en) Method and device for storing data
CN111694693A (en) Data stream storage method and device and computer storage medium
CN110019357B (en) Database query script generation method and device
CN110851437A (en) Storage method, device and equipment
CN112579591B (en) Data verification method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200922