CN117632029A - Data storage method and device of kafka - Google Patents

Data storage method and device of kafka Download PDF

Info

Publication number
CN117632029A
CN117632029A CN202311660517.9A CN202311660517A CN117632029A CN 117632029 A CN117632029 A CN 117632029A CN 202311660517 A CN202311660517 A CN 202311660517A CN 117632029 A CN117632029 A CN 117632029A
Authority
CN
China
Prior art keywords
data
partition
information
written
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311660517.9A
Other languages
Chinese (zh)
Inventor
李博
赖鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202311660517.9A priority Critical patent/CN117632029A/en
Publication of CN117632029A publication Critical patent/CN117632029A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a data storage method and device of kafka, and relates to the technical field of data storage, wherein the method comprises the following steps: dividing a server disk into two disk partitions; the first disk partition is used for storing topic information and partitional information of kafka, and the second disk partition is used for storing data in a bare disk mode; each partition in the first disk partition stores description information, wherein the description information comprises position information and storage state information of recordbach; ordering all partitions according to the position information of the recordbach to form a partitions sequence; and determining a proper part from the part sequence, and writing the data into a storage block corresponding to the proper part. The invention can improve the data reading and writing performance of kafka.

Description

Data storage method and device of kafka
Technical Field
The invention relates to the technical field of data storage, in particular to a data storage method and device of kafka.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In kafka, data is divided in units of topic, and data of different topics belong to different logic units, and data is finally stored in different physical units. In order to improve concurrency, the data of the same topic is divided into a series of partition partitions, and the data of the same topic in kafka is written into different partitions. For each topic's part, the data information of this file is described by a series of files.
Since kafka is a high-speed messaging system that accesses data, improving performance as much as possible is a goal that it continues to pursue, and therefore improving data storage performance in critical paths necessarily improves overall performance of the system. In the current storage mode, the continuous reading and writing of the large-block data is carried out by the high dependence of the local file system, the reading and writing performance of the local file system is limited by the access efficiency of the local file system, namely, the performance loss of the file system reduces the storage performance of the kafka data, and particularly, the processing of the large-block data by the file system, the processing of some technologies such as metadata, journaling, file layout, data recovery and the like of the file system can have different effects on the reading and writing performance of the kafka.
Disclosure of Invention
The embodiment of the invention provides a data storage method of kafka, which is used for improving the data reading and writing performance of the kafka, and comprises the following steps:
dividing a server disk into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode;
receiving configuration information, wherein the configuration information is used for creating a topic and a partitionof the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition;
according to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition;
ordering all partitions according to the position information of the recordmatch corresponding to the partitions to form a partitions sequence;
receiving data to be written; the data to be written carries attribute information;
determining a proper part of the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; the recordbase corresponding to the proper partition is used for storing data to be written;
and writing the data to be written into the recordbatt corresponding to the proper partition in the second disk partition.
The embodiment of the invention also provides a data storage device of kafka, which is used for improving the data reading and writing performance of kafka, and comprises the following components:
the disk dividing module is used for dividing a server disk into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode;
the disk partition processing module is used for receiving configuration information, wherein the configuration information is used for creating topic and partitionof topic in a first disk partition and dividing a storage block recordmatch in a second disk partition; according to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition;
the data writing module is used for ordering all the partitionings according to the position information of the recordbatt corresponding to the partitionings to form a partitioning sequence; receiving data to be written; determining a proper part in the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; writing the data to be written into a recordmatch corresponding to the proper partition in the second disk partition; the data to be written carries attribute information; and the recordbar corresponding to the proper partition is used for storing the data to be written.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the data storage method of kafka when executing the computer program.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described data storage method of kafka.
The embodiment of the invention also provides a computer program product, which comprises a computer program, and the computer program realizes the data storage method of kafka when being executed by a processor.
In the embodiment of the invention, a server disk is divided into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode; receiving configuration information, wherein the configuration information is used for creating a topic and a partitionof the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; according to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition; ordering all partitions according to the position information of the recordmatch corresponding to the partitions to form a partitions sequence; receiving data to be written; the data to be written carries attribute information; determining a proper part in the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; the recordbase corresponding to the proper partition is used for storing data to be written; and writing the data to be written into the recordbatt corresponding to the proper partition in the second disk partition. In the embodiment of the invention, the server disk is divided into two parts, one part is used for storing the topic information of kafka and the partition partitioning information of topic, and the other part is used for directly storing data in a bare disk mode.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flow chart of a data storage method of kafka in an embodiment of the invention;
FIG. 2 is a block diagram of a data storage method of kafka according to an embodiment of the present invention;
FIG. 3 is a block diagram of a data storage method of kafka according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a data storage device of kafka in an embodiment of the invention;
fig. 5 is a schematic diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
The data acquisition, storage, use, processing and the like in the technical scheme meet the relevant regulations of national laws and regulations.
The applicant finds that the current storage mode of kafka is highly dependent on a local file system to perform continuous read-write of large-block data, and the read-write performance of the storage mode is limited by the access efficiency of the local file system, namely, the performance loss of the file system reduces the storage performance of the kafka data, specifically, the file system processes large-block data, and some technical processes such as metadata, journals, file layout, data recovery and the like of the file system can have different effects on the read-write performance of the kafka. To this end, the applicant has proposed a data storage method of kafka.
Fig. 1 is a flow chart of a data storage method of kafka according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 101, dividing a server disk into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode;
102, receiving configuration information, wherein the configuration information is used for creating a topic in a first disk partition and a partition of the topic, and dividing a storage block recordmatch in a second disk partition;
step 103, creating a topic and a partition of the topic in a first disk partition according to the configuration information, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition;
step 104, ordering all partitions according to the position information of the recordbach corresponding to the partitions to form a partitions sequence;
step 105, receiving data to be written; the data to be written carries attribute information;
step 106, determining a proper part in the part sequence according to the attribute information carried by the data to be written and the storage state information of the recordbase corresponding to each part in the part sequence; the recordbase corresponding to the proper partition is used for storing data to be written;
and 107, writing the data to be written into the recordbase corresponding to the proper partition in the second disk partition.
As can be seen from the flow shown in FIG. 1, in the embodiment of the invention, the server disk is divided into two parts, one part is used for storing the topic information of kafka and the partition partitioninformation of topic, and the other part is used for directly storing data in a bare disk mode.
The data storage method of kafka in the embodiment of the present invention is explained in detail below.
The data storage method of kafka in the embodiment of the invention is improved by means of a distributed technology aiming at the existing storage mode of kakfa, and is specially designed according to the storage characteristics of kafka. When the method is implemented, the same disk division instruction can be adopted for all servers in the distributed system, the distributed system is divided into two disk partitions, the distributed system comprises a plurality of servers, and the servers comprise servers for running services, servers for implementing the data storage method of kafka in the embodiment of the invention, equipment controlled by a developer and capable of being operated remotely, and the like.
And then, receiving configuration information, wherein the configuration information can be remotely controlled and sent by a user. The configuration information is mainly used for creating topic and partitioning of topic in a first disk partition of the server, and dividing a storage block recordmatch in a second disk partition of the server.
According to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, and the description information includes location information of a recordmatch corresponding to the partition and storage state information of the recordmatch corresponding to the partition.
FIG. 2 is a schematic diagram of an embodiment of a method for storing data in kafka according to the present invention, where, as shown in FIG. 2, a first disk partition stores data in the form of a rocksdb database, and a second disk partition is a bare disk. The first disk partition uses a locksdb database to store the topic information and the partitioning information. The second disk partition is completely managed by the database, and the data is directly stored in a bare disk mode.
The information held by the locksdb database is typically metadata structural information, including but not limited to: description information of the description information topinfo, partation of topic, datainfo information applied for each part, location information and storage state information of the recordbase used in each datainfo, index information of each datainfo, segment information, and the like. The description information of each part is stored in a bitmap data structure.
When the second disk partitions the storage block recordbach, the bare disk divides the disk into segments blocks with fixed sizes according to the description information of the partition, and each segment block is uniformly divided into recordbach blocks with fixed sizes. When writing to bare disk, datainfo is filled with the size of one recordbach at a time. Therefore, the management cost of the metadata of the bare disk is reduced, and the performance is improved.
After the server disk is ready, operations such as reading, writing, deleting and the like of data are performed based on the storage mode of kafka.
When writing data, ordering all partitions according to the position information of the recordbatt corresponding to the partitions to form a partitions sequence; receiving data to be written, wherein the data to be written carries attribute information, and the data to be written can be written into a cache at the moment; determining a proper part in the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; and writing the data to be written in the cache into the recordbase corresponding to the proper partition in the second disk partition.
In the embodiment of the invention, the corresponding relation or the association relation between the attribute information of the written data and the storage state information of the recordbach can be pre-established, and a proper part is determined from a part sequence according to the pre-established corresponding relation or association relation, and then the data is written. Compared with the prior art, the method is more flexible, extensible and efficient.
In an embodiment, the attribute information carried by the data to be written may include a data size of the data to be written, and the storage state information of the recordbach corresponding to the partition may include a remaining storage space size of the recordbach corresponding to the partition;
according to the attribute information carried by the data to be written and the storage state information of the recordbase corresponding to each part in the part sequence, determining an appropriate part in the part sequence may include:
determining the first part in the part sequence;
when the data size of the data to be written is smaller than or equal to the residual storage space size of the recordpatch corresponding to the first part, determining the first part as a proper part;
when the data size of the data to be written is larger than the residual storage space size of the recovery match corresponding to the first part of the partition, the residual storage space size of the recovery match corresponding to the partition in the partition sequence is sequentially determined, when the residual storage space size of the recovery match corresponding to the partition is larger than the data size of the data to be written, the partition is determined to be a proper partition.
In this example, before writing data, the data to be written is first determined according to the data size of the data to be written and the size of the remaining storage space of the recordbach corresponding to the partition, so as to completely write the data to be written into a recordbach block.
In the embodiment of the invention, all partitions are sequenced in advance according to the storage positions of the bare disk, and a proper partitions is determined in the partitions sequence according to the data size of the data to be written and the residual storage space size of the recordbach corresponding to the partitions, so that the data storage performance of kafka is improved.
When the method is implemented, if the data size of the data to be written is too large, the data to be written can be divided, marked and stored in sequence.
In one embodiment, after writing the data to be written into the recordbase corresponding to the appropriate partition in the second disk partition, the method may further include:
and updating the storage state information of the recordbatt corresponding to the proper partition in the first disk partition according to the residual storage space size of the recordbatt corresponding to the proper partition.
In the embodiment, the accuracy of the storage state information of the recordbach corresponding to the partition stored in the locksdb database can be guaranteed in real time, so that the problem of error reporting caused by insufficient writing data storage space is avoided.
In one embodiment, the description information further includes flag information, where the flag information is used to indicate whether the recordbach corresponding to the partition is capable of storing data;
according to the residual storage space size of the recordbatt corresponding to the proper partition, updating the storage state information of the recordbatt corresponding to the proper partition in the first disk partition may include:
and when the residual storage space size of the recordbatt corresponding to the proper partition is zero, updating the mark information in the description information of the proper partition to the fact that the recordbatt corresponding to the proper partition can not store data.
In this example, the description information in the part is further refined, and when implementing, the datainfo bitmap may be operated, and a bit preset by the datainfo bitmap in the part corresponding to the recordatch that has no redundant storage space is marked, for example, setbit 1, to indicate that the recordatch corresponding to the part cannot store data.
Therefore, when writing data, whether the storage space and the size of the residual space exist or not is judged in sequence, and then the storage is carried out, so that the problem of error reporting is avoided, and the stability of data storage of kafka is improved.
When reading data, in one embodiment, the description information further includes data identification information of the data stored by the partitioner;
after receiving the data to be written, the method further comprises the following steps:
generating data identification information for data to be written; the data identification information is used for identifying data to be written, and is generated according to preset rules according to the data to be written, wherein the preset rules are used for generating data identification information according to attribute information of the data to be written;
and storing the data identification information of the data to be written in the description information of the proper partition of the first disk partition.
Specifically, the attribute information of the data to be written may include a service system name from which the data to be written comes, address information of a server from which the data to be written comes, time point information of writing the data to be written into a cache, time point information of generating the data to be written, and the like. The setting rule is that attribute information of the data to be written corresponds to a mark, and the mark can be English characters, number strings and the like.
After the data to be written is stored, the data identification information of the data to be written is stored in the corresponding description information of the partitionation.
Fig. 3 is a specific embodiment of a data storage method of kafka according to an embodiment of the present invention, as shown in fig. 3, after writing data to be written into a recordbase corresponding to the appropriate partition in the second disk partition, the method may further include:
step 301, receiving a data reading request; the data reading request comprises attribute information of data to be read;
step 302, according to a data reading request, obtaining data identification information of data to be read according to a preset rule;
step 303, sequentially reading description information of each part in the part sequence according to the data identification information of the data to be read to obtain a plurality of second parts; the data identification information of the data stored by the second part in the description information of the second part is consistent with the data identification information of the data to be acquired;
step 304, reading data from the recordcharacteristics corresponding to the second partitionings.
In the embodiment, when reading data, an accurate recordbach block is quickly searched according to the attribute information of the data to be read, so that the data reading and searching efficiency based on kafka data storage is improved.
When deleting data, the recordbach block is emptied, and then the description information of the partition, the information of the topic and the like are updated, for example, a bit preset by the datainfo bitmap is set to 0 from 1.
In summary, in the embodiment of the invention, the local file system is removed from the storage mode of kafka by changing the mode of storing the data and the metadata of the data by using the file, the metadata information of the data is stored by using the mode of operating a faster database of the metadata information, and the data is read and written by adopting a fixed block size, and the bare disk is read and written by using the database, so that a slightly redundant call stack of the file system is bypassed, the speed of reading and writing the hard disk is improved, and the performance of accessing the data of kafka is more efficient.
The embodiment of the invention also provides a data storage device of kafka, which is described in the following embodiment. Since the principle of the device for solving the problem is similar to that of the data storage method of kafka, the implementation of the device can be referred to the implementation of the data storage method of kafka, and the repetition is omitted.
FIG. 4 is a schematic diagram of a data storage device of kafka according to an embodiment of the present invention, as shown in FIG. 4, the device includes:
the disk dividing module 401 is configured to divide a server disk into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode;
the disk partition processing module 402 is configured to receive configuration information, where the configuration information is used to create a topic and a partition of the topic in a first disk partition, and divide a storage block recordcharacteristic in a second disk partition; according to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition;
a data writing module 403, configured to sort all partitions according to position information of the recordbach corresponding to the partitions, to form a partitional sequence; receiving data to be written; determining a proper part in the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; writing the data to be written into a recordmatch corresponding to the proper partition in the second disk partition; the data to be written carries attribute information; and the recordbar corresponding to the proper partition is used for storing the data to be written.
In one embodiment, the attribute information includes a data size of data to be written, and the storage state information of the recordbach corresponding to the partition includes a remaining storage space size of the recordbach corresponding to the partition;
the write data module 403 is specifically configured to:
determining the first part in the part sequence;
when the data size of the data to be written is smaller than or equal to the residual storage space size of the recordpatch corresponding to the first part, determining the first part as a proper part;
when the data size of the data to be written is larger than the residual storage space size of the recovery match corresponding to the first part of the partition, the residual storage space size of the recovery match corresponding to the partition in the partition sequence is sequentially determined, when the residual storage space size of the recovery match corresponding to the partition is larger than the data size of the data to be written, the partition is determined to be a proper partition.
In one embodiment, the apparatus further comprises:
the storage state information updating module is configured to update storage state information of the recordbach corresponding to the appropriate partition in the first disk partition according to a remaining storage space size of the recordbach corresponding to the appropriate partition after the data to be written is written into the recordbach corresponding to the appropriate partition in the second disk partition by the data writing module 403.
In one embodiment, the description information further includes flag information, where the flag information is used to indicate whether the recordbach corresponding to the partition is capable of storing data;
the storage state information updating module is specifically configured to:
when the residual storage space size of the recordbach corresponding to the first part is zero, the marking information in the description information of the first part is updated to be that the recordbach corresponding to the proper part can not store data.
In one embodiment, the description information further includes data identification information of the data stored by the partitioner;
the apparatus further comprises:
the partition marking module is configured to generate data identification information for the data to be written after the data to be written is received by the data writing module 403; the data identification information is used for identifying data to be written, and is generated according to preset rules according to the data to be written, wherein the preset rules are used for generating data identification information according to attribute information of the data to be written;
and storing the data identification information of the data to be written in the description information of the proper partition of the first disk partition.
In one embodiment, the apparatus further comprises:
the data reading module is configured to receive a data reading request after the data to be written is written into the recordbase corresponding to the appropriate partition in the second disk partition by the data writing module 403; the data reading request comprises attribute information of data to be read; according to the data reading request, acquiring data identification information of the data to be read according to a preset rule; according to the data identification information of the data to be read, sequentially reading the description information of each part in the part sequence to obtain a plurality of second parts; the data identification information of the data stored by the second part in the description information of the second part is consistent with the data identification information of the data to be acquired; and reading data from the recordcharacteristics corresponding to the second partitionings.
In one embodiment, the descriptive information for each part is stored in a bitmap data structure.
In one embodiment, the first disk partition stores data in the form of a locksdb database.
Fig. 5 is a schematic diagram of a computer device according to an embodiment of the present invention, and as shown in fig. 5, a computer device 500 is further provided according to an embodiment of the present invention, including a processor 501, a memory 502, and a computer program 503 stored in the memory 502 and capable of running on the processor 501, where the data storage method of kafka is implemented when the processor 501 executes the computer program 503.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described data storage method of kafka.
The embodiment of the invention also provides a computer program product, which comprises a computer program, and the computer program realizes the data storage method of kafka when being executed by a processor.
In the embodiment of the invention, a server disk is divided into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode; receiving configuration information, wherein the configuration information is used for creating a topic and a partitionof the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; according to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition; ordering all partitions according to the position information of the recordmatch corresponding to the partitions to form a partitions sequence; receiving data to be written; the data to be written carries attribute information; determining a proper part in the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; the recordbase corresponding to the proper partition is used for storing data to be written; and writing the data to be written into the recordbatt corresponding to the proper partition in the second disk partition. In the embodiment of the invention, the server disk is divided into two parts, one part is used for storing the topic information and the partition partitioning information of topic of kafka, and the other part is used for directly storing data in a bare disk mode.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (14)

1. A data storage method of kafka comprising:
dividing a server disk into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode;
receiving configuration information, wherein the configuration information is used for creating a topic and a partitionof the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition;
according to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition;
ordering all partitions according to the position information of the recordmatch corresponding to the partitions to form a partitions sequence;
receiving data to be written; the data to be written carries attribute information;
determining a proper part in the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; the recordbase corresponding to the proper partition is used for storing data to be written;
and writing the data to be written into the recordbatt corresponding to the proper partition in the second disk partition.
2. The method of claim 1, wherein the attribute information includes a data size of data to be written, and the storage state information of the recordbach corresponding to the partition includes a remaining storage space size of the recordbach corresponding to the partition;
determining a proper part in the part sequence according to the attribute information carried by the data to be written and the storage state information of the recordbase corresponding to each part in the part sequence, including:
determining the first part in the part sequence;
when the data size of the data to be written is smaller than or equal to the residual storage space size of the recordpatch corresponding to the first part, determining the first part as a proper part;
when the data size of the data to be written is larger than the residual storage space size of the recovery match corresponding to the first part of the partition, the residual storage space size of the recovery match corresponding to the partition in the partition sequence is sequentially determined, when the residual storage space size of the recovery match corresponding to the partition is larger than the data size of the data to be written, the partition is determined to be a proper partition.
3. The method of claim 2, further comprising, after writing the data to be written to the recordbase corresponding to the appropriate partition in the second disk partition:
and updating the storage state information of the recordbatt corresponding to the proper partition in the first disk partition according to the residual storage space size of the recordbatt corresponding to the proper partition.
4. The method of claim 3, wherein the description information further includes flag information for indicating whether the recordbach corresponding to the partitionation is capable of storing data;
updating the storage state information of the recordbatt corresponding to the proper partition in the first disk partition according to the residual storage space size of the recordbatt corresponding to the proper partition, including:
and when the residual storage space size of the recordbatt corresponding to the proper partition is zero, updating the mark information in the description information of the proper partition to the fact that the recordbatt corresponding to the proper partition can not store data.
5. The method of claim 1, wherein the description information further includes data identification information of the data stored by the partitioner;
after receiving the data to be written, the method further comprises the following steps:
generating data identification information for data to be written; the data identification information is used for identifying data to be written, and is generated according to preset rules according to the data to be written, wherein the preset rules are used for generating data identification information according to attribute information of the data to be written;
and storing the data identification information of the data to be written in the description information of the proper partition of the first disk partition.
6. The method of claim 5, wherein after writing the data to be written to the recordbase corresponding to the appropriate partition in the second disk partition, further comprising:
receiving a data reading request; the data reading request comprises attribute information of data to be read;
according to the data reading request, acquiring data identification information of the data to be read according to a preset rule;
according to the data identification information of the data to be read, sequentially reading the description information of each part in the part sequence to obtain a plurality of second parts; the data identification information of the data stored by the second part in the description information of the second part is consistent with the data identification information of the data to be acquired;
and reading data from the recordcharacteristics corresponding to the second partitionings.
7. The method of claim 1, wherein the description information for each part is stored in a bitmap data structure.
8. The method of claim 1, wherein the first disk partition stores data in the form of a locksdb database.
9. A data storage device of kafka comprising:
the disk dividing module is used for dividing a server disk into a first disk partition and a second disk partition; the first disk partition is used for storing the topic information of kafka and the partition partitioning information of topic, and the second disk partition is used for storing data in a bare disk mode;
the disk partition processing module is used for receiving configuration information, wherein the configuration information is used for creating topic and partitionof topic in a first disk partition and dividing a storage block recordmatch in a second disk partition; according to the configuration information, creating a topic and a partition of the topic in a first disk partition, and dividing a storage block recordmatch in a second disk partition; each partition in the first disk partition stores description information, wherein the description information comprises position information of a recordbase corresponding to the partition and storage state information of the recordbase corresponding to the partition;
the data writing module is used for ordering all the partitionings according to the position information of the recordbatt corresponding to the partitionings to form a partitioning sequence; receiving data to be written; determining a proper part in the part sequence according to attribute information carried by the data to be written and storage state information of the recordbase corresponding to each part in the part sequence; writing the data to be written into a recordmatch corresponding to the proper partition in the second disk partition; the data to be written carries attribute information; and the recordbar corresponding to the proper partition is used for storing the data to be written.
10. The apparatus of claim 9, wherein the description information further includes data identification information of the data stored by the partitioner;
further comprises:
the data writing module is used for writing data to the data storage module, and generating data identification information for the data to be written; the data identification information is used for identifying data to be written, and is generated according to preset rules according to the data to be written, wherein the preset rules are used for generating data identification information according to attribute information of the data to be written; and storing the data identification information of the data to be written in the description information of the proper partition of the first disk partition.
11. The apparatus as recited in claim 10, further comprising:
the data reading module is used for receiving a data reading request after the data to be written is written into the recordbase corresponding to the first partition in the second disk partition by the data writing module; the data reading request comprises attribute information of data to be read; according to the data reading request, acquiring data identification information of the data to be read according to a preset rule; according to the data identification information of the data to be read, sequentially reading the description information of each part in the part sequence to obtain a plurality of second parts; the data identification information of the data stored by the second part in the description information of the second part is consistent with the data identification information of the data to be acquired; and reading data from the recordcharacteristics corresponding to the second partitionings.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 8 when executing the computer program.
13. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 8.
14. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any of claims 1 to 8.
CN202311660517.9A 2023-12-05 2023-12-05 Data storage method and device of kafka Pending CN117632029A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311660517.9A CN117632029A (en) 2023-12-05 2023-12-05 Data storage method and device of kafka

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311660517.9A CN117632029A (en) 2023-12-05 2023-12-05 Data storage method and device of kafka

Publications (1)

Publication Number Publication Date
CN117632029A true CN117632029A (en) 2024-03-01

Family

ID=90016071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311660517.9A Pending CN117632029A (en) 2023-12-05 2023-12-05 Data storage method and device of kafka

Country Status (1)

Country Link
CN (1) CN117632029A (en)

Similar Documents

Publication Publication Date Title
CN104199750B (en) A kind of file access pattern method and device of Linux system
JP4206586B2 (en) Database management method and apparatus, and storage medium storing database management program
CN108694195B (en) Management method and system of distributed data warehouse
US20190026042A1 (en) Deduplication-Aware Load Balancing in Distributed Storage Systems
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
CN101308471B (en) Method and device for data restoration
CN107122368A (en) A kind of data verification method, device and electronic equipment
CN111414389B (en) Data processing method and device, electronic equipment and storage medium
CN104731896A (en) Data processing method and system
JP6156517B2 (en) Write information storage device, method, and program
CN107665219B (en) Log management method and device
CN109271343A (en) A kind of data merging method and device applied in key assignments storage system
CN110765076B (en) Data storage method, device, electronic equipment and storage medium
CN104572920A (en) Data arrangement method and data arrangement device
KR20160100211A (en) Method and device for constructing on-line real-time updating of massive audio fingerprint database
JP4199888B2 (en) Database management method
US9047363B2 (en) Text indexing for updateable tokenized text
CN106897338A (en) A kind of data modification request processing method and processing device for database
CN110209780A (en) A kind of question template generation method, device, server and storage medium
CN111522827A (en) Data updating method and device and electronic equipment
CN116414935A (en) Method for distributed Search space vector data based on Elastic Search
CN109542860B (en) Service data management method based on HDFS and terminal equipment
CN117632029A (en) Data storage method and device of kafka
CN116610670A (en) State data storage method and device based on block chain
US11803525B2 (en) Selection and movement of data between nodes of a distributed storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination