WO2020042612A1 - 消息存储、读取方法及装置、服务器、存储介质 - Google Patents

消息存储、读取方法及装置、服务器、存储介质 Download PDF

Info

Publication number
WO2020042612A1
WO2020042612A1 PCT/CN2019/081173 CN2019081173W WO2020042612A1 WO 2020042612 A1 WO2020042612 A1 WO 2020042612A1 CN 2019081173 W CN2019081173 W CN 2019081173W WO 2020042612 A1 WO2020042612 A1 WO 2020042612A1
Authority
WO
WIPO (PCT)
Prior art keywords
real
message
storage address
topic
virtual
Prior art date
Application number
PCT/CN2019/081173
Other languages
English (en)
French (fr)
Inventor
彭伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020042612A1 publication Critical patent/WO2020042612A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management

Definitions

  • the embodiments of the present application relate to the field of data processing technologies, and in particular, to a method and device for storing and reading messages, a server, and a storage medium.
  • Kafka (abbreviation: Kafka) system is a distributed messaging system with high throughput.
  • the Kafka system can store multiple types of messages, each type of message is called a topic (English: topic), each topic has multiple partitions, and all partitions of each topic share and store messages belonging to the topic.
  • Kafka clusters are used to deploy Kafka systems; Kafka clusters have multiple storage nodes; the storage nodes can be servers or other devices with computing capabilities; for example, multiple storage nodes in a Kafka cluster can be across data centers.
  • Each topic in the Kafka system can be deployed on one or more storage nodes in a Kafka cluster; if a topic is stored on multiple storage nodes, multiple partitions that the topic has can be distributed and deployed on the multiple Storage nodes; if a topic is stored on a storage node, multiple partitions of the topic are deployed on the storage node.
  • the client when a client requests to store a message to a Kafka cluster, the client may specify a topic and a partition for storing the message.
  • the storage request is sent to a target storage node (a storage node of the partition where the topic is deployed), and the target storage node has a server of the partition of the topic.
  • the target storage node when the server receives the storage request, the target storage node (specifically the server deployed on it) stores the message in the partition of the topic.
  • the embodiments of the present application provide a method and a device for storing and reading messages, a server, and a storage medium, which can solve the problem of excessive workload of some topics and some partitions in related technologies.
  • the technical solution includes:
  • a message storage method is provided.
  • the method is applied to a Kafka Kafka cluster; the method includes: receiving a first message storage request for storing a message in the Kafka cluster; the first message storage request Specify that the message specified by the first message storage request is stored in a virtual storage address, where the virtual storage address includes an identifier of a virtual topic topic and an identifier of a virtual partition; and determine a relationship with the virtual storage based on a correspondence between the virtual storage address and the first real storage address A first real storage address corresponding to a storage address, the first real storage address including an identifier of a first real topic and an identifier of a first real partition; the first in the first real topic specified by the first real storage address The real partition stores the message specified by the first message storage request.
  • a real storage address for storing a message is determined according to a correspondence relationship between a virtual storage address and a real storage address, and the message is stored.
  • the storage of messages is realized.
  • the method further includes: receiving a second message storage request for storing a message in the Kafka cluster, the second message storage request specifying storing the message specified by the second message storage request at the virtual storage address; based on the virtual storage address The correspondence relationship with the second real storage address determines a second real storage address corresponding to the virtual storage address, where the second real storage address includes an identifier of a second real topic and an identifier of a second real partition; The second real partition in the second real topic specified by the storage address stores the message specified by the second message storage request.
  • the message designated to be stored in the virtual storage address can be stored in the second real storage address.
  • the degree of workload imbalance among multiple topics in a storage node reduces the chance of multiple topics occupying an uneven resource consumption in a certain storage node.
  • the first real partition and the second real partition may be deployed on different storage nodes in the Kafka cluster.
  • the receiving time of the second message storage request may be later than the receiving time of the first message storage request.
  • the method further comprises: before receiving the second message storage request, estimating an amount of pre-stored data of the message specified by the second message storage request received within a preset time period; when the amount of pre-stored data is greater than the first At a threshold, a correspondence between the virtual storage address and the second real storage address is established.
  • the amount of pre-stored data is the estimated data amount of the message specified by the second message storage request received within a preset period of time
  • the amount of pre-stored data is greater than the first threshold value, it indicates that the amount of data specified by the second message storage request is The message has a large storage requirement.
  • the correspondence between the virtual storage address and the real storage address can be modified to correspond to the virtual storage address and the second real storage address, so as to store the second message storage request with more specified messages in It is more capable of supporting real storage in this storage requirement, thereby improving the storage performance of the message storage system.
  • the implementation process of estimating the amount of pre-stored data of the message specified by the second message storage request received within a preset time period may include: for a correspondence relationship with the first real topic At least one target virtual topic among the plurality of virtual topics, obtaining a second data amount of a message stored in each target virtual topic; obtaining a first data amount of a message stored in the first real topic; based on the first The amount of data and the second amount of data for each target virtual topic. Estimate the amount of pre-stored data.
  • the implementation process of the estimated pre-stored data amount may include: estimating the pre-stored data amount using an estimation model; wherein the input parameters and output parameters of the estimation model include: at least one set of parameters, the A set of parameters corresponds to at least one target virtual topic.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic, and the target virtual topic.
  • the output parameters include: the pre-stored data amount, an identifier of the target virtual topic, and a ratio of the third data amount of the target virtual topic to the first data amount.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic and the second data amount of the target virtual topic; and the output parameters include: the amount of pre-stored data, the target The identification of the virtual topic and the third data amount of the target virtual topic.
  • the method may further include: before receiving the second message storage request, estimating to stay in the preset time period.
  • the amount of pre-stored data is an estimated amount of data to be stored in a first real topic where the first real partition is located within a preset time period
  • the amount of pre-stored data is greater than a second threshold, it indicates that the first real The partition may not be able to support the message storage needs within the preset time period.
  • the correspondence relationship between the virtual storage address and the real storage address may be It is modified that the virtual storage address corresponds to the second real storage address, so that messages to be stored in the first real topic where the first real partition is located are stored in the second real storage address, thereby improving the storage performance of the message storage system.
  • the first threshold value and the second threshold value may be determined according to actual needs, and the first threshold value and the second threshold value may be equal or different, and this embodiment of the present application does not specifically limit time.
  • the implementation process of estimating the amount of pre-stored data of a message to be stored in a first real topic where the first real partition is located within a preset time period may include: At least one target virtual topic among a plurality of virtual topics in which a topic has a corresponding relationship, to obtain a second amount of data of a message stored in each target virtual topic; to obtain a first amount of data of a message stored in the first real topic ; Based on the first data amount and the second data amount of each target virtual topic, estimate the pre-stored data amount.
  • the implementation process of estimating the amount of pre-stored data may include: using an estimation model to estimate the amount of pre-stored data; wherein the input parameters and output parameters of the estimation model include: at least one set of parameters, the At least one set of parameters corresponds to at least one target virtual topic.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic, and the target virtual topic.
  • the output parameters include: the identifier of the first real topic and the amount of pre-stored data, the identifier of the target virtual topic, and the third data amount of the target virtual topic. A ratio of the first data amount.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic and the second data amount of the target virtual topic; and the output parameters include: the identifier of the first real topic And the pre-stored data amount, the identifier of the target virtual topic and the third data amount of the target virtual topic.
  • the at least one target virtual topic includes: all virtual topics in the multiple virtual topics, or at least one virtual topic before the amount of data stored in the multiple virtual topics.
  • implementation process of establishing a correspondence between the virtual storage address and the second real storage address may include:
  • the available data amount is the difference between the data amount of the real topic and the pre-stored data amount; when it is determined that there is When a real topic whose amount of available data is greater than the third amount of data is determined, the real topic whose amount of available data is greater than the third amount of data is determined as the second real topic; when it is determined that there is no available data amount greater than the third amount of data,
  • the second real topic is created in the message storage system; the correspondence between the virtual storage address corresponding to the target virtual topic and the real storage address is modified to the virtual storage address and the The second real storage address corresponds.
  • the messages specified to be stored in the virtual storage address can be stored in different real storage addresses.
  • the amount of data (or traffic) of each logical topic is not balanced, it can be reduced. The odds that the resources occupied by each logical topic are uneven.
  • the real topic corresponding to the maximum available data amount may be determined as the second real topic.
  • establishing the correspondence between the virtual storage address and the second real storage address may include: for at least one target virtual topic corresponding to the first real topic, according to the second data amount of the at least one target virtual topic From the largest to the smallest, a correspondence relationship between the virtual storage address corresponding to each target virtual topic and the second real storage address is established in turn.
  • establishing the correspondence between the virtual storage address and the second real storage address may further include: determining a message offset of the first message in the second real topic, where the first message is based on the virtual storage The correspondence between the address and the second real storage address, the first message stored in the second real topic; the message offset of the first message, and the virtual storage address and the second real storage address The corresponding relationship is stored in an index file corresponding to the target virtual topic.
  • each real storage address has a corresponding relationship with a plurality of virtual storage addresses.
  • the message storage method after receiving a message storage request for storing a message in a Kafka cluster, determines a real storage address for storing a message according to a correspondence relationship between a virtual storage address and a real storage address, and stores the message. In the real partition designated by the real storage address, the storage of messages is realized.
  • the messages designated to be stored in the virtual storage address can be stored in different In the real storage address, compared with the related technology, the probability of overloading the real partition workload in the real topic is reduced, and the throughput of the message storage system is improved.
  • a message storage method is provided.
  • the method can be applied to a Kafka cluster.
  • the method includes: receiving a message storage request for storing a message in the Kafka cluster, and the message storage request is specified in a virtual server.
  • the topic stores the message; determines the real topic corresponding to the virtual topic based on the correspondence between the virtual topic and the first real topic; stores the message specified by the message storage request in the real partition of the real topic.
  • a real topic for storing a message after receiving a message storage request for storing a message in a Kafka cluster, a real topic for storing a message can be determined according to a correspondence relationship between a virtual topic and a real topic, and the message is stored in the message. In the real partition specified by the real topic, message storage is implemented.
  • the method may further include: establishing a correspondence between a virtual topic and a real topic.
  • a message reading method is provided.
  • the method is applied to a Kafka Kafka cluster; the method includes: receiving a message reading request for reading a message in the Kafka cluster, and the message reading request Specifies to read messages from a virtual storage address, the virtual storage address including the identification of the virtual topic topic and the identification of the virtual partition; based on the correspondence between the virtual storage address and the real storage address, determining the target real storage address corresponding to the virtual storage address
  • the target real storage address includes the target real topic identifier and the target real partition identifier; the target real partition specified by the target real storage address reads the message specified by the message read request.
  • a target real storage address corresponding to the virtual storage address is determined through the correspondence between the virtual storage address and the real storage address, and the The target real partition specified by the target real storage address reads the message specified by the message read request, and realizes the reading of the message.
  • the message read request carries a target offset of the message to be read
  • determining the target real storage address corresponding to the virtual storage address based on the correspondence between the virtual storage address and the real storage address includes: Obtaining the message offset of the first message recorded in the target index file, where the first message is the first message stored in the real topic specified by the current correspondence based on the current correspondence between the virtual storage address and the real storage address,
  • the target index file is an index file corresponding to the virtual topic specified by the virtual storage address; when the target offset is greater than or equal to the message offset, the real storage address recorded in the current correspondence is determined as the target real storage address ;
  • the target offset is less than the message offset
  • the real storage address recorded in the historical correspondence between the virtual storage address and the real storage address is determined as the target real storage address.
  • the current correspondence relationship is a modified correspondence relationship between the correspondence relationship between the virtual storage address and the real storage address during the use of the message storage system.
  • the correspondence relationship before the modification of the correspondence relationship between the virtual storage address and the real storage address is a historical correspondence relationship.
  • the real storage address recorded in the historical correspondence relationship is different from the real storage address recorded in the current correspondence relationship.
  • the offset of the message stored based on the current correspondence is greater than the offset of the message stored based on the historical correspondence.
  • the messages stored based on the current correspondence are stored in the real storage address specified by the current correspondence.
  • the message stored based on the historical correspondence is stored in the real storage address specified by the historical correspondence. Therefore, before determining the target real storage address, it is necessary to first obtain the message offset of the first message, and compare the message offset of the first message with the target offset to determine that the target real storage address is
  • the real storage address specified by the historical correspondence relationship is also the real storage address specified by the current correspondence relationship, thereby ensuring that messages can be effectively read.
  • a message storage device includes a receiving module configured to receive a first message storage request for storing a message in a Kafka cluster, where the first message storage request is specified to be stored in a virtual storage address.
  • the first message storage request specifies a message, the virtual storage address includes an identifier of a virtual topic topic and an identifier of a virtual partition; and a determining module is configured to determine the virtual storage address based on the correspondence between the virtual storage address and the first real storage address A first real storage address corresponding to the address, where the first real storage address includes an identifier of a first real topic and an identifier of a first real partition; and a storage module, configured to store the first real topic specified by the first real storage address in the first real topic
  • the first real partition stores the message specified by the first message storage request.
  • the receiving module is configured to receive a second message storage request for storing a message in the Kafka cluster, where the second message storage request specifies storing the message specified by the second message storage request at the virtual storage address;
  • the determining module For determining a second real storage address corresponding to the virtual storage address based on the correspondence between the virtual storage address and the second real storage address, where the second real storage address includes an identifier of a second real topic and a second real partition Identification;
  • the storage module is configured to store, in the second real partition in the second real topic specified by the second real storage address, the message specified by the second message storage request.
  • the first real partition and the second real partition are deployed on different storage nodes in the Kafka cluster.
  • the receiving time of the second message storage request is later than the receiving time of the first message storage request.
  • the device further includes: an estimation module for estimating a pre-stored data amount of the message specified by the second message storage request received within a preset time period; and a establishing module for when the pre-stored data amount When it is greater than the first threshold, a correspondence between the virtual storage address and the second real storage address is established.
  • the estimation module includes: an acquisition submodule, configured to acquire, for at least one target virtual topic among multiple virtual topics corresponding to the first real topic, a target virtual topic stored in each target virtual topic.
  • the second data amount of the message, the acquisition sub-module is used to acquire the first data amount of the message stored in the first real topic, and the estimation sub-module is used based on the first data amount and each target The second data amount of the virtual topic, and the amount of pre-stored data is estimated.
  • the estimation submodule is configured to: use an estimation model to estimate the amount of pre-stored data; wherein the input parameters and output parameters of the estimation model include: at least one set of parameters, and the at least one set of parameters and At least one target virtual topic has a one-to-one correspondence.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic, and the second data of the target virtual topic.
  • the output parameters include: the amount of pre-stored data, an identifier of the target virtual topic, and a ratio of the third amount of data of the target virtual topic to the first amount of data.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic and the second data amount of the target virtual topic; and the output parameters include: the amount of pre-stored data, the target The identification of the virtual topic and the third data amount of the target virtual topic.
  • the device includes: an estimation module for estimating the amount of pre-stored data of messages to be stored in the first real topic where the first real partition is located within a preset time period; and a establishing module for when When the amount of pre-stored data is greater than a second threshold, a correspondence between the virtual storage address and the second real storage address is established.
  • the estimation module includes: an acquisition submodule, configured to acquire, for at least one target virtual topic among a plurality of virtual topics corresponding to the first real topic, a target virtual topic stored in each target virtual topic.
  • the second data amount of the message, the acquisition sub-module is used to acquire the first data amount of the message stored in the first real topic, and the estimation sub-module is used based on the first data amount and each target The second data amount of the virtual topic, and the amount of pre-stored data is estimated.
  • the estimation submodule is configured to: use an estimation model to estimate the amount of pre-stored data; wherein the input parameters and output parameters of the estimation model include: at least one set of parameters, and the at least one set of parameters and At least one target virtual topic has a one-to-one correspondence.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic, and the second data of the target virtual topic.
  • the output parameters include: the identifier of the first real topic and the amount of pre-stored data, the identifier of the target virtual topic, the third amount of data of the target virtual topic, and the first amount of data Ratio.
  • the input parameters include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic and the second data amount of the target virtual topic; and the output parameters include: the identifier of the first real topic And the pre-stored data amount, the identifier of the target virtual topic and the third data amount of the target virtual topic.
  • the at least one target virtual topic includes: all virtual topics in the multiple virtual topics, or at least one virtual topic before the amount of data stored in the multiple virtual topics is from large to small.
  • the establishment module includes a search submodule for finding a real topic whose available data amount is greater than the third data amount based on the third data amount of each target virtual topic, and the available data amount is the data amount of the real topic.
  • the difference between the quota and the amount of pre-stored data a determining submodule configured to determine, when a real topic with an available data amount greater than the third data amount, a real topic having an available data amount greater than the third data amount as the second real topic;
  • a determination sub-module configured to create a second real topic in the message storage system when it is determined that there is no real topic with an amount of available data greater than the third amount of data; and a modification sub-module is used to store a virtual storage corresponding to the target virtual topic
  • the correspondence between the address and the real storage address is modified so that the virtual storage address corresponds to a second real storage address including the second real topic.
  • the determining sub-module is further configured to: when determining that there are multiple real topics with available data amount greater than the third data amount, determine the real topic corresponding to the maximum available data amount as the second real topic.
  • the establishing module is further configured to: for at least one target virtual topic that has a corresponding relationship with the first real topic, sequentially establish each of the at least one target virtual topic in a descending order of the second data amount of the at least one target virtual topic.
  • the establishing module is further configured to determine a message offset of the first message in the second real topic, where the first message is based on a correspondence between the virtual storage address and the second real storage address, The first message stored in the second real topic; the message offset of the first message and the correspondence between the virtual storage address and the second real storage address are stored in the index corresponding to the target virtual topic File.
  • each real storage address has a corresponding relationship with a plurality of virtual storage addresses.
  • a message reading device includes a receiving module configured to receive a message reading request for reading a message in a Kafka cluster.
  • the message reading request specifies reading from a virtual storage address.
  • the virtual storage address includes an identification of a virtual topic and an identification of a virtual partition;
  • a determining module configured to determine a target real storage address corresponding to the virtual storage address based on the correspondence between the virtual storage address and the real storage address, the The target real storage address includes the identification of the target real topic and the identification of the target real partition; the reading module is configured to read the message specified by the message read request in the target real partition specified by the target real storage address.
  • the message read request carries a target offset of a message to be read
  • the determining module is configured to: obtain a message offset of a first message recorded in a target index file, where the first message is based on a virtual The current correspondence between the storage address and the real storage address, the first message stored in the real topic specified by the current correspondence, the target index file is the index file corresponding to the virtual topic specified by the virtual storage address; when the target offset is greater than or When it is equal to the message offset, the real storage address recorded in the current correspondence is determined as the target real storage address; when the target offset is less than the message offset, the historical correspondence between the virtual storage address and the real storage address is recorded The real storage address of is determined as the target real storage address, and the real storage address recorded in the current correspondence is different from the real storage address recorded in the historical correspondence.
  • a server including a processor and a memory; when the processor executes a computer program stored in the memory, the server executes the message storage method according to any one of the first aspect.
  • a server including a processor and a memory; when the processor executes a computer program stored in the memory, the server executes the message reading method of any one of the second aspect.
  • a storage medium is provided, and a computer program is stored in the storage medium, and the computer program instructs a server to execute the message storage method according to any one of the first aspects.
  • a storage medium is provided, and a computer program is stored in the storage medium, and the computer program instructs a server to execute the message reading method according to any one of the second aspect.
  • FIG. 1 is a schematic diagram of a kafka cluster-based message storage system in a related technology provided by an embodiment of the present application;
  • FIG. 2 is a schematic structural diagram of a message storage system according to an embodiment of the present application.
  • FIG. 3 is a flowchart of a message storage method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a data storage structure provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of a method for estimating the amount of pre-stored data of a message to be stored within a preset time period provided by an embodiment of the present application;
  • FIG. 6 is a schematic structural diagram of an LSTM neural network according to an embodiment of the present application.
  • FIG. 7 is a flowchart of a method for establishing a correspondence between a virtual storage address and a second real storage address according to an embodiment of the present application
  • FIG. 8 is a flowchart of a method for determining a second real topic according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a message reading method according to an embodiment of the present application.
  • FIG. 10 is a flowchart of a method for determining a target real storage address corresponding to a virtual storage address according to an embodiment of the present application
  • FIG. 11 is a schematic structural diagram of a message storage device according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of another message storage device according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an estimation module according to an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a setup module according to an embodiment of the present application.
  • 15 is a schematic structural diagram of a message reading device according to an embodiment of the present application.
  • FIG. 16 is a structural block diagram of a server according to an embodiment of the present application.
  • FIG. 17 is a structural block diagram of another server provided by an embodiment of the present application.
  • the message service system is mainly a message service of a Kafka cluster-based message storage system.
  • the Kafka cluster-based message storage system may receive a message sent by a message producer, and store the message in a topic to which the message belongs, so that a consumer can request the message from the topic.
  • each topic is composed of at least one partition, and each partition is composed of at least one segment.
  • Each storage segment stores a pair of index files.
  • data files The data file is used to store the message sent by the message producer, and the index file is used to record the index information (such as the offset address) of the message in the corresponding data file.
  • the Kafka cluster-based message storage system may further include multiple storage nodes. When a storage node receives a corresponding message, the message may be immediately stored in the system, thereby increasing the system's ability to persistently store and handle message accumulation.
  • the storage granularity of the stored messages is relatively coarse, which makes it impossible to effectively use the storage space.
  • only one topic message can be stored in each partition, resulting in a limited number of topics that each storage node can support.
  • a storage node with a virtual machine specification of 8U16G (for deploying topics) The number of topics that can be supported is usually less than 100, otherwise the performance of the storage node will drop sharply.
  • due to the limited number of topics that each storage node can support a large number of storage nodes need to be deployed in the kafka cluster, and the cost of the message storage system based on the kafka cluster is high.
  • the topic may have an imbalanced traffic problem, the resources occupied by each topic are uneven, and the topic data needs to be migrated. And when the topic data volume is large, the data migration process has the problems of too long migration time and untimely migration.
  • FIG. 2 is a schematic structural diagram of a message storage system involved in the message storage method.
  • the message storage system 10 may include: a plurality of storage nodes 101.
  • the multiple storage nodes 101 may establish a connection through a wired network or a wireless network.
  • the message storage system may be a Kafka cluster-based message storage system.
  • the Kafka cluster is used to deploy Kafka systems.
  • Kafka cluster has multiple storage nodes.
  • the storage node may be a server or other computing-capable devices. Each topic in the Kafka system can be deployed on one or more storage nodes in a Kafka cluster.
  • each storage node 101 is configured with multiple virtual topics, multiple real topics, and an index file corresponding to each virtual topic.
  • each real partition is configured with multiple index files and multiple data files. This data file is used to store messages.
  • the index file is used to store the index information of the message.
  • Each virtual topic includes multiple virtual partitions, and the identifier of the virtual topic and the identifier of a virtual partition included in the virtual topic can form a virtual storage address.
  • Each real topic includes multiple real partitions. The identifier of the real topic and the identifier of a real partition included in the real topic can form a real storage address. And a virtual storage address can correspond to a real storage address.
  • the message can be stored in the real storage address corresponding to the specified virtual storage address, and the index information used to indicate the message is stored in the index file corresponding to the virtual topic.
  • the correspondence between the real topic, real partition, virtual topic, virtual partition, index file, and data file can be determined when the message storage system is established.
  • each real storage address may have a corresponding relationship with multiple virtual storage addresses.
  • messages designated to be stored in the plurality of virtual storage addresses may all be stored in the real storage address.
  • each real partition can store data designated to be stored in multiple virtual topics, that is, designated to be stored in the multiple
  • the messages in the virtual topic can share the storage space of the real partition in the real topic, so that the real partition can support multiple virtual topics, and then the storage nodes with real partitions can support multiple virtual topics. The number of virtual topics that can be supported, while reducing system costs.
  • the message storage system 10 may further include a plurality of data producing nodes (producers) and a plurality of data consuming nodes (consumers).
  • the connection between the data production node and the storage node 101 and between the data consumption node and the storage node 101 can be established through a wired network or a wireless network.
  • the data production node is used to send a message to the storage node 101, so that the storage node 101 stores the message.
  • the data consuming node is used to read messages from the storage node 101.
  • the following is a description of a message storage method provided by an embodiment of the present application.
  • This message storage method can be applied to Kafka clusters.
  • the following uses the message storage method as the first storage node in a Kafka cluster as an example to describe the message storage method.
  • the message storage method may include the following steps:
  • Step 201 Receive a first message storage request for storing a message in a Kafka cluster.
  • the client may send a first message storage request to a first storage node.
  • the first message storage request may carry a designated message to be stored and a virtual storage address for storing the message.
  • the virtual storage address includes an identifier of a virtual topic topic and an identifier of a virtual partition. That is, the first message storage request may specify that a message specified by the first message storage request is stored at a virtual storage address.
  • the virtual storage address is used as an external interface for storing messages in the Kafka cluster, so that the client can specify that the message is stored in the virtual storage address.
  • the message can be stored in the real storage address corresponding to the virtual storage address specified by the first message storage request, thereby realizing message storage in the real storage address.
  • Step 202 Determine a first real storage address corresponding to the virtual storage address based on a correspondence between the virtual storage address and the first real storage address.
  • the correspondence relationship between the virtual storage address and the first real storage address is stored in the message storage system, and at any time, a virtual storage address corresponds to only one real storage address, that is, a message designated to be stored in the virtual storage address It can only be stored in a corresponding real storage address. Therefore, after receiving the first message storage request, the correspondence relationship can be queried according to the virtual storage address specified in the first message storage request to determine the first real storage address corresponding to the virtual storage address, so that the first A message specified in a message storage request is stored in the first real storage address.
  • the first real storage address includes an identifier of a first real topic and an identifier of a first real partition.
  • each real storage address may have a corresponding relationship with a plurality of virtual storage addresses.
  • messages designated to be stored in the plurality of virtual storage addresses may all be stored in the real storage address.
  • each real partition can store data designated to be stored in multiple virtual topics, that is, designated to be stored in the multiple
  • the messages in the virtual topic can share the storage space of the real partition in the real topic, so that the real partition can support multiple virtual topics, and then the storage nodes with real partitions can support multiple virtual topics. The number of virtual topics that can be supported, while reducing system costs.
  • each real partition can store data stored based on multiple virtual topics, as shown in Figure 4.
  • the schematic diagram of the messages stored in the real partition 2001 is shown.
  • the virtual topic ⁇ ( ⁇ is used to identify the number) is used to identify different virtual topics.
  • the virtual topic ⁇ -message ⁇ is used to identify the different virtual topics based on the stored in the real partition.
  • Messages such as: Virtual Topic 1 (Index1), Virtual Topic 2 (Index2), and Virtual Topic 3 (Index3) respectively identify different virtual topics
  • Virtual Topic 1-Message 1 (Index1-Msg1) is used to identify the storage based on virtual topic1
  • the message 1 in the real partition, and the virtual topic 2-message 1 (Index2-Msg1) are used to identify the message 1 stored in the real partition based on the virtual topic 2.
  • Step 203 Store the message specified by the first message storage request in the first real partition in the first real topic specified by the first real storage address.
  • the message After determining the first real storage address corresponding to the virtual storage address, the message may be stored in the real partition of the real topic indicated by the first real storage address.
  • the data file used to store the message may be determined among the multiple data files in the real partition according to the message storage situation in the real partition. , And then store the message in the corresponding data file.
  • Step 204 Generate index information according to the storage location of the message specified by the first message storage request, and store the index information in an index file corresponding to the virtual topic indicated by the virtual storage address.
  • the index information of the message is used to indicate the storage location of the message in the first real storage address.
  • index information can be generated according to the storage location of the message in the first real storage address, and the index information is stored in an index file corresponding to the virtual topic, so that The message is obtained according to the index information during the message reading process.
  • An index file corresponding to the virtual topic may be stored in a storage node configured with the virtual topic. For example, when the virtual topic is configured in a first storage node, the index file corresponding to the virtual topic may be stored in the first storage node.
  • the index file can be created during the establishment of the message storage system.
  • an index directory may be established in the storage node according to the name of each virtual topic deployed in the storage node, and the index directory stores index files.
  • the index directory may be determined according to the virtual topic identifier, and the index information is stored in an index file in the index directory.
  • the index file may include a data record index and a mapping record index.
  • the data record index is used to indicate the offset of the message in the real partition.
  • the mapping record index is used to indicate the correspondence between the virtual storage address and the real storage address.
  • the data record index may include multiple data index entries.
  • the mapping record index may also include multiple mapping index entries.
  • the data record index (Index) 2 may include a data index entry (Entry) 1 and a data index entry 2, and the data index entry 1 is used to indicate a deviation of the message 1 corresponding to the index 2 in the real partition
  • the data index entry 2 is used to indicate the offset of the message 2 corresponding to the index 2 in the real partition.
  • Data record index 1: 2002 may include data index entry 3 and data index entry 4, which is used to indicate the offset of message 1 corresponding to index 1 in the real partition, and data index entry 4 is used to indicate the index The offset of message 2 and message 3 in the real partition corresponding to 1.
  • the mapping record index 2 may include a mapping index entry (MateEntry) 1 and a mapping index entry 2
  • the mapping record index 1: 2003 may include a mapping index entry 3 and a mapping index entry 4.
  • the sizes of the multiple data index entries may be equal or different, and the sizes of the multiple mapping index entries may also be equal or different.
  • the data index entry and the mapping index entry may each record multiple fields.
  • multiple data index entries are equal in size
  • multiple mapping index entries are equal in size.
  • the data index entry and the mapping index are respectively the same.
  • the fields recorded in the item are exemplified:
  • the data record index may include multiple data index entries, and each data index entry may record one or more of the following fields: virtual storage address offset field (consumerQueueOffset), message sequence number field (startPartitionOffset), file offset A displacement field (physicalPostion), a message length field (size), a total message field (msgNum), and a storage timestamp field (timestamp).
  • virtual storage address offset field (consumerQueueOffset)
  • startPartitionOffset message sequence number field
  • file offset A displacement field physicalPostion
  • message length field size
  • msgNum total message field
  • timestamp storage timestamp field
  • the content carried in the virtual storage address offset field is the message offset of the data recorded in the data index entry among all the data stored in the virtual storage address.
  • the length of the virtual storage address offset field can be 4 bytes or 8 bytes. For example, suppose there are 100 messages stored in the real partition 2 of the real topic1, among which 20 messages are stored in the virtual partition 1 of the virtual topic1, and the messages corresponding to the index information recorded in the data index entry are in the 20 5th of the messages, the message offset carried in the virtual storage address offset field is 5.
  • an 8-bit long value can be used to represent the message offset.
  • the file names of the index directories record the offset of the first message stored in the corresponding virtual topic (also called the base offset baseOffset), in order to save storage space, the virtual storage address offset
  • the content carried in the field may be the relative position of the current message and the first message.
  • the relative position corresponding to the current message can be added to the offset of the first message to obtain the offset of the current message.
  • the virtual storage The address offset field can be 4 bytes in length.
  • the content carried in the message sequence number field is the message offset of the first message in the data index entry among the multiple messages stored in the corresponding real partition.
  • the length of the message sequence number field can be 8 bytes. For example, assuming that 100 messages are stored in the real partition, and the first message in the data index entry is the 60th message stored in real topic1, the message offset carried in the message sequence number field is 60.
  • each data index entry may record the message offsets of multiple messages in the real partition, and the first message in the data index entry is the first message in the multiple messages.
  • the data offset entry Entry1 of the data record index Index2 records the message offset of the message Msg1 in the real partition.
  • the first message in the data index entry is the message Msg1.
  • the offsets of the message Msg2 and the message Msg3 in the real partition are recorded.
  • the first message in the data index entry is the message Msg2.
  • the content carried in the file offset field is the file offset of the first message recorded in the data index entry in the data file of the real partition.
  • the file offset field can be 4 bytes in length. For example, suppose the first message recorded in the data index entry is stored in the third data file in the real partition, and three messages are stored in the third data file, and the first message stored in the third data file is The size of one message and the second message are 1 kilobyte (KB), then the file offset of the first message recorded in the data index entry is 2KB, that is, the content carried in the file offset field It is 2KB.
  • the content carried in the message length field is the length of the message block used to store the data index entry.
  • the length of the message length field can be 4 bytes.
  • the content carried in the total message field is the total number of messages recorded in the message block.
  • the length of the total message field can be 4 bytes.
  • the content carried in the storage timestamp field is the timestamp of the data index entry.
  • the length of the storage timestamp field can be 8 bytes.
  • the mapping record index may include multiple mapping index entries, and the mapping index entry may record one or more of the following fields: a message logical sequence number field (startLogicaloffset), a real partition identifier length field (topicNameSize), and a real partition identifier field ( topicName).
  • startLogicaloffset a message logical sequence number field
  • topicNameSize a real partition identifier length field
  • topicName a real partition identifier field
  • the content carried in the message logical sequence number field is the message sequence number of the first message stored in the real topic based on the correspondence relationship among the multiple messages stored in the virtual storage address when the virtual storage address corresponds to the real storage address.
  • the length of the message logical sequence number field is 8 bytes. For example, suppose that 200 messages are stored in virtual partition 1 in virtual topic1 indicated by the virtual storage address. According to the correspondence between the virtual storage address and the real storage address, the first message stored in the real storage address is the 200. In the 101st message, the content carried in the logical sequence number field of the message is 101.
  • the content carried in the real partition identifier length field is the length of the identifier of the real partition in the real topic corresponding to the virtual storage address.
  • the length of the real partition identifier length field is 4 bytes. For example, when the virtual partition 1 in the virtual topic 1 corresponds to the real partition 2 in the real topic 1, the content carried in the real partition identifier length field is the length of the identifier of the real partition 2.
  • the content carried in the real partition identifier field is the identifier of the real partition in the real topic corresponding to the virtual storage address.
  • the length of the real partition identification field can be set according to actual needs. For example, when the virtual partition 1 in the virtual topic 1 corresponds to the real partition 2 in the real topic 1, the content carried in the real partition identification field is the identifier of the real partition 2.
  • each field included in the data index entry and the mapping index entry are merely exemplary descriptions, and are not used to limit the present application.
  • the content and length carried by each field included in the data index entry and the mapping index entry can be set according to actual needs.
  • Step 205 Estimate the amount of pre-stored data of the message to be stored within a preset time period.
  • Predicting the amount of pre-stored data of a message to be stored within a preset period of time may include: estimating the amount of pre-stored data of the message specified by the second message storage request received within the preset period of time, that is, estimating the Set the amount of pre-stored data of messages to be stored in the virtual storage address during the time period.
  • the amount of pre-stored data of the message to be stored in the first real topic where the first real partition is located within the preset time period is estimated, that is, the amount of pre-stored data to be left at the first real storage address within the preset time period is estimated
  • the amount of pre-stored data for messages stored in is estimated.
  • the amount of the pre-stored data can be estimated according to the data amount of the real topic and the virtual topic.
  • the implementation manner of step 205 may include:
  • Step 2051 For at least one target virtual topic among a plurality of virtual topics that have a corresponding relationship with the first real topic, obtain a second data amount of the message stored in each target virtual topic, and obtain the second data amount in the first real topic. The first amount of data for the stored message.
  • the at least one target virtual topic may include: all virtual topics in the multiple virtual topics, or at least one virtual topic before the amount of data stored in the multiple virtual topics is large to small.
  • the N target virtual topics may be the first N virtual topics in which the amount of data stored in multiple virtual topics ranges from large to small. Is a positive integer.
  • the at least one target virtual topic can be determined according to actual needs. For example, after obtaining the second data amount of each virtual topic, it may be determined whether the virtual topic needs to be determined as the target virtual topic according to the size of the second data amount of the virtual topic.
  • the preset time period can also be set according to actual needs. For example, the preset time period can be four hours, ten hours, or twenty-four hours after the current time.
  • a data volume collection module may be deployed in the message storage system, or a traffic collection process may be created in the message storage system to obtain the second data amount of the virtual topic through the traffic collection module or the traffic collection process.
  • a queue for example, a data amount topic
  • the second data amount may be stored in the queue.
  • the second data amount of the virtual topic may be acquired periodically or in real time, which is not specifically limited in the embodiment of the present application.
  • the first data amount of the first real topic refers to an implementation manner of acquiring the second data amount of the virtual topic correspondingly.
  • the sum of the second data amount of all virtual topics corresponding to the first real topic is the first The first amount of data for the real topic. Therefore, the second data amount of all virtual topics corresponding to the first real topic may be acquired, and the sum of the second data amounts of all virtual topics is determined as the first data amount.
  • Step 2052 Estimate the amount of pre-stored data based on the first data amount and the second data amount of each target virtual topic.
  • an estimated model may be used to estimate the amount of pre-stored data.
  • the prediction model may be a Kalman filtering prediction model, a regression prediction model, or a neural network prediction model. Both the input parameters and output parameters of the prediction model may include: at least one set of parameters, and the at least one set of parameters corresponds to at least one target virtual topic one-to-one.
  • the input parameters of the corresponding group may include: the identification of the first real topic, the first data amount of the first real topic, the identification of the target virtual topic, and the second data amount and the first data of the target virtual topic.
  • the output parameters of the corresponding group may include: the amount of pre-stored data, the identifier of the target virtual topic, and the ratio of the third data amount to the first data amount of the target virtual topic. It should be noted that, when the amount of pre-stored data of a message to be stored in the first real topic where the first real partition is located within a preset time period is estimated, the output parameter may further include an identifier of the first real topic.
  • the identifier of the first real topic is used to uniquely identify the real topic in the message storage system
  • the identifier of the target virtual topic is used to uniquely identify the target virtual topic in the message storage system. Both the identification and the identification of the virtual topic can be determined during the system establishment process.
  • the format of the input parameter may be ⁇ the identification of the first real topic, the first data amount of the first real topic, the first Identification of each target virtual topic, the ratio of the second data amount to the first data amount of the first target virtual topic ⁇ , ..., ⁇ the identification of the first real topic, the first data of the first real topic Amount, the identification of the Nth target virtual topic, and the ratio of the second data amount to the first data amount of the Nth target virtual topic ⁇ .
  • the format of the output parameter can be ⁇ pre-stored data amount, identification of the first target virtual topic, ratio of the third data amount of the first target virtual topic to the first data amount ⁇ , ..., ⁇
  • the format of the output parameter may be ⁇ identity of the first real topic, pre-stored Data volume, the identification of the first target virtual topic, the ratio of the third data volume to the first data volume of the first target virtual topic ⁇ , ..., ⁇ the identification of the first real topic, the amount of pre-stored data , The identifier of the Nth target virtual topic, and the ratio of the third data amount to the first data amount of the Nth target virtual topic ⁇ .
  • the input parameters of the corresponding group may include: the identifier of the first real topic, the first amount of data of the first real topic, the identifier of the target virtual topic, and the second amount of data of the target virtual topic.
  • the output parameters of the corresponding group may include: the amount of pre-stored data, the identification of the target virtual topic, and the third data amount of the target virtual topic. It should be noted that, when the amount of pre-stored data of a message to be stored in the first real topic where the first real partition is located within a preset time period is estimated, the output parameter may further include an identifier of the first real topic.
  • step 205 may include:
  • the second traffic corresponding to at least one target virtual topic among the plurality of virtual topics that have a corresponding relationship with the first real topic, and the first traffic corresponding to the first real topic may be used.
  • the at least one target virtual topic includes: all virtual topics in a plurality of virtual topics corresponding to the first real topic, or, among the plurality of virtual topics corresponding to the first real topic, a traffic ratio ranges from large to Small before at least one virtual topic.
  • the traffic ratio is a ratio of the second traffic corresponding to the virtual topic to the first traffic of the first real topic.
  • a traffic collection (Metric Collector) module deployed in the data storage system or a traffic collection process created in the data storage system may be used to obtain the second traffic corresponding to the virtual topic.
  • the traffic collection module or the traffic collection process may also be used to obtain the first traffic corresponding to the first real topic.
  • the sum of the second traffic of all virtual topics corresponding to the first real topic is the first real topic. The first traffic of the topic, so the sum of the second data amount of all virtual topics can be determined as the first data amount.
  • an estimation model may be used to estimate the traffic corresponding to the pre-stored data amount according to the second traffic and the first traffic.
  • the input parameters and output parameters of the prediction model may include: at least one set of parameters, and the at least one set of parameters corresponds to at least one target virtual topic.
  • the corresponding set of input parameters may include: the identity of the first real topic, the first traffic of the first real topic, the identity of the target virtual topic, and the second traffic of the target virtual topic Ratio to the first flow.
  • the output parameters of the corresponding group may include: estimated traffic, an identifier of the target virtual topic, and a ratio of the third traffic of the target virtual topic to the estimated traffic. It should be noted that when estimating the estimated traffic corresponding to the amount of pre-stored data of messages stored in the first real topic where the first real partition is located within a preset time period, the output parameter may further include the first The identity of the real topic.
  • the format of the input parameter may be ⁇ the identification of the first real topic, the first traffic of the first real topic, and the first target The identification of the virtual topic, the ratio of the second traffic to the first traffic of the first target virtual topic ⁇ , ..., ⁇ the identification of the first real topic, the first traffic of the first real topic, the Nth The identifier of the target virtual topic, the ratio of the second traffic to the first traffic of the Nth target virtual topic ⁇ .
  • the format of the output parameter can be ⁇ estimated traffic, identification of the first target virtual topic, ratio of third traffic to estimated traffic of the first target virtual topic ⁇ , ..., ⁇ estimated Traffic, the identification of the Nth target virtual topic, the ratio of the third traffic to the estimated traffic of the Nth target virtual topic ⁇ .
  • the format of the output parameter may be ⁇ ⁇ identification of topic, corresponding estimated traffic, identification of the first target virtual topic, ratio of third traffic of the first target virtual topic to corresponding estimated traffic ⁇ , ..., ⁇ first true
  • the input parameters of the corresponding group may include: the identifier of the first real topic, the first traffic of the first real topic, the identifier of the target virtual topic, and the first virtual topic's Second flow.
  • the output parameters of the corresponding group may include: estimated traffic, identification of the target virtual topic, and third traffic of the target virtual topic. It should be noted that when estimating the estimated traffic corresponding to the amount of pre-stored data of messages stored in the first real topic where the first real partition is located within a preset time period, the output parameter may further include the first The identity of the real topic.
  • the target virtual topic includes at least one virtual topic that has a traffic ratio (or data volume ratio) corresponding to the first real topic from large to small, in each estimation process, only the traffic is required.
  • the proportion is estimated by at least one virtual topic from the largest to the smallest, which can reduce the amount of data to be processed in the estimation process, thereby speeding up the estimation speed. And it can correspondingly reduce the number of samples used in training the prediction model, thereby shortening the training time.
  • the embodiment of the present application can use the LSTM neural network to implement the above prediction.
  • the following uses the prediction model as an LSTM neural network as an example to explain the prediction process:
  • X (t-1), X (t), and X (t + 1) are the inputs of the LSTM neural network at time t-1, t, and t + 1 respectively, that is, Input parameters at time t-1, t, and t + 1.
  • h (t-1), h (t), and h (t + 1) are the outputs of the hidden layer of the LSTM neural network at time t-1, t, and t + 1, respectively.
  • C (t-1), C (t), and C (t + 1) are the cell states passed from time t-1, t, and t + 1 to the next time, respectively.
  • the function of the LSTM neural network is mainly implemented through three gates, namely, a forget gate, an input gate, and an output gate.
  • the forgetting gate is used to decide what information is discarded from the cell state.
  • the threshold ⁇ 1 is used to control the amount of data passing through the forgetting gate.
  • [H t-1 , x t ] represents the vector concatenation of the output state h t-1 and the current input state x t at the previous moment
  • W f is the weight matrix of the forgetting gate
  • B f is the bias term of the forgetting gate.
  • the values of W f and B f can be set according to actual needs.
  • the input gate is used to determine how much information in the input information needs to be retained in the current state of the cell. Its function is mainly realized by the input threshold layer ( ⁇ 2 ) and the tanh1 layer.
  • the input threshold layer ( ⁇ 2 ) is used to decide which values to update.
  • the tanh1 layer is used to create a new candidate vector and add it to the cell state.
  • the combination of the current memory C t1 and the long-term memory C (t-1) further realizes the estimation of the traffic after the current time based on the traffic before the current time.
  • the output gate is used to determine how much information in the cell state needs to be output to the output state, and its function is implemented by the output threshold layer ( ⁇ 3 ) and the tanh2 layer.
  • the output threshold layer ( ⁇ 3 ) determines which parts of the cell transition need to be output.
  • the tanh2 layer is used to process the cell state and output a value in the range [-1, 1].
  • the values of the above-mentioned thresholds ⁇ 1 , ⁇ 2, and ⁇ 3 can be set according to actual needs.
  • Step 206 When the amount of pre-stored data is greater than the data amount threshold, establish a correspondence between the virtual storage address and the second real storage address.
  • the amount of pre-stored data is the estimated data amount of the message specified by the second message storage request received within a preset period of time
  • the amount of pre-stored data is greater than the first threshold value, it indicates that the amount of data specified by the second message storage request is The message has a large storage requirement.
  • the correspondence between the virtual storage address and the real storage address can be modified to correspond to the virtual storage address and the second real storage address, so as to store the second message storage request with more specified messages in It is more capable of supporting real storage in this storage requirement, thereby improving the storage performance of the message storage system.
  • the amount of pre-stored data is an estimated amount of data to be stored in a first real topic where the first real partition is located within a preset time period
  • the amount of pre-stored data is greater than a second threshold, it indicates that the first real The partition may not be able to support the message storage needs within the preset time period.
  • the correspondence relationship between the virtual storage address and the real storage address may be It is modified that the virtual storage address corresponds to the second real storage address, so that messages to be stored in the first real topic where the first real partition is located are stored in the second real storage address, thereby improving the storage performance of the message storage system.
  • the first threshold value and the second threshold value may be determined according to actual needs, and the first threshold value and the second threshold value may be equal or different, and this embodiment of the present application does not specifically limit time.
  • This step 206 is an explanation of the implementation process of changing the correspondence between the virtual storage address corresponding to the target virtual topic and the real storage address.
  • one target virtual topic in at least one target virtual topic is used as an example to describe it.
  • the virtual storage addresses corresponding to other target virtual topics in the at least one target virtual topic are changed to
  • the implementation process of step 206 may include:
  • Step 2061 Determine a second real topic based on the third data amount of the target virtual topic.
  • step 2061 may include:
  • Step 2061a Based on the third data amount of the target virtual topic, find a real topic whose available data amount is greater than the third data amount.
  • the available data amount is the difference between the data amount of the real topic and the pre-stored data amount estimated in step 205.
  • the data amount of the real topic is the maximum amount of data that the real topic can bear when performing read and write operations on the real topic.
  • the real topic can be determined as the second real topic, that is, execute Step 2061b.
  • the amount of available data of the real topic is not greater than the third amount of data of the target virtual topic, it means that the real topic cannot bear the third amount of data of the target virtual topic.
  • a usable data amount in the message storage system may be greater than For the second real topic of the third data amount, step 2061c is performed.
  • the second real topic can also be determined according to the traffic. For example, based on the third flow of the target virtual topic, a real topic with available traffic greater than the third flow may be found, and when it is determined that there is a real topic with available traffic greater than the third flow, the real topic with available traffic greater than the third flow may be determined Is the second real topic, or when it is determined that there is no real topic with available traffic greater than the third flow, a second real topic is created in the message storage system.
  • the third traffic of the target virtual topic is 56 megabits per second (MB / S)
  • five real topics are configured in the message storage system, which are real topic1, real topic2, real topic3, real topic4, and real topic5.
  • the available traffic of the five real topics is 50MB / S, 70MB / S, 40MB / S, 55MB / S, and 30MB / S.
  • the available traffic of the real topic 2 is greater than the third traffic of the target virtual topic.
  • This real topic 2 may be determined as the second real topic, that is, step 2061b is performed.
  • Step 2061b When it is determined that there is a real topic with a larger amount of available data than the third data amount, determine a real topic with a larger amount of available data than the third data amount as the second real topic.
  • the process of finding a real topic with an amount of available data greater than the third amount of data it may be found that there are multiple real topics with an amount of available data greater than the third amount of data in the message storage system.
  • the largest real topic is determined as the second real topic to ensure that the real topic can be effectively used and to reduce the probability of modifying the corresponding relationship again due to the small amount of data available for the real topic.
  • Step 2061c When it is determined that there is no real topic with an available data amount greater than the third data amount, a second real topic is created in the message storage system.
  • a real topic with an amount of available data greater than the third amount of data may be created in the message storage system, and the created real topic is determined as the second real amount topic in order to establish a correspondence between a virtual storage address and a second real storage address including the second real topic.
  • each real topic usually includes multiple real partitions
  • the process of determining the second real partition may refer to the process of determining the second real topic accordingly.
  • the first real partition and the second real partition may be deployed on the same storage node or different storage nodes in the Kafka cluster, which is not specifically limited in this embodiment of the present application.
  • the workload (traffic or data amount) of the virtual topic specified by the virtual storage address can be distributed to different storage nodes to Reduce the workload imbalance of multiple topics in the same storage node, and reduce the probability of multiple topics occupying an uneven resource consumption in a storage node.
  • step 2061b when the real partition determined in step 2061b includes the real partition in the first storage node and the real partition in other storage nodes, it may take precedence choose to determine the real partition in the first storage node as the second real partition.
  • Step 2062 The correspondence between the virtual storage address corresponding to the target virtual topic and the real storage address is modified to correspond to the virtual storage address corresponding to the second real storage address including the second real topic, and the modified association relationship is stored in the target.
  • the index file corresponding to the virtual topic In the index file corresponding to the virtual topic.
  • the message designated to be stored in the target virtual topic can be stored in the second real partition data file of the second real topic, thereby realizing the The message designated to be stored in the virtual storage address is stored in the second real storage address.
  • the modified corresponding relationship may also be stored in an index file corresponding to the target virtual topic, so that the message can be stored and searched according to the modified corresponding relationship.
  • the modified correspondence relationship may be stored in a mapping record index corresponding to the target virtual topic.
  • the mapping index entry of the mapping record index records a real partition identifier length field and a real partition identifier field
  • the The identifier of the second real partition in the second real topic is recorded in the real partition identifier length field
  • the identifier length of the second real topic is recorded in the real partition identifier field, so as to determine corresponding to the target virtual topic according to the identifier.
  • Real topic when the message storage system is a Kafka cluster-based message storage system, the modified correspondence can also be saved on zookeeper (a distributed application coordination service) for subsequent use.
  • Step 2063 Determine the message offset of the first message in the second real topic, and store the message offset in an index file corresponding to the target virtual topic.
  • the first message is the first message stored in the second real topic based on the correspondence between the virtual storage address and the second real storage address. After modifying the corresponding relationship, the messages designated to be stored in the virtual storage address are all stored in the second real topic. Before the correspondence is modified, since the messages specified to be stored in the virtual storage address are all stored in the first real topic. Therefore, after modifying the correspondence relationship, it is necessary to determine the message offset of the first message stored in the virtual storage address according to the modified correspondence relationship, so as to facilitate the subsequent storage and search of messages based on the modified correspondence relationship. This message offset stores and looks up the message.
  • the message offset may also be stored in an index file corresponding to the target virtual topic, so that according to the message offset, a message stored in the virtual storage address is specified. , Distinguish between messages stored in the first real topic and messages in the second real topic.
  • the message offset may be stored in a mapping record index corresponding to the target virtual topic. When the message logical sequence number field is recorded in the mapping index entry of the mapping record index, the message offset may be recorded in The message is in the logical sequence number field.
  • the estimation model corresponding to the first real topic and the second real topic needs to be performed.
  • an estimation model needs to be created for the created real topic in order to estimate the traffic of the real topic.
  • the second data amount (or second traffic) of the at least one target virtual topic can be changed from large to small. In order, modify the corresponding relationship of each target virtual topic in turn.
  • the second data amount (or second traffic) it is necessary to select according to the available data amount (or available traffic) of the real topic.
  • the real topic with a larger amount of available data (or available traffic) can be determined as the target virtual topic with a larger amount of second data (or second traffic).
  • the second real topic makes the real topic in the message storage system can be effectively used and reduces the chance of secondary modification of the corresponding relationship of the target virtual topic.
  • a Metric Collector module configured to periodically or real-time obtain the traffic of all virtual topics and real topics in the system and save them in the traffic topic, that is, the traffic collection module can be used to perform the above step 2051.
  • the traffic summary module can periodically read the traffic information from the traffic topic, and input the current traffic of the topic and the virtual topic to the deep learning estimation module.
  • the deep learning estimation module can use the LSTM neural network to estimate the traffic of the real topic and the virtual topic in a preset time period, that is, the deep learning estimation module can be used to perform the above step 2052.
  • the topic migration module may modify the corresponding relationship of the virtual topic according to the traffic estimated by the deep learning estimation module, that is, the topic migration module may be used to perform step 206 described above.
  • the messages specified to be stored in the virtual storage address can be stored in different real storage addresses.
  • the amount of data (or traffic) of each logical topic is not balanced, it can be reduced. The odds that the resources occupied by each logical topic are uneven.
  • Step 207 Receive a second message storage request for storing a message in the Kafka cluster, where the second message storage request specifies that the message specified by the second message storage request is stored in a virtual storage address.
  • the receiving time of the second message storage request is later than the receiving time of the first message storage request.
  • step 207 please refer to the implementation of step 201 accordingly.
  • Step 208 Determine a second real storage address corresponding to the virtual storage address based on the correspondence between the virtual storage address and the second real storage address.
  • the correspondence between the virtual storage address and the real storage address has been modified to the virtual storage address and the second storage address.
  • the real storage address corresponds. Therefore, the real storage address corresponding to the virtual storage address may be determined as the second real storage address according to the correspondence relationship.
  • the second real storage address includes an identifier of a second real topic and an identifier of a second real partition.
  • Step 209 In the second real partition in the second real topic specified by the second real storage address, store the message specified by the second message storage request.
  • Virtual partitions may be deployed in the same storage node, or they may be deployed in different storage nodes. Therefore, before storing the message, it is necessary to determine whether the second real partition in the second real topic and the virtual partition in the virtual topic are deployed in the same storage node. In addition, when the second real partition in the second real topic and the virtual partition in the virtual topic are deployed in the same storage node, the message may be directly stored in the second real storage address.
  • the message needs to be sent to the other storage node for the other storage node to store the message in the storage node.
  • the second real storage address of the other storage node In the second real storage address of the other storage node.
  • Step 210 Generate index information according to the storage location of the message specified by the second message storage request, and store the index information in an index file corresponding to the virtual topic indicated by the virtual storage address.
  • the second real partition in the second real topic and the virtual partition in the virtual topic are deployed in the same storage node, please refer to the implementation process of step 204 for the implementation process of step 210 accordingly.
  • the second real partition in the second real topic and the virtual partition in the virtual topic are deployed in different storage nodes, after other storage nodes store messages in the other storage nodes, they can be used by the background in the message storage system.
  • the thread obtains the index information and sends the index information to the first storage node to store the index information in the first storage node.
  • the action of the background thread sending index information to the first storage node may be actively performed by the background thread or passively performed by the background thread.
  • a background thread can be automatically triggered to make the background thread obtain the index information, and then the background thread actively pushes the index information to the first storage node, so that the first The storage node stores the index information.
  • the first storage node may send an index information pull request to the background thread. After receiving the index information pull request, the background thread may obtain the index information and send the index information to the first storage node.
  • the index information is stored on the first storage node by storing messages on other storage nodes, It can realize the separate storage of messages and index information, and then decouple the relationship between the real storage address and the virtual storage address.
  • the workload (traffic or data volume) of the virtual topic indicated by the virtual storage address can be distributed to different storage nodes. Reduce the workload imbalance of multiple topics in the same storage node, and reduce the probability of multiple topics occupying an uneven resource consumption in a storage node.
  • the correspondence between the virtual storage address and the real storage address may also be expressed as the correspondence between the virtual topic and the real topic.
  • the process of message storage may also be performed according to the correspondence between the virtual topic and the real topic.
  • the message storage method may include: receiving a first message storage request for storing a message in the Kafka cluster, the first message storage request specifying storing the message on a virtual topic topic; based on a correspondence between the virtual topic and a first real topic To determine a first real topic corresponding to the virtual topic; and store a message specified by the first message storage request in a real partition of the first real topic.
  • the implementation of the message storage according to the correspondence between the virtual topic and the real topic can be referred to the above steps 201 to 210, which will not be repeated here.
  • the message storage method determines a real storage address for storing messages by receiving a message storage request for storing a message in a Kafka cluster, and according to a correspondence between a virtual storage address and a real storage address. , And store the message in the real partition specified by the real storage address to realize the storage of the message.
  • the messages designated to be stored in the virtual storage address can be stored in different In the real storage address, compared with the related technology, the probability of overloading the real partition workload in the real topic is reduced, and the throughput of the message storage system is improved.
  • An embodiment of the present application further provides a message reading method.
  • the message reading method may include:
  • Step 601 Receive a message reading request for reading a message in a Kafka cluster.
  • the client can send a message read request to the first storage node.
  • the message read request specifies reading a message from a virtual storage address, where the virtual storage address includes an identifier of a virtual topic topic and an identifier of a virtual partition.
  • Step 602 Determine a target real storage address corresponding to the virtual storage address based on the correspondence between the virtual storage address and the real storage address.
  • the message read request usually carries the target offset of the message to be read.
  • the implementation process of step 602 may include:
  • Step 6021 Obtain a target index file of the message to be read based on the target offset.
  • a binary index method can be used to find the target index file of the message to be read in the storage node.
  • the target index file may include a target data record index and a target mapping record index.
  • the target data record index is used to indicate an offset of the message to be read in a real partition.
  • the target mapping record index is used to indicate a correspondence between a virtual storage address and a real storage address for storing the message to be read.
  • Step 6022 Obtain the message offset of the first message recorded in the target index file.
  • the first message is the first message stored in the real topic specified by the current correspondence.
  • the target index file is an index file corresponding to the virtual topic specified by the virtual storage address.
  • the message offset of the first message may be obtained in the target mapping record index.
  • the current correspondence relationship is a modified correspondence relationship between the correspondence relationship between the virtual storage address and the real storage address during the use of the message storage system.
  • the correspondence relationship before the modification of the correspondence relationship between the virtual storage address and the real storage address is a historical correspondence relationship.
  • the real storage address recorded in the historical correspondence relationship is different from the real storage address recorded in the current correspondence relationship.
  • the offset of the message stored based on the current correspondence is greater than the offset of the message stored based on the historical correspondence.
  • the messages stored based on the current correspondence are stored in the real storage address specified by the current correspondence.
  • the message stored based on the historical correspondence is stored in the real storage address specified by the historical correspondence. Therefore, before determining the target real storage address, it is necessary to first obtain the message offset of the first message, and compare the message offset of the first message with the target offset to determine that the target real storage address is
  • the real storage address specified by the historical correspondence relationship is also the real storage address specified by the current correspondence relationship, thereby ensuring that messages can be effectively read. And when the target offset is smaller than the message offset, it is determined that the target real storage address is the real storage address specified by the historical correspondence, and step 6024 is executed at this time. When the target offset is greater than or equal to the message offset, it is determined that the real storage address is the real storage address specified by the current correspondence, and step 6023 is performed at this time.
  • the real storage address can be determined.
  • the real storage address specified for the historical correspondence relationship may be determined to execute step 6024 at this time.
  • Step 6023 When the target offset is greater than or equal to the message offset, determine the real storage address recorded in the current correspondence relationship as the target real storage address.
  • the target offset is greater than or equal to the message offset
  • the real storage address is the real storage address specified by the current correspondence.
  • the real storage address recorded in the current correspondence can be determined as the target.
  • a real storage address, and the target real storage address includes an identifier of the target real topic and an identifier of the target real partition.
  • Step 6024 When the target offset is smaller than the message offset, determine the real storage address recorded in the historical correspondence relationship as the target real storage address.
  • the target offset is less than the message offset
  • you can determine that the real storage address is the real storage address specified by the historical correspondence.
  • Step 603 Read the message specified by the message read request at the target real partition specified by the target real storage address.
  • the message to be read can be read in the target real partition specified by the target real storage address according to the offset of the message to be read recorded in the target data record index in the real partition. .
  • step 603 According to different storage manners of the message and the index information, there are some differences in the implementation manner of step 603, and the following two aspects are described below:
  • the target real partition designated by the target real storage address when the target real partition designated by the target real storage address is located in the first storage node, that is, the index information is stored in the same storage node as the message to be read, at this time, it can be read in the target real partition The message to be read.
  • the first storage node may send the information to the other storage node.
  • the node sends target index information for the other storage node to obtain the message to be read based on the target index information, and sends a second message read response carrying the message to be read to the first storage node.
  • the first storage node may obtain the message to be read according to the second message read response.
  • the other storage node is a storage node to which the real partition belongs.
  • the target index information includes information of a target real partition designated by a target real storage address.
  • Step 604 Send a first message reading response carrying a message to be read.
  • the first storage node may send the first message read response to the client that sent the message read request, so that the client can obtain the to-be-read carried in the first message read response. Get the message.
  • the messages are continuously stored in the message file of the real partition, and the content recorded in each index entry is the index information corresponding to the continuously stored message, therefore, reading in When fetching messages, the messages in the message file corresponding to the index entry can be read in batches according to the index information continuously stored in the same index entry, thereby avoiding the discrete reading of messages.
  • a message aging mechanism can be set for data stored based on the correspondence relationship before modification, that is, when the storage time of the message in the first storage node reaches a preset time period , Delete the message.
  • the index can be rebuilt on the other nodes, so that the index information and the message are stored on the same node, thereby ensuring the efficiency of reading the message.
  • the correspondence between the virtual storage address and the real storage address may also be expressed as the correspondence between the virtual topic and the real topic.
  • the process of reading the message may also be performed according to the correspondence between the virtual topic and the real topic.
  • steps 601 to 604 for the implementation process of performing message reading according to the correspondence between the virtual topic and the real topic, reference may be made to steps 601 to 604, and details are not described herein again.
  • the message reading method determines the real target corresponding to the virtual storage address through the correspondence between the virtual storage address and the real storage address after receiving the message storage request for storing the message in the Kafka cluster. Store the address, and read the message specified by the message read request in the target real partition specified by the target real storage address to realize the reading of the message.
  • the device 700 may include:
  • the receiving module 701 is configured to receive a first message storage request for storing a message in a Kafka cluster.
  • the first message storage request specifies that a message specified by the first message storage request is stored in a virtual storage address.
  • the virtual storage address may include an identifier of a virtual topic topic and The ID of the virtual partition.
  • a determining module 702 configured to determine a first real storage address corresponding to the virtual storage address based on a correspondence between the virtual storage address and the first real storage address, and the first real storage address may include an identifier of a first real topic and a first real storage address; The ID of the partition.
  • the storage module 703 is configured to store, in the first real partition in the first real topic specified by the first real storage address, the message specified by the first message storage request.
  • the receiving module 701 is further configured to receive a second message storage request for storing a message in a Kafka cluster, where the second message storage request specifies that the message specified by the second message storage request is stored in a virtual storage address.
  • the determining module 702 is further configured to determine a second real storage address corresponding to the virtual storage address based on the correspondence between the virtual storage address and the second real storage address.
  • the second real storage address may include the identifier of the second real topic and the second real storage address. The ID of the real partition.
  • the storage module 703 is further configured to store, in the second real partition in the second real topic specified by the second real storage address, the message specified by the second message storage request.
  • the first real partition and the second real partition are deployed on different storage nodes in the Kafka cluster.
  • the receiving time of the second message storage request is later than the receiving time of the first message storage request.
  • the apparatus 700 may further include:
  • the estimation module 704 is configured to estimate a pre-stored data amount of the message specified by the second message storage request received within a preset time period.
  • the establishing module 705 is configured to establish a correspondence between a virtual storage address and a second real storage address when the amount of pre-stored data is greater than the first threshold.
  • the estimation module 704 may include:
  • An obtaining sub-module 7041 is configured to obtain, for at least one target virtual topic among a plurality of virtual topics corresponding to the first real topic, a second data amount of a message stored in each target virtual topic.
  • the obtaining sub-module 7041 is further configured to obtain a first data amount of a message stored in a first real topic.
  • the estimation submodule 7042 is configured to estimate the amount of pre-stored data based on the first data amount and the second data amount of each target virtual topic.
  • the estimation submodule 7042 is configured to: use an estimation model to estimate the amount of pre-stored data.
  • the input parameters and output parameters of the prediction model may include: at least one set of parameters, at least one set of parameters corresponding to at least one target virtual topic, and for each target virtual topic:
  • the input parameters may include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic, and a ratio of the second data amount of the target virtual topic to the first data amount.
  • the output parameters may include: the amount of pre-stored data, the identification of the target virtual topic, and the ratio of the third data amount to the first data amount of the target virtual topic.
  • the input parameters may include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic and the second data amount of the target virtual topic.
  • the output parameters may include: the amount of pre-stored data, the identification of the target virtual topic, and the third data amount of the target virtual topic.
  • the estimation module 704 is further configured to estimate an amount of pre-stored data of a message to be stored in a first real topic where the first real partition is located within a preset time period.
  • the establishing module 705 is further configured to establish a correspondence between the virtual storage address and the second real storage address when the amount of pre-stored data is greater than the second threshold.
  • the estimation module 704 may include:
  • An obtaining sub-module 7041 is configured to obtain, for at least one target virtual topic among a plurality of virtual topics corresponding to the first real topic, a second data amount of a message stored in each target virtual topic.
  • the obtaining sub-module 7041 is further configured to obtain a first data amount of a message stored in a first real topic.
  • the estimation submodule 7042 is configured to estimate the amount of pre-stored data based on the first data amount and the second data amount of each target virtual topic.
  • the estimation submodule 7042 is configured to: use an estimation model to estimate the amount of pre-stored data.
  • the input parameters and output parameters of the estimation model may include: at least one set of parameters, at least one set of parameters corresponding to at least one target virtual topic, and for each target virtual topic:
  • the input parameters may include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic, and a ratio of the second data amount of the target virtual topic to the first data amount.
  • the output parameters may include: the identifier of the first real topic and the amount of pre-stored data, the identifier of the target virtual topic, and the ratio of the third data amount of the target virtual topic to the first data amount.
  • the input parameters may include: the identifier of the first real topic and the first data amount, the identifier of the target virtual topic and the second data amount of the target virtual topic.
  • the output parameters may include: the identifier of the first real topic and the amount of pre-stored data, the identifier of the target virtual topic and the third amount of data of the target virtual topic.
  • the at least one target virtual topic may include: all virtual topics in the multiple virtual topics, or at least one virtual topic before the amount of data stored in the multiple virtual topics is large to small.
  • the establishment module 705 may include:
  • the searching submodule 7051 is configured to find a real topic whose available data amount is greater than the third data amount based on the third data amount of each target virtual topic, and the available data amount is the difference between the data amount quota of the real topic and the pre-stored data amount.
  • a determining submodule 7052 is configured to determine, when it is determined that there is a real topic with an available data amount greater than the third data amount, a real topic with an available data amount greater than the third data amount as the second real topic.
  • a determining submodule 7052 is configured to create a second real topic in the message storage system when it is determined that there is no real topic with an available data amount greater than the third data amount.
  • the modification submodule 7053 is configured to modify the correspondence between the virtual storage address corresponding to the target virtual topic and the real storage address into a virtual storage address corresponding to a second real storage address that may include a second real topic.
  • a determining submodule 7052 is configured to: when it is determined that there are multiple real topics with available data amount greater than the third data amount, determine the real topic corresponding to the maximum available data amount as the second real topic.
  • a establishing module 705 is configured to: for each at least one target virtual topic that has a corresponding relationship with the first real topic, establish each target in turn in accordance with the second data amount of the at least one target virtual topic Correspondence between the virtual storage address corresponding to the virtual topic and the second real storage address.
  • the establishing module 705 is further configured to:
  • the first message is the first message stored in the second real topic based on the correspondence between the virtual storage address and the second real storage address.
  • the message offset of the first message and the correspondence between the virtual storage address and the second real storage address are stored in an index file corresponding to the target virtual topic.
  • each real storage address has a corresponding relationship with a plurality of virtual storage addresses.
  • the receiving module determines a module for storing the message according to the correspondence between the virtual storage address and the real storage address.
  • the real storage address the storage module stores the message in the real partition designated by the real storage address, and realizes the storage of the message.
  • the messages designated to be stored in the virtual storage address can be stored in different In the real storage address, compared with the related technology, the probability of overloading the real partition workload in the real topic is reduced, and the throughput of the message storage system is improved.
  • the device 800 may include:
  • the receiving module 801 is configured to receive a message reading request for reading a message in a Kafka cluster.
  • the message reading request specifies reading a message from a virtual storage address.
  • the virtual storage address may include an identifier of a topic topic and an identifier of a virtual partition.
  • the determining module 802 is configured to determine a target real storage address corresponding to the virtual storage address based on the correspondence between the virtual storage address and the real storage address.
  • the target real storage address may include an identifier of the target real topic and an identifier of the target real partition.
  • the reading module 803 is configured to read the message specified by the message read request at the target real partition specified by the target real storage address.
  • the message reading request carries a target offset of the message to be read
  • the determining module 802 is configured to:
  • the first message is the first message stored in the real topic specified by the current correspondence based on the current correspondence between the virtual storage address and the real storage address.
  • the target The index file is an index file corresponding to the virtual topic specified by the virtual storage address.
  • the real storage address recorded in the current correspondence is determined as the target real storage address.
  • the real storage address recorded in the historical correspondence between the virtual storage address and the real storage address is determined as the target real storage address, and the real storage address recorded in the current correspondence and the historical correspondence The actual storage address recorded in it is different.
  • the message reading device determines the correspondence between the virtual storage address of the module and the real storage address after the receiving module receives the message storage request for storing the message in the Kafka cluster, and determines the correspondence with the virtual storage address.
  • the target real storage address the reading module reads the message specified by the message read request in the target real partition specified by the target real storage address, and realizes the reading of the message.
  • An embodiment of the present application further provides a server, and the server may include a processor and a memory.
  • the server executes the computer program stored in the memory
  • the server executes the message storage method provided in the embodiment of the present application.
  • the server 20 may include: a processor 22 and a signal interface 24.
  • the processor 22 includes one or more processing cores.
  • the processor 22 executes various functional applications and data processing by running software programs and modules.
  • the processor 22 may include one or more of a central processing unit, a digital signal processor, a microprocessor, a microcontroller, or an artificial intelligence processor, and may further optionally include a hardware accelerator required to perform an operation, such as Various logic operation circuits.
  • the signal interface 24 is used to establish a connection with other devices or modules.
  • the signal interface 24 may be connected to a transceiver. Therefore, optionally, the server 20 may further include a transceiver (not shown in the figure).
  • the transceiver specifically performs signal transmission and reception.
  • the processor 22 needs to perform a signal transmitting and receiving operation, it can call or drive the transceiver to perform a corresponding transmitting and receiving operation. Therefore, when the server 20 performs signal transmission and reception, the processor 22 is used to determine or initiate a transmission and reception operation, which is equivalent to the initiator, and the transceiver is used to perform specific transmission and reception, which is equivalent to the performer.
  • the transceiver may also be a transceiver circuit, a radio frequency circuit, or a radio frequency unit, which is not limited in this embodiment.
  • the server 20 further includes components such as a memory 26 and a bus 28.
  • the memory 26 and the signal interface 24 are connected to the processor 22 through a bus 28, respectively.
  • the memory 26 may be used to store software programs and modules. Specifically, the memory 26 may store at least one program module 262 required for a function, and the program may be an application program or a driver program.
  • the program module 262 may include:
  • the receiving unit 2621 has the same or similar functions as the receiving module 701.
  • the determining unit 2622 has the same or similar functions as the determining module 702.
  • the storage unit 2623 has the same or similar functions as the storage module 703.
  • An embodiment of the present invention further provides a storage medium.
  • the storage medium may be a non-volatile computer-readable storage medium.
  • a computer program is stored in the storage medium, and the computer program instructs the server to execute the message storage method provided by the embodiment of the present invention.
  • An embodiment of the present invention also provides a computer program product containing instructions.
  • the computer program product runs on a computer, the computer is caused to execute the message storage method provided by the embodiment of the present invention.
  • An embodiment of the present application further provides a server, and the server may include a processor and a memory.
  • the server executes the computer program stored in the memory
  • the server executes the message reading method provided in the embodiment of the present application.
  • the server 40 may include a processor 42 and a signal interface 44.
  • the processor 42 includes one or more processing cores.
  • the processor 42 executes various functional applications and data processing by running software programs and modules.
  • the processor 42 may include one or more of a central processing unit, a digital signal processor, a microprocessor, a microcontroller, or an artificial intelligence processor, and may further optionally include a hardware accelerator required to perform an operation, such as Various logic operation circuits.
  • the signal interface 44 may be multiple, and the signal interface 44 is used to establish a connection with other devices or modules.
  • the signal interface 44 may be connected to a transceiver. Therefore, optionally, the server 40 may further include a transceiver (not shown in the figure).
  • the transceiver specifically performs signal transmission and reception.
  • the processor 42 needs to perform a signal transceiving operation, it can call or drive the transceiver to perform the corresponding transceiving operation. Therefore, when the server 40 performs signal transmission and reception, the processor 42 is used to determine or initiate a transmission and reception operation, which is equivalent to the initiator, and the transceiver is used to perform specific transmission and reception, which is equivalent to the performer.
  • the transceiver may also be a transceiver circuit, a radio frequency circuit, or a radio frequency unit, which is not limited in this embodiment.
  • the server 40 further includes components such as a memory 46 and a bus 48.
  • the memory 46 and the signal interface 44 are connected to the processor 42 through a bus 48, respectively.
  • the memory 46 may be used to store software programs and modules. Specifically, the memory 46 may store at least one program module 462 required for a function, and the program may be an application program or a driver program.
  • the program module 462 may include:
  • the receiving unit 4621 has the same or similar functions as the receiving module 801.
  • the determining unit 4622 has the same or similar functions as the determining module 802.
  • the reading unit 4623 has the same or similar functions as the reading module 803.
  • An embodiment of the present invention also provides a storage medium.
  • the storage medium may be a non-volatile computer-readable storage medium.
  • a computer program is stored in the storage medium, and the computer program instructs the server to execute the message reading method provided by the embodiment of the present invention. .
  • An embodiment of the present invention also provides a computer program product containing instructions.
  • the computer program product runs on a computer, the computer is caused to execute the message reading method provided by the embodiment of the present invention.
  • the program may be stored in a computer-readable storage medium.
  • the storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种消息存储、读取方法及装置、服务器、存储介质,属于通信技术领域。该方法包括:消息存储方法,该方法应用于卡夫卡Kafka集群;该方法包括:接收在该Kafka集群存储消息的第一消息存储请求,该第一消息存储请求指定在虚拟存储地址存储该第一消息存储请求指定的消息,该虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识;基于虚拟存储地址与第一真实存储地址的对应关系,确定与该虚拟存储地址对应的第一真实存储地址,该第一真实存储地址包括第一真实topic的标识和第一真实分区的标识;在该第一真实存储地址指定的该第一真实topic中的该第一真实分区,存储该第一消息存储请求指定的消息。本申请实现了消息的存储。

Description

消息存储、读取方法及装置、服务器、存储介质 技术领域
本申请实施例涉及数据处理技术领域,特别涉及一种消息存储、读取方法及装置、服务器、存储介质。
背景技术
卡夫卡(简称:Kafka)系统是一种具有高吞吐量的分布式发送订阅消息系统。该Kafka系统可存储多类消息,每类消息称为一个话题(英文:topic),每个topic具有多个分区,每个topic的所有分区分担存储属于该topic的消息。
Kafka集群用于部署Kafka系统;Kafka集群具有多个存储节点;该存储节点可以是服务器或者其它具有计算能力的设备;例如Kafka集群中的多个存储节点可以是跨数据中心。Kafka系统中的每个topic可以部署在Kafka集群中的一个或多个存储节点上;如果topic是存储在多个存储节点上,则该topic具有的多个分区可以分布式地部署在该多个存储节点上;如果topic是存储在一个存储节点上,则该topic具有的多个分区均部署在该个存储节点上。
相关技术中,当客户端请求向Kafka集群存储消息时,该客户端可以指定用于存储消息的topic和分区。该存储请求会被发送至目标存储节点(部署有该topic的该分区的存储节点),目标存储节点部署有该topic的该分区的服务端。当该服务端接收到该存储请求时,该目标存储节点(具体是其上部署的该服务端)将该消息存储至该topic的该分区中。这种存储消息的方法,可能存在部分topic的部分分区工作负载过重的问题,尤其是在大量客户端集中指定向同一topic的同一分区存储消息的场景。
发明内容
本申请实施例提供了一种消息存储、读取方法及装置、服务器、存储介质,可以解决相关技术中可能存在部分topic的部分分区工作负载过重的问题的问题。所述技术方案包括:
根据本申请的第一方面,提供了一种消息存储方法,该方法应用于卡夫卡Kafka集群;该方法包括:接收在该Kafka集群存储消息的第一消息存储请求,该第一消息存储请求指定在虚拟存储地址存储该第一消息存储请求指定的消息,该虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识;基于虚拟存储地址与第一真实存储地址的对应关系,确定与该虚拟存储地址对应的第一真实存储地址,该第一真实存储地址包括第一真实topic的标识和第一真实分区的标识;在该第一真实存储地址指定的该第一真实topic中的该第一真实分区,存储该第一消息存储请求指定的消息。
本申请实施例提供的消息存储方法,在接收在Kafka集群存储消息的消息存储 请求后,通过根据虚拟存储地址与真实存储地址的对应关系,确定用于存储消息的真实存储地址,并将消息存储在该真实存储地址指定的真实分区中,实现了消息的存储。
可选地,该方法还包括:接收在该Kafka集群存储消息的第二消息存储请求,该第二消息存储请求指定在该虚拟存储地址存储该第二消息存储请求指定的消息;基于虚拟存储地址与第二真实存储地址的对应关系,确定与该虚拟存储地址对应的第二真实存储地址,该第二真实存储地址包括第二真实topic的标识和第二真实分区的标识;在该第二真实存储地址指定的该第二真实topic中的该第二真实分区,存储该第二消息存储请求指定的消息。
在虚拟存储地址与第二真实存储地址对应时,当接收到在该Kafka集群存储消息的第二消息存储请求,可以将指定存储在虚拟存储地址中的消息存储在第二真实存储地址中,能够将指定存储在相同虚拟存储地址中的消息存储在不同的真实存储地址中,能够将虚拟存储地址指定的虚拟topic的工作负载(流量或数据量)分摊到不同的存储节点上,能够减小同一存储节点中多个topic的工作负载不均衡程度,降低多个topic在某一存储节点中出现占用资源不均衡的几率。
其中,第一真实分区与第二真实分区可以部署在所述Kafka集群中的不同存储节点上。
并且,该第二消息存储请求的接收时间可以晚于该第一消息存储请求的接收时间。相应地,该方法还包括:在接收该第二消息存储请求之前,预估在预设时间段内接收的该第二消息存储请求所指定的消息的预存数据量;当该预存数据量大于第一阈值时,建立该虚拟存储地址与该第二真实存储地址的对应关系。
当预存数据量为预估的在预设时间段内接收的第二消息存储请求所指定的消息的数据量时,若该预存数据量大于第一阈值,表示该第二消息存储请求所指定的消息具有较大的存储需求,此时,可以将虚拟存储地址与真实存储地址的对应关系修改为虚拟存储地址与第二真实存储地址对应,以将该第二消息存储请求多指定的消息存储在更有能力支撑该存储需求的真实分区中,进而提高消息存储系统的存储性能。
作为一种可实现方式,该预估在预设时间段内接收的该第二消息存储请求所指定的消息的预存数据量的实现过程,可以包括:对于与该第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic,获取在每个目标虚拟topic中存储的消息的第二数据量;获取在该第一真实topic中存储的消息的第一数据量;基于该第一数据量和每个目标虚拟topic的第二数据量,预估该预存数据量。
可选地,该预估预存数据量的实现过程,可以包括:采用预估模型预估该预存数据量;其中,该预估模型的输入参数和输出参数均包括:至少一组参数,该至少一组参数与至少一个目标虚拟topic一一对应,对于每个目标虚拟topic:该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识,该目标虚拟topic的第二数据量与该第一数据量的比值;该输出参数包括:该预存数据量,该目标虚拟topic的标识,该目标虚拟topic的第三数据量与该第一数据量的比值。
或者,该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识和该目标虚拟topic的第二数据量;该输出参数包括:该预存数据量,该目标虚拟topic的标识和该目标虚拟topic的第三数据量。
当该第二消息存储请求的接收时间晚于该第一消息存储请求的接收时间时,该方法还可以包括:在接收该第二消息存储请求之前,预估在预设时间段内待在该第一真实分区所在的第一真实topic中存储的消息的预存数据量;当该预存数据量大于第二阈值时,建立该虚拟存储地址与该第二真实存储地址的对应关系。
当预存数据量为预估的在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的数据量时,若该预存数据量大于第二阈值,表示该第一真实分区可能无法支撑该预设时间段内的消息存储需求,此时,为了有效存储该待存储的消息,以及保证该第一真实分区的存储性能,可以将虚拟存储地址与真实存储地址的对应关系修改为虚拟存储地址与第二真实存储地址对应,以将待存储在该第一真实分区所在的第一真实topic中的消息存储在第二真实存储地址中,进而提高消息存储系统的存储性能。其中,该第一阈值和该第二阈值可以根据实际需要确定,且该第一阈值和该第二阈值可以相等或不等,本申请实施例对次不做具体限定。
作为一种可实现方式,该预估在预设时间段内待在该第一真实分区所在的第一真实topic中存储的消息的预存数据量的实现过程,可以包括:对于与该第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic,获取在每个目标虚拟topic中存储的消息的第二数据量;获取在该第一真实topic中存储的消息的第一数据量;基于该第一数据量和每个目标虚拟topic的第二数据量,预估该预存数据量。
可选地,该预估该预存数据量的实现过程,可以包括:采用预估模型预估该预存数据量;其中,该预估模型的输入参数和输出参数均包括:至少一组参数,该至少一组参数与至少一个目标虚拟topic一一对应,对于每个目标虚拟topic:该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识,该目标虚拟topic的第二数据量与该第一数据量的比值;该输出参数包括:该第一真实topic的标识和该预存数据量,该目标虚拟topic的标识,该目标虚拟topic的第三数据量与该第一数据量的比值。
或者,该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识和该目标虚拟topic的第二数据量;该输出参数包括:该第一真实topic的标识和该预存数据量,该目标虚拟topic的标识和该目标虚拟topic的第三数据量。
其中,至少一个目标虚拟topic包括:多个虚拟topic中的所有虚拟topic,或者,多个虚拟topic中存储的数据量由大到小的前至少一个虚拟topic。
进一步地,该建立该虚拟存储地址与该第二真实存储地址的对应关系的实现过程,可以包括:
基于每个目标虚拟topic的第三数据量,查找可用数据量大于该第三数据量的真实topic,该可用数据量为该真实topic的数据量额度与该预存数据量的差值;当 确定存在可用数据量大于该第三数据量的真实topic时,将该可用数据量大于该第三数据量的真实topic确定为该第二真实topic;当确定不存在可用数据量大于该第三数据量的真实topic时,在该消息存储系统中创建该第二真实topic;将与该目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系,修改为该虚拟存储地址与包括该第二真实topic的第二真实存储地址对应。
通过修改虚拟存储地址与真实存储地址的对应关系,使得指定存储至虚拟存储地址中的消息能够存储在不同的真实存储地址中,能够在各个逻辑topic的数据量(或流量)不均衡时,减小各个逻辑topic占用的资源不均衡的几率。并且,通过修改该对应关系,使得无需对修改之前根据虚拟存储地址存储在第一真实存储地址中的消息进行迁移,使得在出现资源占用不均衡时,能够及时地将消息存储在第二真实存储地址中,并缩短对数据进行迁移所耗费的时长,可以解决相关技术中迁移时间过长和迁移不及时的问题,进而减小对磁盘的占用率,并提高了消息存储系统的吞吐率。同时,通过对数据量(或流量)进行预估,并根据预估结果修改该对应关系,能够根据该预估结果提前为消息预留资源,避免因迁移不及时造成的存储节点的崩溃。
其中,当确定存在多个可用数据量大于该第三数据量的真实topic时,可以将最大可用数据量对应的真实topic确定为该第二真实topic。
并且,该建立该虚拟存储地址与该第二真实存储地址的对应关系,可以包括:对于与该第一真实topic存在对应关系的至少一个目标虚拟topic,按照至少一个目标虚拟topic的第二数据量由大到小的顺序,依次建立每个目标虚拟topic对应的虚拟存储地址与该第二真实存储地址的对应关系。
进一步的,该建立该虚拟存储地址与该第二真实存储地址的对应关系,还可以包括:确定第一消息在该第二真实topic中的消息偏移量,该第一消息为基于该虚拟存储地址与该第二真实存储地址的对应关系,存储在该第二真实topic中的第一个消息;将该第一消息的消息偏移量,及该虚拟存储地址与该第二真实存储地址的对应关系,存储在该目标虚拟topic对应的索引文件中。
可选地,每个真实存储地址与多个虚拟存储地址存在对应关系。
本申请实施例提供的消息存储方法,在接收在Kafka集群存储消息的消息存储请求后,通过根据虚拟存储地址与真实存储地址的对应关系,确定用于存储消息的真实存储地址,并将消息存储在该真实存储地址指定的真实分区中,实现了消息的存储。
并且,通过对消息存储请求所指定的待存储消息进行预估,根据预估的数据量修改虚拟存储地址与真实存储地址的对应关系,使得指定存储至虚拟存储地址中的消息能够存储在不同的真实存储地址中,相较于相关技术,减小了真实topic中真实分区工作负载过重的几率,提高了消息存储系统的吞吐率。
根据本申请的第二方面,提供了一种消息存储方法,该方法可以应用于卡夫卡Kafka集群;该方法包括:接收在该Kafka集群存储消息的消息存储请求,该消息存储请求指定在虚拟topic存储该消息;基于该虚拟topic与第一真实topic的对 应关系,确定与该虚拟topic对应的该真实topic;在该真实topic的真实分区中存储该消息存储请求指定的消息。
本申请实施例提供的消息存储方法,在接收在Kafka集群存储消息的消息存储请求后,通过根据虚拟topic与真实topic的对应关系,可以确定用于存储消息的真实topic,并将消息存储在该真实topic指定的真实分区中,实现了消息的存储。
可选地,该方法还可以包括:建立虚拟topic与真实topic的对应关系。
根据本申请的第三方面,提供了一种消息读取方法,该方法应用于卡夫卡Kafka集群;该方法包括:接收在该Kafka集群读取消息的消息读取请求,该消息读取请求指定从虚拟存储地址读取消息,该虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识;基于该虚拟存储地址与真实存储地址的对应关系,确定与该虚拟存储地址对应的目标真实存储地址,该目标真实存储地址包括目标真实topic的标识和目标真实分区的标识;在该目标真实存储地址指定的目标真实分区读取该消息读取请求所指定的消息。
本申请实施例提供的消息读取方法,在接收在Kafka集群存储消息的消息存储请求后,通过虚拟存储地址与真实存储地址的对应关系,确定与虚拟存储地址对应的目标真实存储地址,并在目标真实存储地址指定的目标真实分区读取消息读取请求所指定的消息,实现了消息的读取。
可选地,该消息读取请求中携带有待读取消息的目标偏移量,该基于该虚拟存储地址与真实存储地址的对应关系,确定与该虚拟存储地址对应的目标真实存储地址,包括:获取目标索引文件中记载的第一消息的消息偏移量,该第一消息为基于虚拟存储地址与真实存储地址的当前对应关系,存储在当前对应关系指定的真实topic中的第一个消息,该目标索引文件为该虚拟存储地址指定的虚拟topic对应的索引文件;当该目标偏移量大于或等于该消息偏移量时,将当前对应关系中记载的真实存储地址确定为目标真实存储地址;当目标偏移量小于消息偏移量时,将虚拟存储地址与真实存储地址的历史对应关系中记载的真实存储地址确定为目标真实存储地址。该当前对应关系为在消息存储系统的使用过程中,对该虚拟存储地址与真实存储地址的对应关系修改后的对应关系。对虚拟存储地址与真实存储地址的对应关系修改前的对应关系为历史对应关系,该历史对应关系中记载的真实存储地址与当前对应关系中记载的真实存储地址不同。且该基于当前对应关系存储的消息的偏移量大于基于历史对应关系存储的消息的偏移量。
由于基于当前对应关系存储的消息,存储在该当前对应关系所指定的真实存储地址中。基于历史对应关系存储的消息,存储在该历史对应关系所指定的真实存储地址中。因此,在确定目标真实存储地址前,需要先获取该第一消息的消息偏移量,并将该第一消息的消息偏移量与目标偏移量进行比较,以确定该目标真实存储地址为历史对应关系所指定的真实存储地址,还是当前对应关系所指定的真实存储地址,进而保证能够有效地读取消息。
根据本申请的第四方面,提供了一种消息存储装置,该装置包括:接收模块, 用于接收在Kafka集群存储消息的第一消息存储请求,该第一消息存储请求指定在虚拟存储地址存储该第一消息存储请求指定的消息,该虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识;确定模块,用于基于虚拟存储地址与第一真实存储地址的对应关系,确定与该虚拟存储地址对应的第一真实存储地址,该第一真实存储地址包括第一真实topic的标识和第一真实分区的标识;存储模块,用于在该第一真实存储地址指定的该第一真实topic中的该第一真实分区,存储该第一消息存储请求指定的消息。
可选地,该接收模块,用于接收在该Kafka集群存储消息的第二消息存储请求,该第二消息存储请求指定在该虚拟存储地址存储该第二消息存储请求指定的消息;该确定模块,用于基于虚拟存储地址与第二真实存储地址的对应关系,确定与该虚拟存储地址对应的第二真实存储地址,该第二真实存储地址包括第二真实topic的标识和第二真实分区的标识;该存储模块,用于在该第二真实存储地址指定的该第二真实topic中的该第二真实分区,存储该第二消息存储请求指定的消息。
可选地,第一真实分区与第二真实分区部署在Kafka集群中的不同存储节点上。
可选地,该第二消息存储请求的接收时间晚于该第一消息存储请求的接收时间。
可选地,该装置还包括:预估模块,用于预估在预设时间段内接收的该第二消息存储请求所指定的消息的预存数据量;建立模块,用于当该预存数据量大于第一阈值时,建立该虚拟存储地址与该第二真实存储地址的对应关系。
可选地,该预估模块,包括:获取子模块,用于对于与该第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic,获取在每个目标虚拟topic中存储的消息的第二数据量,该;该获取子模块,用于获取在该第一真实topic中存储的消息的第一数据量;预估子模块,用于基于该第一数据量和每个目标虚拟topic的第二数据量,预估该预存数据量。
可选地,该预估子模块,用于:采用预估模型预估该预存数据量;其中,该预估模型的输入参数和输出参数均包括:至少一组参数,该至少一组参数与至少一个目标虚拟topic一一对应,对于每个目标虚拟topic:该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识,该目标虚拟topic的第二数据量与该第一数据量的比值;该输出参数包括:该预存数据量,该目标虚拟topic的标识,该目标虚拟topic的第三数据量与该第一数据量的比值。
或者,该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识和该目标虚拟topic的第二数据量;该输出参数包括:该预存数据量,该目标虚拟topic的标识和该目标虚拟topic的第三数据量。
可选地,该装置包括:预估模块,用于预估在预设时间段内待在该第一真实分区所在的第一真实topic中存储的消息的预存数据量;建立模块,用于当该预存数据量大于第二阈值时,建立该虚拟存储地址与该第二真实存储地址的对应关系。
可选地,该预估模块,包括:获取子模块,用于对于与该第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic,获取在每个目标虚拟topic 中存储的消息的第二数据量,该;该获取子模块,用于获取在该第一真实topic中存储的消息的第一数据量;预估子模块,用于基于该第一数据量和每个目标虚拟topic的第二数据量,预估该预存数据量。
可选地,该预估子模块,用于:采用预估模型预估该预存数据量;其中,该预估模型的输入参数和输出参数均包括:至少一组参数,该至少一组参数与至少一个目标虚拟topic一一对应,对于每个目标虚拟topic:该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识,该目标虚拟topic的第二数据量与该第一数据量的比值;该输出参数包括:该第一真实topic的标识和该预存数据量,该目标虚拟topic的标识,该目标虚拟topic的第三数据量与该第一数据量的比值。
或者,该输入参数包括:该第一真实topic的标识和该第一数据量,该目标虚拟topic的标识和该目标虚拟topic的第二数据量;该输出参数包括:该第一真实topic的标识和该预存数据量,该目标虚拟topic的标识和该目标虚拟topic的第三数据量。
可选地,至少一个目标虚拟topic包括:多个虚拟topic中的所有虚拟topic,或者,多个虚拟topic中存储的数据量由大到小的前至少一个虚拟topic。
可选地,该建立模块,包括:查找子模块,用于基于每个目标虚拟topic的第三数据量,查找可用数据量大于第三数据量的真实topic,可用数据量为真实topic的数据量额度与预存数据量的差值;确定子模块,用于当确定存在可用数据量大于第三数据量的真实topic时,将可用数据量大于第三数据量的真实topic确定为第二真实topic;确定子模块,用于当确定不存在可用数据量大于第三数据量的真实topic时,在消息存储系统中创建第二真实topic;修改子模块,用于将与该目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系,修改为该虚拟存储地址与包括该第二真实topic的第二真实存储地址对应。
可选地,该确定子模块,还用于:当确定存在多个可用数据量大于该第三数据量的真实topic时,将最大可用数据量对应的真实topic确定为该第二真实topic。
可选地,该建立模块,还用于:对于与该第一真实topic存在对应关系的至少一个目标虚拟topic,按照至少一个目标虚拟topic的第二数据量由大到小的顺序,依次建立每个目标虚拟topic对应的虚拟存储地址与该第二真实存储地址的对应关系。
可选地,该建立模块,还用于:确定第一消息在该第二真实topic中的消息偏移量,该第一消息为基于该虚拟存储地址与该第二真实存储地址的对应关系,存储在该第二真实topic中的第一个消息;将该第一消息的消息偏移量,及该虚拟存储地址与该第二真实存储地址的对应关系,存储在该目标虚拟topic对应的索引文件中。
可选地,每个真实存储地址与多个虚拟存储地址存在对应关系。
根据本申请的第五方面,提供了一种消息读取装置,该装置包括:接收模块,用于接收在Kafka集群读取消息的消息读取请求,该消息读取请求指定从虚拟存储 地址读取消息,该虚拟存储地址包括虚拟topic的标识和虚拟分区的标识;确定模块,用于基于该虚拟存储地址与真实存储地址的对应关系,确定与该虚拟存储地址对应的目标真实存储地址,该目标真实存储地址包括目标真实topic的标识和目标真实分区的标识;读取模块,用于在该目标真实存储地址指定的目标真实分区读取该消息读取请求所指定的消息。
可选地,该消息读取请求中携带有待读取消息的目标偏移量,该确定模块,用于:获取目标索引文件中记载的第一消息的消息偏移量,第一消息为基于虚拟存储地址与真实存储地址的当前对应关系,存储在当前对应关系指定的真实topic中的第一个消息,目标索引文件为虚拟存储地址指定的虚拟topic对应的索引文件;当目标偏移量大于或等于消息偏移量时,将当前对应关系中记载的真实存储地址确定为目标真实存储地址;当目标偏移量小于消息偏移量时,将虚拟存储地址与真实存储地址的历史对应关系中记载的真实存储地址确定为目标真实存储地址,当前对应关系中记载的真实存储地址与历史对应关系中记载的真实存储地址不同。
根据本申请的第六方面,提供了一种服务器,包括处理器和存储器;在该处理器执行该存储器存储的计算机程序时,该服务器执行第一方面任一所述的消息存储方法。
根据本申请的第七方面,提供了一种服务器,包括处理器和存储器;在该处理器执行该存储器存储的计算机程序时,该服务器执行第二方面任一所述的消息读取方法。
根据本申请的第八方面,提供了一种存储介质,该存储介质内存储有计算机程序,该计算机程序指示服务器执行第一方面任一所述的消息存储方法。
根据本申请的第九方面,提供了一种存储介质,该存储介质内存储有计算机程序,该计算机程序指示服务器执行第二方面任一所述的消息读取方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种相关技术中基于kafka集群的消息存储系统的示意图;
图2是本申请实施例提供的一种消息存储系统的结构示意图;
图3是本申请实施例提供的一种消息存储方法的流程图;
图4是本申请实施例提供的一种数据存储结构示意图;
图5是本申请实施例提供的一种预估在预设时间段内待存储消息的预存数据量的方法流程图;
图6是本申请实施例提供的一种LSTM神经网络的结构示意图;
图7是本申请实施例提供的一种建立虚拟存储地址与第二真实存储地址的对应关系的方法流程图;
图8是本申请实施例提供的一种确定第二真实topic的方法流程图;
图9是本申请实施例提供的一种消息读取方法的流程图;
图10是本申请实施例提供的一种确定与虚拟存储地址对应的目标真实存储地址的方法流程图;
图11是本申请实施例提供的一种消息存储装置的结构示意图;
图12是本申请实施例提供的另一种消息存储装置的结构示意图;
图13是本申请实施例提供的一种预估模块的结构示意图;
图14是本申请实施例提供的一种建立模块的结构示意图;
图15是本申请实施例提供的一种消息读取装置的结构示意图;
图16是本申请实施例提供的一种服务器的结构框图;
图17是本申请实施例提供的另一种服务器的结构框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
通常,云消息服务需要支持多个租户,每个租户可以在kafka集群中创建自己的多个topic,每个topic用于存储属于同一类别的云消息。相关技术中,消息服务系统主要为:基于kafka集群的消息存储系统的消息服务。该基于kafka集群的消息存储系统可以接收消息生产者(producer)发送的消息,并将该消息存储在该消息所属的topic中,以供消息消费者(consumer)从该topic请求该消息。
如图1所示,在基于kafka集群的消息存储系统中,每个topic由至少一个分区(partition)组成,每个分区由至少一个存储段(Segment)组成,每个存储段中成对存储有索引文件和数据文件。数据文件用于存储消息生产者发送的消息,索引文件用于记载消息在对应数据文件中的索引信息(如偏移地址)。在消费者读取数据时,可以根据该索引文件中记载的待读取消息的索引信息,在数据文件中该索引信息所指示的位置处获取该消息。并且,该基于kafka集群的消息存储系统还可以包括多个存储节点,当存储节点接收到对应消息时,可将消息立即存储在该系统中,进而增加系统持久化存储和处理消息堆积的能力。
但是,在该基于kafka集群的消息存储系统中,该存储消息的存储粒度较粗,导致无法对存储空间进行有效利用。并且,由于在该系统中,每个分区中仅能存储一个topic的消息,导致每个存储节点所能支持的topic数量有限,例如:虚拟机规格为8U16G的存储节点(用于部署topic)所能支持的topic数量通常小于100,否则存储节点的性能会急剧下降。同时,由于每个存储节点所能支持的topic数量有限,导致该kafka集群中需要部署大量的存储节点,以及基于该kafka集群消息 存储系统的成本较高。以及,由于各个topic可能存在流量不均衡的问题,导致各个topic占用的资源不均衡,需要对topic的数据进行迁移。且当topic数据量较大时,该数据迁移过程存在迁移时间过长和迁移不及时的问题。
为此,本发明实施例提供了一种消息存储方法,该消息存储方法可以解决以上问题。图2为该消息存储方法所涉及的消息存储系统的结构示意图。如图2所示,该消息存储系统10可以包括:多个存储节点101。该多个存储节点101之间可以通过有线网络或无线网络建立连接。可选地,该消息存储系统可以为基于kafka集群的消息存储系统。其中,Kafka集群用于部署Kafka系统。Kafka集群具有多个存储节点。该存储节点可以是服务器或者其它具有计算能力的设备。Kafka系统中的每个topic可以部署在Kafka集群中的一个或多个存储节点上。
在该消息存储系统中,每个存储节点101中配置有多个虚拟topic、多个真实topic和与每个虚拟topic对应的索引文件。其中,每个真实分区中配置有多个索引文件和多个数据文件。该数据文件用于存储消息。该索引文件用于存储消息的索引信息。每个虚拟topic包括多个虚拟分区,虚拟topic的标识与该虚拟topic包括的一个虚拟分区的标识可以组成一个虚拟存储地址。每个真实topic包括多个真实分区,真实topic的标识与该真实topic包括的一个真实分区的标识可以组成一个真实存储地址。且一个虚拟存储地址可以与一个真实存储地址对应。当虚拟存储地址与真实存储地址对应时,可以将消息存储至指定的虚拟存储地址所对应的真实存储地址中,并将用于指示该消息的索引信息存储在与该虚拟topic对应的索引文件中。其中,该真实topic、真实分区、虚拟topic、虚拟分区、索引文件和数据文件之间的对应关系可以在消息存储系统建立时确定。
可选地,每个真实存储地址可以与多个虚拟存储地址存在对应关系。此时,对于与多个虚拟存储地址对应的一个真实存储地址,指定存储至该多个虚拟存储地址中的消息可以均存储在该真实存储地址中。由于虚拟存储地址用于指示虚拟topic和虚拟分区,真实存储地址用于指示真实topic和真实分区,因此,每个真实分区中可以存储指定存储至多个虚拟topic的数据,即指定存储至该多个虚拟topic中的消息可以共享该真实topic中真实分区的存储空间,使得真实分区能够支持多个虚拟topic,进而使部署有真实分区的存储节点能够支持多个虚拟topic,增加了每个存储节点所能支持的虚拟topic数量,同时也降低系统成本。
并且,该消息存储系统10还可以包括:多个数据生产节点(producer)和多个数据消费节点(Consumer)。数据生产节点和存储节点101之间,以及,数据消费节点和存储节点101之间,均可以通过有线网络或无线网络建立连接。该数据生产节点用于向存储节点101发送消息,使存储节点101对该消息进行存储。该数据消费节点用于从存储节点101中读取消息。
下面为对本申请实施例提供的消息存储方法的说明。该消息存储方法可应用于kafka集群。下面以该消息存储方法应用于kafka集群中的第一存储节点为例,对该消息存储方法进行说明。如图3所示,该消息存储方法可以包括以下步骤:
步骤201、接收在Kafka集群存储消息的第一消息存储请求。
当客户端需要向Kafka集群存储消息时,该客户端可以向第一存储节点发送第一消息存储请求。该第一消息存储请求中可以携带有指定的待存储消息和用于存储该消息的虚拟存储地址。该虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识。也即是,该第一消息存储请求可以指定在虚拟存储地址存储该第一消息存储请求指定的消息。
在该实现方式中,该虚拟存储地址作为Kafka集群存储消息的对外接口,使得客户端可以指定将消息存储在虚拟存储地址中。并且,由于当虚拟存储地址与真实存储地址对应时,可以将消息存储至第一消息存储请求指定的虚拟存储地址所对应的真实存储地址中,进而实现了在真实存储地址中的消息存储。
步骤202、基于虚拟存储地址与第一真实存储地址的对应关系,确定与虚拟存储地址对应的第一真实存储地址。
在消息存储系统中存储有虚拟存储地址与第一真实存储地址的对应关系,且在任一时刻,一个虚拟存储地址仅与一个真实存储地址对应,也即是,指定存储至该虚拟存储地址的消息仅能存储至对应的一个真实存储地址中。因此,在接收第一消息存储请求后,可以根据该第一消息存储请求中指定的虚拟存储地址,查询该对应关系,确定与该虚拟存储地址对应的第一真实存储地址,以便于将该第一消息存储请求中指定的消息存储至该第一真实存储地址中。其中,该第一真实存储地址包括第一真实topic的标识和第一真实分区的标识。
并且,在该消息存储系统中,每个真实存储地址可以与多个虚拟存储地址存在对应关系。此时,对于与多个虚拟存储地址对应的一个真实存储地址,指定存储至该多个虚拟存储地址中的消息可以均存储在该真实存储地址中。由于虚拟存储地址用于指示虚拟topic和虚拟分区,真实存储地址用于指示真实topic和真实分区,因此,每个真实分区中可以存储指定存储至多个虚拟topic的数据,即指定存储至该多个虚拟topic中的消息可以共享该真实topic中真实分区的存储空间,使得真实分区能够支持多个虚拟topic,进而使部署有真实分区的存储节点能够支持多个虚拟topic,增加了每个存储节点所能支持的虚拟topic数量,同时也降低系统成本。
示例地,在基于kafka集群的消息存储系统中,当每个真实存储地址与多个虚拟存储地址存在对应关系时,每个真实分区中可以存储基于多个虚拟topic存储的数据,如图4所示的真实分区2001中存储的消息的示意图,虚拟话题×(×用于标识数字)用于标识不同的虚拟topic,虚拟话题×-消息×用于标识基于不同的虚拟topic存储在该真实分区中的消息,例如:虚拟话题1(Index1)、虚拟话题2(Index2)和虚拟话题3(Index3)分别标识不同的虚拟topic,虚拟话题1-消息1(Index1-Msg1)用于标识基于虚拟topic1存储在该真实分区中的消息1,虚拟话题2-消息1(Index2-Msg1)用于标识基于虚拟topic2存储在该真实分区中的消息1。
步骤203、在第一真实存储地址指定的第一真实topic中的第一真实分区,存储第一消息存储请求指定的消息。
在确定与虚拟存储地址对应的第一真实存储地址后,可以在该第一真实存储地址指示的真实topic的真实分区中存储该消息。并且,由于真实分区中配置有多个数据文件,在存储该消息时,可以根据该真实分区中的消息存储情况,在该真实分区中的多个数据文件中确定用于存储该消息的数据文件,然后将该消息存储在对应的数据文件中。
示例地,在基于kafka集群的消息存储系统中,假设第一存储节点中虚拟topic、虚拟分区、真实topic、真实分区、数据文件的和索引文件的对应关系如表1所示,当第一消息存储请求请求在虚拟topic1的虚拟分区1中存储消息时,根据该表1所示的虚拟存储地址与真实存储地址的对应关系可知:与该虚拟topic1的虚拟分区1对应的第一真实存储地址指示在真实topic1的真实分区1中存储消息。此时,可以根据该虚拟topic1的虚拟分区1中的数据情况,确定可将第一消息存储请求指定的消息存储在数据文件3中,则可在该虚拟topic1的虚拟分区1的数据文件3中存储该消息。
表1
Figure PCTCN2019081173-appb-000001
步骤204、根据第一消息存储请求指定的消息的存储位置生成索引信息,并将该索引信息存储在虚拟存储地址所指示的虚拟topic对应的索引文件中。
消息的索引信息用于指示消息在第一真实存储地址中的存储位置。在将消息存储在第一真实存储地址后,可以根据该消息在第一真实存储地址中的存储位置,生成索引信息,并将该索引信息存储在与该虚拟topic对应的索引文件中,以便在消息读取过程中根据该索引信息获取该消息。其中,与该虚拟topic对应的索引文件可以存储在配置有该虚拟topic的存储节点中。例如,当该虚拟topic配置在第一存储节点中时,该与该虚拟topic对应的索引文件可以存储在该第一存储节点中。
需要说明的是,索引文件可以在消息存储系统的建立过程中建立。例如,在系统建立过程中,可以根据存储节点中部署的每个虚拟topic的名称,在存储节点中建立索引目录,该索引目录中存储有索引文件。在将消息存储在包括有虚拟topic标识的虚拟存储地址对应的真实存储地址后,可以根据该虚拟topic标识确定该索引目录,并将该索引信息存储在该索引目录中的索引文件中。
其中,索引文件可以包括:数据记录索引和映射记录索引。该数据记录索引用于指示消息在真实分区中的偏移量。该映射记录索引用于指示虚拟存储地址与真实存储地址的对应关系。数据记录索引可以包括多个数据索引项。映射记录索引也可以包括多个映射索引项。例如:如图4所示,数据记录索引(Index)2可 以包括数据索引项(Entry)1和数据索引项2,该数据索引项1用于指示索引2对应的消息1在真实分区中的偏移量,该数据索引项2用于指示索引2对应的消息2在真实分区中的偏移量。数据记录索引1:2002可以包括数据索引项3和数据索引项4,该数据索引项3用于指示索引1对应的消息1在真实分区中的偏移量,该数据索引项4用于指示索引1对应的消息2和消息3在真实分区中的偏移量。映射记录索引2可以包括映射索引项(MateEntry)1和映射索引项2,以及,映射记录索引1:2003可以包括映射索引项3和映射索引项4。并且,该多个数据索引项的大小可以相等或不等,该多个映射索引项的大小也可以相等或不等。
该数据索引项和该映射索引项中可以均记载有多个字段,下面以多个数据索引项的大小相等,以及,多个映射索引项的大小相等,分别对该数据索引项和该映射索引项中记载的字段进行举例说明:
请参考图4,数据记录索引可以包括多个数据索引项,每个数据索引项可以记载以下一个或多个字段:虚拟存储地址偏移量字段(consumerQueueOffset)、消息序号字段(startPartitionOffset)、文件偏移量字段(physicalPostion)、消息长度字段(size)、消息总数字段(msgNum)和存储时间戳字段(timestamp)。各个字段的含义分别如下:
虚拟存储地址偏移量字段携带的内容为该数据索引项中记载的数据在所有存储在该虚拟存储地址中的数据中的消息偏移量。该虚拟存储地址偏移量字段的长度可以为4字节或8字节。例如,假设在真实topic1的真实分区2中存储有100个消息,其中,存储在虚拟topic1的虚拟分区1中的消息有20个,该数据索引项中记载的索引信息对应的消息为在该20个消息中的的第5个,则该虚拟存储地址偏移量字段中携带的消息偏移量为5。
一般地,可以使用8位长整型数值表示该消息偏移量。但是,由于索引目录的文件名均会记录存储在对应虚拟topic中的第一个消息的偏移量(也称基准偏移量baseOffset),因此,为了节省存储空间,该虚拟存储地址偏移量字段携带的内容可以为当前消息与该第一个消息的相对位置。在读取该当前消息的偏移量时,可将该当前消息对应的相对位置与该第一个消息的偏移量相加,以得到该当前消息的偏移量,此时,该虚拟存储地址偏移量字段的长度可以为4字节。
消息序号字段携带的内容为该数据索引项中第一个消息在对应真实分区中存储的多个消息中的消息偏移量。该消息序号字段的长度可以为8字节。例如,假设在该真实分区中存储有100个消息,该数据索引项中第一个消息为在真实topic1中存储的第60个消息,则该消息序号字段携带的消息偏移量为60。
可选地,每个数据索引项中可以记载有多个消息在真实分区中的消息偏移量,该数据索引项中第一个消息即为该多个消息中的第一个消息。例如:在图4中,在数据记录索引Index2的数据索引项Entry1中记载有消息Msg1在真实分区中的消息偏移量,此时,数据索引项中第一个消息为消息Msg1。在数据记录索引Index1:2002的数据索引项Entry2中记载有消息Msg2和消息Msg3在真实分区中的偏移量,此时,数据索引项中第一个消息为消息Msg2。
文件偏移量字段携带的内容为该数据索引项中记载的第一个消息在真实分区 的数据文件中的文件偏移量。该文件偏移量字段的长度可以为4字节。例如,假设该数据索引项中记载的第一个消息存储在真实分区中的第三个数据文件中,且该第三个数据文件中存储有三个消息,该第三个数据文件中存储的第一个消息和第二个消息的大小均为1千字节(KB),则该数据索引项中记载的第一个消息的文件偏移量为2KB,即该文件偏移量字段携带的内容为2KB。
消息长度字段携带的内容为用于存储该数据索引项的消息块的长度。该消息长度字段的长度可以为4字节。
消息总数字段携带的内容为消息块中记载的消息的总数。该消息总数字段的长度可以为4字节。
存储时间戳字段携带的内容为写入该数据索引项的时间戳。该存储时间戳字段的长度可以为8字节。
请参考图4,映射记录索引可以包括多个映射索引项,映射索引项可以记载以下一个或多个字段:消息逻辑序号字段(startLogicaloffset)、真实分区标识长度字段(topicNameSize)和真实分区标识字段(topicName)。各个字段的含义分别如下:
消息逻辑序号字段携带的内容为虚拟存储地址与真实存储地址对应时,基于该对应关系存储在真实topic中的第一个消息在存储在该虚拟存储地址中的多个消息中的消息序号。该消息逻辑序号字段的长度为8字节。例如,假设虚拟存储地址指示的虚拟topic1中的虚拟分区1中存储了200个消息,根据虚拟存储地址与真实存储地址的对应关系,存储在该真实存储地址中的第一个消息为该200个消息中的第101个,则该消息逻辑序号字段携带的内容为101。
真实分区标识长度字段携带的内容为与虚拟存储地址对应的真实topic中真实分区的标识的长度。该真实分区标识长度字段的长度为4字节。例如,当虚拟topic1中的虚拟分区1与真实topic1中的真实分区2对应时,该真实分区标识长度字段携带的内容为该真实分区2的标识的长度。
真实分区标识字段携带的内容为与虚拟存储地址对应的真实topic中真实分区的标识。该真实分区标识字段的长度可以根据实际需要进行设置。例如,当虚拟topic1中的虚拟分区1与真实topic1中的真实分区2对应时,该真实分区标识字段携带的内容为该真实分区2的标识。
需要说明的是,上述数据索引项和映射索引项中所包括的各个字段携带的内容和长度仅为示例性的说明,不用于限定本申请。该数据索引项和映射索引项中所包括的各个字段携带的内容和长度均可以根据实际需要进行设置。
根据上述存储数据的过程,以及对数据索引项和映射索引项的说明可以看出:对数据进行存储时,均是将数据连续地存储在真实分区的数据文件中的,且每个索引项中记录的内容均为连续存储的数据对应的索引信息,因此,在读取数据时,可以根据同一个索引项中连续存储的索引信息,批量地读取该索引项所对应的数据文件中的数据,进而避免离散地读取数据。
步骤205、预估在预设时间段内待存储消息的预存数据量。
预估在预设时间段内待存储消息的预存数据量可以包括:预估在预设时间段 内接收的第二消息存储请求所指定的消息的预存数据量,也即是,预估在预设时间段内待在该虚拟存储地址中存储的消息的预存数据量。或者,预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量,也即是,预估在预设时间段内待在第一真实存储地址中存储的消息的预存数据量。通过对该预存数据量进行预估,可以在该预存数据量较大时,更改虚拟存储地址与真实存储地址的对应关系,以避免该预存数据量较大导致的存储性能下降。
在一种可实现方式中,可以根据真实topic和虚拟topic的数据量对该预存数据量进行预估。如图5,该步骤205的实现方式可以包括:
步骤2051、对于与第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic,获取在每个目标虚拟topic中存储的消息的第二数据量,并获取在第一真实topic中存储的消息的第一数据量。
其中,该,该至少一个目标虚拟topic可以包括:多个虚拟topic中的所有虚拟topic,或者,多个虚拟topic中存储的数据量由大到小的前至少一个虚拟topic。例如,需要获取N个目标虚拟topic中存储的消息的第二数据量时,该N个目标虚拟topic可以为多个虚拟topic中存储的数据量由大到小的前N个虚拟topic,该N为正整数。且该至少一个目标虚拟topic可以根据实际需要确定。例如,可以在获取每个虚拟topic的第二数据量后,根据该虚拟topic的第二数据量的大小,确定是否需要将该虚拟topic确定为目标虚拟topic。该预设时间段也可以根据实际需要进行设置,例如:该预设时间段可以为当前时刻后的四小时、十小时或二十四小时等。
可选的,消息存储系统中可以部署有数据量采集模块,或者,可以在该消息存储系统可以创建流量收集进程,以通过该流量收集模块或该流量收集进程获取虚拟topic的第二数据量。并且,该消息存储系统中还可以部署有用于存储数据量信息的队列(例如数据量topic),在获取每个虚拟topic的第二数据量后,可以将该第二数据量保存在该队列中。其中,可以周期性地或者实时地获取虚拟topic的第二数据量,本申请实施例对其不做具体限定。
获取第一真实topic的第一数据量的实现方式,可以相应参考获取虚拟topic的第二数据量的实现方式。或者,由于存储在第一真实topic中的数据均需存储在与其存在对应关系的虚拟topic中,因此,与该第一真实topic对应的所有虚拟topic的第二数据量的总和即为该第一真实topic的第一数据量。所以,可以获取与该第一真实topic对应的所有虚拟topic的第二数据量,并将该所有虚拟topic的第二数据量的总和确定为该第一数据量。
步骤2052、基于第一数据量和每个目标虚拟topic的第二数据量,预估预存数据量。
可选地,可以采用预估模型预估预存数据量。其中,该预估模型可以为卡尔曼滤波预估模型、回归预估模型或神经网络预估模型等。该预估模型的输入参数和输出参数均可以包括:至少一组参数,该至少一组参数与至少一个目标虚拟topic一一对应。
对于每个目标虚拟topic,该对应组的输入参数可以包括:第一真实topic的标 识,第一真实topic的第一数据量,目标虚拟topic的标识,及目标虚拟topic的第二数据量与第一数据量的比值。该对应组的输出参数可以包括:预存数据量,目标虚拟topic的标识,及目标虚拟topic的第三数据量与第一数据量的比值。需要说明的是,在预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量时,该输出参数中还可以包括第一真实topic的标识。其中,该第一真实topic的标识用于在该消息存储系统中唯一地标识该真实topic,该目标虚拟topic的标识用于在该消息存储系统中唯一地标识该目标虚拟topic,且真实topic的标识和虚拟topic的标识可以均在系统建立过程中确定。
示例地,当基于N个目标虚拟topic的第二数据量预估预存数据量时,该输入参数的格式可以为{{第一真实topic的标识,第一真实topic的第一数据量,第一个目标虚拟topic的标识,第一个目标虚拟topic的第二数据量与第一数据量的比值},......,{第一真实topic的标识,第一真实topic的第一数据量,第N个目标虚拟topic的标识,第N个目标虚拟topic的第二数据量与第一数据量的比值}}。该输出参数的格式可以为{{预存数据量,第一个目标虚拟topic的标识,第一个目标虚拟topic的第三数据量与第一数据量的比值},......,{预存数据量,第N个目标虚拟topic的标识,第N个目标虚拟topic的第三数据量与第一数据量的比值}}。或者,在预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量时,该输出参数的格式可以为{{第一真实topic的标识,预存数据量,第一个目标虚拟topic的标识,第一个目标虚拟topic的第三数据量与第一数据量的比值},......,{第一真实topic的标识,预存数据量,第N个目标虚拟topic的标识,第N个目标虚拟topic的第三数据量与第一数据量的比值}}。
或者,对于每个目标虚拟topic,该对应组的输入参数可以包括:第一真实topic的标识,第一真实topic的第一数据量,目标虚拟topic的标识,及目标虚拟topic的第二数据量。该对应组的输出参数可以包括:预存数据量,目标虚拟topic的标识,及目标虚拟topic的第三数据量。需要说明的是,在预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量时,该输出参数中还可以包括第一真实topic的标识。
在另一种可实现方式中,可以根据真实topic和虚拟topic的流量对该预存数据量进行预估。该步骤205的实现方式可以包括:
在根据流量对预存数据量进行预估时,可以根据与第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic对应的第二流量,及第一真实topic对应的第一流量,对该预存数据量对应的流量进行预估,然后将该预估流量与预设时间段时长的乘积确定为该预存数据量。其中,该至少一个目标虚拟topic包括:与该第一真实topic对应的多个虚拟topic中的所有虚拟topic,或者,与该第一真实topic对应的多个虚拟topic中,流量占比由大到小的前至少一个虚拟topic。该流量占比为对应虚拟topic的第二流量与第一真实topic的第一流量的比值。
可选地,可以采用数据存储系统中部署的流量收集(Metric Collector)模块,或者,在该数据存储系统创建的流量收集进程,获取虚拟topic对应的第二流量。同时,也可以采用该流量收集模块或该流量收集进程获取第一真实topic对应的第 一流量。或者,由于存储在第一真实topic中的数据均需存储在与其存在对应关系的虚拟topic中,因此,与该第一真实topic对应的所有虚拟topic的第二流量的总和即为该第一真实topic的第一流量,所以,可将所有虚拟topic的第二数据量的总和确定为该第一数据量。
并且,也可以采用预估模型根据第二流量和第一流量对该预存数据量对应的流量进行预估。且该预估模型的输入参数和输出参数均可以包括:至少一组参数,该至少一组参数与至少一个目标虚拟topic一一对应。
对于每个目标虚拟topic,该对应组的输入参数可以包括:该第一真实topic的标识,该第一真实topic的第一流量,该目标虚拟topic的标识,及该目标虚拟topic的第二流量与第一流量的比值。该对应组的输出参数可以包括:预估流量,该目标虚拟topic的标识,及该目标虚拟topic的第三流量与预估流量的比值。需要说明的是,在预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量对应的预估流量时,该输出参数中还可以包括第一真实topic的标识。
示例地,当基于N个目标虚拟topic的第二流量预估预存数据量时,该输入参数的格式可以为{{第一真实topic的标识,第一真实topic的第一流量,第一个目标虚拟topic的标识,第一个目标虚拟topic的第二流量与第一流量的比值},......,{第一真实topic的标识,第一真实topic的第一流量,第N个目标虚拟topic的标识,第N个目标虚拟topic的第二流量与第一流量的比值}}。该输出参数的格式可以为{{预估流量,第一个目标虚拟topic的标识,第一个目标虚拟topic的第三流量与预估流量的比值},......,{预估流量,第N个目标虚拟topic的标识,第N个目标虚拟topic的第三流量与预估流量的比值}}。或者,在预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量对应的预估流量时,该输出参数的格式可以为{{第一真实topic的标识,对应的预估流量,第一个目标虚拟topic的标识,第一个目标虚拟topic的第三流量与对应的预估流量的比值},......,{第一真实topic的标识,对应的预估流量,第N个目标虚拟topic的标识,第N个目标虚拟topic的第三流量与对应的预估流量的比值}}。
或者,对于每个目标虚拟topic,该对应组的输入参数可以包括:该第一真实topic的标识,该第一真实topic的第一流量,该目标虚拟topic的标识,及该目标虚拟topic的第二流量。该对应组的输出参数可以包括:预估流量,该目标虚拟topic的标识,及该目标虚拟topic的第三流量。需要说明的是,在预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量对应的预估流量时,该输出参数中还可以包括第一真实topic的标识。
需要说明的是,由于消息存储系统中存储的数据量非常大,当至少一个目标虚拟topic包括与该第一真实topic对应的所有虚拟topic时,需要在每次预估过程中均对所有虚拟topic进行预估,导致预估过程的预估速度较慢。且需要使用大量的样本对预估模型进行训练,导致该训练过程的训练时长较长。因此,当目标虚拟topic包括与该第一真实topic对应的流量占比(或数据量占比)由大到小的前至少一个虚拟topic时,在每次预估过程中,仅需要对该流量占比(或数据量占比) 由大到小的前至少一个虚拟topic进行预估,能够减少预估过程中需要处理的数据量,进而加快预估速度。且能够相应地减少对预估模型进行训练时所用的样本数,进而缩短训练时长。
在一种可实现方式中,由于长短记忆(Long Short-Term Memory,LSTM)神经网络在预估方面表现出较大的优势,因此,本申请实施例中可以使用该LSTM神经网络实现上述预估功能。下面以该预估模型为LSTM神经网络为例,对该预估过程进行说明:
LSTM神经网络的结构请参考图6,其中,X(t-1)、X(t)和X(t+1)分别为LSTM神经网络在t-1、t和t+1时刻的输入,即分别为t-1、t和t+1时刻输入的输入参数。h(t-1)、h(t)和h(t+1)分别为该LSTM神经网络的隐含层在t-1、t和t+1时刻的输出。C(t-1)、C(t)和C(t+1)分别为从t-1、t和t+1时刻传递至下一时刻的细胞状态。
请继续参考图6,该LSTM神经网络的功能主要通过三个门实现,即遗忘门(Forget gate)、输入门(Input gate)和输出门(Output gate)。
遗忘门用于决定从细胞状态中丢弃哪些信息,门限δ 1用于控制通过该遗忘门的数据量,δ 1的取值范围为[0,1],δ 1=0表示“完全的保留”,δ 1=1表示“完全的丢弃”,其中,输入门的计算公式为:f t=δ 1×(W f×[h t-1,x t]+b f)。其中[h t-1,x t]表示上一时刻输出状态h t-1与当前输入状态x t的向量拼接,W f是遗忘门的权重矩阵,B f是遗忘门的偏置项。该W f和该B f的取值可以根据实际需要进行设置。
输入门用于决定输入信息中有多少信息需要保留在当前时刻的细胞状态中,其功能主要通过输入门限层(δ 2)和tanh1层实现。该输入门限层(δ 2)用于决定更新哪些值,该输入门限层使用上一个输出状态h t-1与当前输入x t的拼接作为输入,即输入门限层的计算公式为:i t=δ 2×(W i×[h t-1,x t]+b i),该W i是输入门限层的权重矩阵,该bi是输入门限层的偏置项。tanh1层用于创建新的候选向量,并将其加入到细胞状态中,其计算公式为:C t1=tanh1×(W c×[h t-1,x t]+b c),该W c是tanh1层的权重矩阵,b c是tanh1层的偏置项。根据该输入门限层和该tanh1层的输出,可以得到当前时刻的细胞状态为:上一时刻的单元状态C(t-1)按元素乘以遗忘门f t的第一乘积,与当前输入的单元状态C t1按元素乘以输入门i t的第二乘积的和,即当前时刻的细胞状态C t=f t×C(t-1)+i t×C t1,LSTM通过该方式实现了对当前的记忆C t1和长期的记忆C(t-1)的组合,进而实现了根据当前时刻之前的流量对当前时刻之后的流量的预估。
输出门用于决定细胞状态中有多少信息需要输出到输出状态中,其功能通过输出门限层(δ 3)和tanh2层实现。输出门限层(δ 3)决定细胞转态哪些部分需要输出,该输出门限层使用上一个输出状态h t-1与当前输入x t的拼接作为输入,即输出门限层的输入O t=δ 3×(W o×[h t-1,x t]+b o),该W o是输出门限层的权重矩阵,b o是输出门限层的偏置项。tanh2层用于对细胞状态进行处理,并输出范围为[-1,1]的数值。该输出门的输出为输出门限层(δ 3)的输出与tanh2层的输出的乘积,即输出门的输出h t=O t×tanh2(C t)。
可选地,上述门限δ 1、门限δ 2和门限δ 3的取值均可以根据实际需要进行设置。
步骤206、当预存数据量大于数据量阈值时,建立虚拟存储地址与第二真实存储地址的对应关系。
当预存数据量为预估的在预设时间段内接收的第二消息存储请求所指定的消息的数据量时,若该预存数据量大于第一阈值,表示该第二消息存储请求所指定的消息具有较大的存储需求,此时,可以将虚拟存储地址与真实存储地址的对应关系修改为虚拟存储地址与第二真实存储地址对应,以将该第二消息存储请求多指定的消息存储在更有能力支撑该存储需求的真实分区中,进而提高消息存储系统的存储性能。
当预存数据量为预估的在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的数据量时,若该预存数据量大于第二阈值,表示该第一真实分区可能无法支撑该预设时间段内的消息存储需求,此时,为了有效存储该待存储的消息,以及保证该第一真实分区的存储性能,可以将虚拟存储地址与真实存储地址的对应关系修改为虚拟存储地址与第二真实存储地址对应,以将待存储在该第一真实分区所在的第一真实topic中的消息存储在第二真实存储地址中,进而提高消息存储系统的存储性能。其中,该第一阈值和该第二阈值可以根据实际需要确定,且该第一阈值和该第二阈值可以相等或不等,本申请实施例对次不做具体限定。
该步骤206为对更改目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系的实现过程的说明。并且,在对该过程进行说明时,是以至少一个目标虚拟topic中的一个目标虚拟topic为例对其进行说明的,更改该至少一个目标虚拟topic中的其他目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系的实现过程,请相应参考该实现过程。其中,请参考图7,该步骤206的实现过程可以包括:
步骤2061、基于目标虚拟topic的第三数据量,确定第二真实topic。
可选地,如图8所示,该步骤2061的实现过程可以包括:
步骤2061a、基于目标虚拟topic的第三数据量,查找可用数据量大于第三数据量的真实topic。
其中,可用数据量为真实topic的数据量额度与步骤205中预估的预存数据量的差值。该真实topic的数据量额度为对该真实topic执行读写操作时,该真实topic能够承受的最大数据量。
当真实topic的可用数据量大于目标虚拟topic的第三数据量时,说明该真实topic能够承受该目标虚拟topic的第三数据量,因此,可将该真实topic确定为第二真实topic,即执行步骤2061b。当真实topic的可用数据量不大于目标虚拟topic的第三数据量时,说明该真实topic无法承受该目标虚拟topic的第三数据量,此时,可以在消息存储系统中创建一个可用数据量大于该第三数据量的第二真实topic,即执行步骤2061c。
需要说明的是,由于数据量为流量与时长的乘积,因此,也可以根据流量确定第二真实topic。例如,可以基于目标虚拟topic的第三流量,查找可用流量大于该第三流量的真实topic,并在确定存在可用流量大于第三流量的真实topic时,将 可用流量大于第三流量的真实topic确定为第二真实topic,或者,在确定不存在可用流量大于第三流量的真实topic时,在消息存储系统中创建第二真实topic。
示例地,假设目标虚拟topic的第三流量为56兆比特每秒(MB/S),消息存储系统中配置有五个真实topic,分别为真实topic1、真实topic2、真实topic3、真实topic4和真实topic5,该五个真实topic的可用流量分别为50MB/S、70MB/S、40MB/S、55MB/S和30MB/S,此时,真实topic2的可用流量大于该目标虚拟topic的第三流量,则可将该真实topic2确定为第二真实topic,即执行步骤2061b。
步骤2061b、当确定存在可用数据量大于第三数据量的真实topic时,将可用数据量大于第三数据量的真实topic确定为第二真实topic。
其中,在查找可用数据量大于第三数据量的真实topic的过程中,可能查找到消息存储系统中存在多个可用数据量大于第三数据量的真实topic,此时,可以将对应可用数据量最大的真实topic确定为该第二真实topic,以保证能够对真实topic进行有效利用,并减小由于真实topic的可用数据量较小导致的再次修改对应关系的几率。
步骤2061c、当确定不存在可用数据量大于第三数据量的真实topic时,在消息存储系统中创建第二真实topic。
当确定不存在可用数据量大于第三数据量的真实topic时,可以在消息存储系统中创建一个可用数据量大于第三数据量的真实topic,并将该创建的真实topic确定为该第二真实topic,以便于建立虚拟存储地址与包括该第二真实topic的第二真实存储地址的对应关系。
需要说明的是,由于每个真实topic通常包括多个真实分区,因此,在确定第二真实topic后,还需要在该第二真实topic中确定第二真实分区,以便建立虚拟存储地址与第二真实存储地址的对应关系。且在确定第二真实分区的过程中,还需要确定该第二真实分区的可用数据量大于该虚拟存储地址指定的虚拟分区的预存数据量。其中,该确定第二真实分区的过程可以相应参考确定第二真实topic的过程。
可选地,第一真实分区与该第二真实分区可部署在Kafka集群中的相同存储节点或不同存储节点上,本申请实施例对其不做具体限定。当第一真实分区与该第二真实分区部署在Kafka集群中的不同存储节点上时,能够将虚拟存储地址指定的虚拟topic的工作负载(流量或数据量)分摊到不同的存储节点上,以减小同一存储节点中多个topic的工作负载不均衡程度,降低多个topic在某一存储节点中出现占用资源不均衡的几率。并且,为了简化根据修改后的对应关系进行消息存储和消息读取的过程,当在步骤2061b中确定的真实分区包括第一存储节点中的真实分区和其他存储节点中的真实分区时,可以优先选择将该第一存储节点中的真实分区确定为第二真实分区。
步骤2062、将与目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系,修改为虚拟存储地址与包括第二真实topic的第二真实存储地址对应,并将修改后的关联关系存储在目标虚拟topic对应的索引文件中。
当修改目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系后,可 将指定存储至该目标虚拟topic中的消息存储至该第二真实topic的第二真实分区数据文件中,进而实现将指定存储至该虚拟存储地址中的消息存储在第二真实存储地址中。
并且,在修改该对应关系后,还可将修改后的对应关系存储在该目标虚拟topic对应的索引文件中,以便于能够根据该修改后的对应关系对消息进行存储和查找。例如:可以将该修改后的对应关系存储在该目标虚拟topic对应的映射记录索引中,当该映射记录索引的映射索引项中记载有真实分区标识长度字段和真实分区标识字段时,可以将该第二真实topic中第二真实分区的标识记载在该真实分区标识长度字段中,将该第二真实topic的标识长度记载在该真实分区标识字段中,以便于根据其确定与该目标虚拟topic对应的真实topic。并且,当该消息存储系统为基于kafka集群的消息存储系统时,该修改后的对应关系还可以保存在zookeeper(一种分布式应用程序协调服务)上,以便后续使用。
步骤2063、确定第一消息在第二真实topic中的消息偏移量,并将该消息偏移量存储在目标虚拟topic对应的索引文件中。
其中,第一消息为基于虚拟存储地址与第二真实存储地址的对应关系,存储在第二真实topic中的第一个消息。在修改对应关系之后,指定存储至该虚拟存储地址中的消息均存储在该第二真实topic中。而在修改该对应关系之前,由于指定存储至该虚拟存储地址中的消息均存储在第一真实topic中。因此,在修改该对应关系后,需要确定根据该修改后的对应关系,指定存储在该虚拟存储地址中的第一个消息的消息偏移量,以便于在后续存储和查找消息的过程中基于该消息偏移量对消息进行存储和查找。
并且,在确定该消息偏移量后,还可以将该消息偏移量存储在该目标虚拟topic对应的索引文件中,以便于根据该消息偏移量,在指定存储在该虚拟存储地址的消息中,区分存储在第一真实topic中的消息和该第二真实topic中的消息。可选地,可以将消息偏移量存储在该目标虚拟topic对应的映射记录索引中,当该映射记录索引的映射索引项中记载有消息逻辑序号字段时,可以将该消息偏移量记载在该消息逻辑序号字段中。
同时,为了便于后续对目标虚拟topic的数据量(或流量)进行预估,在完成该对应关系的修改过程后,还需要对该第一真实topic和该第二真实topic对应的预估模型进行重新训练以更新模型参数,例如重置LSTM参数。以及,若该第二真实topic为创建的真实topic,还需要对该创建的真实topic创建预估模型,以便于对该真实topic的流量进行预估。
需要说明的是,对于与第一真实topic存在对应关系的至少一个目标虚拟topic,在修改对应关系时,可以按照至少一个目标虚拟topic的第二数据量(或第二流量)由大到小的顺序,依次修改每个目标虚拟topic的对应关系。此时,由于在确定目标虚拟topic对应的第二真实topic时,需要根据真实topic的可用数据量(或可用流量)进行选择,当按照第二数据量(或第二流量)由大到小的顺序依次修改至少一个目标虚拟topic的对应关系时,能够将具有较大可用数据量(或可用流量)的真实topic,确定为具有较大第二数据量(或第二流量)的目标虚拟topic对应的 第二真实topic,使得消息存储系统中的真实topic能够被有效利用,并减小对目标虚拟topic的对应关系进行二次修改的几率。
并且,在该消息存储系统中,可以部署有流量收集(Metric Collector)模块、流量topic(Flow Metric topic)、流量汇总(Flow Summary)模块、深度学习预估(Deep learning prediction)模块和话题迁移(topic migrate)模块。其中,该流量收集模块用于周期性地或者实时地获取系统中所有虚拟topic和真实topic的流量,并将其保存在流量topic中,即可以采用该流量收集模块执行上述步骤2051。该流量汇总模块可以定期从流量topic中读取流量信息,并将topic和虚拟topic当前时刻的流量输入至深度学习预估模块。该深度学习预估模块可以采用LSTM神经网络预估真实topic和虚拟topic在预设时间段的流量,即可以采用该深度学习预估模块执行上述步骤2052。话题迁移模块可以根据深度学习预估模块预估的流量,将修改虚拟topic的对应关系,即可以采用该话题迁移模块执行上述步骤206。
通过修改虚拟存储地址与真实存储地址的对应关系,使得指定存储至虚拟存储地址中的消息能够存储在不同的真实存储地址中,能够在各个逻辑topic的数据量(或流量)不均衡时,减小各个逻辑topic占用的资源不均衡的几率。并且,通过修改该对应关系,使得无需对修改之前根据虚拟存储地址存储在第一真实存储地址中的消息进行迁移,使得在出现资源占用不均衡时,能够及时地将消息存储在第二真实存储地址中,并缩短对数据进行迁移所耗费的时长,可以解决相关技术中迁移时间过长和迁移不及时的问题,进而减小对磁盘的占用率,并提高了消息存储系统的吞吐率。同时,通过对数据量(或流量)进行预估,并根据预估结果修改该对应关系,能够根据该预估结果提前为消息预留资源,避免因迁移不及时造成的存储节点的崩溃。
步骤207、接收在Kafka集群存储消息的第二消息存储请求,该第二消息存储请求指定在虚拟存储地址存储第二消息存储请求指定的消息。
其中,第二消息存储请求的接收时间晚于第一消息存储请求的接收时间。该步骤207的实现过程请相应参考步骤201的实现过程。
步骤208、基于虚拟存储地址与第二真实存储地址的对应关系,确定与虚拟存储地址对应的第二真实存储地址。
由于第二消息存储请求的接收时间晚于第一消息存储请求的接收时间,在接收到该第二消息存储请求后,虚拟存储地址与真实存储地址的对应关系已修改为虚拟存储地址与第二真实存储地址对应,因此,可以根据该对应关系确定与虚拟存储地址对应的真实存储地址为第二真实存储地址。其中,第二真实存储地址包括第二真实topic的标识和第二真实分区的标识。并且,该步骤208的实现过程请相应参考步骤202的实现过程。
步骤209、在第二真实存储地址指定的第二真实topic中的第二真实分区,存储第二消息存储请求指定的消息。
由于虚拟存储地址与第二真实存储地址的对应关系为修改后的对应关系,且由于该第二真实存储地址指示的第二真实topic中的第二真实分区与虚拟存储地址指示的虚拟topic中的虚拟分区可能部署在相同的存储节点中,也可能部署在不同 的存储节点中。因此,在存储该消息之前,需要确定该第二真实topic中的第二真实分区与虚拟topic中的虚拟分区是否部署在相同的存储节点中。并且,当第二真实topic中的第二真实分区与虚拟topic中的虚拟分区部署在相同的存储节点中时,可以直接将该消息存储在该第二真实存储地址中。当第二真实topic中的第二真实分区与虚拟topic中的虚拟分区部署在不同的存储节点中时,需要将该消息发送至该其他存储节点,以供该其他存储节点将该消息存储在该其他存储节点的第二真实存储地址中。其中,该存储消息的过程可以相应参考步骤203的实现过程,此处不再赘述。
步骤210、根据第二消息存储请求指定的消息的存储位置生成索引信息,并将该索引信息存储在虚拟存储地址所指示的虚拟topic对应的索引文件中。
可选地,当第二真实topic中的第二真实分区与虚拟topic中的虚拟分区部署在相同的存储节点中时,该步骤210的实现过程请相应参考步骤204的实现过程。当第二真实topic中的第二真实分区与虚拟topic中的虚拟分区部署在不同的存储节点中时,在其他存储节点将消息存储在该其他存储节点中后,可以由消息存储系统中的后台线程获取该索引信息,并将该索引信息发送至该第一存储节点,以在该第一存储节点中存储该索引信息。其中,该后台线程向第一存储节点发送索引信息的动作,可以是该后台线程主动执行的,也可以是该后台线程被动执行的。例如:在将消息存储在其他存储节点中后,可以自动触发后台线程,使该后台线程获取该索引信息,然后该后台线程主动地将该索引信息推送至该第一存储节点,使该第一存储节点对该索引信息进行存储。或者,该第一存储节点可以向该后台线程发送索引信息拉取请求,后台线程在接收该索引信息拉取请求后,可以获取该索引信息并向该第一存储节点发送该索引信息。
在该通过后台线程发送该索引信息的实现方式中,由于无需第一存储节点主动地获取该第一存储位置信息,相较于相关技术中第一存储节点在写入数据后,需要再根据该数据的存储位置信息获取索引信息的实现方式,可以节省对该第一存储节点的资源占用,进而减少磁盘的占用率。
并且,当第二真实topic中的第二真实分区与虚拟topic中的虚拟分区部署在不同的存储节点中时,通过将消息存储在其他存储节点上,将索引信息存储在第一存储节点上,可以实现消息与索引信息的分离存储,进而解耦真实存储地址与虚拟存储地址的关系,可以将虚拟存储地址指示的虚拟topic的工作负载(流量或数据量)分摊到不同的存储节点上,能够减小同一存储节点中多个topic的工作负载不均衡程度,降低多个topic在某一存储节点中出现占用资源不均衡的几率。
需要说明的是,在本申请实施例中,虚拟存储地址与真实存储地址的对应关系也可以表示为虚拟topic与真实topic的对应关系。此时,也可以根据该虚拟topic与真实topic的对应关系执行消息存储的过程。例如,该消息存储方法可以包括:接收在该Kafka集群存储消息的第一消息存储请求,该第一消息存储请求指定在虚拟话题topic存储该消息;基于该虚拟topic与第一真实topic的对应关系,确定与该虚拟topic对应的第一真实topic;在该第一真实topic的真实分区中存储该第一消息存储请求指定的消息。其中,根据该虚拟topic与真实topic的对应关系执行 消息存储的实现过程可以相应参考上述步骤201至步骤210,此处不再赘述。
综上所述,本申请实施例提供的消息存储方法,在接收在Kafka集群存储消息的消息存储请求后,通过根据虚拟存储地址与真实存储地址的对应关系,确定用于存储消息的真实存储地址,并将消息存储在该真实存储地址指定的真实分区中,实现了消息的存储。
并且,通过对消息存储请求所指定的待存储消息进行预估,根据预估的数据量修改虚拟存储地址与真实存储地址的对应关系,使得指定存储至虚拟存储地址中的消息能够存储在不同的真实存储地址中,相较于相关技术,减小了真实topic中真实分区工作负载过重的几率,提高了消息存储系统的吞吐率。
本申请实施例还提供了一种消息读取方法,如图9所示,该消息读取方法可以包括:
步骤601、接收在Kafka集群读取消息的消息读取请求。
当客户端需要从Kafka集群读取消息时,该客户端可以向第一存储节点发送消息读取请求。该消息读取请求指定从虚拟存储地址读取消息,该虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识。
步骤602、基于虚拟存储地址与真实存储地址的对应关系,确定与虚拟存储地址对应的目标真实存储地址。
消息读取请求中通常携带有待读取消息的目标偏移量,相应的,如图10所示,该步骤602的实现过程可以包括:
步骤6021、基于目标偏移量,获取待读取消息的目标索引文件。
在接收到消息读取请求后,可以基于该消息读取请求中的目标偏移量,采用二分法在存储节点中查找该待读取消息的目标索引文件。其中,该目标索引文件可以包括:目标数据记录索引和目标映射记录索引。该目标数据记录索引用于指示该待读取消息在真实分区中的偏移量。该目标映射记录索引用于指示用于存储该待读取消息的虚拟存储地址与真实存储地址的对应关系。
步骤6022、获取目标索引文件中记载的第一消息的消息偏移量。
其中,第一消息为存储在当前对应关系指定的真实topic中的第一个消息。目标索引文件为虚拟存储地址指定的虚拟topic对应的索引文件。可选地,当目标索引文件包括目标数据记录索引和目标映射记录索引时,可以在目标映射记录索引中获取该第一消息的消息偏移量。该当前对应关系为在消息存储系统的使用过程中,对该虚拟存储地址与真实存储地址的对应关系修改后的对应关系。对虚拟存储地址与真实存储地址的对应关系修改前的对应关系为历史对应关系,该历史对应关系中记载的真实存储地址与当前对应关系中记载的真实存储地址不同。且该基于当前对应关系存储的消息的偏移量大于基于历史对应关系存储的消息的偏移量。
由于基于当前对应关系存储的消息,存储在该当前对应关系所指定的真实存储地址中。基于历史对应关系存储的消息,存储在该历史对应关系所指定的真实存储地址中。因此,在确定目标真实存储地址前,需要先获取该第一消息的消息 偏移量,并将该第一消息的消息偏移量与目标偏移量进行比较,以确定该目标真实存储地址为历史对应关系所指定的真实存储地址,还是当前对应关系所指定的真实存储地址,进而保证能够有效地读取消息。且在目标偏移量小于消息偏移量时,确定该目标真实存储地址为历史对应关系所指定的真实存储地址,此时执行步骤6024。在目标偏移量大于或等于消息偏移量时,确定该真实存储地址为当前对应关系所指定的真实存储地址,此时执行步骤6023。
示例地,假设目标映射索引文件中记载的第一消息的消息偏移量为101,且目标偏移量offset为77,由于该目标偏移量小于消息偏移量,则可以确定该真实存储地址为历史对应关系所指定的真实存储地址,此时可以确定执行步骤6024。
步骤6023、当目标偏移量大于或等于消息偏移量时,将当前对应关系中记载的真实存储地址确定为目标真实存储地址。
当目标偏移量大于或等于消息偏移量时,可以确定该真实存储地址为当前对应关系所指定的真实存储地址,此时,可以将该当前对应关系中记载的真实存储地址确定为该目标真实存储地址,且该目标真实存储地址包括目标真实topic的标识和目标真实分区的标识。
步骤6024、当目标偏移量小于消息偏移量时,将历史对应关系中记载的真实存储地址确定为目标真实存储地址。
当目标偏移量小于消息偏移量时,可以确定该真实存储地址为历史对应关系所指定的真实存储地址,此时可以查询该历史对应关系,并将历史对应关系中记载的真实存储地址确定为该目标真实存储地址。
步骤603、在目标真实存储地址指定的目标真实分区读取消息读取请求所指定的消息。
在确定目标真实存储地址后,可以根据目标数据记录索引记载的该待读取消息在真实分区中的偏移量,在该目标真实存储地址指定的目标真实分区中读取该待读取的消息。
可选地,根据消息和索引信息的不同存储方式,该步骤603的实现方式存在一定的差异,下面从以下两个方面进行说明:
在第一方面,当目标真实存储地址指定的目标真实分区位于第一存储节点中时,即索引信息与待读取消息存储在同一存储节点中,此时,可以在该目标真实分区中读取该待读取消息。
在第二方面中,当目标真实存储地址指定的目标真实分区位于其他存储节点中时,即索引信息与待读取消息存储在不同存储节点中,此时,第一存储节点可以向该其他存储节点发送目标索引信息,以供该其他存储节点基于该目标索引信息获取该待读取消息,并向该第一存储节点发送携带有该待读取消息的第二消息读取响应。该第一存储节点在接收该第二消息读取响应后,可根据该第二消息读取响应获取该待读取消息。其中,该其他存储节点为真实分区所属的存储节点。该目标索引信息中记载有目标真实存储地址所指定的目标真实分区的信息。
步骤604、发送携带有待读取消息的第一消息读取响应。
第一存储节点在获取该待读取消息后,可向发送消息读取请求的客户端发送 该第一消息读取响应,以便于该客户端获取该第一消息读取响应中携带的待读取消息。
并且,由于在本申请实施例中,由于消息均是连续地存储在真实分区的消息文件中的,且每个索引项中记录的内容均为连续存储的消息对应的索引信息,因此,在读取消息时,可以根据同一个索引项中连续存储的索引信息,批量地读取该索引项所对应的消息文件中的消息,进而避免离散地读取消息。
需要说明的是,在修改目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系后,若第二真实topic中的第二真实分区属于其他存储节点,在读取消息时需要先在索引信息所在的第一存储节点中读取索引信息,然后根据该索引信息在其他存储节点中读取消息。此时,为了避免读取消息需要一直跨两个存储节点,可以为基于修改前的对应关系存储的数据设置消息老化机制,即当该消息在第一存储节点中的存储时长达到预设时间段时,将该消息进行删除。并且,在删除该消息后,还可以在该其他节点上重建索引,使得索引信息与消息存储在同一个节点,进而保证消息的读取效率。
还需要说明的是,在本申请实施例中,虚拟存储地址与真实存储地址的对应关系也可以表示为虚拟topic与真实topic的对应关系。此时,也可以根据该虚拟topic与真实topic的对应关系执行消息读取的过程。其中,根据该虚拟topic与真实topic的对应关系执行消息读取的实现过程可以相应参考上述步骤601至步骤604,此处不再赘述。
综上所述,本申请实施例提供的消息读取方法,在接收在Kafka集群存储消息的消息存储请求后,通过虚拟存储地址与真实存储地址的对应关系,确定与虚拟存储地址对应的目标真实存储地址,并在目标真实存储地址指定的目标真实分区读取消息读取请求所指定的消息,实现了消息的读取。
本申请实施例提供了一种消息存储装置,如图11所示,该装置700可以包括:
接收模块701,用于接收在Kafka集群存储消息的第一消息存储请求,第一消息存储请求指定在虚拟存储地址存储第一消息存储请求指定的消息,虚拟存储地址可以包括虚拟话题topic的标识和虚拟分区的标识。
确定模块702,用于基于虚拟存储地址与第一真实存储地址的对应关系,确定与虚拟存储地址对应的第一真实存储地址,第一真实存储地址可以包括第一真实topic的标识和第一真实分区的标识。
存储模块703,用于在第一真实存储地址指定的第一真实topic中的第一真实分区,存储第一消息存储请求指定的消息。
可选地,接收模块701,还用于接收在Kafka集群存储消息的第二消息存储请求,第二消息存储请求指定在虚拟存储地址存储第二消息存储请求指定的消息。
确定模块702,还用于基于虚拟存储地址与第二真实存储地址的对应关系,确定与虚拟存储地址对应的第二真实存储地址,第二真实存储地址可以包括第二真实topic的标识和第二真实分区的标识。
存储模块703,还用于在第二真实存储地址指定的第二真实topic中的第二真 实分区,存储第二消息存储请求指定的消息。
可选地,第一真实分区与第二真实分区部署在Kafka集群中的不同存储节点上。
可选地,第二消息存储请求的接收时间晚于第一消息存储请求的接收时间。
可选地,如图12所示,该装置700还可以包括:
预估模块704,用于预估在预设时间段内接收的第二消息存储请求所指定的消息的预存数据量。
建立模块705,用于当预存数据量大于第一阈值时,建立虚拟存储地址与第二真实存储地址的对应关系。
可选地,如图13所示,预估模块704,可以包括:
获取子模块7041,用于对于与第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic,获取在每个目标虚拟topic中存储的消息的第二数据量。
获取子模块7041,还用于获取在第一真实topic中存储的消息的第一数据量。
预估子模块7042,用于基于第一数据量和每个目标虚拟topic的第二数据量,预估预存数据量。
可选地,预估子模块7042,用于:采用预估模型预估预存数据量。
其中,预估模型的输入参数和输出参数均可以包括:至少一组参数,至少一组参数与至少一个目标虚拟topic一一对应,对于每个目标虚拟topic:
输入参数可以包括:第一真实topic的标识和第一数据量,目标虚拟topic的标识,目标虚拟topic的第二数据量与第一数据量的比值。
输出参数可以包括:预存数据量,目标虚拟topic的标识,目标虚拟topic的第三数据量与第一数据量的比值。
或者,输入参数可以包括:第一真实topic的标识和第一数据量,目标虚拟topic的标识和目标虚拟topic的第二数据量。
输出参数可以包括:预存数据量,目标虚拟topic的标识和目标虚拟topic的第三数据量。
可选地,预估模块704,还用于预估在预设时间段内待在第一真实分区所在的第一真实topic中存储的消息的预存数据量。
建立模块705,还用于当预存数据量大于第二阈值时,建立虚拟存储地址与第二真实存储地址的对应关系。
可选地,如图13所示,预估模块704,可以包括:
获取子模块7041,用于对于与第一真实topic存在对应关系的多个虚拟topic中的至少一个目标虚拟topic,获取在每个目标虚拟topic中存储的消息的第二数据量。
获取子模块7041,还用于获取在第一真实topic中存储的消息的第一数据量。
预估子模块7042,用于基于第一数据量和每个目标虚拟topic的第二数据量,预估预存数据量。
可选地,预估子模块7042,用于:采用预估模型预估预存数据量。其中,预 估模型的输入参数和输出参数均可以包括:至少一组参数,至少一组参数与至少一个目标虚拟topic一一对应,对于每个目标虚拟topic:
输入参数可以包括:第一真实topic的标识和第一数据量,目标虚拟topic的标识,目标虚拟topic的第二数据量与第一数据量的比值。
输出参数可以包括:第一真实topic的标识和预存数据量,目标虚拟topic的标识,目标虚拟topic的第三数据量与第一数据量的比值。
或者,输入参数可以包括:第一真实topic的标识和第一数据量,目标虚拟topic的标识和目标虚拟topic的第二数据量。
输出参数可以包括:第一真实topic的标识和预存数据量,目标虚拟topic的标识和目标虚拟topic的第三数据量。
可选地,至少一个目标虚拟topic可以包括:多个虚拟topic中的所有虚拟topic,或者,多个虚拟topic中存储的数据量由大到小的前至少一个虚拟topic。
可选地,如图14所示,建立模块705,可以包括:
查找子模块7051,用于基于每个目标虚拟topic的第三数据量,查找可用数据量大于第三数据量的真实topic,可用数据量为真实topic的数据量额度与预存数据量的差值。
确定子模块7052,用于当确定存在可用数据量大于第三数据量的真实topic时,将可用数据量大于第三数据量的真实topic确定为第二真实topic。
确定子模块7052,用于当确定不存在可用数据量大于第三数据量的真实topic时,在消息存储系统中创建第二真实topic。
修改子模块7053,用于将与目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系,修改为虚拟存储地址与可以包括第二真实topic的第二真实存储地址对应。
可选地,确定子模块7052,用于:当确定存在多个可用数据量大于第三数据量的真实topic时,将最大可用数据量对应的真实topic确定为第二真实topic。
可选地,建立模块705,用于:对于与第一真实topic存在对应关系的至少一个目标虚拟topic,按照至少一个目标虚拟topic的第二数据量由大到小的顺序,依次建立每个目标虚拟topic对应的虚拟存储地址与第二真实存储地址的对应关系。
可选地,建立模块705,还用于:
确定第一消息在第二真实topic中的消息偏移量,第一消息为基于虚拟存储地址与第二真实存储地址的对应关系,存储在第二真实topic中的第一个消息。
将第一消息的消息偏移量和虚拟存储地址与第二真实存储地址的对应关系,存储在目标虚拟topic对应的索引文件中。
可选地,每个真实存储地址与多个虚拟存储地址存在对应关系。
综上所述,本申请实施例提供的消息存储装置,接收模块在接收在Kafka集群存储消息的消息存储请求后,确定模块根据虚拟存储地址与真实存储地址的对应关系,确定用于存储消息的真实存储地址,存储模块将消息存储在该真实存储地址指定的真实分区中,实现了消息的存储。
并且,通过对消息存储请求所指定的待存储消息进行预估,根据预估的数据 量修改虚拟存储地址与真实存储地址的对应关系,使得指定存储至虚拟存储地址中的消息能够存储在不同的真实存储地址中,相较于相关技术,减小了真实topic中真实分区工作负载过重的几率,提高了消息存储系统的吞吐率。
本申请实施例提供了一种消息读取装置,如图15所示,该装置800可以包括:
接收模块801,用于接收在Kafka集群读取消息的消息读取请求,消息读取请求指定从虚拟存储地址读取消息,虚拟存储地址可以包括虚拟话题topic的标识和虚拟分区的标识。
确定模块802,用于基于虚拟存储地址与真实存储地址的对应关系,确定与虚拟存储地址对应的目标真实存储地址,目标真实存储地址可以包括目标真实topic的标识和目标真实分区的标识。
读取模块803,用于在目标真实存储地址指定的目标真实分区读取消息读取请求所指定的消息。
可选地,消息读取请求中携带有待读取消息的目标偏移量,确定模块802,用于:
获取目标索引文件中记载的第一消息的消息偏移量,第一消息为基于虚拟存储地址与真实存储地址的当前对应关系,存储在当前对应关系指定的真实topic中的第一个消息,目标索引文件为虚拟存储地址指定的虚拟topic对应的索引文件。
当目标偏移量大于或等于消息偏移量时,将当前对应关系中记载的真实存储地址确定为目标真实存储地址。
当目标偏移量小于消息偏移量时,将虚拟存储地址与真实存储地址的历史对应关系中记载的真实存储地址确定为目标真实存储地址,当前对应关系中记载的真实存储地址与历史对应关系中记载的真实存储地址不同。
综上所述,本申请实施例提供的消息读取装置,在接收模块接收在Kafka集群存储消息的消息存储请求后,确定模块虚拟存储地址与真实存储地址的对应关系,确定与虚拟存储地址对应的目标真实存储地址,读取模块在目标真实存储地址指定的目标真实分区读取消息读取请求所指定的消息,实现了消息的读取。
本申请实施例还提供了一种服务器,该服务器可以包括处理器和存储器。在处理器执行存储器存储的计算机程序时,服务器执行本申请实施例提供的消息存储方法。
具体地,请参考图16,该服务器20可以包括:处理器22和信号接口24。
处理器22包括一个或者一个以上处理核心。处理器22通过运行软件程序以及模块,从而执行各种功能应用以及数据处理。处理器22可以包括中央处理单元、数字信号处理器、微处理器、微控制器或人工智能处理器中的一种或多种,还可以进一步选择性地包括执行运算所需的硬件加速器,如各种逻辑运算电路。
信号接口24可以为多个,该信号接口24用于与其它装置或模块建立连接,例如:可以通过该信号接口24与收发机进行连接。因此,可选地,该服务器20还可包括收发机(图中未示出)。该收发机具体执行信号收发。当处理器22需要 执行信号收发操作的时候可以调用或驱动收发机执行相应收发操作。因此,当服务器20进行信号收发的时候,处理器22用于决定或发起收发操作,相当于发起者,而收发机用于具体收发执行,相当于执行者。该收发机也可以是收发电路、射频电路或射频单元,本实施例对此不限定。
可选的,服务器20还包括存储器26、总线28等部件。其中,存储器26与信号接口24分别通过总线28与处理器22相连。
存储器26可用于存储软件程序以及模块。具体的,存储器26可存储至少一个功能所需的程序模块262,该程序可以是应用程序或驱动程序。
其中,该程序模块262可以包括:
接收单元2621,具有与接收模块701相同或相似的功能。
确定单元2622,具有与确定模块702相同或相似的功能。
存储单元2623,具有与存储模块703相同或相似的功能。
本发明实施例还提供了一种存储介质,该存储介质可以为非易失性计算机可读存储介质,存储介质内存储有计算机程序,计算机程序指示服务器执行本发明实施例提供的消息存储方法。
本发明实施例还提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行本发明实施例提供的消息存储方法。
本申请实施例还提供了一种服务器,该服务器可以包括处理器和存储器。在处理器执行存储器存储的计算机程序时,服务器执行本申请实施例提供的消息读取方法。
具体地,请参考图17,该服务器40可以包括:处理器42和信号接口44。
处理器42包括一个或者一个以上处理核心。处理器42通过运行软件程序以及模块,从而执行各种功能应用以及数据处理。处理器42可以包括中央处理单元、数字信号处理器、微处理器、微控制器或人工智能处理器中的一种或多种,还可以进一步选择性地包括执行运算所需的硬件加速器,如各种逻辑运算电路。
信号接口44可以为多个,该信号接口44用于与其它装置或模块建立连接,例如:可以通过该信号接口44与收发机进行连接。因此,可选地,该服务器40还可包括收发机(图中未示出)。该收发机具体执行信号收发。当处理器42需要执行信号收发操作的时候可以调用或驱动收发机执行相应收发操作。因此,当服务器40进行信号收发的时候,处理器42用于决定或发起收发操作,相当于发起者,而收发机用于具体收发执行,相当于执行者。该收发机也可以是收发电路、射频电路或射频单元,本实施例对此不限定。
可选的,服务器40还包括存储器46、总线48等部件。其中,存储器46与信号接口44分别通过总线48与处理器42相连。
存储器46可用于存储软件程序以及模块。具体的,存储器46可存储至少一个功能所需的程序模块462,该程序可以是应用程序或驱动程序。
其中,该程序模块462可以包括:
接收单元4621,具有与接收模块801相同或相似的功能。
确定单元4622,具有与确定模块802相同或相似的功能。
读取单元4623,具有与读取模块803相同或相似的功能。
本发明实施例还提供了一种存储介质,该存储介质可以为非易失性计算机可读存储介质,存储介质内存储有计算机程序,计算机程序指示服务器执行本发明实施例提供的消息读取方法。
本发明实施例还提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行本发明实施例提供的消息读取方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (26)

  1. 一种消息存储方法,其特征在于,所述方法应用于卡夫卡Kafka集群;所述方法包括:
    接收在所述Kafka集群存储消息的第一消息存储请求,所述第一消息存储请求指定在虚拟存储地址存储所述第一消息存储请求指定的消息,所述虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识;
    基于所述虚拟存储地址与第一真实存储地址的对应关系,确定与所述虚拟存储地址对应的所述第一真实存储地址,所述第一真实存储地址包括第一真实topic的标识和第一真实分区的标识;
    在所述第一真实存储地址指定的所述第一真实topic中的所述第一真实分区,存储所述第一消息存储请求指定的消息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收在所述Kafka集群存储消息的第二消息存储请求,所述第二消息存储请求指定在所述虚拟存储地址存储所述第二消息存储请求指定的消息;
    基于所述虚拟存储地址与第二真实存储地址的对应关系,确定与所述虚拟存储地址对应的所述第二真实存储地址,所述第二真实存储地址包括第二真实topic的标识和第二真实分区的标识;
    在所述第二真实存储地址指定的所述第二真实topic中的所述第二真实分区,存储所述第二消息存储请求指定的消息。
  3. 根据权利要求2所述的方法,其特征在于,所述第一真实分区与所述第二真实分区部署在所述Kafka集群中的不同存储节点上。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第二消息存储请求的接收时间晚于所述第一消息存储请求的接收时间。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    在接收所述第二消息存储请求之前,预估在预设时间段内接收的所述第二消息存储请求所指定的消息的预存数据量;
    当所述预存数据量大于第一阈值时,建立所述虚拟存储地址与所述第二真实存储地址的对应关系。
  6. 根据权利要求4所述的方法,其特征在于,所述方法包括:
    在接收所述第二消息存储请求之前,预估在预设时间段内待在所述第一真实分区所在的第一真实topic中存储的消息的预存数据量;
    当所述预存数据量大于第二阈值时,建立所述虚拟存储地址与所述第二真实存储地址的对应关系。
  7. 根据权利要求5或6所述的方法,其特征在于,所述建立所述虚拟存储地 址与所述第二真实存储地址的对应关系,包括:
    对于与所述第一真实topic存在对应关系的多个目标虚拟topic,基于所述目标虚拟topic的第三数据量,查找可用数据量大于所述第三数据量的真实topic,所述可用数据量为所述真实topic的数据量额度与所述预存数据量的差值;
    当确定存在可用数据量大于所述第三数据量的真实topic时,将所述可用数据量大于所述第三数据量的真实topic确定为第二真实topic;
    当确定不存在可用数据量大于所述第三数据量的真实topic时,在所述消息存储系统中创建第二真实topic;
    将与所述目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系,修改为所述虚拟存储地址与包括所述第二真实topic的第二真实存储地址对应。
  8. 根据权利要求5至7任一所述的方法,其特征在于,所述建立所述虚拟存储地址与所述第二真实存储地址的对应关系,还包括:
    确定第一消息在所述第二真实topic中的消息偏移量,所述第一消息为基于所述虚拟存储地址与所述第二真实存储地址的对应关系,存储在所述第二真实topic中的第一个消息;
    将所述第一消息的消息偏移量,及所述虚拟存储地址与所述第二真实存储地址的对应关系,存储在所述目标虚拟topic对应的索引文件中。
  9. 根据权利要求1至8任一所述的方法,其特征在于,每个真实存储地址与多个虚拟存储地址存在对应关系。
  10. 一种消息读取方法,其特征在于,所述方法应用于卡夫卡Kafka集群;所述方法包括:
    接收在所述Kafka集群读取消息的消息读取请求,所述消息读取请求指定从虚拟存储地址读取消息,所述虚拟存储地址包括虚拟topic的标识和虚拟分区的标识;
    基于所述虚拟存储地址与真实存储地址的对应关系,确定与所述虚拟存储地址对应的目标真实存储地址,所述目标真实存储地址包括目标真实topic的标识和目标真实分区的标识;
    在所述目标真实存储地址指定的目标真实分区读取所述消息读取请求所指定的消息。
  11. 根据权利要求10所述的方法,其特征在于,所述消息读取请求中携带有待读取消息的目标偏移量,所述确定与所述虚拟存储地址对应的目标真实存储地址,包括:
    获取目标索引文件中记载的第一消息的消息偏移量,所述第一消息为基于所述虚拟存储地址与真实存储地址的当前对应关系,存储在所述当前对应关系指定的真实topic中的第一个消息,所述目标索引文件为所述虚拟存储地址指定的虚拟topic对应的索引文件;
    当所述目标偏移量大于或等于所述消息偏移量时,将所述当前对应关系中记载的真实存储地址确定为所述目标真实存储地址;
    当所述目标偏移量小于所述消息偏移量时,将所述虚拟存储地址与真实存储地址的历史对应关系中记载的真实存储地址确定为所述目标真实存储地址,所述当前对应关系中记载的真实存储地址与所述历史对应关系中记载的真实存储地址不同。
  12. 一种消息存储装置,其特征在于,所述装置包括:
    接收模块,用于接收在Kafka集群存储消息的第一消息存储请求,所述第一消息存储请求指定在虚拟存储地址存储所述第一消息存储请求指定的消息,所述虚拟存储地址包括虚拟话题topic的标识和虚拟分区的标识;
    确定模块,用于基于所述虚拟存储地址与第一真实存储地址的对应关系,确定与所述虚拟存储地址对应的所述第一真实存储地址,所述第一真实存储地址包括第一真实topic的标识和第一真实分区的标识;
    存储模块,用于在所述第一真实存储地址指定的所述第一真实topic中的所述第一真实分区,存储所述第一消息存储请求指定的消息。
  13. 根据权利要求12所述的装置,其特征在于,
    所述接收模块,用于接收在所述Kafka集群存储消息的第二消息存储请求,所述第二消息存储请求指定在所述虚拟存储地址存储所述第二消息存储请求指定的消息;
    所述确定模块,用于基于所述虚拟存储地址与第二真实存储地址的对应关系,确定与所述虚拟存储地址对应的所述第二真实存储地址,所述第二真实存储地址包括第二真实topic的标识和第二真实分区的标识;
    所述存储模块,用于在所述第二真实存储地址指定的所述第二真实topic中的所述第二真实分区,存储所述第二消息存储请求指定的消息。
  14. 根据权利要求13所述的装置,其特征在于,所述第一真实分区与所述第二真实分区部署在所述Kafka集群中的不同存储节点上。
  15. 根据权利要求13或14所述的装置,其特征在于,所述第二消息存储请求的接收时间晚于所述第一消息存储请求的接收时间。
  16. 根据权利要求15所述的装置,其特征在于,所述装置还包括:
    预估模块,用于预估在预设时间段内接收的所述第二消息存储请求所指定的消息的预存数据量;
    建立模块,用于当所述预存数据量大于第一阈值时,建立所述虚拟存储地址与所述第二真实存储地址的对应关系。
  17. 根据权利要求15所述的装置,其特征在于,所述装置包括:
    预估模块,用于预估在预设时间段内待在所述第一真实分区所在的第一真实topic中存储的消息的预存数据量;
    建立模块,用于当所述预存数据量大于第二阈值时,建立所述虚拟存储地址与所述第二真实存储地址的对应关系。
  18. 根据权利要求16或17所述的装置,所述建立模块,包括:
    查找子模块,用于对于与所述第一真实topic存在对应关系的多个目标虚拟topic,基于所述目标虚拟topic的第三数据量,查找可用数据量大于所述第三数据量的真实topic,所述可用数据量为所述真实topic的数据量额度与所述预存数据量的差值;
    确定子模块,用于当确定存在可用数据量大于所述第三数据量的真实topic时,将所述可用数据量大于所述第三数据量的真实topic确定为第二真实topic;
    所述确定子模块,用于当确定不存在可用数据量大于所述第三数据量的真实topic时,在所述消息存储系统中创建第二真实topic;
    修改子模块,用于将与所述目标虚拟topic对应的虚拟存储地址与真实存储地址的对应关系,修改为所述虚拟存储地址与包括所述第二真实topic的第二真实存储地址对应。
  19. 根据权利要求16至18任一所述的装置,其特征在于,所述建立模块,还用于:
    确定第一消息在所述第二真实topic中的消息偏移量,所述第一消息为基于所述虚拟存储地址与所述第二真实存储地址的对应关系,存储在所述第二真实topic中的第一个消息;
    将所述第一消息的消息偏移量,及所述虚拟存储地址与所述第二真实存储地址的对应关系,存储在所述目标虚拟topic对应的索引文件中。
  20. 根据权利要求12至19任一所述的装置,其特征在于,每个真实存储地址与多个虚拟存储地址存在对应关系。
  21. 一种消息读取装置,其特征在于,所述装置包括:
    接收模块,用于接收在Kafka集群读取消息的消息读取请求,所述消息读取请求指定从虚拟存储地址读取消息,所述虚拟存储地址包括虚拟topic的标识和虚拟分区的标识;
    确定模块,用于基于所述虚拟存储地址与真实存储地址的对应关系,确定与所述虚拟存储地址对应的目标真实存储地址,所述目标真实存储地址包括目标真实topic的标识和目标真实分区的标识;
    读取模块,用于在所述目标真实存储地址指定的目标真实分区读取所述消息读取请求所指定的消息。
  22. 根据权利要求21所述的装置,其特征在于,所述消息读取请求中携带有 待读取消息的目标偏移量,所述确定模块,用于:
    获取目标索引文件中记载的第一消息的消息偏移量,所述第一消息为基于所述虚拟存储地址与真实存储地址的当前对应关系,存储在所述当前对应关系指定的真实topic中的第一个消息,所述目标索引文件为所述虚拟存储地址指定的虚拟topic对应的索引文件;
    当所述目标偏移量大于或等于所述消息偏移量时,将所述当前对应关系中记载的真实存储地址确定为所述目标真实存储地址;
    当所述目标偏移量小于所述消息偏移量时,将所述虚拟存储地址与真实存储地址的历史对应关系中记载的真实存储地址确定为所述目标真实存储地址,所述当前对应关系中记载的真实存储地址与所述历史对应关系中记载的真实存储地址不同。
  23. 一种服务器,其特征在于,包括处理器和存储器;
    在所述处理器执行所述存储器存储的计算机程序时,所述服务器执行权利要求1至9任一所述的消息存储方法。
  24. 一种服务器,其特征在于,包括处理器和存储器;
    在所述处理器执行所述存储器存储的计算机程序时,所述服务器执行权利要求10或11所述的消息读取方法。
  25. 一种存储介质,其特征在于,所述存储介质内存储有计算机程序,所述计算机程序指示服务器执行权利要求1至9任一所述的消息存储方法。
  26. 一种存储介质,其特征在于,所述存储介质内存储有计算机程序,所述计算机程序指示服务器执行权利要求10或11所述的消息读取方法。
PCT/CN2019/081173 2018-08-31 2019-04-03 消息存储、读取方法及装置、服务器、存储介质 WO2020042612A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811014981.XA CN109271106B (zh) 2018-08-31 2018-08-31 消息存储、读取方法及装置、服务器、存储介质
CN201811014981.X 2018-08-31

Publications (1)

Publication Number Publication Date
WO2020042612A1 true WO2020042612A1 (zh) 2020-03-05

Family

ID=65187026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081173 WO2020042612A1 (zh) 2018-08-31 2019-04-03 消息存储、读取方法及装置、服务器、存储介质

Country Status (2)

Country Link
CN (1) CN109271106B (zh)
WO (1) WO2020042612A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271106B (zh) * 2018-08-31 2021-03-05 华为技术有限公司 消息存储、读取方法及装置、服务器、存储介质
CN110928491B (zh) * 2019-10-30 2022-04-19 平安科技(深圳)有限公司 存储分区动态选择方法、系统、计算机设备及存储介质
CN111143580B (zh) * 2019-12-26 2024-04-09 惠州Tcl移动通信有限公司 多媒体数据存储方法、装置、存储介质及电子设备
CN111930528A (zh) * 2020-08-12 2020-11-13 银联商务股份有限公司 消息中间件的消息写入方法、装置、设备及可读存储介质
CN113297309B (zh) * 2021-05-31 2023-11-10 平安证券股份有限公司 流数据写入方法、装置、设备及存储介质
CN114968088B (zh) * 2022-04-08 2023-09-05 中移互联网有限公司 文件存储方法、文件读取方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955717A (zh) * 2012-11-05 2013-03-06 北京奇虎科技有限公司 在分布式消息处理系统中的消息管理设备和方法
CN106095589A (zh) * 2016-06-30 2016-11-09 浪潮软件集团有限公司 一种分配分区的方法、装置及系统
US20180060143A1 (en) * 2016-08-26 2018-03-01 Vmware, Inc. Distributed shared log storage system having an adapter for heterogenous big data workloads
CN108023953A (zh) * 2017-12-04 2018-05-11 北京小度信息科技有限公司 Ftp服务的高可用实现方法和装置
CN108255875A (zh) * 2016-12-29 2018-07-06 北京奇虎科技有限公司 将消息存储至分布式文件系统的方法和装置
CN108365971A (zh) * 2018-01-10 2018-08-03 深圳市金立通信设备有限公司 日志解析方法、设备及计算机可读介质
CN109271106A (zh) * 2018-08-31 2019-01-25 华为技术有限公司 消息存储、读取方法及装置、服务器、存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037153B2 (en) * 2001-12-21 2011-10-11 International Business Machines Corporation Dynamic partitioning of messaging system topics
CN103473334B (zh) * 2013-09-18 2017-01-11 中控技术(西安)有限公司 数据存储、查询方法及系统
CN105490854B (zh) * 2015-12-11 2019-03-12 传线网络科技(上海)有限公司 实时日志收集方法、系统和应用服务器集群
CN106375462B (zh) * 2016-09-13 2019-05-10 北京百度网讯科技有限公司 在分布式消息系统中实现消息持久化的方法及装置
CN107273310A (zh) * 2017-06-30 2017-10-20 浙江大华技术股份有限公司 一种多媒体数据的读取方法、装置、介质及设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955717A (zh) * 2012-11-05 2013-03-06 北京奇虎科技有限公司 在分布式消息处理系统中的消息管理设备和方法
CN106095589A (zh) * 2016-06-30 2016-11-09 浪潮软件集团有限公司 一种分配分区的方法、装置及系统
US20180060143A1 (en) * 2016-08-26 2018-03-01 Vmware, Inc. Distributed shared log storage system having an adapter for heterogenous big data workloads
CN108255875A (zh) * 2016-12-29 2018-07-06 北京奇虎科技有限公司 将消息存储至分布式文件系统的方法和装置
CN108023953A (zh) * 2017-12-04 2018-05-11 北京小度信息科技有限公司 Ftp服务的高可用实现方法和装置
CN108365971A (zh) * 2018-01-10 2018-08-03 深圳市金立通信设备有限公司 日志解析方法、设备及计算机可读介质
CN109271106A (zh) * 2018-08-31 2019-01-25 华为技术有限公司 消息存储、读取方法及装置、服务器、存储介质

Also Published As

Publication number Publication date
CN109271106A (zh) 2019-01-25
CN109271106B (zh) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2020042612A1 (zh) 消息存储、读取方法及装置、服务器、存储介质
US11704144B2 (en) Creating virtual machine groups based on request
US11334382B2 (en) Technologies for batching requests in an edge infrastructure
KR102139410B1 (ko) 시간 기반 노드 선출 방법 및 장치
WO2020052605A1 (zh) 一种网络切片的选择方法及装置
WO2018000993A1 (zh) 一种分布式存储的方法和系统
US11573725B2 (en) Object migration method, device, and system
US20160094413A1 (en) Network Resource Governance in Multi-Tenant Datacenters
CN109964507B (zh) 网络功能的管理方法、管理单元及系统
CN111522636A (zh) 应用容器的调整方法、调整系统、计算机可读介质及终端设备
CN111526031B (zh) 一种业务虚拟网络功能vnf的扩缩容方法及设备
US10521258B2 (en) Managing test services in a distributed production service environment
CN109710406B (zh) 数据分配及其模型训练方法、装置、及计算集群
CN110058937B (zh) 用于调度专用处理资源的方法、设备和介质
US11513854B1 (en) Resource usage restrictions in a time-series database
US11256719B1 (en) Ingestion partition auto-scaling in a time-series database
CN102480502B (zh) 一种i/o负载均衡方法及i/o服务器
US11461053B2 (en) Data storage system with separate interfaces for bulk data ingestion and data access
CN113805816B (zh) 一种磁盘空间管理方法、装置、设备及存储介质
Xiang et al. Differentiated latency in data center networks with erasure coded files through traffic engineering
US11409725B1 (en) Multi-tenant partitioning in a time-series database
WO2022111456A1 (zh) 基于众核系统的核共享方法及装置、电子设备、介质
US20220103500A1 (en) Method and device for managing group member, and method for processing group message
WO2018053838A1 (zh) 一种负载均衡的方法及相关设备
US11108854B2 (en) Peer-to-peer network for internet of things resource allocation operation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19855975

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19855975

Country of ref document: EP

Kind code of ref document: A1