CN109325200B - Method and device for acquiring data and computer readable storage medium - Google Patents

Method and device for acquiring data and computer readable storage medium Download PDF

Info

Publication number
CN109325200B
CN109325200B CN201810822678.6A CN201810822678A CN109325200B CN 109325200 B CN109325200 B CN 109325200B CN 201810822678 A CN201810822678 A CN 201810822678A CN 109325200 B CN109325200 B CN 109325200B
Authority
CN
China
Prior art keywords
cluster
data
read
write
kafka
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810822678.6A
Other languages
Chinese (zh)
Other versions
CN109325200A (en
Inventor
马跃
曹东阳
孙爱林
乔超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810822678.6A priority Critical patent/CN109325200B/en
Publication of CN109325200A publication Critical patent/CN109325200A/en
Application granted granted Critical
Publication of CN109325200B publication Critical patent/CN109325200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method and a device for acquiring data and a computer readable storage medium, and relates to the technical field of computers. The method for acquiring data comprises the following steps: receiving a subject data acquisition request sent by a user, wherein the subject data acquisition request comprises an identifier of subject data; inquiring a pre-created kafka cluster theme data information table by using the identification of the theme data, and determining the cluster where the theme data is located in the kafka cluster, wherein the kafka cluster theme data information table contains the mapping relation between each theme data in the kafka cluster and the cluster where the theme data is located; and sending the configuration parameters and the consumption key of the cluster where the subject data are located to the user so that the user can obtain the subject data from the cluster where the subject data are located. According to the method and the device, the read cluster can be automatically matched according to the theme data acquisition request of the user, and the automatic configuration of the Kafka cluster is realized.

Description

Method and device for acquiring data and computer readable storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for acquiring data, and a computer-readable storage medium.
Background
Kafka is an open source stream processing platform developed by the Apache software foundation, written in Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data in a consumer-scale website.
The Kafka cluster of the large-scale enterprise has already made data read-write separation. Kafka data read-write is mutually decoupled, one data is written at the upstream, and a plurality of downstream services can be used for consuming the data, so that the number of consumers is larger than the data amount of a producer.
In order to prepare data storage disaster recovery, data with large data volume and high service level are often backed up and stored on a plurality of clusters, usually a write cluster and a plurality of read clusters. FIG. 1 shows a schematic diagram of a Kafka cluster in which a consumer consumes one write cluster, multiple read clusters. If the data volume and the service level are increased by one level of service data, one part of service data needs to be disassembled into multiple parts and written into multiple writing clusters at the same time, and then the data of the multiple writing clusters are converged into one part of data to be respectively synchronized to multiple reading clusters for backup. FIG. 2 shows a schematic diagram of a Kafka cluster in which a consumer consumes multiple write clusters, multiple read clusters. And for data with small data volume and unimportant service, the same read-write cluster which is not separated is adopted. Fig. 3 shows a schematic diagram of a Kafka cluster in which consumer consumption reads and writes are not separated.
Disclosure of Invention
The technical problem solved by the present disclosure is how to automatically match the read cluster according to the topic data acquisition request of the user, and realize the automatic configuration of the Kafka cluster.
According to an aspect of the embodiments of the present disclosure, there is provided a method of acquiring data, including: receiving a subject data acquisition request sent by a user, wherein the subject data acquisition request comprises an identifier of subject data; inquiring a pre-created kafka cluster theme data information table by using the identification of the theme data, and determining the cluster where the theme data is located in the kafka cluster, wherein the kafka cluster theme data information table contains the mapping relation between each theme data in the kafka cluster and the cluster where the theme data is located; and sending the configuration parameters and the consumption key of the cluster where the subject data are located to the user so that the user can obtain the subject data from the cluster where the subject data are located.
In some embodiments, determining the cluster in which the subject data is located in the kafka cluster comprises: if the read clusters storing the subject data exist in the kafka clusters, the read cluster with the smallest load is selected as the cluster where the subject data are located.
In some embodiments, determining the cluster in which the subject data is located in the kafka cluster comprises: and if the read cluster for storing the subject data does not exist in the kafka cluster and the read-write cluster for storing the subject data exists in the kafka cluster, determining the read-write cluster as the cluster in which the subject data is located.
In some embodiments, determining the read cluster in which the subject data resides in the kafka cluster comprises: if the read cluster for storing the subject data, the read-write cluster for storing the subject data and the write cluster for storing the subject data do not exist in the kafka cluster, inquiring the read cluster related to the write cluster from a pre-established storage cluster topological graph; and synchronizing the writing clusters to the associated reading clusters, and determining the clusters where the subject data are located according to the reading clusters obtained through synchronization.
In some embodiments, synchronizing the write clusters to the associated read clusters, and determining the cluster in which the subject data is located according to the read clusters obtained by synchronization includes: if the write cluster is of the single synchronization type, synchronizing the write cluster to the read cluster with the minimum associated load; and if the write cluster is of a multi-synchronization type, synchronizing the write cluster to all the associated read clusters, and selecting the read cluster with the minimum load from the write clusters as the cluster where the subject data is located.
In some embodiments, synchronizing a write cluster to an associated read cluster comprises: synchronizing the write clusters to each associated read cluster through a plurality of synchronization tasks, wherein the synchronized data volume of each synchronization task does not exceed a preset value; and respectively determining the number of the theme data synchronized by each synchronous task according to a preset value and the data volume of each theme data synchronized by each synchronous task.
In some embodiments, the subject data acquisition request further includes a service level of the user; the method further comprises the following steps: associating the business grade with the cluster where the subject data is located; determining the comprehensive evaluation level of each cluster according to the service level associated with each cluster; and carrying out cluster isolation on each cluster according to the comprehensive evaluation grade of each cluster.
According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for acquiring data, including: the system comprises a message receiving module, a message sending module and a message sending module, wherein the message receiving module is configured to receive a theme data acquisition request sent by a user, and the theme data acquisition request comprises an identifier of theme data; the system comprises a cluster determining module, a topic data analyzing module and a topic data analyzing module, wherein the cluster determining module is configured to query a pre-created kafka cluster topic data information table by using identification of topic data, determine a cluster where the topic data is located in the kafka cluster, and the kafka cluster topic data information table contains mapping relations between each topic data in the kafka cluster and the located cluster; and the message sending module is configured to send the configuration parameters and the consumption keys of the clusters where the subject data are located to the user so that the user can obtain the subject data from the clusters where the subject data are located.
In some embodiments, the cluster determination module is configured to: if the read clusters storing the subject data exist in the kafka clusters, the read cluster with the smallest load is selected as the cluster where the subject data are located.
In some embodiments, the cluster determination module is configured to: and if the read cluster for storing the subject data does not exist in the kafka cluster and the read-write cluster for storing the subject data exists in the kafka cluster, determining the read-write cluster as the cluster in which the subject data is located.
In some embodiments, the cluster determination module is configured to: if the read cluster for storing the subject data, the read-write cluster for storing the subject data and the write cluster for storing the subject data do not exist in the kafka cluster, inquiring the read cluster related to the write cluster from a pre-established storage cluster topological graph; and synchronizing the writing clusters to the associated reading clusters, and determining the clusters where the subject data are located according to the reading clusters obtained through synchronization.
In some embodiments, the cluster determination module is configured to: if the write cluster is of the single synchronization type, synchronizing the write cluster to the read cluster with the minimum associated load; and if the write cluster is of a multi-synchronization type, synchronizing the write cluster to all the associated read clusters, and selecting the read cluster with the minimum load from the write clusters as the cluster where the subject data is located.
In some embodiments, the cluster determination module is configured to: synchronizing the write clusters to each associated read cluster through a plurality of synchronization tasks, wherein the synchronized data volume of each synchronization task does not exceed a preset value; and respectively determining the number of the theme data synchronized by each synchronous task according to a preset value and the data volume of each theme data synchronized by each synchronous task.
In some embodiments, the subject data acquisition request further includes a service level of the user; the apparatus also includes a cluster isolation module configured to: associating the business grade with the cluster where the subject data is located; determining the comprehensive evaluation level of each cluster according to the service level associated with each cluster; and carrying out cluster isolation on each cluster according to the comprehensive evaluation grade of each cluster.
According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for acquiring data, including: a memory; and a processor coupled to the memory, the processor configured to perform the aforementioned method of retrieving data based on instructions stored in the memory.
According to still another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores computer instructions, and the instructions are executed by a processor to implement the foregoing method for acquiring data.
According to the method and the device, the read cluster can be automatically matched according to the theme data acquisition request of the user, and the automatic configuration of the Kafka cluster is realized.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 shows a schematic diagram of a Kafka cluster in which a consumer consumes one write cluster, multiple read clusters.
FIG. 2 shows a schematic diagram of a Kafka cluster in which a consumer consumes multiple write clusters, multiple read clusters.
Fig. 3 shows a schematic diagram of a Kafka cluster in which consumer consumption reads and writes are not separated.
Fig. 4 shows a flow diagram of a method of acquiring data according to an embodiment of the present disclosure.
Fig. 5 shows a schematic flow chart of determining a cluster in which subject data is located in the kafka cluster.
Fig. 6 shows a schematic flow chart of a method of acquiring data according to another embodiment of the present disclosure.
Fig. 7 shows a system workflow of an application example of the method of acquiring data.
Fig. 8 is a schematic structural diagram of an apparatus for acquiring data according to an embodiment of the present disclosure.
Fig. 9 is a schematic structural diagram of an apparatus for acquiring data according to another embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The inventors have conducted studies on how consumers decide which cluster of data to consume after locking a piece of subject data. For large enterprises, Kafka clusters are operated and maintained by a unified team, and a business department generally only concerns the accuracy and high performance of consumption data and does not concern which cluster the consumption data is on. Thus, the business department needs the operation and maintenance team to keep track of which cluster is consumed specifically. That is, the current approach taken in determining which cluster of data to consume is manual assignment by a human, namely: firstly, a business department designates which theme data to consume and submits the theme data in the form of mails or filled web pages; then, the operation and maintenance team inquires which clusters the subject data are on, and manually allocates the data on a certain read cluster to a business department according to the business classification and load carried by the clusters; and finally, returning the configuration parameters and the consumption keys of the service department cluster through mails or web pages.
The inventor analyzes that the above related technical solutions require operation and maintenance personnel to perform advance judgment and manually allocate the operation and maintenance personnel to the user read cluster, and the operation and maintenance personnel may manually synchronize the subject data from the write cluster to the read cluster, so that on one hand, the labor cost is high, and on the other hand, with the growth of the Kafka cluster, the operation and maintenance personnel may cause a read cluster error allocated to the business department due to a manual judgment error, and the business department cannot consume the data or the cluster resources are unreasonably allocated. In order to solve the problems, the present disclosure provides a new method for acquiring data, which can automatically match a read cluster according to a subject data acquisition request of a user, and realize the automatic configuration of a Kafka cluster. A method of acquiring data according to an embodiment of the present disclosure is described below with reference to fig. 4.
Fig. 4 shows a flow diagram of a method of acquiring data according to an embodiment of the present disclosure. As shown in fig. 4, the method of acquiring data in the present embodiment includes steps S402 to S406.
In step S402, a subject data obtaining request sent by a user is received, where the subject data obtaining request includes an identifier of the subject data.
In step S404, the kafka cluster topic data information table created in advance is queried by using the identification of the topic data, the cluster where the topic data is located is determined in the kafka cluster, and the kafka cluster topic data information table contains the mapping relationship between each topic data in the kafka cluster and the located cluster.
The specific process of determining the cluster in which the subject data is located in the kafka cluster is described in detail in the following embodiments.
In step S406, the configuration parameters and the consumption key of the cluster where the subject data is located are sent to the user, so that the user can obtain the subject data from the cluster where the subject data is located.
The embodiment can automatically match the read cluster according to the theme data acquisition request of the user, and realizes the automatic configuration of the Kafka cluster.
How to determine the cluster in which the subject data is located in the kafka cluster is described in detail below with reference to fig. 5.
Fig. 5 shows a schematic flow chart of determining a cluster in which subject data is located in the kafka cluster. As shown in fig. 5, the process of determining the cluster in which the subject data is located in the kafka cluster specifically includes steps S5042 to S5046.
In step S5041, it is determined whether there is a read cluster storing the subject data in the kafka cluster.
If there is a read cluster storing the subject data in the kafka cluster, step S5042 is executed.
In step S5042, the read cluster with the smallest load is selected from the read clusters storing the subject data as the cluster in which the subject data is located.
If there is no read cluster storing the subject data in the kafka cluster, step S5043 is performed.
In step S5043, it is determined whether there is a read-write cluster storing the subject data among the kafka clusters.
If there is a read-write cluster storing the subject data in the kafka cluster, step S5044 is performed.
In step S5044, the read-write cluster is determined as the cluster in which the subject data is located.
If there is no read-write cluster storing the subject data among the kafka clusters, step S5045 is performed.
In step S5045, it is determined whether there is a write cluster storing the subject data in the kafka cluster.
If there is a write cluster storing the subject data, step S5046 is performed.
In step S5046, a read cluster associated with the write cluster is queried from a pre-created storage cluster topology map;
in step S5047, the write cluster is synchronized to the associated read cluster, and the cluster where the subject data is located is determined according to the read cluster obtained through synchronization.
In the synchronization process, if the write cluster is of a single synchronization type, synchronizing the write cluster to the read cluster with the minimum associated load; and if the write cluster is of a multi-synchronization type, synchronizing the write cluster to all the associated read clusters, and selecting the read cluster with the minimum load from the write clusters as the cluster where the subject data is located.
During synchronization, the write clusters can be synchronized to each associated read cluster through a plurality of synchronization tasks, and the amount of data synchronized by each synchronization task does not exceed a preset value. And then, respectively determining the number of the theme data synchronized by each synchronization task according to a preset value and the data volume of each theme data synchronized by each synchronization task.
If there is no write cluster storing the subject data, step S5048 is executed.
In step S5048, the user is notified that there is no cluster storing the topic data among the kafka clusters.
In the embodiment, according to the read-write attribute of the cluster storing the subject data in the kafka cluster, the existing read cluster is preferentially provided for the user, so that the system overhead can be saved, and the working efficiency can be improved; the read-write cluster is provided for the user suboptimally because the read-write cluster has small data volume and is mostly used for storing the theme data with relatively low importance, and cluster information is provided for the user requesting the theme data with relatively high importance more quickly through the cluster searching sequence with certain priority; alternatively, the read cluster obtained by synchronizing the write clusters is provided for the user, so that when only the write cluster storing the subject data requested by the user exists in the Kafka cluster, the read cluster is provided for the user to obtain the subject data from the read cluster.
In addition, the incidence relation between the writing cluster and the reading cluster is set in the pre-created storage cluster topological graph, automatic synchronization between the writing cluster and the reading cluster can be realized through the storage cluster topological graph, automatic matching of the reading cluster according to a theme data acquisition request of a user is further realized, and automatic configuration of the Kafka cluster is further realized.
A method of acquiring data according to another embodiment of the present disclosure is described below with reference to fig. 6.
Fig. 6 shows a schematic flow chart of a method of acquiring data according to another embodiment of the present disclosure. As shown in fig. 6, the method for acquiring data in the present embodiment further includes steps S602 to S612.
In step S602, a subject data obtaining request sent by a user is received, where the subject data obtaining request includes an identifier of the subject data and a service level of the user.
In step S604, the kafka cluster topic data information table created in advance is queried by using the identification of the topic data, the cluster where the topic data is located is determined in the kafka cluster, and the kafka cluster topic data information table contains the mapping relationship between each topic data in the kafka cluster and the located cluster.
In step S606, the configuration parameters and the consumption key of the cluster where the subject data is located are sent to the user, so that the user can obtain the subject data from the cluster where the subject data is located.
In step S608, the business rank is associated with the cluster in which the topic data is located.
For example, if the traffic level of the user U1 is level L2, level L2 is associated with the cluster M1 in which the requested subject data is located.
In step S610, a comprehensive evaluation level of each cluster is determined according to the service level associated with each cluster.
For example, cluster M1 may associate user ranks of multiple users U1-U10. The associated 10 user ratings may be averaged to obtain a composite rating M1 for cluster M1.
In step S612, cluster isolation is performed on each cluster according to the comprehensive evaluation level of each cluster.
For example, clusters of different composite evaluation levels may be cluster isolated.
In the embodiment, objective evaluation of the cluster grade is realized through the service grade of the user, and automatic cluster isolation is realized according to the comprehensive evaluation grade of the cluster, so that automatic configuration of the Kafka cluster is further realized.
A specific application example for implementing the method for acquiring data of the present disclosure is described below. The application example realizes the method for acquiring the data through the Kafka cluster consumer management system, the Kafka inter-cluster data synchronization task management system and the Kafka cluster information management system.
Kafka cluster consumer management system
The system records all consumer information requesting to consume Kafka data, and specifically comprises the following steps:
(1) the service information of the consumer may specifically include a service project group, a service level, a use description and the like to which the consumer belongs, so that Kafka operation and maintenance personnel comprehensively manage the service volume and the service importance carried by the cluster, and the operation and maintenance personnel conveniently perform cluster isolation according to the service level and perform cost accounting on resources used by the service project in granularity;
(2) the parameter information of the consumption Kafka data can specifically comprise which service data is consumed, the storage period of the service data, the storage partition of the service data and the like, so that a service department can more clearly know the storage condition of the consumed data;
(3) the connection information of the cluster where the data required by the consumer is located may specifically include a username and a password for connecting the cluster, a specific server of the Kafka reading cluster to be connected, and the like, so that a business department can perform self-service query at any time without recording in other texts.
The Kafka cluster consumer management system stores a Kafka cluster theme data information table, when a producer of theme data creates the theme data, operation and maintenance personnel distribute the theme data to a certain cluster according to business grouping, and in the process, the Kafka cluster consumer management system automatically records the relation between the theme data and the cluster. The system is open to the business department and is also an entrance for the business department to initiate a Kafka consumption data request.
Kafka inter-cluster data synchronization task management system
The system is a task management system for data synchronization among Kafka clusters, and is a main tool for realizing read-write separation of Kafka services. In the cluster synchronization process, the source cluster is determined to be a write cluster of Kafka, and the target cluster is determined to be a read cluster of Kafka. In addition to configuring the operating parameters in the synchronization task, the most important theme data to be synchronized by the task is also configured, and the configuration process is as follows:
(1) recording which theme data of each cluster are in a table;
(2) determining a source cluster;
(3) selecting the business theme data to be synchronized in the source cluster range determined in the step (2);
(4) determining a target cluster;
(5) save and start the sync task.
The number of topics that can be synchronized by each synchronization task can be determined according to the size of the data volume of the synchronized topics, and when the data volume of synchronization exceeds a set threshold, a new synchronization task is started.
Kafka cluster information management system
Performing configuration parameter maintenance, server information maintenance, service grouping maintenance, cluster topological relation and other information on the Kafka cluster by taking the Kafka cluster as granularity; the cluster information management system plays a role of an information dictionary in the whole scheme; the system is maintained by operation and maintenance personnel according to the actual conditions of the clusters (for example, the operation and maintenance personnel manually enter or import in batches under the conditions of Kafka cluster capacity expansion, capacity reduction, machine increase and decrease, system upgrade, parameter change and the like).
The application case realizes automatic distribution of the data of the read cluster to the service department mainly through the topological relation maintained in the cluster management system. In the topological relation, the writing cluster has a corresponding sub-cluster, and the sub-cluster refers to a reading cluster corresponding to the subject data on the writing cluster, that is, a corresponding target cluster when the synchronization task is started. The cluster management system also needs to record the attribute of each cluster, which is defined by the operation and maintenance personnel and is input into the Kafka cluster information management system when a cluster is newly built. The read cluster only allows the service department to read data from the cluster, and the write cluster only allows the service department to write data to the cluster; a read-write cluster allows both the business entity to write data to the cluster and the business entity to consume data from the cluster. The cluster synchronization method includes two methods: the single synchronization means that if a plurality of sub-clusters of the cluster are provided, one sub-cluster is selected according to the load to carry out data synchronization; the multiple synchronization means that if there are a plurality of sub-clusters of the cluster, data synchronization is performed to the sub-clusters, respectively.
Fig. 7 shows a system workflow of an application example of the method of acquiring data. It can be seen from fig. 7 that no operation and maintenance personnel are required to participate in the whole process, and the business department sends out a request and then automatically returns a reasonable reading cluster to the business department according to the judgment of the rule.
The application example replaces Kafka reading clusters of the theme data manually distributed by operation and maintenance personnel, maintains cluster dictionaries and logic rules in programs to automatically match the reading clusters according to the theme data specified by a service department, does not need the operation and maintenance personnel to manually configure, and realizes the automation of consuming the Kafka data by the service department to acquire cluster configuration information.
An apparatus for acquiring data according to an embodiment of the present disclosure is described below with reference to fig. 8.
Fig. 8 is a schematic structural diagram of an apparatus for acquiring data according to an embodiment of the present disclosure. As shown in fig. 8, the apparatus 80 for acquiring data in the present embodiment includes:
a message receiving module 802, configured to receive a subject data obtaining request sent by a user, where the subject data obtaining request includes an identifier of subject data;
the cluster determining module 804 is configured to query a pre-created kafka cluster topic data information table by using the identification of the topic data, determine a cluster where the topic data is located in the kafka cluster, and the kafka cluster topic data information table contains the mapping relationship between each topic data in the kafka cluster and the located cluster;
the message sending module 806 is configured to send the configuration parameters and the consumption key of the cluster where the subject data is located to the user, so that the user can obtain the subject data from the cluster where the subject data is located.
The embodiment can automatically match the read cluster according to the theme data acquisition request of the user, and realizes the automatic configuration of the Kafka cluster.
In some embodiments, the cluster determination module 804 is configured to: if the read clusters storing the subject data exist in the kafka clusters, the read cluster with the smallest load is selected as the cluster where the subject data are located.
In some embodiments, the cluster determination module 804 is configured to: and if the read cluster for storing the subject data does not exist in the kafka cluster and the read-write cluster for storing the subject data exists in the kafka cluster, determining the read-write cluster as the cluster in which the subject data is located.
In some embodiments, the cluster determination module 804 is configured to: if the read cluster for storing the subject data, the read-write cluster for storing the subject data and the write cluster for storing the subject data do not exist in the kafka cluster, inquiring the read cluster related to the write cluster from a pre-established storage cluster topological graph; and synchronizing the writing clusters to the associated reading clusters, and determining the clusters where the subject data are located according to the reading clusters obtained through synchronization.
In the embodiment, according to the read-write attribute of the cluster storing the subject data in the kafka cluster, the existing read cluster is preferentially provided for the user, so that the system overhead can be saved, and the working efficiency can be improved; the read-write cluster is provided for the user suboptimally because the read-write cluster has small data volume and is mostly used for storing the theme data with relatively low importance, and cluster information is provided for the user requesting the theme data with relatively high importance more quickly through the cluster searching sequence with certain priority; alternatively, the read cluster obtained by synchronizing the write clusters is provided for the user, so that when only the write cluster storing the subject data requested by the user exists in the Kafka cluster, the read cluster is provided for the user to obtain the subject data from the read cluster.
In addition, the incidence relation between the writing cluster and the reading cluster is set in the pre-created storage cluster topological graph, automatic synchronization between the writing cluster and the reading cluster can be realized through the storage cluster topological graph, automatic matching of the reading cluster according to a theme data acquisition request of a user is further realized, and automatic configuration of the Kafka cluster is further realized.
In some embodiments, the cluster determination module 804 is configured to: if the write cluster is of the single synchronization type, synchronizing the write cluster to the read cluster with the minimum associated load; and if the write cluster is of a multi-synchronization type, synchronizing the write cluster to all the associated read clusters, and selecting the read cluster with the minimum load from the write clusters as the cluster where the subject data is located.
In some embodiments, the cluster determination module 804 is configured to: synchronizing the write clusters to each associated read cluster through a plurality of synchronization tasks, wherein the synchronized data volume of each synchronization task does not exceed a preset value; and respectively determining the number of the theme data synchronized by each synchronous task according to a preset value and the data volume of each theme data synchronized by each synchronous task.
In some embodiments, the subject data acquisition request further includes a service level of the user; the apparatus 80 further comprises a cluster isolation module 808 configured to: associating the business grade with the cluster where the subject data is located; determining the comprehensive evaluation level of each cluster according to the service level associated with each cluster; and carrying out cluster isolation on each cluster according to the comprehensive evaluation grade of each cluster.
In the embodiment, objective evaluation of the cluster grade is realized through the service grade of the user, and automatic cluster isolation is realized according to the comprehensive evaluation grade of the cluster, so that automatic configuration of the Kafka cluster is further realized.
Fig. 9 is a schematic structural diagram of an apparatus for acquiring data according to another embodiment of the present disclosure. As shown in fig. 9, the apparatus 90 for acquiring data of this embodiment includes: a memory 910 and a processor 920 coupled to the memory 910, wherein the processor 920 is configured to execute a method of retrieving data in any of the embodiments described above based on instructions stored in the memory 910. Memory 910 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
The apparatus for acquiring data 90 may further include an input output interface 930, a network interface 940, a storage interface 950, and the like. These interfaces 930, 940, 950 and the memory 910 and the processor 920 may be connected, for example, by a bus 960. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 940 provides a connection interface for various networking devices. The storage interface 950 provides a connection interface for external storage devices such as an SD card and a usb disk.
The present disclosure also includes a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method of acquiring data in any of the foregoing embodiments.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (16)

1. A method of acquiring data, comprising:
receiving a subject data acquisition request sent by a user, wherein the subject data acquisition request comprises an identifier of subject data;
inquiring a pre-created kafka cluster theme data information table by using the identification of the theme data, and determining the cluster where the theme data is located in the kafka cluster, wherein the kafka cluster theme data information table contains the mapping relation between each theme data in the kafka cluster and the cluster where the theme data is located;
and sending the configuration parameters and the consumption key of the cluster where the subject data are located to the user so that the user can obtain the subject data from the cluster where the subject data are located.
2. The method of claim 1, wherein said determining, among kafka clusters, a cluster in which the subject data resides comprises:
and if the read clusters storing the subject data exist in the kafka clusters, selecting the read cluster with the minimum load as the cluster where the subject data are located.
3. The method of claim 1, wherein said determining, among kafka clusters, a cluster in which the subject data resides comprises:
and if the read cluster for storing the subject data does not exist in the kafka cluster and the read-write cluster for storing the subject data exists in the kafka cluster, determining the read-write cluster as the cluster where the subject data is located.
4. The method of claim 1, wherein said determining a read cluster in which said subject data is located among kafka clusters comprises:
if the read cluster for storing the subject data does not exist, the read-write cluster for storing the subject data does not exist, and the write cluster for storing the subject data exists in the kafka cluster, querying the read cluster associated with the write cluster from a pre-established storage cluster topological graph;
and synchronizing the writing clusters to the associated reading clusters, and determining the cluster where the subject data is located according to the reading clusters obtained by synchronization.
5. The method of claim 4, wherein the synchronizing the write cluster to an associated read cluster and determining the cluster where the subject data is located according to the synchronized read cluster comprises:
if the write cluster is of a single synchronization type, synchronizing the write cluster to the read cluster with the minimum associated load;
and if the write cluster is of a multi-synchronization type, synchronizing the write cluster to all the associated read clusters, and selecting the read cluster with the minimum load from the write clusters as the cluster where the subject data is located.
6. The method of claim 4, wherein the synchronizing the write cluster to the associated read cluster comprises:
synchronizing the write cluster to each associated read cluster through a plurality of synchronization tasks, wherein the synchronized data volume of each synchronization task does not exceed a preset value;
and respectively determining the number of the theme data synchronized by each synchronous task according to the preset value and the data volume of each theme data synchronized by each synchronous task.
7. The method of claim 1, wherein the subject data acquisition request further comprises a user's business rating;
the method further comprises the following steps:
associating the business grade with the cluster in which the subject data is located;
determining the comprehensive evaluation level of each cluster according to the service level associated with each cluster;
and carrying out cluster isolation on each cluster according to the comprehensive evaluation grade of each cluster.
8. An apparatus for acquiring data, comprising:
the system comprises a message receiving module, a message sending module and a message sending module, wherein the message receiving module is configured to receive a theme data acquisition request sent by a user, and the theme data acquisition request comprises an identifier of theme data;
the cluster determining module is configured to query a pre-created kafka cluster topic data information table by using the identification of the topic data, and determine a cluster where the topic data is located in the kafka cluster, wherein the kafka cluster topic data information table contains mapping relations between each topic data in the kafka cluster and the cluster where the topic data is located;
and the message sending module is configured to send the configuration parameters and the consumption keys of the clusters where the theme data are located to the user so that the user can obtain the theme data from the clusters where the theme data are located.
9. The apparatus of claim 8, wherein the cluster determination module is configured to:
and if the read clusters storing the subject data exist in the kafka clusters, selecting the read cluster with the minimum load as the cluster where the subject data are located.
10. The apparatus of claim 8, wherein the cluster determination module is configured to:
and if the read cluster for storing the subject data does not exist in the kafka cluster and the read-write cluster for storing the subject data exists in the kafka cluster, determining the read-write cluster as the cluster where the subject data is located.
11. The apparatus of claim 8, wherein the cluster determination module is configured to:
if the read cluster for storing the subject data does not exist, the read-write cluster for storing the subject data does not exist, and the write cluster for storing the subject data exists in the kafka cluster, querying the read cluster associated with the write cluster from a pre-established storage cluster topological graph;
and synchronizing the writing clusters to the associated reading clusters, and determining the cluster where the subject data is located according to the reading clusters obtained by synchronization.
12. The apparatus of claim 11, wherein the cluster determination module is configured to:
if the write cluster is of a single synchronization type, synchronizing the write cluster to the read cluster with the minimum associated load;
and if the write cluster is of a multi-synchronization type, synchronizing the write cluster to all the associated read clusters, and selecting the read cluster with the minimum load from the write clusters as the cluster where the subject data is located.
13. The apparatus of claim 11, wherein the cluster determination module is configured to:
synchronizing the write cluster to each associated read cluster through a plurality of synchronization tasks, wherein the synchronized data volume of each synchronization task does not exceed a preset value;
and respectively determining the number of the theme data synchronized by each synchronous task according to the preset value and the data volume of each theme data synchronized by each synchronous task.
14. The apparatus of claim 8, wherein the subject data acquisition request further comprises a user's business rating;
the apparatus further comprises a cluster isolation module configured to: associating the business grade with the cluster in which the subject data is located; determining the comprehensive evaluation level of each cluster according to the service level associated with each cluster; and carrying out cluster isolation on each cluster according to the comprehensive evaluation grade of each cluster.
15. An apparatus for acquiring data, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of acquiring data of any of claims 1-7 based on instructions stored in the memory.
16. A computer readable storage medium, wherein the computer readable storage medium stores computer instructions which, when executed by a processor, implement a method of acquiring data as claimed in any one of claims 1 to 7.
CN201810822678.6A 2018-07-25 2018-07-25 Method and device for acquiring data and computer readable storage medium Active CN109325200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810822678.6A CN109325200B (en) 2018-07-25 2018-07-25 Method and device for acquiring data and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810822678.6A CN109325200B (en) 2018-07-25 2018-07-25 Method and device for acquiring data and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109325200A CN109325200A (en) 2019-02-12
CN109325200B true CN109325200B (en) 2021-05-25

Family

ID=65263716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810822678.6A Active CN109325200B (en) 2018-07-25 2018-07-25 Method and device for acquiring data and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109325200B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970309B2 (en) 2019-06-05 2021-04-06 Advanced New Technologies Co., Ltd. Data storage method and apparatus
CN110347675A (en) * 2019-06-05 2019-10-18 阿里巴巴集团控股有限公司 A kind of date storage method and device
CN110868322B (en) * 2019-11-15 2023-04-07 北京京东尚科信息技术有限公司 Network management method, system, device and storage medium for distributed message service
CN112751938B (en) * 2020-12-30 2023-04-07 上海赋算通云计算科技有限公司 Real-time data synchronization system based on multi-cluster operation, implementation method and storage medium
CN115174423A (en) * 2022-07-12 2022-10-11 中国建设银行股份有限公司 Kafka query method, device, storage medium and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system
CN108038226A (en) * 2017-12-25 2018-05-15 郑州云海信息技术有限公司 A kind of data Fast Acquisition System and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608223B (en) * 2016-01-12 2019-04-30 北京中交兴路车联网科技有限公司 For the storage method and system of the Hbase database of kafka
US10262032B2 (en) * 2016-02-24 2019-04-16 Salesforce.Com, Inc. Cache based efficient access scheduling for super scaled stream processing systems
US11005933B2 (en) * 2016-03-17 2021-05-11 International Business Machines Corporation Providing queueing in a log streaming messaging system
CN106293968B (en) * 2016-08-04 2019-05-10 华中科技大学 A kind of intercommunication system and method based on Kafka message-oriented middleware
CN108197233A (en) * 2017-12-29 2018-06-22 飞狐信息技术(天津)有限公司 A kind of data managing method, middleware and data management system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system
CN108038226A (en) * 2017-12-25 2018-05-15 郑州云海信息技术有限公司 A kind of data Fast Acquisition System and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"SpringBoot集成Redis缓存在电信运营系统中的应用研究";葛宝龙 等;《中国信通信》;20171205;第19卷(第23期);第102-104页 *

Also Published As

Publication number Publication date
CN109325200A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
CN109325200B (en) Method and device for acquiring data and computer readable storage medium
CN107679192B (en) Multi-cluster cooperative data processing method, system, storage medium and equipment
US9110727B2 (en) Automatic replication of virtual machines
US20110154339A1 (en) Incremental mapreduce-based distributed parallel processing system and method for processing stream data
TW201140450A (en) Continuous optimization of archive management scheduling by use of integrated content-resource analytic model
CN105005570A (en) Method and apparatus for mining massive intelligent power consumption data based on cloud computing
US10432703B2 (en) On-demand session upgrade in a coordination service
CN109840166A (en) Across the cluster object storage async backup methods, devices and systems of one kind
CN105426419B (en) Data promotion system and method between heterogeneous system
CN105635311A (en) Method for synchronizing resource pool information in cloud management platform
CN107645410A (en) A kind of virtual machine management system and method based on OpenStack cloud platforms
CN111966677A (en) Data report processing method and device, electronic equipment and storage medium
AU2011312100A1 (en) Automatic selection of secondary backend computing devices for virtual machine image replication
CN116662441A (en) Distributed data blood margin construction and display method
CN112148578A (en) IT fault defect prediction method based on machine learning
CN115858513A (en) Data governance method, data governance device, computer equipment and storage medium
CN113626447B (en) Civil aviation data management platform and method
CN110119341B (en) IDC equipment management and control system
JP5024453B2 (en) Business flow distributed processing system and method
CN115146000A (en) Database data synchronization method and device, electronic equipment and storage medium
Gopalakrishna et al. Untangling cluster management with Helix
CN108574719B (en) Cloud host creation method and device
CN115757304A (en) Log storage method, device and system, electronic equipment and storage medium
CN115587147A (en) Data processing method and system
Kennedy et al. A MR simulator in facilitating cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant