CN117555679A - Service data processing method, device, computer equipment and storage medium - Google Patents

Service data processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117555679A
CN117555679A CN202311510031.7A CN202311510031A CN117555679A CN 117555679 A CN117555679 A CN 117555679A CN 202311510031 A CN202311510031 A CN 202311510031A CN 117555679 A CN117555679 A CN 117555679A
Authority
CN
China
Prior art keywords
node
data table
data
distributed data
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311510031.7A
Other languages
Chinese (zh)
Inventor
张艺
张志海
李俊谦
林立成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311510031.7A priority Critical patent/CN117555679A/en
Publication of CN117555679A publication Critical patent/CN117555679A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to a business data processing method, apparatus, computer device, storage medium and computer program product, and relates to the technical field of big data, which can be applied to the financial and technological field or other related fields. The method comprises the following steps: responding to a data writing request aiming at a main cluster, and writing service data into the slicing nodes of a main distributed data table; the main distributed data table is a distributed data table corresponding to the main cluster; after service data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes, a data writing completion message is sent to the distributed coordination server; the data writing-in completion message is used for starting the copying process; copying the service data in the fragment node and the copy node of the main distributed data table to the corresponding node in the spare distributed data table through a copying process; the standby distributed data table is a distributed data table corresponding to the standby cluster. By adopting the method, the reliability of service data backup can be improved.

Description

Service data processing method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a service data processing method, apparatus, computer device, storage medium, and computer program product.
Background
With the continuous development of internet technology, the scale and the concurrent access number of enterprise-level databases are rapidly increasing, and distributed platforms are rapidly and widely applied. Distributed platforms typically include master-slave architecture distributed platforms and multi-master architecture distributed platforms.
For multi-master architecture distributed platforms (e.g., clickHouse, mongoDB, etc.), each node role in the cluster is peer-to-peer, and the same result can be obtained by a client accessing any of the nodes. Therefore, the multi-master architecture effectively avoids usability problems caused by single point faults.
However, in practical use in the financial industry, in order to secure business data, it is necessary to cope with a server failure, a network failure, a malicious attack, or the like that may occur. The service data backup method in the related art cannot cope with the above situation, so that the safety of the service data cannot be ensured. Therefore, there is a problem in the related art that reliability of service data backup is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a service data processing method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the reliability of service data backup.
In a first aspect, the present application provides a service data processing method, including:
responding to a data writing request aiming at a main cluster, and writing service data into the slicing nodes of a main distributed data table; the main distributed data table is a distributed data table corresponding to the main cluster;
after service data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes, a data writing completion message is sent to a distributed coordination server; the data writing completion message is used for starting a copying process;
copying the service data in the fragment node and the copy node of the main distributed data table to the corresponding node in the distributed data table through the copying process; the spare distributed data table is a distributed data table corresponding to the spare cluster.
In one embodiment, the copying, by the copying process, the service data in the sharded node and the duplicate node of the master distributed data table to the corresponding node in the backup distributed data table includes:
the data writing-in slicing nodes and the corresponding replica nodes in the main distributed data table are respectively used as target slicing nodes and target replica nodes;
Splitting the replication task through the replication process to obtain a fragmented replication task aiming at the target fragmented node and a replication task aiming at the target replication node;
and concurrently copying the service data in the target shard node and the service data in the target replica node to the corresponding node in the standby distributed data table through the shard copying task and the copy copying task.
In one embodiment, the concurrently copying the service data in the target shard node and the service data in the target replica node to the corresponding node in the backup distributed data table through the shard replication task and the replica replication task includes:
copying the business data in the target fragment node to the corresponding fragment node in the standby distributed data table through the fragment copying task;
and copying the business data in the target copy node to the corresponding copy node in the standby distributed data table through the copy copying task.
In one embodiment, the method further comprises:
Determining a partition directory where the business data written into the main distributed data table is located;
and merging the service data in the partition catalogues belonging to the same partition to obtain the merged service data in the partition nodes of the main distributed data table.
In one embodiment, after the service data in the slicing node of the master distributed data table is copied to the corresponding copy node, a data writing completion message is sent to the distributed coordination server, including:
and after the merged service data in the main slicing node of the main distributed data table is copied to the corresponding copy node, the data writing completion message is sent to the distributed coordination server.
In one embodiment, the method further comprises:
creating a corresponding partition directory for each data write request;
and writing the business data corresponding to each data writing request into the corresponding created partition directory.
In one embodiment, the writing the service data to the sharded nodes of the master distributed data table in response to the data writing request for the master cluster includes:
responding to the data writing request, and writing the service data into any fragment node of the main distributed data table;
And carrying out segmentation processing on the service data through any segmentation node, and sending the segmented service data to other segmentation nodes to obtain the service data in the segmentation nodes of the main distributed data table.
In a second aspect, the present application further provides a service data processing apparatus, including:
the writing module is used for responding to a data writing request aiming at the main cluster and writing the service data into the slicing nodes of the main distributed data table; the main distributed data table is a distributed data table corresponding to the main cluster;
the sending module is used for sending a data writing completion message to the distributed coordination server after the business data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes; the data writing completion message is used for starting a copying process;
the copying module is used for copying the service data in the fragment node and the copy node of the main distributed data table to the corresponding node in the spare distributed data table through the copying process; the spare distributed data table is a distributed data table corresponding to the spare cluster.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the steps of the method described above.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method described above.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.
The business data processing method, the business data processing device, the computer equipment, the storage medium and the computer program product write business data into the slicing nodes of the main distributed data table by responding to the data writing request aiming at the main cluster; the main distributed data table is a distributed data table corresponding to the main cluster; after service data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes, a data writing completion message is sent to the distributed coordination server; the data writing-in completion message is used for starting the copying process; copying the service data in the fragment node and the copy node of the main distributed data table to the corresponding node in the spare distributed data table through a copying process; the standby distributed data table is a distributed data table corresponding to the standby cluster. Therefore, after the service data in the slicing nodes of the main distributed data table corresponding to the main cluster are copied to the corresponding copy nodes, the slicing nodes of the main distributed data table and the service data in the copy nodes are all copied to the corresponding nodes in the standby distributed data table, so that the same service is distributed in another cluster, all data backup of the main cluster is stored, and when the problems of server faults, network faults, malicious attacks and the like possibly occur, the written service data can be effectively saved, the situations of service data loss, damage and the like are reduced, the safety of the service data is ensured, the reliability of service data backup is effectively improved, and the high availability of system service is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a flow chart of a business data processing method in one embodiment;
FIG. 2 is a schematic diagram of the slicing and replication principle of a distributed data table according to one embodiment;
FIG. 3 is a flow chart of another method of processing business data in one embodiment;
FIG. 4 is a flowchart of a business data processing method according to another embodiment;
FIG. 5 is a block diagram of a business data processing device in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
In one embodiment, as shown in fig. 1, a service data processing method is provided, where the method is applied to a server for illustration, and the server may be implemented by using a separate server or a server cluster formed by a plurality of servers. The method comprises the following steps:
step S110, service data is written into the slicing nodes of the main distributed data table in response to the data writing request for the main cluster.
The main distributed data table is a distributed data table corresponding to the main cluster.
It should be noted that, the execution main body of the data processing method provided in the present application may be a server in a main cluster.
The service data may be service data in the technical field of financial science and technology, or may be service data in other technical fields, which is not particularly limited herein.
In a specific implementation, a user may initiate a data writing request for service data to a main cluster, and a server may respond to the data writing request for the main cluster, that is, the server may respond to the data writing request initiated to the main cluster, and write the service data into the slicing nodes of the distributed data table corresponding to the main cluster.
Step S120, after the business data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes, a data writing completion message is sent to the distributed coordination server.
The data writing completion message is used for starting the copying process.
In a specific implementation, each sharded node of the master distributed data table may trigger copy to copy service data in the sharded node to a corresponding replica node in the master distributed data table. After the service data in the slicing node of the main distributed data table is copied to the corresponding copy node, the server can send a data writing completion message to the distributed coordination server, and the copying process can be started through the data writing completion message.
For ease of understanding by those skilled in the art, the distributed data table is illustrated as a ClickHouse (a type of database that uses columnar storage) distributed data table. ClickHouse supports slicing of data using clusters. Each cluster consists of 1 to a plurality of slices, and each slice corresponds to a service node in the cluster. Thus, each service node in the clickHouse may be referred to as a card. FIG. 2 illustrates the slicing and replication principle of a ClickHouse distributed data table, where the data table is divided into four slices (shards), four replicas (replicas). The distributed data table is shown in fig. 2, and one distributed data table is divided into four segments (shards), where four different shards are located on different service nodes respectively. replica1 is a copy of the card 1, i.e., the business data in the card 1 is consistent with the business data in replica 1. Similarly, duplicate 2 is a copy of the card 2, duplicate 3 is a copy of the card 3, and duplicate 4 is a copy of the card 4. It can be understood that the sharded node in this embodiment is a card, and the duplicate node is a replica. After the writing of each fragment node is completed, each fragment node can trigger copy copying by means of a ZooKeeper (reliable coordination system of a distributed system).
Thus, after the service data in the slicing node of the main distributed data table is copied to the corresponding copy node, a data writing completion message can be sent to the distributed coordination server (which can be a ZooKeeper). Specifically, a data write completion message may be written to the ZooKeeper to inform the replication process that replication of data between clusters is possible.
Step S130, copying the service data in the sharded nodes and the duplicate nodes of the main distributed data table to the corresponding nodes in the spare distributed data table through the copying process.
The backup distributed data table is a distributed data table corresponding to the backup cluster.
Wherein the backup cluster deploys the same service as the primary cluster.
In a specific implementation, after the replication process is started, the service data in the sharded node and the duplicate node of the main distributed data table can be replicated to the corresponding node in the spare distributed data table through the replication process. The nodes for preparing the distributed data table comprise a slicing node and a duplicate node for preparing the distributed data table. The corresponding nodes of the sharded nodes and the replica nodes of the master distributed data table in the backup distributed data table comprise the sharded nodes corresponding to the sharded nodes of the master distributed data table in the backup distributed data table and the replica nodes corresponding to the replica nodes of the master distributed data table in the backup distributed data table.
In the service data processing method, service data is written into the slicing nodes of the main distributed data table by responding to a data writing request aiming at the main cluster; the main distributed data table is a distributed data table corresponding to the main cluster; after service data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes, a data writing completion message is sent to the distributed coordination server; the data writing-in completion message is used for starting the copying process; copying the service data in the fragment node and the copy node of the main distributed data table to the corresponding node in the spare distributed data table through a copying process; the standby distributed data table is a distributed data table corresponding to the standby cluster. Therefore, after the service data in the slicing nodes of the main distributed data table corresponding to the main cluster are copied to the corresponding copy nodes, the slicing nodes of the main distributed data table and the service data in the copy nodes are all copied to the corresponding nodes of the backup distributed data table, so that the same service is distributed in another cluster, all data backup of the main cluster is stored, written service data can be effectively saved when the problems of server faults, network faults, malicious attacks and the like are faced, the situations of service data loss, damage and the like are reduced, the safety of the service data is guaranteed, the reliability of service data backup is effectively improved, and the high availability of system service is guaranteed.
In one embodiment, the copying the service data in the sharded node and the duplicate node of the master distributed data table to the corresponding node in the backup distributed data table through the copying process includes: the method comprises the steps that a slicing node and a corresponding replica node which finish data writing in a main distributed data table are respectively used as a target slicing node and a target replica node; splitting the replication task through the replication process to obtain a fragmented replication task aiming at a fragmented node and a replication task aiming at a target replication node; and the service data in the target shard node and the service data in the target replica node are copied to the corresponding nodes in the distributed data table through the shard copying task and the copy copying task.
When the service data distributed to the slicing node of the service data in the main distributed data table is consistent with the service data in the corresponding replica node, the slicing node and the corresponding replica node can be judged to finish data writing.
In the specific implementation, in the process that the server copies the service data in the slicing node and the replica node of the main distributed data table to the corresponding node in the backup distributed data table through the copying process, the server can respectively serve the slicing node and the corresponding replica node for which the data is written in the main distributed data table as a target slicing node and a target replica node, and the copying task is split according to the slicing granularity through the copying process to obtain the slicing copying task aiming at the target slicing node and the replica copying task aiming at the target replica node. Thus, the service data in the target shard node and the target replica node can be read through the shard replication task and the replica replication task concurrently, so that the service data in the target shard node and the service data in the target replica node are replicated to the corresponding nodes in the distributed data table.
According to the technical scheme, the slicing node and the corresponding replica node which finish data writing in the main distributed data table are respectively used as a target slicing node and a target replica node; splitting the replication task through the replication process to obtain a fragmented replication task aiming at a fragmented node and a replication task aiming at a target replication node; and the service data in the target shard node and the service data in the target replica node are copied to the corresponding nodes in the distributed data table through the shard copying task and the copy copying task. When the backup cluster data is copied by the copying process, the copying task is split into a sharded copying task aiming at the target sharded node and a copying task aiming at the target copy node, and service data concurrency reading is carried out on a plurality of sharded nodes of the main cluster and the corresponding sharded nodes so as to copy the service data in the sharded nodes of the main cluster and the service data in the sharded nodes into the corresponding nodes of the backup distributed data table, thereby effectively improving the copying efficiency of the service data.
In one embodiment, the copying the service data in the target shard node and the service data in the target replica node to the corresponding node in the distributed data table through the shard copying task and the copy copying task concurrently includes: copying the business data in the target fragment node to the corresponding fragment node in the distributed data table through the fragment copying task; and copying the business data in the target copy node to the corresponding copy node in the distributed data table through the copy copying task.
In the specific implementation, in the process that the server copies the service data in the target shard node and the service data in the target replica node to the corresponding node in the backup distributed data table through the shard copying task and the copy copying task, the server can copy the service data in the target shard node to the corresponding shard node in the backup distributed data table, namely to the shard node corresponding to the target shard node in the backup distributed data table through the shard copying task; and copying the business data in the target copy node to the corresponding copy node in the distributed data table through the copy copying task, namely copying the business data to the copy node corresponding to the target copy node in the distributed data table.
Further, the replication process replicates the service data in the target shard node to the corresponding shard node in the spare distributed data table according to the service data to be replicated from the shard nodes or the replica nodes in the main distributed data table, and then performs point-to-point data replication, so that the service data in the target replica node is replicated to the corresponding replica node in the spare distributed data table. For example, the service data in the card 1 of the main cluster is copied to the card 1 of the backup cluster, the service data in the card 2 of the main cluster is copied to the card 2 of the backup cluster, the service data in the replica1 of the main cluster is copied to the replica1 of the backup cluster, and the service data in the replica2 of the main cluster is copied to the replica2 of the backup cluster. Therefore, the business data files at the bottom layer of the main cluster are directly copied to the corresponding nodes in the standby cluster, so that the process of copying the copy of the partitioned nodes of the standby distributed data table by writing clickhouse data is avoided, the disk I/O times are reduced, and the business data copying efficiency is effectively improved.
According to the technical scheme, business data in the target shard node is copied to the corresponding shard node in the distributed data table through the shard copying task; and copying the business data in the target copy node to the corresponding copy node in the distributed data table through the copy copying task. Therefore, the bottom business data files of the sharded nodes of the main cluster and the bottom business data files of the duplicate nodes are directly duplicated to the corresponding sharded nodes and the corresponding duplicate nodes in the backup distributed data table through the sharded duplication task and the duplicate duplication task, so that the sharded node duplicate duplication process for the backup distributed data table is avoided, the disk I/O times are reduced, and the business data duplication efficiency is effectively improved.
In one embodiment, the method further comprises: determining a partition directory where the business data written into the main distributed data table is located; and merging the service data in the partition catalogues belonging to the same partition to obtain the merged service data in the partition nodes of the main distributed data table.
In a specific implementation, the server may determine a partition directory in which the service data written in the main distributed data table is located, and combine the service data in the partition directory belonging to the same partition to obtain combined service data in the partition node of the main distributed data table.
According to the technical scheme, the partition directory where the business data written into the main distributed data table are located is determined; and merging the service data in the partition catalogues belonging to the same partition to obtain the merged service data in the partition nodes of the main distributed data table. Therefore, when the service data is required to be migrated from the main cluster to the standby cluster, the service data in the partition catalogs belonging to the same partition are combined, and then the service data is exported and imported, so that the time and cost of data migration can be effectively reduced.
In one embodiment, after service data in a sharded node of a master distributed data table is copied to a corresponding replica node, a data write-in completion message is sent to a distributed coordination server, including: and after the merged service data in the main slicing node of the main distributed data table is copied to the corresponding copy node, sending a data writing completion message to the distributed coordination server.
In a specific implementation, after service data in the slicing node of the main distributed data table is copied to a corresponding copy node, the server may send a data writing completion message to the distributed coordination server after the merged service data in the main slicing node of the main distributed data table is copied to the corresponding copy node in the main distributed data table in the process of sending the data writing completion message to the distributed coordination server. That is, after the merged service data in the sharded nodes of the main distributed data table is obtained through partition merging, each sharded node of the main distributed data table can trigger copy copying, copy the merged service data in the sharded nodes to the corresponding copy nodes in the main distributed data table, and then send a data writing completion message to the distributed coordination server.
Therefore, the business data of the main cluster are firstly partitioned and combined and then duplicated, the data quantity of duplicated copy can be reduced, and the data consistency between the duplicated nodes and the corresponding slicing nodes can be ensured.
In the related technology, copy copying and partition merging of open source clickhouse data writing standby distributed data table are two steps, and disk I/O times are increased. In the embodiment, after the copy copying and partition merging of the service data of the main cluster are completed, the data writing completion message is written into the distributed coordination server, so that in the process of copying the service data in the partition nodes and the copy nodes of the main distributed data table to the corresponding nodes in the spare distributed data table, the spare cluster only needs one I/O, and the disk I/O times when the spare cluster writes the service data can be effectively reduced.
It will be appreciated that the two steps of replication of the business data copies and partition merging in the primary cluster may also be performed simultaneously, not in the order described above.
According to the technical scheme of the embodiment, after the merged service data in the main slicing node of the main distributed data table is copied to the corresponding copy node, a data writing completion message is sent to the distributed coordination server. In this way, after the business data of the main cluster is copied and the partitions are combined, the data writing completion message is written into the distributed coordination server to start the copying process, the business data files after the bottom layer combination of the partition nodes of the main cluster and the business data files after the bottom layer combination of the copy nodes can be directly copied to the corresponding partition nodes and the corresponding copy nodes in the spare distributed data table, so that the copying and partition combining processes of the business data writing into the spare distributed data table are avoided, and the efficiency of copying the business data to the spare cluster is effectively improved.
In one embodiment, the method further comprises: creating a corresponding partition directory for each data write request; and writing the business data corresponding to each data writing request into the partition catalog correspondingly created.
In a specific implementation, the server may create a corresponding partition directory for each data write request; and writing the business data corresponding to each data writing request into the partition catalog correspondingly created. Specifically, each time a clickhouse writes data, the same distributed data table is logically written, a partition directory is actually newly built at the bottom layer, the newly written data is written into the corresponding partition directory, then the clickhouse has an internal process, and the data in the partition directories belonging to the same partition are combined at regular time. If multiple data writes are performed before the periodic merge begins, then a merge of multiple partition directories is involved.
According to the technical scheme of the embodiment, a corresponding partition directory is created by aiming at each data writing request; writing the business data corresponding to each data writing request into the partition catalog correspondingly created; therefore, the written service data can be better managed and organized, and the reading and inquiring efficiency of the service data is improved.
In one embodiment, writing traffic data into a sharded node of a master distributed data table in response to a data write request for a master cluster includes: responding to a data writing request, and writing service data into any fragment node of the main distributed data table; and splitting the service data through any of the splitting nodes, and sending the split service data to other splitting nodes to obtain the service data in the splitting nodes of the main distributed data table.
In a specific implementation, in a process of writing service data into a slicing node of a main distributed data table in response to a data writing request for a main cluster, the server may select any one of the slicing nodes of the main distributed data table corresponding to the main cluster to execute writing operation in response to the data writing request initiated to the main cluster, write the service data into the any one of the slicing nodes, perform slicing processing on the service data through the written any one of the slicing nodes, and send the sliced service data to other slicing nodes to obtain the service data in each of the slicing nodes of the main distributed data table.
According to the technical scheme, service data is written into any fragment node of a main distributed data table by responding to a data writing request; and splitting the service data through any of the splitting nodes, and sending the split service data to other splitting nodes to obtain the service data in the splitting nodes of the main distributed data table. Therefore, by selecting one slicing node to execute the writing operation, the writing load can be uniformly distributed to each slicing node in the whole main cluster, the overload of the single slicing node is avoided, and the load balance is realized. And the selected slicing node is responsible for slicing the service data and transmitting the sliced data to other slicing nodes, so that the parallel writing and storage of the service data can be realized, and the writing performance and the reliability of the data are improved. Meanwhile, by sending the sliced business data to other sliced nodes, the parallel processing of the query operation on each node in the cluster can be ensured, and the query performance and throughput are improved. And the service data in the distributed data table are distributed on a plurality of nodes, and if one node fails or is unavailable, other nodes can still continuously receive and process the writing request of the data, so that the reliability and high availability of the service data are ensured.
Fig. 3 provides a flow chart of another business data processing method for ease of understanding by those skilled in the art. As shown in fig. 3, a user may initiate a data writing request to a main cluster, a server may respond to the data writing request, select any one of the sharded nodes of the main distributed data table corresponding to the main cluster to execute a writing operation, complete data writing of each sharded node, trigger copy by each sharded node, write a data writing completion message to a ZooKeeper after copy is completed and partition merging is completed, so as to start a copy process. The copying process periodically initiates polling to the ZooKeeper to determine whether the copying of the sharded nodes is completed. That is, whether or not there is a sharded node that distributes traffic data that is consistent with the traffic data in its corresponding replica node. If so, the service data in the fragmented nodes and the service data in the duplicate nodes are copied to the corresponding nodes in the distributed data table by aiming at the fragmented copy tasks of the fragmented nodes and the duplicate copy tasks of the duplicate nodes. If not, returning to the step of initiating polling to the ZooKeeper at regular time by the copying process.
In another embodiment, as shown in fig. 4, a service data processing method is provided, and the method is applied to a server for illustration, and includes the following steps:
Step S402, in response to the data writing request, the service data is written into any of the slicing nodes of the main distributed data table.
And step S404, the service data is segmented through any of the segmentation nodes, and the segmented service data is sent to other segmentation nodes to obtain the service data in the segmentation nodes of the main distributed data table.
Step S406, determining the partition directory where the business data written into the main distributed data table is located.
Step S408, the business data in the partition catalogs belonging to the same partition are combined to obtain the combined business data in the partition nodes of the main distributed data table.
Step S410, after the merged service data in the main slicing node of the main distributed data table is copied to the corresponding copy node, a data writing completion message is sent to the distributed coordination server.
Step S412, the sliced node and the corresponding duplicate node in the main distributed data table after data writing are respectively used as a target sliced node and a target duplicate node.
In step S414, the replication task is split through the replication process, so as to obtain a sharded replication task for the target sharded node and a replication task for the target replication node.
Step S416, the business data in the target shard node is copied to the corresponding shard node in the spare distributed data table through the shard copying task.
In step S418, the service data in the target replica node is replicated to the corresponding replica node in the distributed data table by the replica replication task.
It should be noted that, the specific limitation of the above steps may be referred to the specific limitation of a service data processing method.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a service data processing device for implementing the service data processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more service data processing devices provided below may refer to the limitation of the service data processing method in the above description, which is not repeated here.
In an exemplary embodiment, as shown in fig. 5, there is provided a service data processing apparatus, including: a writing module 510, a sending module 520, and a copying module 530, wherein:
a writing module 510, configured to respond to a data writing request for the main cluster, and write service data into the sharded nodes of the main distributed data table; the main distributed data table is a distributed data table corresponding to the main cluster.
The sending module 520 is configured to send a data write completion message to the distributed coordination server after the service data in the sharded node of the master distributed data table is copied to the corresponding duplicate node; the data write complete message is used to initiate the replication process.
A copying module 530, configured to copy, through the copying process, the service data in the sharded node and the duplicate node of the master distributed data table to a corresponding node in a backup distributed data table; the spare distributed data table is a distributed data table corresponding to the spare cluster.
In one embodiment, the replication module 530 is specifically configured to use the sliced node and the corresponding replica node in the master distributed data table, where the sliced node and the corresponding replica node have completed writing data, as a target sliced node and a target replica node, respectively; splitting the replication task through the replication process to obtain a fragmented replication task aiming at the target fragmented node and a replication task aiming at the target replication node; and concurrently copying the service data in the target shard node and the service data in the target replica node to the corresponding node in the standby distributed data table through the shard copying task and the copy copying task.
In one embodiment, the replication module 530 is specifically configured to replicate, by the shard replication task, the service data in the target shard node to a corresponding shard node in the backup distributed data table; and copying the business data in the target copy node to the corresponding copy node in the standby distributed data table through the copy copying task.
In one embodiment, the apparatus further includes a merging module, configured to determine a partition directory in which the service data written to the master distributed data table is located; and merging the service data in the partition catalogues belonging to the same partition to obtain the merged service data in the partition nodes of the main distributed data table.
In one embodiment, the sending module 520 is specifically configured to send the data writing completion message to the distributed coordination server after the merged service data in the master sharded node of the master distributed data table is copied to the corresponding copy node.
In one embodiment, the writing module 510 is further specifically configured to create, for each data writing request, a corresponding partition directory; and writing the business data corresponding to each data writing request into the corresponding created partition directory.
In one embodiment, the writing module 510 is specifically configured to write the service data into any of the slicing nodes of the master distributed data table in response to the data writing request; and carrying out segmentation processing on the service data through any segmentation node, and sending the segmented service data to other segmentation nodes to obtain the service data in the segmentation nodes of the main distributed data table.
Each of the modules in the above-described data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing business data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a business data processing method.
It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (11)

1. A method for processing service data, the method comprising:
responding to a data writing request aiming at a main cluster, and writing service data into the slicing nodes of a main distributed data table; the main distributed data table is a distributed data table corresponding to the main cluster;
after service data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes, a data writing completion message is sent to a distributed coordination server; the data writing completion message is used for starting a copying process;
Copying the service data in the fragment node and the copy node of the main distributed data table to the corresponding node in the distributed data table through the copying process; the spare distributed data table is a distributed data table corresponding to the spare cluster.
2. The method according to claim 1, wherein the copying, by the copying process, the service data in the sharded node and the duplicate node of the master distributed data table to the corresponding node in the backup distributed data table includes:
the data writing-in slicing nodes and the corresponding replica nodes in the main distributed data table are respectively used as target slicing nodes and target replica nodes;
splitting the replication task through the replication process to obtain a fragmented replication task aiming at the target fragmented node and a replication task aiming at the target replication node;
and concurrently copying the service data in the target shard node and the service data in the target replica node to the corresponding node in the standby distributed data table through the shard copying task and the copy copying task.
3. The method according to claim 2, wherein said concurrently copying the traffic data in the target shard node and the traffic data in the target replica node to the corresponding nodes in the backup distributed data table by the shard replication task and the replica replication task comprises:
copying the business data in the target fragment node to the corresponding fragment node in the standby distributed data table through the fragment copying task;
and copying the business data in the target copy node to the corresponding copy node in the standby distributed data table through the copy copying task.
4. The method according to claim 1, wherein the method further comprises:
determining a partition directory where the business data written into the main distributed data table is located;
and merging the service data in the partition catalogues belonging to the same partition to obtain the merged service data in the partition nodes of the main distributed data table.
5. The method of claim 4, wherein the sending the data write completion message to the distributed coordination server after the service data in the slicing node of the master distributed data table is copied to the corresponding replica node comprises:
And after the merged service data in the main slicing node of the main distributed data table is copied to the corresponding copy node, the data writing completion message is sent to the distributed coordination server.
6. The method according to claim 4, wherein the method further comprises:
creating a corresponding partition directory for each data write request;
and writing the business data corresponding to each data writing request into the corresponding created partition directory.
7. The method of claim 1, wherein the writing traffic data into the sharded nodes of the master distributed data table in response to the data write request for the master cluster comprises:
responding to the data writing request, and writing the service data into any fragment node of the main distributed data table;
and carrying out segmentation processing on the service data through any segmentation node, and sending the segmented service data to other segmentation nodes to obtain the service data in the segmentation nodes of the main distributed data table.
8. A traffic data processing apparatus, the apparatus comprising:
the writing module is used for responding to a data writing request aiming at the main cluster and writing the service data into the slicing nodes of the main distributed data table; the main distributed data table is a distributed data table corresponding to the main cluster;
The sending module is used for sending a data writing completion message to the distributed coordination server after the business data in the slicing nodes of the main distributed data table are copied to the corresponding copy nodes; the data writing completion message is used for starting a copying process;
the copying module is used for copying the service data in the fragment node and the copy node of the main distributed data table to the corresponding node in the spare distributed data table through the copying process; the spare distributed data table is a distributed data table corresponding to the spare cluster.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202311510031.7A 2023-11-13 2023-11-13 Service data processing method, device, computer equipment and storage medium Pending CN117555679A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311510031.7A CN117555679A (en) 2023-11-13 2023-11-13 Service data processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311510031.7A CN117555679A (en) 2023-11-13 2023-11-13 Service data processing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117555679A true CN117555679A (en) 2024-02-13

Family

ID=89810359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311510031.7A Pending CN117555679A (en) 2023-11-13 2023-11-13 Service data processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117555679A (en)

Similar Documents

Publication Publication Date Title
US11797489B2 (en) System and method for using local storage to emulate centralized storage
JP6553822B2 (en) Dividing and moving ranges in distributed systems
WO2018098972A1 (en) Log recovery method, storage device and storage node
US11032156B1 (en) Crash-consistent multi-volume backup generation
US8015375B1 (en) Methods, systems, and computer program products for parallel processing and saving tracking information for multiple write requests in a data replication environment including multiple storage devices
CN107943615B (en) Data processing method and system based on distributed cluster
CN116389233B (en) Container cloud management platform active-standby switching system, method and device and computer equipment
CN115955488B (en) Distributed storage copy cross-machine room placement method and device based on copy redundancy
CN111752892B (en) Distributed file system and implementation method, management system, equipment and medium thereof
CN116303789A (en) Parallel synchronization method and device for multi-fragment multi-copy database and readable medium
US11842063B2 (en) Data placement and recovery in the event of partition failures
CN117555679A (en) Service data processing method, device, computer equipment and storage medium
CN107102898B (en) Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture
CN110704239B (en) Data copying method and device and electronic equipment
Jiang et al. MyStore: A high available distributed storage system for unstructured data
CN117370078B (en) Database backup management method, device, computer equipment and storage medium
CN111782634A (en) Data distributed storage method and device, electronic equipment and storage medium
CN111400098A (en) Copy management method and device, electronic equipment and storage medium
CN117539690B (en) Method, device, equipment, medium and product for merging and recovering multi-disk data
CN116257531B (en) Database space recovery method
CN112860694B (en) Service data processing method, device and equipment
CN115840756B (en) Data updating method of main and standby modulation system, equipment and medium
US20240005017A1 (en) Fencing off cluster services based on access keys for shared storage
CN117688099A (en) Method, device, equipment and storage medium for synchronizing main and standby data of distributed database
CN116795287A (en) Read-write method, device, storage medium and equipment of distributed cache system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination