CN116186082A - Data summarizing method based on distribution, first server and electronic equipment - Google Patents

Data summarizing method based on distribution, first server and electronic equipment Download PDF

Info

Publication number
CN116186082A
CN116186082A CN202211711972.2A CN202211711972A CN116186082A CN 116186082 A CN116186082 A CN 116186082A CN 202211711972 A CN202211711972 A CN 202211711972A CN 116186082 A CN116186082 A CN 116186082A
Authority
CN
China
Prior art keywords
database
log
data
server
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211711972.2A
Other languages
Chinese (zh)
Inventor
牛新庄
倪一鸣
吴迪
黎育龙
张亚强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Postal Savings Bank of China Ltd
Original Assignee
Postal Savings Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postal Savings Bank of China Ltd filed Critical Postal Savings Bank of China Ltd
Priority to CN202211711972.2A priority Critical patent/CN116186082A/en
Publication of CN116186082A publication Critical patent/CN116186082A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a data summarization method based on distribution, a first server and electronic equipment. Comprising the following steps: the method comprises the steps that a first server obtains a database log stored in a target source database, analyzes the database log, restores SQL sentences in the database log, and obtains restored SQL sentences; the method comprises the steps that a first server is connected with an aggregation database, and SQL sentences are restored in the aggregation database to obtain aggregation data; and under the condition that the first server receives the target query requirement sent by the terminal, querying target data from the aggregated data of the aggregated database according to the target query requirement, and sending the target data to the terminal. According to the scheme, the data in the plurality of source databases are summarized into the aggregation database, and when the query requirement of the dimension beyond the slicing key is received, the data can be directly queried in the aggregation database without crossing the plurality of source databases, so that the cost of data query can be reduced.

Description

Data summarizing method based on distribution, first server and electronic equipment
Technical Field
The application relates to the technical field of distributed data processing, in particular to a data summarizing method based on distributed data, a first server, a computer readable storage medium and electronic equipment.
Background
In a complex large-scale distributed service system, data are often split horizontally according to a certain slicing key and are respectively stored in mutually independent databases, and if the query requirement according to the dimension except the slicing key exists at the moment, the data can be queried across a plurality of databases, so that the cost of data query is high.
Disclosure of Invention
The main objective of the present application is to provide a data summarizing method, a first server, a computer readable storage medium and an electronic device based on a distributed system, so as to at least solve the problem of high data query cost when data is queried across a plurality of databases in the prior art.
To achieve the above object, according to one aspect of the present application, there is provided a distribution-based data summarizing method, including: the method comprises the steps that a first server sequentially selects one from a plurality of source databases to serve as a target source database, and establishes connection with the target source database; the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database; the first server establishes connection with an aggregation database, and executes the reduced SQL statement in the aggregation database to obtain aggregation data; and under the condition that a target query requirement sent by a terminal is received, the first server queries target data from the aggregated data of the aggregated database according to the target query requirement, and sends the target data to the terminal, wherein the target query requirement characterizes the requirement of requesting to query data from a plurality of source databases.
Optionally, before the first server obtains the database log stored in the target source database, the method further includes: the first server establishes connection with a cache database, and searches a first target key value pair corresponding to the target source database from the cache database, wherein the first target key value pair is obtained according to the ID of the source database and the lock state of the source database; the first server determines whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair; the first server selects one from a plurality of source databases as the target source database under the condition that the lock state of the first target key value pair represents that the target source database is in a locked state; and the first server updates the lock state of the target source database in the first target key value pair to a lock state when the lock state of the target source database in the first target key value pair represents that the target source database is in an unlocked state, wherein a second server does not acquire the database log stored in the target source database when the lock state of the target source database in the first target key value pair is in the lock state, and the servers which update the lock state of the target source database in the first server and the second server are used for analyzing the database log.
Optionally, the database log includes a log ID and a log record, where the log ID is an ID corresponding to a database log generated by operating the target source database, one database log is generated by one operation, the log record stores a command for operating the target source database, the database log is parsed, and SQL statements in the database log are restored to obtain restored SQL statements, and the method includes: the first server sorts the plurality of database logs according to the size of the log ID to obtain sorted database logs; the first server sequentially analyzes the log records in each database log according to the sequence from first to last, and restores the SQL sentences in each database log to obtain the restored SQL sentences corresponding to each database log; executing the restored SQL statement in the aggregate database, comprising: the first server checks the log IDs, determines whether the sequence of the log IDs is correct, and obtains a check result; and the first server executes the restored SQL statement in the aggregation database under the condition that the verification result representation passes verification.
Optionally, the first server sequentially parses the log records in each database log according to the order from first to last, restores the SQL statement in each database log, and obtains the restored SQL statement corresponding to each database log, including: the first server obtains the mapping relation between the command of the log record and the SQL command; the first server executes target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of an operation instruction, an operation condition and operation data in the log record, determining an SQL operation instruction corresponding to the operation instruction in the log record, an SQL query condition corresponding to the operation condition in the log record and an SQL data field corresponding to the field of the operation data in the log record according to the mapping relation, and combining the SQL operation instruction, the SQL query condition and the SQL data field to obtain the return SQL statement.
Optionally, the database log further comprises a plurality of database transactions, the database transactions being a sequence of operations to access and operate on one database of various data items, the method further comprising: the first server obtains transaction types of all the database transactions, wherein the transaction types refer to categories for classifying the properties of the transactions corresponding to the database transactions; the first server executes the restored SQL statement in the aggregated database according to the transaction types of a plurality of database transactions.
Optionally, the first server checks the log IDs to determine whether the order of the plurality of log IDs is correct, so as to obtain a check result, including: before executing the restore SQL statement in the aggregation database for the nth time, the first server stores the log ID of the database log corresponding to the restore SQL statement executed in the aggregation database for the nth time and the table name of the data table in a message queue; after executing the restore SQL statement in the aggregation database for the nth time, the first server stores the ID of the target source database corresponding to the restore SQL statement executed in the aggregation database for the nth time, the log ID of the database log and the table name of the data table as a second target key value pair in a cache database; before executing the restore SQL statement in the aggregation database for the (n+1) th time, the first server acquires the log ID corresponding to the restore SQL statement executed for the (N) th time in the message queue and the table name of the data table, and acquires the log ID corresponding to the restore SQL statement executed for the (N) th time in the cache database and the table name of the data table; the first server executes the log ID corresponding to the restored SQL statement for the nth time in the message queue, is different from the log ID corresponding to the restored SQL statement for the nth time in the cache database, and determines that the sequence of a plurality of log IDs is incorrect and the verification result is not passed under the condition that the table name of the data table corresponding to the restored SQL statement for the nth time in the message queue is different from the table name of the data table corresponding to the restored SQL statement for the nth time in the cache database; the first server executes the log ID corresponding to the restoring SQL statement for the nth time in the message queue, the log ID corresponding to the restoring SQL statement for the nth time in the cache database is the same as the log ID corresponding to the restoring SQL statement, the table name of the data table corresponding to the restoring SQL statement for the nth time in the message queue is the same as the table name of the data table corresponding to the restoring SQL statement for the nth time in the cache database, the order of a plurality of log IDs is determined to be correct, and the verification result is determined to pass verification.
Optionally, in the case that the verification result is determined not to pass the verification, the method further includes: the first server re-checks the log ID, generates alarm information and sends the alarm information to a target terminal when the re-check time is greater than or equal to a time threshold; and under the condition that a preset operation is received by the first server, updating the verification result to be passed, generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized in that the sequence of the log ID which is not passed by verification is modified, and the sequence of the log ID is modified to be non-abnormal.
According to another aspect of the present application, there is provided a first server including: the first selecting unit is used for selecting one from the plurality of source databases as a target source database and establishing connection with the target source database; the first processing unit is used for acquiring a database log stored in the target source database, analyzing the database log, and restoring SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database; the first execution unit establishes connection with the aggregation database, and executes the reduced SQL statement in the aggregation database to obtain aggregation data; and the second processing unit is used for inquiring target data from the aggregated data of the aggregated database according to the target inquiry requirement under the condition that the target inquiry requirement sent by the terminal is received, and sending the target data to the terminal, wherein the target inquiry requirement characterizes the requirement of requesting to inquire data from a plurality of source databases.
According to still another aspect of the present application, there is provided a computer readable storage medium, where the computer readable storage medium includes a stored program, and when the program runs, controls a device in which the computer readable storage medium is located to perform any one of the methods.
According to still another aspect of the present application, there is provided an electronic apparatus including: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods.
By applying the technical scheme, firstly, a first server sequentially selects one from a plurality of source databases as a target source database, establishes connection with the target source database, then the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, then the first server establishes connection with an aggregation database, executes the restored SQL sentences in the aggregation database to obtain aggregation data, and finally the first server inquires target data from the aggregation data of the aggregation database according to the target inquiry requirement under the condition that the first server receives the target inquiry requirement sent by the terminal, and sends the target data to the terminal. According to the scheme, the data in the plurality of source databases are summarized into the aggregation database, and when the query requirement of the dimension beyond the slicing key is received, the data can be directly queried in the aggregation database without crossing the plurality of source databases, so that the cost of data query can be reduced. Meanwhile, the method adopts the restored SQL sentences obtained by the database logs in the source database, and further executes the restored SQL sentences in the aggregation database, so that the data in the source database can be synchronized into the aggregation database, and the data synchronization effect is good.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
fig. 1 is a block diagram showing a hardware structure of a mobile terminal performing a distributed-based data summarization method according to an embodiment of the present application;
FIG. 2 illustrates a flow diagram of a distributed-based data summarization method provided in accordance with an embodiment of the present application;
FIG. 3 shows a schematic system architecture of the present solution;
FIG. 4 shows a flow diagram of a distribution-based data summarization method;
FIG. 5 shows a schematic flow chart for checking a log ID;
fig. 6 shows a block diagram of a first server provided according to an embodiment of the present application.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, the following will describe some terms or terms related to the embodiments of the present application:
SQL: structured query language, languages used to manipulate relational databases, and common data management operations include insert (add), delete, modify, and query four major classes.
Message queues: middleware stores messages in a queue structure in a memory. The sender and receiver of the distributed service request may be decoupled via a message queue to send the request to the message queue, which replaces the sender to send the request to registered consumers according to rules defined by the program developer, to complete processing of the business logic synchronously or asynchronously.
Slicing key: fields in the data table for slicing the data level.
Source database: the database in which the data before summarizing is located.
Aggregation database: and the database where the collected duplicate data is located.
And (3) data playback: and extracting and analyzing the data change of the source database to obtain a restored SQL sentence, and re-executing the restored SQL sentence in the aggregation database to generate the same data as the source database in the aggregation database.
Redis database: and the memory type KV database stores data in the form of key value pairs. The method can be used for reading and writing in the memory, has high speed, can be distributed and can be used as a cache component.
As described in the background art, in the prior art, data is often split horizontally according to a certain slicing key and stored in mutually independent databases respectively, if a query requirement according to a dimension other than the slicing key exists at this time, the data can be queried across a plurality of databases, so that the cost of data query is high, and in order to solve the problem of high cost of data query when the data is queried across the plurality of databases, the embodiment of the application provides a distributed data summarizing method, a first server, a computer readable storage medium and electronic equipment.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of a mobile terminal based on a distributed data summarizing method according to an embodiment of the present invention. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a display method of device information in an embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, a distributed data summarizing method running on a mobile terminal, a computer terminal, or a similar computing device is provided, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different from that herein.
Fig. 2 is a flow chart of a distributed-based data summarization method according to an embodiment of the present application. As shown in fig. 2, the method comprises the steps of:
step S201, a first server sequentially selects one from a plurality of source databases as a target source database, and establishes connection with the target source database;
specifically, the plurality of source databases are provided, different source databases correspond to different slicing keys, and when the first server performs data summarization, one source database can be randomly selected from the plurality of source databases to serve as a target source database, so that data in the target source database can be synchronized into the aggregation database subsequently, and the data in the plurality of source databases can be synchronized into the aggregation database.
Step S202, the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database;
specifically, a database log is stored in the target source database, the database log records all the commands of the operations in the target source database, and in order to ensure the accuracy of data playback, the database log stored in the target source database can be directly obtained, so that the database log is analyzed, SQL sentences of the operations in the target source database can be restored to obtain restored SQL sentences, and then the data playback can be performed by adopting the restored SQL sentences.
Step S203, the first server establishes connection with an aggregation database, and executes the SQL reduction statement in the aggregation database to obtain aggregation data;
optionally, the scheme for synchronizing data at present mainly comprises two types of batch processing and stream processing, and the scheme can adopt a stream processing mode to synchronize data; when the stream processing mode is adopted for data synchronization, the stream processing framework senses and extracts modified data in the target source database by changing the data capturing method, and copies the data to the aggregation database to complete data synchronization, and the accuracy of the data and the real-time performance of the synchronization can be ensured.
Specifically, under the condition that the first server and the aggregation database are already connected, a corresponding SQL command can be executed in the aggregation database according to the restored SQL statement, so that data in the target source database can be synchronized into the aggregation database, the plurality of source databases perform the same operation, the aggregation database can comprise the data in the plurality of source databases, and the data can be directly queried in the aggregation database without querying the data in the plurality of source databases.
In step S204, when receiving a target query requirement sent by a terminal, the first server queries target data from the aggregated data in the aggregated database according to the target query requirement, and sends the target data to the terminal, where the target query requirement characterizes requirements for querying data from a plurality of source databases.
Specifically, since the data in the multiple source databases are synchronized into the aggregated database, when the query requirement (target query requirement) of the dimension beyond the sharded key is received, the data can be directly queried in the aggregated database without crossing the multiple source databases, and thus, the cost of data query is lower.
According to the method, a first server sequentially selects one of a plurality of source databases as a target source database, establishes connection with the target source database, acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, establishes connection with an aggregation database, executes the restored SQL sentences in the aggregation database to obtain aggregation data, and finally inquires target data from the aggregation data of the aggregation database according to the target inquiry requirement under the condition that the first server receives the target inquiry requirement sent by a terminal. According to the scheme, the data in the plurality of source databases are summarized into the aggregation database, and when the query requirement of the dimension beyond the slicing key is received, the data can be directly queried in the aggregation database without crossing the plurality of source databases, so that the cost of data query can be reduced. Meanwhile, the method adopts the restored SQL sentences obtained by the database logs in the source database, and further executes the restored SQL sentences in the aggregation database, so that the data in the source database can be synchronized into the aggregation database, and the data synchronization effect is good.
The scheme can adopt a distributed mode to collect data, and can carry out horizontal expansion to share performance pressure in the face of a large distributed service system with larger data volume, higher transaction concurrency number and higher real-time requirement, and has higher upper limit of processing capacity.
In the application scenario of data aggregation, because the number of source databases is large, the data volume is large, and there are multiple servers working cooperatively in a distributed computing manner, at this time, to avoid the situation that the multiple servers aggregate the data in the same source database into an aggregate database to cause repeated transmission, so as to affect the consistency and synchronization efficiency of the data, in one embodiment of the present application, before the first server obtains the database log stored in the target source database, the method further includes: the first server establishes connection with a cache database, and searches a first target key value pair corresponding to the target source database from the cache database, wherein the first target key value pair is obtained according to the ID of the source database and the lock state of the source database; the first server determines whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair; the first server selecting one of the source databases as the target source database when the lock state in the first target key value pair indicates that the target source database is in a locked state; and updating the lock state of the target source database in the first target key pair to a lock state when the lock state of the target source database in the first target key pair indicates that the target source database is in an unlocked state, wherein a second server does not acquire the database log stored in the target source database when the lock state of the target source database in the first target key pair is in a locked state, and wherein a server which updates the lock state of the target source database first among the first server and the second server is used for analyzing the database log.
Specifically, the first target key value pair corresponding to the target source database may be < target source database ID, lock state >, where the lock state is 0 indicates unlocked and the lock state is 1 indicates locked.
In fact, in the process of synchronizing the source database to the aggregation database, a plurality of servers simultaneously communicate with the plurality of source databases, the plurality of servers can preempt the source database, the server preempt the source database communicates with the source database, acquire the database log of the source database, and perform subsequent SQL restoration operation, so that in order to cooperatively work among the plurality of servers, a certain server is prevented from being idle, or the load rate of the certain server is higher, or the certain source database does not communicate with the server (i.e. the server is not used for synchronizing the source database). The allocation of a specific server may be determined by parameters such as idle time of the server, data synchronization rate of the server, and concurrency number of the servers.
The multiple servers can be configured with concurrent numbers, each server can preempt 1-2 source databases, and the product of the maximum concurrent numbers of the multiple servers is greater than or equal to the number of the source databases, so as to avoid the situation that no server is in communication with the source databases. Data aggregation from multiple source databases to an aggregate database may also be implemented.
The cache database can be a Redis database, the Redis database is a public database, and a plurality of servers can query corresponding key value pairs of a plurality of source databases from the Redis database to determine whether the source databases are in a locking state; of course, other relational databases are possible, or the first server communicates directly with the configuration center without storing the first target key value pair via a cache database.
Alternatively, the locking operation is performed by writing the database ID in the distributed cache component, or by writing the database ID in an additional database or configuration center.
In the scheme, before a first server acquires a database log of a target source database, a first target key value pair corresponding to the target source database is searched in a cache database, whether the target source database is in a locked state is determined, under the condition that the target source database is in the locked state, the target source database may be communicated with a second server or a third server (or other servers), the second server or the third server already acquires the database log of the target source database, if the first server acquires the database log of the target source database again and then restores the database log of the target source database, a plurality of servers can acquire data repeatedly, and the consistency of the data can be poor.
Because the database logs are generated according to the sequence of operations, the log ID in the database logs can embody the generation sequence of the database logs, in order to ensure higher data synchronization efficiency, the operation record corresponding to the database logs in the source database should be identical to the operation record in the aggregate database, in the specific implementation process, the database logs comprise the log ID and the log record, the log ID is the ID corresponding to the database log generated by the operation of the target source database, one operation generates one database log, the log record stores the command of the operation of the target source database, the database log is analyzed, and the SQL statement in the database log is restored to obtain the restored SQL statement, which can be realized by the following steps: the first server sorts the plurality of database logs according to the size of the log ID to obtain sorted database logs; the first server sequentially analyzes the log records in each database log according to the sequence from first to last, and restores the SQL sentence in each database log to obtain the restored SQL sentence corresponding to each database log;
In the scheme, the database logs actually comprise log heads and log bodies, the log heads comprise IDs, the log bodies comprise log records, and the log records can be firstly sequenced according to the sizes of the log IDs, so that the obtained database logs are sequenced according to the sequence, the sequenced database logs are sequentially further analyzed, SQL sentences used by the target source database are restored, the sequence of the data in the target source database and the sequence of the data in the aggregate database are identical, and the higher data synchronization efficiency is further ensured.
Because of the characteristics of the distributed system, when the first server and the source database communicate, the situation that data come first after the data comes can occur due to uncertain delay of the network, so that the data is inconsistent can be caused, and in the specific implementation process, the restoring SQL statement is executed in the aggregation database, and the method can be realized by the following steps: the first server checks the log IDs, determines whether the sequence of the plurality of log IDs is correct, and obtains a check result; and the first server executes the restored SQL sentence in the aggregation database under the condition that the verification result representation passes verification.
Specifically, for the processes of locking a target source database, analyzing a database log and restoring an SQL statement, the process can be regarded as a task in a first server, a timing cycle can be set, the first server can execute the task once every time the timing time (for example, 5 seconds or 1 minute) is reached, the time interval can be dynamically adjusted according to the traffic, and the first server can process all database logs in the timing task of each cycle, so that backlog is avoided.
In the scheme, in order to ensure that the database logs are not repeatedly acquired by a plurality of servers within the same time period, the log IDs can be checked only by acquiring and restoring the database logs in the first server, whether the order of the restored database logs is correct can be determined by determining whether the order of the log IDs is correct, and if the condition that the log IDs come from the beginning later occurs, the restored database logs are likely to cause the desynchronization of the target source database and the aggregation database, so that under the condition that the order of the log IDs represented by the check result is correct, the restored SQL statement is executed in the aggregation database, and the accuracy of data synchronization is higher.
In order to further analyze the database logs, so as to ensure that the analysis result is accurate, and further ensure that the data synchronization efficiency is high, the first server in the application sequentially analyzes the log records in each database log according to the sequence from first to last, restores the SQL sentence in each database log, and obtains the restored SQL sentence corresponding to each database log, and the method can be realized by the following steps: the first server acquires the mapping relation between the command of the log record and the SQL command; the first server executes target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of the operation instructions, the operation conditions and the operation data in the log records, determining SQL operation instructions corresponding to the operation instructions in the log records, SQL query conditions corresponding to the operation conditions in the log records and SQL data fields corresponding to the fields of the operation data in the log records according to the mapping relation, and combining the SQL operation instructions, the SQL query conditions and the SQL data fields to obtain the restored SQL statement.
In the scheme, the content recorded in the database log and the SQL command have a mapping relation, the SQL sentence in the target source database can be restored according to the content recorded in the database log and the mapping relation of the SQL command, so that a restored SQL sentence can be obtained, and restoration is performed according to the sequence of the database log, so that higher data synchronization efficiency can be ensured when the restored SQL sentence is adopted for data playback in the follow-up.
If all database logs in the target source database are analyzed and playback is completed in one synchronization period, a log pointer connected with the target source database can be moved forward to the position of the last log executed by the restoring SQL statement, the lock state of the target source database in the first target key value pair of the cache database is updated to be in an unlocked state, and unlocking of the target source database is completed, and at the moment, the target source database can still be accessed and locked by other servers.
When data summarization is carried out, a plurality of servers can be in distributed cooperation without the phenomenon that the plurality of servers repeatedly send the same piece of data, the cooperation can be independent of additional configuration, when the data summarization is carried out, the sequence of data copying of each source database can be ensured, the later starting is avoided, and therefore the data consistency is ensured.
In practical application, the database log includes not only log ID and log record, but also database transaction, in order to ensure consistency of data synchronization, in another embodiment of the present application, the database log further includes a plurality of database transactions, where the database transaction is an operation sequence of accessing and operating one database of various data items, and the method further includes: the first server obtains transaction types of the database transactions, wherein the transaction types refer to categories for classifying the properties of the transactions corresponding to the database transactions; the first server executes the restore SQL statement in the aggregate database according to the transaction types of the plurality of database transactions.
For example, if the transaction type of one database transaction is the type of the transfer-out amount, the SQL statement should be restored according to the type of the transfer-out amount in the aggregate database, and if the transaction type of one database transaction is the deposit type, the SQL statement should be restored according to the type of the deposit in the aggregate database.
In the scheme, the database log also comprises a plurality of database transactions, and the database transactions can be synchronized in the aggregation database according to the transaction types of the database transactions of the target source database, so that the transactions in the target source database and the transactions in the aggregation database can be ensured to be synchronized, the occurrence of difference of the transaction types is avoided, and further, the higher synchronization efficiency of the transactions is ensured.
Under the condition that a few servers or databases fail, or the first server singly executes tasks for too long, so that when the next task starts, the current task is not executed, and thus, under the condition that the next task starts, database synchronization is inaccurate, in order to ensure that one database log is processed by the first server and cannot be processed by other servers (such as a second server or a third server), a message queue can be introduced to check the sequence of log IDs, and in some embodiments, the first server checks the log IDs, and determines whether the sequence of a plurality of log IDs is correct, so that a check result can be obtained, which is realized by the following steps: before the first server executes the restore SQL statement in the aggregate database for the nth time, the first server stores the log ID and the table name of the data table of the database log corresponding to the restore SQL statement executed in the aggregate database for the nth time in a message queue; after the first server executes the restore SQL statement in the aggregate database for the nth time, storing an ID of the target source database, the log ID of the database log, and the table name of the data table corresponding to the execution of the restore SQL statement in the aggregate database for the nth time as a second target key value pair in a cache database; before the (n+1) th execution of the restore SQL statement in the aggregate database, the first server obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the message queue and the table name of the data table, and obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the cache database and the table name of the data table; the first server executing the log ID corresponding to the reduced SQL statement N time in the message queue, the log ID corresponding to the reduced SQL statement N time in the cache database being different from the log ID corresponding to the reduced SQL statement N time in the message queue, and the first server determining that the order of the log IDs is incorrect and the verification result is not passed if the table name of the data table corresponding to the reduced SQL statement N time in the message queue is different from the table name of the data table corresponding to the reduced SQL statement N time in the cache database; the first server executes the log ID corresponding to the return SQL statement N time in the message queue, and determines that the order of the log IDs is correct, and determines that the verification result is pass verification, when the table name of the data table corresponding to the return SQL statement N time in the message queue is the same as the table name of the data table corresponding to the return SQL statement N time in the cache database.
Specifically, the parsing of the database log may be performed by a data summarizing service in the first server, and after the parsing of the database log, the parsing result may be further sent to a data summarizing proxy service of the first server through a message queue, where the message queue completes final data playback.
In this scheme, the first server may include the log ID of the current data and the log ID of the last data in the message queue, and further include the table name of the current data and the table name of the last data, after the first server executes the restore SQL statement each time, the ID of the target source database, the log ID and the table name may be stored in the cache database as a second target key value pair, so as to check before executing the restore SQL statement next time, and before executing the restore SQL statement each time, it may check whether the last log ID stored in the cache database is consistent with the last log ID stored in the message queue, and also check whether the last table name stored in the cache database is consistent with the last table name stored in the message queue, if not, it indicates that the last message is sent first (that other messages have been sent first and have not yet arrived), and if the last log ID stored in the cache database is consistent with the last log ID stored in the message queue, it may be determined that the last log ID is not yet consistent with the last table name stored in the message queue, and it may be avoided that the order is further that the last log ID is consistent with the last message stored in the message queue.
Specifically, the first server may also return a failure response to the message queue, and may consume the message again after waiting for a period of time through a failure retry mechanism of the message queue, and when retrying, if a message sent first and then arrives, the retry message may be played back normally, or else, the failure response may be returned continuously to wait for retrying.
If none of the plurality of verification results passes, the data synchronization is suspended, and the data is not written in the aggregated database for a long period of time, so that the time suspended in the data synchronization process is too long, and the cost in the data synchronization process is higher. The first server re-checks the log ID, and generates alarm information and sends the alarm information to a target terminal when the re-check time is greater than or equal to a time threshold; and under the condition that the first server receives a preset operation, updating the verification result to pass, generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized in that the sequence of the log ID which is not passed in verification is modified, and the sequence of the log ID is modified to be non-abnormal.
Alternatively, the predetermined operation may be an operation of retrieving the failed transaction by a person or an operation of manually modifying the order of the log IDs to modify the order of the log IDs to the correct order.
In the scheme, if the obtained multiple verification result characterization does not pass verification, alarm information can be generated and sent to the target terminal, so that maintenance personnel can be timely prompted that abnormality occurs in the data synchronization process, the maintenance personnel can timely process the abnormal data synchronization process, failure transactions can be modified to be non-abnormal by the maintenance personnel, the sequence of the log IDs can pass verification, data synchronization is continued, an operation log is generated, and the follow-up maintenance personnel can know that the data is abnormal.
In order to enable those skilled in the art to more clearly understand the technical solutions of the present application, the implementation procedure of the distributed data summarizing method of the present application will be described in detail below with reference to specific embodiments.
The system structure of the scheme is shown in fig. 3, and comprises service units (the distributed system is divided according to the service data level, and the service logic is identical except for the difference of stored and processed data), wherein each service unit is defined as a service unit, each service unit is composed of a set of middleware services such as a database, a service server and a message queue depending on the service server, and the like, and an aggregation unit (the unit for summarizing the divided data stores partial data copies by using a single-base list table).
As shown in FIG. 4, the distributed data summarizing method is characterized in that a first server sequentially selects one of a plurality of source databases as a target source database, establishes connection with the target source database, establishes connection with a Redis database, searches a first target key value pair corresponding to the target source database from the Redis database, determines whether the target source database is in a locked state according to the lock state of the target source database in the first target key value pair, analyzes the SQL log of the database to obtain a reduction statement when the lock state of the first target key value pair represents the target source database is in the locked state, reselects one of the source databases as the target source database from the plurality of source databases, updates the lock state of the target source database in the first target key value pair to the locked state when the lock state of the first server represents the target source database in the unlocked state, acquires a database log stored in the target source database, and performs a reduction statement in the SQL log, and finally performs a task aggregation to the SQL log according to the sequence of the reduction statement, and sends the SQL statement to the target source database before the task is executed to the target database.
The process of checking the log ID is shown in fig. 5, after the first server obtains the data summarizing service analysis log, the analysis result is sent to the data summarizing proxy service through the message queue, the data summarizing proxy service checks the table name of the last log ID and the data table in the message queue and the table name of the last log ID and the data table in the Redis database, and determines whether the checking is passed: whether the last log ID in the message queue is the same as the last log ID in the Redis database or not, whether the table name of the last data table in the message queue is the same as the table name of the last data table in the Redis database or not, if so, checking to pass, and if not, checking to fail; returning a failure response to the message queue under the condition that the verification is not passed, and waiting for a period of time to consume the message again through a failure retry mechanism of the message queue; and under the condition that verification is passed, performing data playback, storing the table names of the source database ID, the log ID and the data table as key value pairs in a Redis database, storing the table names of the source database ID, the log ID and the data table as key value pairs in a message queue, returning a successful response to the message queue, and waiting for the next verification.
The embodiment of the application also provides a first server, and it should be noted that the first server of the embodiment of the application may be used to execute the method for data summarization based on the distribution. The device is used for realizing the above embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The following describes a first server provided in an embodiment of the present application.
Fig. 6 is a block diagram of a first server according to an embodiment of the present application. As shown in fig. 6, the first server includes:
a first selection unit 10, configured to sequentially select one from a plurality of source databases as a target source database, and establish a connection with the target source database;
specifically, the plurality of source databases are provided, different source databases correspond to different slicing keys, and when the first server performs data summarization, one source database can be randomly selected from the plurality of source databases to serve as a target source database, so that data in the target source database can be synchronized into the aggregation database subsequently, and the data in the plurality of source databases can be synchronized into the aggregation database.
A first processing unit 20, configured to obtain a database log stored in the target source database, parse the database log, and restore an SQL statement in the database log to obtain a restored SQL statement, where the database log is at least used to store a command of an operation in the target source database;
specifically, a database log is stored in the target source database, the database log records all the commands of the operations in the target source database, and in order to ensure the accuracy of data playback, the database log stored in the target source database can be directly obtained, so that the database log is analyzed, SQL sentences of the operations in the target source database can be restored to obtain restored SQL sentences, and then the data playback can be performed by adopting the restored SQL sentences.
The first execution unit 30 establishes connection with the aggregation database, and executes the reduced SQL statement in the aggregation database to obtain aggregation data;
optionally, the scheme for synchronizing data at present mainly comprises two types of batch processing and stream processing, and the scheme can adopt a stream processing mode to synchronize data; when the stream processing mode is adopted for data synchronization, the stream processing framework senses and extracts modified data in the target source database by changing the data capturing method, and copies the data to the aggregation database to complete data synchronization, and the accuracy of the data and the real-time performance of the synchronization can be ensured.
Specifically, under the condition that the first server and the aggregation database are already connected, a corresponding SQL command can be executed in the aggregation database according to the restored SQL statement, so that data in the target source database can be synchronized into the aggregation database, the plurality of source databases perform the same operation, the aggregation database can comprise the data in the plurality of source databases, and the data can be directly queried in the aggregation database without querying the data in the plurality of source databases.
And a second processing unit 40, configured to query target data from the aggregated data in the aggregated database according to the target query requirement, and send the target data to the terminal, where the target query requirement characterizes a requirement of requesting to query data from a plurality of source databases, when receiving a target query requirement sent by the terminal.
Specifically, since the data in the multiple source databases are synchronized into the aggregated database, when the query requirement (target query requirement) of the dimension beyond the sharded key is received, the data can be directly queried in the aggregated database without crossing the multiple source databases, and thus, the cost of data query is lower.
According to the embodiment, the first selecting unit sequentially selects one of the plurality of source databases as a target source database, establishes connection with the target source database, the first processing unit acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, the first executing unit establishes connection with the aggregation database, executes the restored SQL sentences in the aggregation database to obtain aggregation data, and the second processing unit inquires the target data from the aggregation data of the aggregation database according to the target inquiry requirement under the condition that the target inquiry requirement sent by the terminal is received, and sends the target data to the terminal. According to the scheme, the data in the plurality of source databases are summarized into the aggregation database, and when the query requirement of the dimension beyond the slicing key is received, the data can be directly queried in the aggregation database without crossing the plurality of source databases, so that the cost of data query can be reduced. Meanwhile, the method adopts the restored SQL sentences obtained by the database logs in the source database, and further executes the restored SQL sentences in the aggregation database, so that the data in the source database can be synchronized into the aggregation database, and the data synchronization effect is good.
The scheme can adopt a distributed mode to collect data, and can carry out horizontal expansion to share performance pressure in the face of a large distributed service system with larger data volume, higher transaction concurrency number and higher real-time requirement, and has higher upper limit of processing capacity.
In the application scenario of data summarization, because the number of source databases is large, there are multiple servers working cooperatively in a distributed computing manner, at this time, to avoid the situation that multiple servers summarize the data in the same source database into an aggregation database to cause repeated transmission, thereby affecting the consistency and synchronization efficiency of the data, in one embodiment of the present application, the first server further includes a third processing unit, a determining unit, a second selecting unit, and an updating unit, where the third processing unit is configured to establish a connection with a cache database before acquiring the database log stored in the target source database, and search the cache database for a first target key value pair corresponding to the target source database, where the first target key value pair is obtained according to the ID of the source database and the lock state of the source database; the determining unit is used for determining whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair; the second selecting unit is used for selecting one from the source databases as the target source database under the condition that the lock state of the first target key value pair represents that the target source database is in a locked state; the updating unit is configured to update the lock state of the target source database in the first target key pair to a lock state when the lock state of the target source database in the first target key pair indicates that the target source database is in an unlocked state, wherein when the lock state of the target source database in the first target key pair is in a locked state, the second server does not acquire the database log stored in the target source database, and a server that updates the lock state of the target source database first among the first server and the second server is configured to parse the database log.
Specifically, the first target key value pair corresponding to the target source database may be < target source database ID, lock state >, where the lock state is 0 indicates unlocked and the lock state is 1 indicates locked.
In fact, in the process of synchronizing the source database to the aggregation database, a plurality of servers simultaneously communicate with the plurality of source databases, the plurality of servers can preempt the source database, the server preempt the source database communicates with the source database, acquire the database log of the source database, and perform subsequent SQL restoration operation, so that in order to cooperatively work among the plurality of servers, a certain server is prevented from being idle, or the load rate of the certain server is higher, or the certain source database does not communicate with the server (i.e. the server is not used for synchronizing the source database). The allocation of a specific server may be determined by parameters such as idle time of the server, data synchronization rate of the server, and concurrency number of the servers.
The multiple servers can be configured with concurrent numbers, each server can preempt 1-2 source databases, and the product of the maximum concurrent numbers of the multiple servers is greater than or equal to the number of the source databases, so as to avoid the situation that no server is in communication with the source databases. Data aggregation from multiple source databases to an aggregate database may also be implemented.
The cache database can be a Redis database, the Redis database is a public database, and a plurality of servers can query corresponding key value pairs of a plurality of source databases from the Redis database to determine whether the source databases are in a locking state; of course, other relational databases are possible, or the first server communicates directly with the configuration center without storing the first target key value pair via a cache database.
Alternatively, the locking operation is performed by writing the database ID in the distributed cache component, or by writing the database ID in an additional database or configuration center.
In the scheme, before a first server acquires a database log of a target source database, a first target key value pair corresponding to the target source database is searched in a cache database, whether the target source database is in a locked state is determined, under the condition that the target source database is in the locked state, the target source database may be communicated with a second server or a third server (or other servers), the second server or the third server already acquires the database log of the target source database, if the first server acquires the database log of the target source database again and then restores the database log of the target source database, a plurality of servers can acquire data repeatedly, and the consistency of the data can be poor.
Because the database logs are generated according to the sequence of operations, the log ID in the database logs can embody the generation sequence of the database logs, in order to ensure higher data synchronization efficiency, the operation records corresponding to the database logs in the source database should be identical to the operation records in the aggregate database, in the specific implementation process, the database logs comprise the log ID and the log records, the log ID is the ID corresponding to the database logs generated by the operation of the target source database, one operation generates one database log, the log records store commands for the operation of the target source database, and the first processing unit comprises a first processing module and a second processing module, and the first processing module is used for sequencing a plurality of the database logs according to the size of the log ID to obtain the sequenced database logs; the second processing module is used for sequentially analyzing the log records in each database log according to the sequence from first to last, and restoring the SQL sentences in each database log to obtain the restored SQL sentences corresponding to each database log;
in the scheme, the database logs actually comprise log heads and log bodies, the log heads comprise IDs, the log bodies comprise log records, and the log records can be firstly sequenced according to the sizes of the log IDs, so that the obtained database logs are sequenced according to the sequence, the sequenced database logs are sequentially further analyzed, SQL sentences used by the target source database are restored, the sequence of the data in the target source database and the sequence of the data in the aggregate database are identical, and the higher data synchronization efficiency is further ensured.
Because of the characteristic of the distributed system, when the first server and the source database communicate, the condition that the data come first after the data come first possibly occurs due to the uncertain delay of the network, so that the data are inconsistent, and in the specific implementation process, the first execution unit comprises a determination module and an execution module, wherein the determination module is used for checking the log IDs, determining whether the sequence of a plurality of log IDs is correct, and obtaining a checking result; the execution module is used for executing the restored SQL statement in the aggregation database under the condition that the verification result representation passes the verification.
Specifically, for the processes of locking a target source database, analyzing a database log and restoring an SQL statement, the process can be regarded as a task in a first server, a timing cycle can be set, the first server can execute the task once every time the timing time (for example, 5 seconds or 1 minute) is reached, the time interval can be dynamically adjusted according to the traffic, and the first server can process all database logs in the timing task of each cycle, so that backlog is avoided.
In the scheme, in order to ensure that the database logs are not repeatedly acquired by a plurality of servers within the same time period, the log IDs can be checked only by acquiring and restoring the database logs in the first server, whether the order of the restored database logs is correct can be determined by determining whether the order of the log IDs is correct, and if the condition that the log IDs come from the beginning later occurs, the restored database logs are likely to cause the desynchronization of the target source database and the aggregation database, so that under the condition that the order of the log IDs represented by the check result is correct, the restored SQL statement is executed in the aggregation database, and the accuracy of data synchronization is higher.
In order to further analyze the database log, so as to ensure that the analysis result is accurate and further ensure that the data synchronization efficiency is high, the second processing module of the application comprises a first acquisition sub-module and an execution sub-module, wherein the first acquisition sub-module is used for acquiring the mapping relation between the command recorded by the log and the SQL command; the execution submodule is used for executing target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of the operation instructions, the operation conditions and the operation data in the log records, determining SQL operation instructions corresponding to the operation instructions in the log records, SQL query conditions corresponding to the operation conditions in the log records and SQL data fields corresponding to the fields of the operation data in the log records according to the mapping relation, and combining the SQL operation instructions, the SQL query conditions and the SQL data fields to obtain the restored SQL statement.
In the scheme, the content recorded in the database log and the SQL command have a mapping relation, the SQL sentence in the target source database can be restored according to the content recorded in the database log and the mapping relation of the SQL command, so that a restored SQL sentence can be obtained, and restoration is performed according to the sequence of the database log, so that higher data synchronization efficiency can be ensured when the restored SQL sentence is adopted for data playback in the follow-up.
If all database logs in the target source database are analyzed and playback is completed in one synchronization period, a log pointer connected with the target source database can be moved forward to the position of the last log executed by the restoring SQL statement, the lock state of the target source database in the first target key value pair of the cache database is updated to be in an unlocked state, and unlocking of the target source database is completed, and at the moment, the target source database can still be accessed and locked by other servers.
When data summarization is carried out, a plurality of servers can be in distributed cooperation without the phenomenon that the plurality of servers repeatedly send the same piece of data, the cooperation can be independent of additional configuration, when the data summarization is carried out, the sequence of data copying of each source database can be ensured, the later starting is avoided, and therefore the data consistency is ensured.
In practical application, the database log includes not only log ID and log record, but also database transaction, in order to ensure consistency of data synchronization, in another embodiment of the present application, the database log further includes a plurality of database transactions, where the database transaction is an operation sequence of accessing and operating one database of various data items, and the first server further includes an obtaining unit and a second executing unit, where the obtaining unit is configured to obtain a transaction type of each database transaction, where the transaction type refers to a category in which properties of transactions corresponding to the database transaction are classified; the second execution unit is configured to execute the restore SQL statement in the aggregate database according to the transaction types of the plurality of database transactions.
For example, if the transaction type of one database transaction is the type of the transfer-out amount, the SQL statement should be restored according to the type of the transfer-out amount in the aggregate database, and if the transaction type of one database transaction is the deposit type, the SQL statement should be restored according to the type of the deposit in the aggregate database.
In the scheme, the database log also comprises a plurality of database transactions, and the database transactions can be synchronized in the aggregation database according to the transaction types of the database transactions of the target source database, so that the transactions in the target source database and the transactions in the aggregation database can be ensured to be synchronized, the occurrence of difference of the transaction types is avoided, and further, the higher synchronization efficiency of the transactions is ensured.
Under the condition that a few servers or databases fail, or the first server singly executes a task for too long, so that when the next task starts, the current task is not executed, and thus, under the condition that the next task starts, database synchronization is inaccurate, in order to ensure that one database log is processed by the first server and cannot be processed by other servers (such as a second server or a third server), a message queue can be introduced to verify the sequence of log IDs, and in some embodiments, the determining module comprises a first storage sub-module, a second acquisition sub-module, a first determining sub-module and a second determining sub-module, wherein the first storage sub-module is used for storing the log ID of the database log corresponding to the restoring SQL statement executed in the aggregation database for the nth time and the table name of the database table in the message queue before executing the restoring SQL statement in the aggregation database for the nth time; the second storage sub-module is configured to store, in a cache database, an ID of the target source database, the log ID of the database log, and the table name of the data table corresponding to the nth execution of the restore SQL statement in the aggregate database as a second target key value pair after the nth execution of the restore SQL statement in the aggregate database; the second obtaining submodule obtains the log ID corresponding to the N-th execution of the restore SQL statement in the message queue and the table name of the data table before the n+1th execution of the restore SQL statement in the aggregation database, and obtains the log ID corresponding to the N-th execution of the restore SQL statement in the cache database and the table name of the data table; the first determination submodule is configured to determine that the order of the log IDs is incorrect when the log ID corresponding to the nth execution of the reduced SQL statement in the message queue is different from the log ID corresponding to the nth execution of the reduced SQL statement in the cache database, and the table name of the data table corresponding to the nth execution of the reduced SQL statement in the message queue is different from the table name of the data table corresponding to the nth execution of the reduced SQL statement in the cache database, so that the verification result is determined as not passing the verification; the second determination submodule is configured to determine that the order of the log IDs is correct when the log ID corresponding to the N-th execution of the return SQL statement in the message queue is the same as the log ID corresponding to the N-th execution of the return SQL statement in the cache database, and the table name of the data table corresponding to the N-th execution of the return SQL statement in the message queue is the same as the table name of the data table corresponding to the N-th execution of the return SQL statement in the cache database, thereby determining that the verification result is passing.
Specifically, the parsing of the database log may be performed by a data summarizing service in the first server, and after the parsing of the database log, the parsing result may be further sent to a data summarizing proxy service of the first server through a message queue, where the message queue completes final data playback.
In this scheme, the first server may include the log ID of the current data and the log ID of the last data in the message queue, and further include the table name of the current data and the table name of the last data, after the first server executes the restore SQL statement each time, the ID of the target source database, the log ID and the table name may be stored in the cache database as a second target key value pair, so as to check before executing the restore SQL statement next time, and before executing the restore SQL statement each time, it may check whether the last log ID stored in the cache database is consistent with the last log ID stored in the message queue, and also check whether the last table name stored in the cache database is consistent with the last table name stored in the message queue, if not, it indicates that the last message is sent first (that other messages have been sent first and have not yet arrived), and if the last log ID stored in the cache database is consistent with the last log ID stored in the message queue, it may be determined that the last log ID is not yet consistent with the last table name stored in the message queue, and it may be avoided that the order is further that the last log ID is consistent with the last message stored in the message queue.
Specifically, the first server may also return a failure response to the message queue, and may consume the message again after waiting for a period of time through a failure retry mechanism of the message queue, and when retrying, if a message sent first and then arrives, the retry message may be played back normally, or else, the failure response may be returned continuously to wait for retrying.
If the data synchronization is suspended and the data is not written in the aggregated database for a long time due to the fact that the data synchronization is not passed through a plurality of verification results, the time suspended in the data synchronization process is too long and the cost in the data synchronization process is high; and the fifth processing unit is used for updating the verification result to pass under the condition of receiving a preset operation, and generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized by modifying the sequence of the log ID which does not pass the verification and modifying the sequence of the log ID into a non-abnormal operation.
Alternatively, the predetermined operation may be an operation of retrieving the failed transaction by a person or an operation of manually modifying the order of the log IDs to modify the order of the log IDs to the correct order.
In the scheme, if the obtained multiple verification result characterization does not pass verification, alarm information can be generated and sent to the target terminal, so that maintenance personnel can be timely prompted that abnormality occurs in the data synchronization process, the maintenance personnel can timely process the abnormal data synchronization process, failure transactions can be modified to be non-abnormal by the maintenance personnel, the sequence of the log IDs can pass verification, data synchronization is continued, an operation log is generated, and the follow-up maintenance personnel can know that the data is abnormal.
The first server includes a processor and a memory, the first selecting unit, the first processing unit, the first executing unit, the second processing unit, etc. are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions. The modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problem of high data query cost when data is queried across multiple databases in the prior art is solved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the invention provides a computer readable storage medium, which comprises a stored program, wherein the device where the computer readable storage medium is located is controlled to execute the distributed data summarizing method when the program runs.
Specifically, the data summarization method based on the distributed mode comprises the following steps:
step S201, a first server sequentially selects one from a plurality of source databases as a target source database, and establishes connection with the target source database;
specifically, the plurality of source databases are provided, different source databases correspond to different slicing keys, and when the first server performs data summarization, one source database can be randomly selected from the plurality of source databases to serve as a target source database, so that data in the target source database can be synchronized into the aggregation database subsequently, and the data in the plurality of source databases can be synchronized into the aggregation database.
Step S202, the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database;
specifically, a database log is stored in the target source database, the database log records all the commands of the operations in the target source database, and in order to ensure the accuracy of data playback, the database log stored in the target source database can be directly obtained, so that the database log is analyzed, SQL sentences of the operations in the target source database can be restored to obtain restored SQL sentences, and then the data playback can be performed by adopting the restored SQL sentences.
Step S203, the first server establishes connection with an aggregation database, and executes the SQL reduction statement in the aggregation database to obtain aggregation data;
specifically, under the condition that the first server and the aggregation database are already connected, a corresponding SQL command can be executed in the aggregation database according to the restored SQL statement, so that data in the target source database can be synchronized into the aggregation database, the plurality of source databases perform the same operation, the aggregation database can comprise the data in the plurality of source databases, and the data can be directly queried in the aggregation database without querying the data in the plurality of source databases.
In step S204, when receiving a target query requirement sent by a terminal, the first server queries target data from the aggregated data in the aggregated database according to the target query requirement, and sends the target data to the terminal, where the target query requirement characterizes requirements for querying data from a plurality of source databases.
Specifically, since the data in the multiple source databases are synchronized into the aggregated database, when the query requirement (target query requirement) of the dimension beyond the sharded key is received, the data can be directly queried in the aggregated database without crossing the multiple source databases, and thus, the cost of data query is lower.
Optionally, before the first server obtains the database log stored in the target source database, the method further includes: the first server establishes connection with a cache database, and searches a first target key value pair corresponding to the target source database from the cache database, wherein the first target key value pair is obtained according to the ID of the source database and the lock state of the source database; the first server determines whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair; the first server selecting one of the source databases as the target source database when the lock state in the first target key value pair indicates that the target source database is in a locked state; and updating the lock state of the target source database in the first target key pair to a lock state when the lock state of the target source database in the first target key pair indicates that the target source database is in an unlocked state, wherein a second server does not acquire the database log stored in the target source database when the lock state of the target source database in the first target key pair is in a locked state, and wherein a server which updates the lock state of the target source database first among the first server and the second server is used for analyzing the database log.
Optionally, the database log includes a log ID and a log record, where the log ID is an ID corresponding to a database log generated by operating the target source database, one database log is generated by one operation, the log record stores a command for operating the target source database, the database log is parsed, and SQL statements in the database log are restored to obtain restored SQL statements, and the method includes: the first server sorts the plurality of database logs according to the size of the log ID to obtain sorted database logs; the first server sequentially analyzes the log records in each database log according to the sequence from first to last, and restores the SQL sentence in each database log to obtain the restored SQL sentence corresponding to each database log; executing the restore SQL statement in the aggregate database, comprising: the first server checks the log IDs, determines whether the sequence of the plurality of log IDs is correct, and obtains a check result; and the first server executes the restored SQL sentence in the aggregation database under the condition that the verification result representation passes verification.
Optionally, the first server sequentially analyzes the log records in each database log according to the order from first to last, restores the SQL statement in each database log, and obtains the restored SQL statement corresponding to each database log, including: the first server acquires the mapping relation between the command of the log record and the SQL command; the first server executes target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of the operation instructions, the operation conditions and the operation data in the log records, determining SQL operation instructions corresponding to the operation instructions in the log records, SQL query conditions corresponding to the operation conditions in the log records and SQL data fields corresponding to the fields of the operation data in the log records according to the mapping relation, and combining the SQL operation instructions, the SQL query conditions and the SQL data fields to obtain the restored SQL statement.
Optionally, the database log further includes a plurality of database transactions, the database transactions being a sequence of operations to access and operate on a database of various data items, the method further comprising: the first server obtains transaction types of the database transactions, wherein the transaction types refer to categories for classifying the properties of the transactions corresponding to the database transactions; the first server executes the restore SQL statement in the aggregate database according to the transaction types of the plurality of database transactions.
Optionally, the first server checks the log IDs to determine whether the order of the plurality of log IDs is correct, to obtain a check result, including: before the first server executes the restore SQL statement in the aggregate database for the nth time, the first server stores the log ID and the table name of the data table of the database log corresponding to the restore SQL statement executed in the aggregate database for the nth time in a message queue; after the first server executes the restore SQL statement in the aggregate database for the nth time, storing an ID of the target source database, the log ID of the database log, and the table name of the data table corresponding to the execution of the restore SQL statement in the aggregate database for the nth time as a second target key value pair in a cache database; before the (n+1) th execution of the restore SQL statement in the aggregate database, the first server obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the message queue and the table name of the data table, and obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the cache database and the table name of the data table; the first server executing the log ID corresponding to the reduced SQL statement N time in the message queue, the log ID corresponding to the reduced SQL statement N time in the cache database being different from the log ID corresponding to the reduced SQL statement N time in the message queue, and the first server determining that the order of the log IDs is incorrect and the verification result is not passed if the table name of the data table corresponding to the reduced SQL statement N time in the message queue is different from the table name of the data table corresponding to the reduced SQL statement N time in the cache database; the first server executes the log ID corresponding to the return SQL statement N time in the message queue, and determines that the order of the log IDs is correct, and determines that the verification result is pass verification, when the table name of the data table corresponding to the return SQL statement N time in the message queue is the same as the table name of the data table corresponding to the return SQL statement N time in the cache database.
Optionally, in the case that the verification result is determined to be not passing the verification, the method further includes: the first server re-checks the log ID, and generates alarm information and sends the alarm information to a target terminal when the re-check time is greater than or equal to a time threshold; and under the condition that the first server receives a preset operation, updating the verification result to pass, generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized in that the sequence of the log ID which is not passed in verification is modified, and the sequence of the log ID is modified to be non-abnormal.
The embodiment of the invention provides a processor which is used for running a program, wherein the distributed data summarizing method is executed when the program runs.
Specifically, the data summarization method based on the distributed mode comprises the following steps:
step S201, a first server sequentially selects one from a plurality of source databases as a target source database, and establishes connection with the target source database;
specifically, the plurality of source databases are provided, different source databases correspond to different slicing keys, and when the first server performs data summarization, one source database can be randomly selected from the plurality of source databases to serve as a target source database, so that data in the target source database can be synchronized into the aggregation database subsequently, and the data in the plurality of source databases can be synchronized into the aggregation database.
Step S202, the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database;
specifically, a database log is stored in the target source database, the database log records all the commands of the operations in the target source database, and in order to ensure the accuracy of data playback, the database log stored in the target source database can be directly obtained, so that the database log is analyzed, SQL sentences of the operations in the target source database can be restored to obtain restored SQL sentences, and then the data playback can be performed by adopting the restored SQL sentences.
Step S203, the first server establishes connection with an aggregation database, and executes the SQL reduction statement in the aggregation database to obtain aggregation data;
specifically, under the condition that the first server and the aggregation database are already connected, a corresponding SQL command can be executed in the aggregation database according to the restored SQL statement, so that data in the target source database can be synchronized into the aggregation database, the plurality of source databases perform the same operation, the aggregation database can comprise the data in the plurality of source databases, and the data can be directly queried in the aggregation database without querying the data in the plurality of source databases.
In step S204, when receiving a target query requirement sent by a terminal, the first server queries target data from the aggregated data in the aggregated database according to the target query requirement, and sends the target data to the terminal, where the target query requirement characterizes requirements for querying data from a plurality of source databases.
Specifically, since the data in the multiple source databases are synchronized into the aggregated database, when the query requirement (target query requirement) of the dimension beyond the sharded key is received, the data can be directly queried in the aggregated database without crossing the multiple source databases, and thus, the cost of data query is lower.
Optionally, before the first server obtains the database log stored in the target source database, the method further includes: the first server establishes connection with a cache database, and searches a first target key value pair corresponding to the target source database from the cache database, wherein the first target key value pair is obtained according to the ID of the source database and the lock state of the source database; the first server determines whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair; the first server selecting one of the source databases as the target source database when the lock state in the first target key value pair indicates that the target source database is in a locked state; and updating the lock state of the target source database in the first target key pair to a lock state when the lock state of the target source database in the first target key pair indicates that the target source database is in an unlocked state, wherein a second server does not acquire the database log stored in the target source database when the lock state of the target source database in the first target key pair is in a locked state, and wherein a server which updates the lock state of the target source database first among the first server and the second server is used for analyzing the database log.
Optionally, the database log includes a log ID and a log record, where the log ID is an ID corresponding to a database log generated by operating the target source database, one database log is generated by one operation, the log record stores a command for operating the target source database, the database log is parsed, and SQL statements in the database log are restored to obtain restored SQL statements, and the method includes: the first server sorts the plurality of database logs according to the size of the log ID to obtain sorted database logs; the first server sequentially analyzes the log records in each database log according to the sequence from first to last, and restores the SQL sentence in each database log to obtain the restored SQL sentence corresponding to each database log; executing the restore SQL statement in the aggregate database, comprising: the first server checks the log IDs, determines whether the sequence of the plurality of log IDs is correct, and obtains a check result; and the first server executes the restored SQL sentence in the aggregation database under the condition that the verification result representation passes verification.
Optionally, the first server sequentially analyzes the log records in each database log according to the order from first to last, restores the SQL statement in each database log, and obtains the restored SQL statement corresponding to each database log, including: the first server acquires the mapping relation between the command of the log record and the SQL command; the first server executes target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of the operation instructions, the operation conditions and the operation data in the log records, determining SQL operation instructions corresponding to the operation instructions in the log records, SQL query conditions corresponding to the operation conditions in the log records and SQL data fields corresponding to the fields of the operation data in the log records according to the mapping relation, and combining the SQL operation instructions, the SQL query conditions and the SQL data fields to obtain the restored SQL statement.
Optionally, the database log further includes a plurality of database transactions, the database transactions being a sequence of operations to access and operate on a database of various data items, the method further comprising: the first server obtains transaction types of the database transactions, wherein the transaction types refer to categories for classifying the properties of the transactions corresponding to the database transactions; the first server executes the restore SQL statement in the aggregate database according to the transaction types of the plurality of database transactions.
Optionally, the first server checks the log IDs to determine whether the order of the plurality of log IDs is correct, to obtain a check result, including: before the first server executes the restore SQL statement in the aggregate database for the nth time, the first server stores the log ID and the table name of the data table of the database log corresponding to the restore SQL statement executed in the aggregate database for the nth time in a message queue; after the first server executes the restore SQL statement in the aggregate database for the nth time, storing an ID of the target source database, the log ID of the database log, and the table name of the data table corresponding to the execution of the restore SQL statement in the aggregate database for the nth time as a second target key value pair in a cache database; before the (n+1) th execution of the restore SQL statement in the aggregate database, the first server obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the message queue and the table name of the data table, and obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the cache database and the table name of the data table; the first server executing the log ID corresponding to the reduced SQL statement N time in the message queue, the log ID corresponding to the reduced SQL statement N time in the cache database being different from the log ID corresponding to the reduced SQL statement N time in the message queue, and the first server determining that the order of the log IDs is incorrect and the verification result is not passed if the table name of the data table corresponding to the reduced SQL statement N time in the message queue is different from the table name of the data table corresponding to the reduced SQL statement N time in the cache database; the first server executes the log ID corresponding to the return SQL statement N time in the message queue, and determines that the order of the log IDs is correct, and determines that the verification result is pass verification, when the table name of the data table corresponding to the return SQL statement N time in the message queue is the same as the table name of the data table corresponding to the return SQL statement N time in the cache database.
Optionally, in the case that the verification result is determined to be not passing the verification, the method further includes: the first server re-checks the log ID, and generates alarm information and sends the alarm information to a target terminal when the re-check time is greater than or equal to a time threshold; and under the condition that the first server receives a preset operation, updating the verification result to pass, generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized in that the sequence of the log ID which is not passed in verification is modified, and the sequence of the log ID is modified to be non-abnormal.
The application also provides an electronic device comprising one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the distributed-based data summarization methods described above.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program stored in the memory and capable of running on the processor, wherein the processor realizes at least the following steps when executing the program:
Step S201, a first server sequentially selects one from a plurality of source databases as a target source database, and establishes connection with the target source database;
specifically, the plurality of source databases are provided, different source databases correspond to different slicing keys, and when the first server performs data summarization, one source database can be randomly selected from the plurality of source databases to serve as a target source database, so that data in the target source database can be synchronized into the aggregation database subsequently, and the data in the plurality of source databases can be synchronized into the aggregation database.
Step S202, the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database;
specifically, a database log is stored in the target source database, the database log records all the commands of the operations in the target source database, and in order to ensure the accuracy of data playback, the database log stored in the target source database can be directly obtained, so that the database log is analyzed, SQL sentences of the operations in the target source database can be restored to obtain restored SQL sentences, and then the data playback can be performed by adopting the restored SQL sentences.
Step S203, the first server establishes connection with an aggregation database, and executes the SQL reduction statement in the aggregation database to obtain aggregation data;
specifically, under the condition that the first server and the aggregation database are already connected, a corresponding SQL command can be executed in the aggregation database according to the restored SQL statement, so that data in the target source database can be synchronized into the aggregation database, the plurality of source databases perform the same operation, the aggregation database can comprise the data in the plurality of source databases, and the data can be directly queried in the aggregation database without querying the data in the plurality of source databases.
In step S204, when receiving a target query requirement sent by a terminal, the first server queries target data from the aggregated data in the aggregated database according to the target query requirement, and sends the target data to the terminal, where the target query requirement characterizes requirements for querying data from a plurality of source databases.
Specifically, since the data in the multiple source databases are synchronized into the aggregated database, when the query requirement (target query requirement) of the dimension beyond the sharded key is received, the data can be directly queried in the aggregated database without crossing the multiple source databases, and thus, the cost of data query is lower.
Optionally, before the first server obtains the database log stored in the target source database, the method further includes: the first server establishes connection with a cache database, and searches a first target key value pair corresponding to the target source database from the cache database, wherein the first target key value pair is obtained according to the ID of the source database and the lock state of the source database; the first server determines whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair; the first server selecting one of the source databases as the target source database when the lock state in the first target key value pair indicates that the target source database is in a locked state; and updating the lock state of the target source database in the first target key pair to a lock state when the lock state of the target source database in the first target key pair indicates that the target source database is in an unlocked state, wherein a second server does not acquire the database log stored in the target source database when the lock state of the target source database in the first target key pair is in a locked state, and wherein a server which updates the lock state of the target source database first among the first server and the second server is used for analyzing the database log.
Optionally, the database log includes a log ID and a log record, where the log ID is an ID corresponding to a database log generated by operating the target source database, one database log is generated by one operation, the log record stores a command for operating the target source database, the database log is parsed, and SQL statements in the database log are restored to obtain restored SQL statements, and the method includes: the first server sorts the plurality of database logs according to the size of the log ID to obtain sorted database logs; the first server sequentially analyzes the log records in each database log according to the sequence from first to last, and restores the SQL sentence in each database log to obtain the restored SQL sentence corresponding to each database log; executing the restore SQL statement in the aggregate database, comprising: the first server checks the log IDs, determines whether the sequence of the plurality of log IDs is correct, and obtains a check result; and the first server executes the restored SQL sentence in the aggregation database under the condition that the verification result representation passes verification.
Optionally, the first server sequentially analyzes the log records in each database log according to the order from first to last, restores the SQL statement in each database log, and obtains the restored SQL statement corresponding to each database log, including: the first server acquires the mapping relation between the command of the log record and the SQL command; the first server executes target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of the operation instructions, the operation conditions and the operation data in the log records, determining SQL operation instructions corresponding to the operation instructions in the log records, SQL query conditions corresponding to the operation conditions in the log records and SQL data fields corresponding to the fields of the operation data in the log records according to the mapping relation, and combining the SQL operation instructions, the SQL query conditions and the SQL data fields to obtain the restored SQL statement.
Optionally, the database log further includes a plurality of database transactions, the database transactions being a sequence of operations to access and operate on a database of various data items, the method further comprising: the first server obtains transaction types of the database transactions, wherein the transaction types refer to categories for classifying the properties of the transactions corresponding to the database transactions; the first server executes the restore SQL statement in the aggregate database according to the transaction types of the plurality of database transactions.
Optionally, the first server checks the log IDs to determine whether the order of the plurality of log IDs is correct, to obtain a check result, including: before the first server executes the restore SQL statement in the aggregate database for the nth time, the first server stores the log ID and the table name of the data table of the database log corresponding to the restore SQL statement executed in the aggregate database for the nth time in a message queue; after the first server executes the restore SQL statement in the aggregate database for the nth time, storing an ID of the target source database, the log ID of the database log, and the table name of the data table corresponding to the execution of the restore SQL statement in the aggregate database for the nth time as a second target key value pair in a cache database; before the (n+1) th execution of the restore SQL statement in the aggregate database, the first server obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the message queue and the table name of the data table, and obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the cache database and the table name of the data table; the first server executing the log ID corresponding to the reduced SQL statement N time in the message queue, the log ID corresponding to the reduced SQL statement N time in the cache database being different from the log ID corresponding to the reduced SQL statement N time in the message queue, and the first server determining that the order of the log IDs is incorrect and the verification result is not passed if the table name of the data table corresponding to the reduced SQL statement N time in the message queue is different from the table name of the data table corresponding to the reduced SQL statement N time in the cache database; the first server executes the log ID corresponding to the return SQL statement N time in the message queue, and determines that the order of the log IDs is correct, and determines that the verification result is pass verification, when the table name of the data table corresponding to the return SQL statement N time in the message queue is the same as the table name of the data table corresponding to the return SQL statement N time in the cache database.
Optionally, in the case that the verification result is determined to be not passing the verification, the method further includes: the first server re-checks the log ID, and generates alarm information and sends the alarm information to a target terminal when the re-check time is greater than or equal to a time threshold; and under the condition that the first server receives a preset operation, updating the verification result to pass, generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized in that the sequence of the log ID which is not passed in verification is modified, and the sequence of the log ID is modified to be non-abnormal.
The device herein may be a server, PC, PAD, cell phone, etc.
The present application also provides a computer program product adapted to perform a program initialized with at least the following method steps when executed on a data processing device:
step S201, a first server sequentially selects one from a plurality of source databases as a target source database, and establishes connection with the target source database;
specifically, the plurality of source databases are provided, different source databases correspond to different slicing keys, and when the first server performs data summarization, one source database can be randomly selected from the plurality of source databases to serve as a target source database, so that data in the target source database can be synchronized into the aggregation database subsequently, and the data in the plurality of source databases can be synchronized into the aggregation database.
Step S202, the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database;
specifically, a database log is stored in the target source database, the database log records all the commands of the operations in the target source database, and in order to ensure the accuracy of data playback, the database log stored in the target source database can be directly obtained, so that the database log is analyzed, SQL sentences of the operations in the target source database can be restored to obtain restored SQL sentences, and then the data playback can be performed by adopting the restored SQL sentences.
Step S203, the first server establishes connection with an aggregation database, and executes the SQL reduction statement in the aggregation database to obtain aggregation data;
specifically, under the condition that the first server and the aggregation database are already connected, a corresponding SQL command can be executed in the aggregation database according to the restored SQL statement, so that data in the target source database can be synchronized into the aggregation database, the plurality of source databases perform the same operation, the aggregation database can comprise the data in the plurality of source databases, and the data can be directly queried in the aggregation database without querying the data in the plurality of source databases.
In step S204, when receiving a target query requirement sent by a terminal, the first server queries target data from the aggregated data in the aggregated database according to the target query requirement, and sends the target data to the terminal, where the target query requirement characterizes requirements for querying data from a plurality of source databases.
Specifically, since the data in the multiple source databases are synchronized into the aggregated database, when the query requirement (target query requirement) of the dimension beyond the sharded key is received, the data can be directly queried in the aggregated database without crossing the multiple source databases, and thus, the cost of data query is lower.
Optionally, before the first server obtains the database log stored in the target source database, the method further includes: the first server establishes connection with a cache database, and searches a first target key value pair corresponding to the target source database from the cache database, wherein the first target key value pair is obtained according to the ID of the source database and the lock state of the source database; the first server determines whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair; the first server selecting one of the source databases as the target source database when the lock state in the first target key value pair indicates that the target source database is in a locked state; and updating the lock state of the target source database in the first target key pair to a lock state when the lock state of the target source database in the first target key pair indicates that the target source database is in an unlocked state, wherein a second server does not acquire the database log stored in the target source database when the lock state of the target source database in the first target key pair is in a locked state, and wherein a server which updates the lock state of the target source database first among the first server and the second server is used for analyzing the database log.
Optionally, the database log includes a log ID and a log record, where the log ID is an ID corresponding to a database log generated by operating the target source database, one database log is generated by one operation, the log record stores a command for operating the target source database, the database log is parsed, and SQL statements in the database log are restored to obtain restored SQL statements, and the method includes: the first server sorts the plurality of database logs according to the size of the log ID to obtain sorted database logs; the first server sequentially analyzes the log records in each database log according to the sequence from first to last, and restores the SQL sentence in each database log to obtain the restored SQL sentence corresponding to each database log; executing the restore SQL statement in the aggregate database, comprising: the first server checks the log IDs, determines whether the sequence of the plurality of log IDs is correct, and obtains a check result; and the first server executes the restored SQL sentence in the aggregation database under the condition that the verification result representation passes verification.
Optionally, the first server sequentially analyzes the log records in each database log according to the order from first to last, restores the SQL statement in each database log, and obtains the restored SQL statement corresponding to each database log, including: the first server acquires the mapping relation between the command of the log record and the SQL command; the first server executes target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of the operation instructions, the operation conditions and the operation data in the log records, determining SQL operation instructions corresponding to the operation instructions in the log records, SQL query conditions corresponding to the operation conditions in the log records and SQL data fields corresponding to the fields of the operation data in the log records according to the mapping relation, and combining the SQL operation instructions, the SQL query conditions and the SQL data fields to obtain the restored SQL statement.
Optionally, the database log further includes a plurality of database transactions, the database transactions being a sequence of operations to access and operate on a database of various data items, the method further comprising: the first server obtains transaction types of the database transactions, wherein the transaction types refer to categories for classifying the properties of the transactions corresponding to the database transactions; the first server executes the restore SQL statement in the aggregate database according to the transaction types of the plurality of database transactions.
Optionally, the first server checks the log IDs to determine whether the order of the plurality of log IDs is correct, to obtain a check result, including: before the first server executes the restore SQL statement in the aggregate database for the nth time, the first server stores the log ID and the table name of the data table of the database log corresponding to the restore SQL statement executed in the aggregate database for the nth time in a message queue; after the first server executes the restore SQL statement in the aggregate database for the nth time, storing an ID of the target source database, the log ID of the database log, and the table name of the data table corresponding to the execution of the restore SQL statement in the aggregate database for the nth time as a second target key value pair in a cache database; before the (n+1) th execution of the restore SQL statement in the aggregate database, the first server obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the message queue and the table name of the data table, and obtains the log ID corresponding to the (N) th execution of the restore SQL statement in the cache database and the table name of the data table; the first server executing the log ID corresponding to the reduced SQL statement N time in the message queue, the log ID corresponding to the reduced SQL statement N time in the cache database being different from the log ID corresponding to the reduced SQL statement N time in the message queue, and the first server determining that the order of the log IDs is incorrect and the verification result is not passed if the table name of the data table corresponding to the reduced SQL statement N time in the message queue is different from the table name of the data table corresponding to the reduced SQL statement N time in the cache database; the first server executes the log ID corresponding to the return SQL statement N time in the message queue, and determines that the order of the log IDs is correct, and determines that the verification result is pass verification, when the table name of the data table corresponding to the return SQL statement N time in the message queue is the same as the table name of the data table corresponding to the return SQL statement N time in the cache database.
Optionally, in the case that the verification result is determined to be not passing the verification, the method further includes: the first server re-checks the log ID, and generates alarm information and sends the alarm information to a target terminal when the re-check time is greater than or equal to a time threshold; and under the condition that the first server receives a preset operation, updating the verification result to pass, generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized in that the sequence of the log ID which is not passed in verification is modified, and the sequence of the log ID is modified to be non-abnormal.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
From the above description, it can be seen that the above embodiments of the present application achieve the following technical effects:
1) According to the distributed data summarization method, a first server sequentially selects one of a plurality of source databases as a target source database and establishes connection with the target source database, then the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, then the first server establishes connection with an aggregation database, executes the restored SQL sentences in the aggregation database to obtain aggregation data, and finally the first server inquires target data from the aggregation data of the aggregation database according to the target inquiry requirement under the condition that the first server receives the target inquiry requirement sent by a terminal and sends the target data to the terminal. According to the scheme, the data in the plurality of source databases are summarized into the aggregation database, and when the query requirement of the dimension beyond the slicing key is received, the data can be directly queried in the aggregation database without crossing the plurality of source databases, so that the cost of data query can be reduced. Meanwhile, the method adopts the restored SQL sentences obtained by the database logs in the source database, and further executes the restored SQL sentences in the aggregation database, so that the data in the source database can be synchronized into the aggregation database, and the data synchronization effect is good.
2) The first selecting unit sequentially selects one of the plurality of source databases as a target source database, establishes connection with the target source database, acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, establishes connection with the aggregation database, executes the restored SQL sentences in the aggregation database to obtain aggregation data, and the second processing unit inquires the target data from the aggregation data of the aggregation database according to the target inquiry requirement under the condition that the target inquiry requirement sent by the terminal is received, and sends the target data to the terminal. According to the scheme, the data in the plurality of source databases are summarized into the aggregation database, and when the query requirement of the dimension beyond the slicing key is received, the data can be directly queried in the aggregation database without crossing the plurality of source databases, so that the cost of data query can be reduced. Meanwhile, the method adopts the restored SQL sentences obtained by the database logs in the source database, and further executes the restored SQL sentences in the aggregation database, so that the data in the source database can be synchronized into the aggregation database, and the data synchronization effect is good.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (10)

1. A distributed-based data summarization method, comprising:
the method comprises the steps that a first server sequentially selects one from a plurality of source databases to serve as a target source database, and establishes connection with the target source database;
the first server acquires a database log stored in the target source database, analyzes the database log, restores SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database;
the first server establishes connection with an aggregation database, and executes the reduced SQL statement in the aggregation database to obtain aggregation data;
and under the condition that a target query requirement sent by a terminal is received, the first server queries target data from the aggregated data of the aggregated database according to the target query requirement, and sends the target data to the terminal, wherein the target query requirement characterizes the requirement of requesting to query data from a plurality of source databases.
2. The method of claim 1, wherein prior to the first server obtaining the database log stored in the target source database, the method further comprises:
the first server establishes connection with a cache database, and searches a first target key value pair corresponding to the target source database from the cache database, wherein the first target key value pair is obtained according to the ID of the source database and the lock state of the source database;
the first server determines whether the target source database is in a locking state according to the locking state of the target source database in the first target key value pair;
the first server selects one from a plurality of source databases as the target source database under the condition that the lock state of the first target key value pair represents that the target source database is in a locked state;
and the first server updates the lock state of the target source database in the first target key value pair to a lock state when the lock state of the target source database in the first target key value pair represents that the target source database is in an unlocked state, wherein a second server does not acquire the database log stored in the target source database when the lock state of the target source database in the first target key value pair is in the lock state, and the servers which update the lock state of the target source database in the first server and the second server are used for analyzing the database log.
3. The method of claim 1, wherein the database logs include log IDs corresponding to database logs generated for the target source database operations, one database log generated at a time, and log records storing commands for the target source database operations,
analyzing the database log, and restoring the SQL sentence in the database log to obtain a restored SQL sentence, wherein the method comprises the following steps:
the first server sorts the plurality of database logs according to the size of the log ID to obtain sorted database logs;
the first server sequentially analyzes the log records in each database log according to the sequence from first to last, and restores the SQL sentences in each database log to obtain the restored SQL sentences corresponding to each database log;
executing the restored SQL statement in the aggregate database, comprising:
the first server checks the log IDs, determines whether the sequence of the log IDs is correct, and obtains a check result;
and the first server executes the restored SQL statement in the aggregation database under the condition that the verification result representation passes verification.
4. The method of claim 3, wherein the first server sequentially parses the log records in each database log in order from first to last, restores the SQL statement in each database log, and obtains the restored SQL statement corresponding to each database log, including:
the first server obtains the mapping relation between the command of the log record and the SQL command;
the first server executes target logic on a plurality of database logs in sequence from first to last, wherein the target logic is as follows: and extracting fields of an operation instruction, an operation condition and operation data in the log record, determining an SQL operation instruction corresponding to the operation instruction in the log record, an SQL query condition corresponding to the operation condition in the log record and an SQL data field corresponding to the field of the operation data in the log record according to the mapping relation, and combining the SQL operation instruction, the SQL query condition and the SQL data field to obtain the return SQL statement.
5. The method of claim 3, wherein the database log further comprises a plurality of database transactions, the database transactions being a sequence of operations to access and operate on one database of various data items, the method further comprising:
The first server obtains transaction types of all the database transactions, wherein the transaction types refer to categories for classifying the properties of the transactions corresponding to the database transactions;
the first server executes the restored SQL statement in the aggregated database according to the transaction types of a plurality of database transactions.
6. The method of claim 3, wherein the first server checking the log IDs to determine whether the order of the plurality of log IDs is correct, and obtaining a checking result, comprises:
before executing the restore SQL statement in the aggregation database for the nth time, the first server stores the log ID of the database log corresponding to the restore SQL statement executed in the aggregation database for the nth time and the table name of the data table in a message queue;
after executing the restore SQL statement in the aggregation database for the nth time, the first server stores the ID of the target source database corresponding to the restore SQL statement executed in the aggregation database for the nth time, the log ID of the database log and the table name of the data table as a second target key value pair in a cache database;
Before executing the restore SQL statement in the aggregation database for the (n+1) th time, the first server acquires the log ID corresponding to the restore SQL statement executed for the (N) th time in the message queue and the table name of the data table, and acquires the log ID corresponding to the restore SQL statement executed for the (N) th time in the cache database and the table name of the data table;
the first server executes the log ID corresponding to the restored SQL statement for the nth time in the message queue, is different from the log ID corresponding to the restored SQL statement for the nth time in the cache database, and determines that the sequence of a plurality of log IDs is incorrect and the verification result is not passed under the condition that the table name of the data table corresponding to the restored SQL statement for the nth time in the message queue is different from the table name of the data table corresponding to the restored SQL statement for the nth time in the cache database;
the first server executes the log ID corresponding to the restoring SQL statement for the nth time in the message queue, the log ID corresponding to the restoring SQL statement for the nth time in the cache database is the same as the log ID corresponding to the restoring SQL statement, the table name of the data table corresponding to the restoring SQL statement for the nth time in the message queue is the same as the table name of the data table corresponding to the restoring SQL statement for the nth time in the cache database, the order of a plurality of log IDs is determined to be correct, and the verification result is determined to pass verification.
7. The method of claim 6, wherein in the event that the verification result is determined to not pass the verification, the method further comprises:
the first server re-checks the log ID, generates alarm information and sends the alarm information to a target terminal when the re-check time is greater than or equal to a time threshold;
and under the condition that a preset operation is received by the first server, updating the verification result to be passed, generating an operation log and storing the operation log in the aggregation database, wherein the preset operation is characterized in that the sequence of the log ID which is not passed by verification is modified, and the sequence of the log ID is modified to be non-abnormal.
8. A first server, comprising:
the first selecting unit is used for selecting one from the plurality of source databases as a target source database and establishing connection with the target source database;
the first processing unit is used for acquiring a database log stored in the target source database, analyzing the database log, and restoring SQL sentences in the database log to obtain restored SQL sentences, wherein the database log is at least used for storing commands of operations in the target source database;
The first execution unit establishes connection with the aggregation database, and executes the reduced SQL statement in the aggregation database to obtain aggregation data;
and the second processing unit is used for inquiring target data from the aggregated data of the aggregated database according to the target inquiry requirement under the condition that the target inquiry requirement sent by the terminal is received, and sending the target data to the terminal, wherein the target inquiry requirement characterizes the requirement of requesting to inquire data from a plurality of source databases.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program, when run, controls a device in which the computer readable storage medium is located to perform the method of any one of claims 1 to 7.
10. An electronic device, comprising: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-7.
CN202211711972.2A 2022-12-29 2022-12-29 Data summarizing method based on distribution, first server and electronic equipment Pending CN116186082A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211711972.2A CN116186082A (en) 2022-12-29 2022-12-29 Data summarizing method based on distribution, first server and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211711972.2A CN116186082A (en) 2022-12-29 2022-12-29 Data summarizing method based on distribution, first server and electronic equipment

Publications (1)

Publication Number Publication Date
CN116186082A true CN116186082A (en) 2023-05-30

Family

ID=86437596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211711972.2A Pending CN116186082A (en) 2022-12-29 2022-12-29 Data summarizing method based on distribution, first server and electronic equipment

Country Status (1)

Country Link
CN (1) CN116186082A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555874A (en) * 2024-01-11 2024-02-13 成都大成均图科技有限公司 Log storage method, device, equipment and medium of distributed database

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555874A (en) * 2024-01-11 2024-02-13 成都大成均图科技有限公司 Log storage method, device, equipment and medium of distributed database
CN117555874B (en) * 2024-01-11 2024-03-29 成都大成均图科技有限公司 Log storage method, device, equipment and medium of distributed database

Similar Documents

Publication Publication Date Title
US11921746B2 (en) Data replication method and apparatus, computer device, and storage medium
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
CN110209726B (en) Distributed database cluster system, data synchronization method and storage medium
CN110321387B (en) Data synchronization method, equipment and terminal equipment
US9589041B2 (en) Client and server integration for replicating data
CN107515874B (en) Method and equipment for synchronizing incremental data in distributed non-relational database
CN109298978B (en) Recovery method and system for database cluster of specified position
CN108228322B (en) Distributed link tracking and analyzing method, server and global scheduler
CN104584524A (en) Aggregating data in a mediation system
CN111414403A (en) Data access method and device and data storage method and device
CN111680017A (en) Data synchronization method and device
CN113076304A (en) Distributed version management method, device and system
CN112527801A (en) Data synchronization method and system between relational database and big data system
CN116186082A (en) Data summarizing method based on distribution, first server and electronic equipment
CN112363838A (en) Data processing method and device, storage medium and electronic device
CN115033646B (en) Method for constructing real-time warehouse system based on Flink and Doris
CN111522870A (en) Database access method, middleware and readable storage medium
CN107590199B (en) Memory-oriented multithreading database design method
CN108023920B (en) Data packet transmission method, equipment and application interface
CN115080666A (en) Data synchronization method, system, electronic device and storage medium
CN111966650B (en) Operation and maintenance big data sharing data table processing method and device and storage medium
US20240004902A1 (en) Performance of row table to columnar table replication
CN111143280B (en) Data scheduling method, system, device and storage medium
CN114880336A (en) Database access method, device, equipment and storage medium
CN116266185A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination