CN117992240B - Data processing method, device, computer equipment and storage medium - Google Patents

Data processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117992240B
CN117992240B CN202410398004.3A CN202410398004A CN117992240B CN 117992240 B CN117992240 B CN 117992240B CN 202410398004 A CN202410398004 A CN 202410398004A CN 117992240 B CN117992240 B CN 117992240B
Authority
CN
China
Prior art keywords
log
playback
thread
allocation scheme
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410398004.3A
Other languages
Chinese (zh)
Other versions
CN117992240A (en
Inventor
王云龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Primitive Data Beijing Information Technology Co ltd
Original Assignee
Primitive Data Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Primitive Data Beijing Information Technology Co ltd filed Critical Primitive Data Beijing Information Technology Co ltd
Priority to CN202410398004.3A priority Critical patent/CN117992240B/en
Publication of CN117992240A publication Critical patent/CN117992240A/en
Application granted granted Critical
Publication of CN117992240B publication Critical patent/CN117992240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a data processing method, a data processing device, computer equipment and a storage medium, and belongs to the technical field of data processing. Acquiring a plurality of log tables to be synchronized, and sampling the plurality of logs according to a preset sampling period to determine target logs and the number of the logs contained in each log table; based on the log quantity of each log table, carrying out load balancing simulation distribution on the plurality of log tables among a plurality of playback threads to obtain candidate distribution schemes; reading preset allocation schemes aiming at a plurality of log tables to be synchronized, and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes; selecting a target allocation scheme from the candidate allocation scheme and a preset allocation scheme according to the difference degree; and according to the target allocation scheme, allocating the target log in each log table to a corresponding playback thread for playback. Therefore, load balance of the system can be realized, and the performance effect of the system is improved.

Description

Data processing method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, a data processing device, a computer device, and a storage medium.
Background
When the database is in a master-slave high availability mode, the slave can synchronize data by receiving and playing back the log sent by the master. The standby machine generally has a distributing thread and a plurality of playback threads, wherein the distributing thread is responsible for reading log information from the table file one by one and then distributing the log information to the playback threads according to a distributing rule so that the playback threads execute the log information.
In the related art, a random number generator may be generally used to generate random playback thread indexes, and then a table file is allocated to the playback threads of the corresponding indexes. However, there is a problem with such a distribution scheme: when a plurality of table files with larger load exist, if the indexes of the generated playback threads are the same, the table files are required to be distributed to the same playback thread according to the indexes, so that one playback thread is overloaded, and other playback threads are in an idle state, the problem of unbalanced system load is generated, and the performance of the system is reduced.
Disclosure of Invention
The embodiment of the application mainly aims to provide a data processing method, a data processing device, computer equipment and a storage medium, which can realize load balancing of a system and improve performance effects of the system.
To achieve the above object, a first aspect of an embodiment of the present application provides a data processing method, including:
Acquiring a plurality of log tables to be synchronized, and sampling the plurality of logs according to a preset sampling period to determine target logs and the number of the logs contained in each log table;
Based on the log quantity of each log table, carrying out load balancing simulation distribution on a plurality of log tables among a plurality of playback threads to obtain candidate distribution schemes;
Reading preset allocation schemes aiming at the plurality of log tables to be synchronized, and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes;
selecting a target allocation scheme from the candidate allocation scheme and the preset allocation scheme according to the difference degree;
and distributing the target logs in each log table to the corresponding playback thread for playback according to the target distribution scheme.
Accordingly, a second aspect of an embodiment of the present application proposes a data processing apparatus, the apparatus comprising:
the sampling module is used for acquiring a plurality of log tables to be synchronized, and sampling the logs according to a preset sampling period to determine target logs and the number of the logs contained in each log table;
the distribution module is used for carrying out load balancing simulation distribution on a plurality of log tables among a plurality of playback threads based on the log quantity of each log table to obtain candidate distribution schemes;
the determining module is used for reading preset allocation schemes aiming at the plurality of log tables to be synchronized and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes;
the selecting module is used for selecting a target distribution scheme from the candidate distribution scheme and the preset distribution scheme according to the difference degree;
And the playback module is used for distributing the target log in each log table to the corresponding playback thread for playback according to the target distribution scheme.
In some embodiments, the allocation module is further to:
Predicting a predicted load amount of each log table when playback is performed in the playback thread based on the number of target logs contained in each log table;
And carrying out load balancing simulation distribution on a plurality of log tables in the playback threads of the thread number according to the predicted load capacity occupied by each log table to obtain candidate distribution schemes.
In some embodiments, the allocation module is further to:
Sequencing all the log tables according to the predicted load capacity occupied by each log table to obtain sequencing results;
determining the number of threads of the available playback threads;
When the number of threads is smaller than the number corresponding to the plurality of log tables, according to the sequencing result, carrying out load balancing grouping on the plurality of log tables in the playback threads of the number of threads to obtain log table grouping corresponding to the number of threads;
and obtaining candidate allocation schemes according to each playback thread and the log table group corresponding to the playback thread.
In some embodiments, the allocation module is further to:
determining the corresponding allocation sequence of each log table according to the sequencing result;
selecting a target playback thread with the largest current available load from the log list for distribution according to the real-time available load of each playback thread for the log list according to the available load of each playback thread, and updating the available load of the target playback thread after the distribution is finished;
and when each log table in the sequencing result is distributed to the playback thread according to the corresponding distribution sequence, acquiring log table groups corresponding to the thread number.
In some embodiments, the determining module is further configured to:
comparing the candidate allocation scheme with the preset allocation scheme to obtain a first comparison result;
When the first comparison result indicates that the candidate allocation scheme is inconsistent with the preset allocation scheme, predicting the predicted load capacity of each log table when playback is executed in the playback thread based on the number of target logs contained in each log table;
calculating a first load standard deviation corresponding to the preset allocation scheme and a second load standard deviation of the candidate allocation scheme based on the predicted load capacity of each log table;
And determining the difference degree of the preset distribution scheme relative to the candidate distribution scheme according to the first load standard deviation and the second load standard deviation.
In some embodiments, the selecting module is further configured to:
acquiring a difference threshold;
Comparing the degree of difference to the degree of difference threshold;
And when the difference degree is larger than the difference degree threshold value, taking the candidate allocation scheme as a target allocation scheme.
In some embodiments, the determining module is further configured to:
predicting a first execution duration required for playing back target logs corresponding to a plurality of log tables according to the mapping relation between the log tables and a plurality of playback threads in the preset allocation scheme;
Predicting second execution time length of the plurality of playback threads when playback is completed on target logs corresponding to the plurality of log tables according to the mapping relation between the plurality of log tables and the plurality of playback threads in the candidate allocation scheme;
and determining a time difference between the first execution time length and the second execution time length, and determining the degree of difference of the preset allocation scheme relative to the candidate allocation scheme according to the time difference.
In some embodiments, the data processing apparatus further comprises an alignment module for:
When the candidate allocation scheme is used as a target allocation scheme, determining a first thread number of a playback thread mapped correspondingly to each log table according to the target allocation scheme;
Determining a historical thread number allocated to each log table at the last moment;
Comparing the first thread number with the historical thread number to obtain a second comparison result;
When the second comparison result indicates that the first thread number is inconsistent with the history thread number, continuously executing a target log obtained by sampling the log table at the last moment in a history playback thread corresponding to the history thread number;
And for each log table, after the history playback thread finishes executing the target log obtained by sampling the log table at the last moment, executing the step of distributing the target log in each log table to the corresponding playback thread for playback according to the target distribution scheme.
In some embodiments, a third aspect of the embodiments of the present application proposes a computer device, the computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the data processing method according to any of the embodiments of the first aspect of the present application when the computer program is executed.
In some implementations, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements a data processing method according to any of the embodiments of the first aspect of the present application.
According to the embodiment of the application, the logs to be synchronized are acquired, and the logs are sampled according to the preset sampling period, so that the log table corresponding to each log and the target log and the number of the logs contained in each log table are determined; based on the log quantity of each log table, carrying out load balancing simulation distribution on the plurality of log tables among a plurality of playback threads to obtain candidate distribution schemes; reading preset allocation schemes aiming at a plurality of log tables to be synchronized, and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes; selecting a target allocation scheme from the candidate allocation scheme and a preset allocation scheme according to the difference degree; and according to the target allocation scheme, allocating the target log in each log table to a corresponding playback thread for playback. Therefore, the load condition occupied by each log table can be determined through the log quantity obtained by sampling the log tables, so that the load balance simulation distribution is carried out among the playback threads according to the log quantity in each log table, the problem of unbalanced load is avoided, and the overall performance of the system is improved. And, the target allocation scheme is selected from the candidate allocation scheme and the preset allocation scheme through the degree of difference, so that the system resources can be fully utilized, and the load balancing can be realized. In summary, the application can realize the load balance of the system and improve the performance effect of the system.
Drawings
FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;
FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
Fig. 4 is a schematic hardware structure of a computer device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
When the database is in a master-slave high availability mode, the slave can synchronize data by receiving and playing back the log sent by the master. The standby machine generally has a distributing thread and a plurality of playback threads, wherein the distributing thread is responsible for reading log information from the table file one by one and then distributing the log information to the playback threads according to a distributing algorithm so that the playback threads execute the log information.
When the database is in a master-slave high availability mode, the slave can synchronize data by receiving and playing back the log sent by the master. The standby machine generally has a distributing thread and a plurality of playback threads, wherein the distributing thread is responsible for reading log information from the table file one by one and then distributing the log information to the playback threads according to a distributing rule so that the playback threads execute the log information.
In the related art, a random number generator may be used to generate random playback thread indexes, and then the table files are allocated to the playback threads corresponding to the indexes, or classified and then allocated to the fixed playback threads for playback. However, there is a problem with such a distribution scheme: when a plurality of table files with larger load exist, if indexes of the generated playback threads are the same or the plurality of table files belong to the same category, the table files with the same indexes or the table files with the same category are distributed to the same playback thread, so that one playback thread is overloaded, and other playback threads are in an idle state, the problem of unbalanced system load is generated, and the performance of the system is reduced.
Based on the above, the embodiment of the application provides a data processing method, a data processing device, computer equipment and a storage medium, which can realize load balancing of a system and improve performance effects of the system.
The data processing method, apparatus, computer device and storage medium provided in the embodiments of the present application are specifically described by the following embodiments, and first, a data processing system in the embodiments of the present application is described.
Referring to FIG. 1, in some embodiments, a data processing system may include a server side 11, a controller 12, and a terminal 13, which cooperate to achieve efficient playback of a target log on a standby machine.
In particular, the server side 11 may provide the necessary computing and storage resources to perform the various steps in the data processing method. In the implementation of backup log playback, the server side 11 may include multiple physical servers, virtual machines, or containerized environments that ensure high availability and failover capabilities of the system through a primary and backup high availability architecture. The server 11 receives the task instruction from the controller 12, and performs processing operations such as data sampling, load balancing, distribution scheme calculation, log playback, and the like. In some embodiments, the server 11 may also contain a storage device to store log tables and other related data to be synchronized to support normal, efficient playback of the log by the backup machine.
By way of example, the controller 12 may be a central coordinator of the overall system, responsible for managing, monitoring and controlling the processing of the logs. The controller 12 may be a dedicated control device for receiving and processing instructions and data from the terminal 13 and the server side 11. The controller 12, the terminal 13 and the server 11 are communicated through network connection, so that the safety and efficiency of standby log playback are improved.
Further, the terminal 13 may provide a user interface and an operation interface so that a user may configure and monitor the operation state of the data processing system. The terminal 13 may be a computer, a mobile device or a dedicated console, and the user may use the terminal 13 to send instructions and configuration parameters to the controller 12, and may also receive and display system status, log information and processing results. During target log playback by the standby machine, the terminal 13 allows the user to configure specific log information processing parameters and target allocation schemes to optimize the log playback efficiency.
In some embodiments, the user may interact with the controller 12 by using the terminal 13 to configure parameters and target allocation schemes for log information processing, and the controller 12 receives the user's instructions and configuration parameters and sends task instructions to the server 11. And then, the server 11 distributes the target log in each log table to the corresponding playback thread for playback according to the target distribution scheme. The terminal 13 can query the controller 12 for the task state and the processing result at any time, and the controller 12 returns the result to the terminal 13 for display or storage.
The data processing method in the embodiment of the present application can be described by the following embodiment.
In the embodiments of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first. Moreover, the collection, use, processing, etc. of such data would comply with relevant laws and regulations. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through popup or jump to a confirmation page and the like, and after the independent permission or independent consent of the user is definitely acquired, the necessary relevant data of the user for enabling the embodiment of the application to normally operate is acquired.
In embodiments of the present application, the dimensions of a data processing system, which may be integrated in a computer device in particular, will be described. Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a data processing method according to an embodiment of the present application, where, in an example where a processing system is specifically integrated on a terminal or a server, a specific flow is as follows when a processor on the terminal or the server executes a program instruction corresponding to the data processing method:
Step 101, obtaining a plurality of log tables to be synchronized, and sampling the plurality of logs according to a preset sampling period to determine a target log and the number of logs contained in each log table.
It can be understood that, in order to improve the processing efficiency of the log, the synchronization process of the data is accelerated, the log sent by the host computer can be sampled by reasonably setting a preset sampling period, and a corresponding log table is determined for a target log obtained by sampling, so that the number of the log in each log table is determined, and the candidate allocation scheme can be conveniently and rapidly formulated based on the number of the log in each log table.
The log table may be a structured data object that organizes and stores tables of specific operations or events, for example, the log table may be a table for logging system, application, or database operations. For databases, log tables may be used to record change histories of data, including additions, modifications, deletions, etc., to track and audit changes to the data. In software development, the log table can be used for recording various events and error information when the system runs, and helps developers to conduct fault detection and performance optimization.
The log is a file of the database for recording the occurred operation or event, such as inserting, updating, deleting data, etc., and the data can be recovered through the log in the standby machine, so as to maintain the consistency of the data. By way of example, the log may be a Write-ahead log (WAL), write-Ahead Logging.
Wherein the preset sampling period may be at least one of a sampling frequency and a sampling period. Specifically, the sampling frequency may be the number of times the log is sampled in a unit time, and is used for measuring the sampling rate, and the sampling frequency may be determined based on the generation period of each log; and the sampling period may be the time required to complete a sample of the log being generated or the amount of data covered. For example, when the preset sampling period includes a sampling frequency and a sampling period, the sampling frequency may be 20 samples of the log per second, and the sampling period may be 1G, that is, in one data processing round, the log may be sampled 20 times per second until the data amount of the log covered by the samples is 1G.
In some embodiments, each preset sampling period may be considered a sampling round, and the logs within the sampling round may be used to determine the target allocation scheme. It can be appreciated that the preset sampling period can be adjusted according to the actual situation, that is, the sampling frequency and the sampling period can be adjusted according to the actual situation.
In some embodiments, the preset sampling period may be determined according to the frequency of log generation, for example, when log update is frequent, the sampling frequency of the shorter preset sampling period may be set so as to capture the change of log in time. In some embodiments, the sampling frequency of the different stage logs may be adjusted based on the number changes by analyzing the historical log data, predicting the number changes of the logs. In some embodiments, the sampling frequency of the preset sampling period may be adaptively adjusted according to the dynamic change speed of the log generation by an adaptive algorithm, for example, the current speed may be used as a reference, when the log generation speed is detected to be faster than the current speed, the adaptive algorithm may shorten the sampling frequency, and when the log generation speed is detected to be slower than the current speed, the sampling frequency is prolonged. The specific preset sampling frequency can be determined according to the actual situation, and the embodiment of the application is not particularly limited as long as the set sampling frequency is ensured to be capable of sampling each generated log and not to repeatedly sample each log.
The target log may be a log obtained by sampling the log with a preset sampling period, and each pair of logs is sampled once to obtain a target log. The target log may be all logs in a preset sampling period, for example, when the preset sampling period is taken as a sampling loop in 1G, the target log may be all logs in 1G, for example, 500 logs in 1G, and then all 500 logs are target logs.
The number of the logs is the number of the target logs. The number of logs may be the number of target logs obtained when the logs are sampled in a preset sampling period, that is, the number of times of sampling the logs, for example, when the preset sampling period is taken as a sampling return by 1G, there are 500 logs in 1G, and 500 total times of sampling are taken, and then all the 500 logs are target logs, and 500 is the number of logs.
Specifically, the log can be sampled according to a certain sampling frequency, and the sampled log is distributed to the corresponding playback thread until the sampled log amount reaches the sampling period of the preset sampling period. For example, when the preset sampling period includes a sampling frequency and a sampling period, the log is sampled every 5 seconds, and when the sampling frequency is 1G, the log is sampled every 5 seconds until the 1G log is sampled, and the log obtained by each sampling is taken as a target log, and the number of the target logs is taken as the log number. Therefore, the load capacity of the logs in each log table can be quantified through the number of the logs, the processing speed of the data is improved, and the logs can be distributed more accurately later.
Further, a log table corresponding to the target log, that is, a log table to which the target log belongs, may be determined. Specifically, when the target log is a pre-written log, the type of the log table may be spcNode, dbNode, relNode or the like. In the database, the log contains key metadata, which can be used for identifying the file information of the log table to which the pre-written log belongs, for example, by analyzing the metadata information in the pre-written log, the values spcNode, dbNode and relNode can be extracted, and then the directory structure of the database is combined to determine the log table to which the pre-written log belongs.
It can be understood that, because the log distribution is relatively uniform and stable, the logs are sampled regularly according to the preset sampling period to obtain the target logs, and the log loading capacity of each log table in the preset sampling period can be predicted according to the number of the target logs in each log table. For example, according to a preset sampling period, 100 target logs can be obtained by sampling, wherein 40 target logs belong to the log table 1, 10 target logs belong to the log table 2, 20 target logs belong to the log table 3, and 30 target logs belong to the log table 4, and the load of the target logs in each log table can be quantified through the number of the logs because the logs are uniformly sampled, so that the maximum load of the log table 1 and the minimum load of the log table 2 are judged. In some embodiments, a log load reference value may also be selected, where the log load reference value is used to characterize the load of a target log in a general case. According to each log table, the predicted load of each log table when the log tables are played back in the playback thread of the standby machine can be predicted by multiplying the log load reference value by the log number of the log tables, so that the load of the log tables can be rapidly predicted, and the computing power resources are greatly saved.
By the method, the plurality of target logs are sampled according to the preset sampling period, the target logs and the number of the logs contained in each log table can be determined, that is, the predicted load amount when each playback thread processes the target logs in the log table in the preset sampling period can be predicted, the load amounts of all the target logs do not need to be calculated one by one, and therefore calculation resources are greatly saved, and the candidate allocation scheme can be obtained conveniently and rapidly.
And 102, carrying out load balancing simulation distribution on the plurality of log tables among the plurality of playback threads based on the log quantity of each log table to obtain candidate distribution schemes.
It will be appreciated that in one sampling round, when one log table has more logs than the other log tables, this means that writing and storing all target logs in that log table consumes more system resources and bandwidth, and thus the load of that log table is also greater than the other log tables. Therefore, according to the number of the logs contained in each log table, the log table is taken as an allocation unit, and the target logs in each log table are allocated to each playback thread in a simulation mode, so that a candidate allocation scheme is obtained, load balancing of each thread is achieved, and the performance of the system is improved.
The playback thread can be a thread for executing log playback operation, and the playback thread is responsible for executing the playback operation of the target log in the standby machine. Specifically, in the database in the high availability mode of the master and the slave, the slave reads the logs one by one through a distributing thread, determines a log table to which each log belongs, distributes each target log into a playback thread by taking the log table as a unit, and plays back the target logs by the playback thread in the slave.
The candidate allocation scheme may be a scheme for determining a target log in a log table that each playback thread needs to play back. By analyzing the log table to which the target log belongs, the predicted load of each log table in a preset sampling period can be deduced, so that each log table is distributed to each playback thread in a simulation manner as much as possible, and a candidate distribution scheme is obtained.
It can be understood that when the number of threads is greater than the number corresponding to the plurality of log tables, the log tables can be directly allocated to each playback thread, that is, one log table is correspondingly allocated to one playback thread, so that each playback thread processes a target log in one log table, thereby realizing load balancing.
In some embodiments, the number of log table packets equal to the number of playback threads may be determined based on the number of playback threads, and the log tables are grouped according to the number of packets by taking the minimum difference between the maximum total predicted load and the minimum total predicted load in all the log table packets as an allocation principle, so as to obtain log table packets corresponding to the number of threads, where the total predicted load of each playback thread is the sum of the predicted loads occupied by one or more log tables in the corresponding log table packets, and then, according to the log table packets corresponding to each playback thread, a candidate allocation scheme is generated. Therefore, the difference between the total predicted load amounts of each playback thread can be minimized, the load of each playback thread is effectively balanced, and the load balancing of the standby machine when playing back the target log is realized.
For example, when the number of log tables is 5, respectively denoted as log table a, log table B, log table C, log table D, log table E, the number of loop threads is 3, respectively denoted as thread 1, thread 2, thread 3, the sample list generated by the log table and the target log is ((log table a, 100), (log table B, 60), (log table C, 30), (log table D, 10), (log table E, 10)), wherein the first bit in the bracket is the log table, and the second bit is the sample number of the target log corresponding to the log table, that is, the number of the target log. Then, it may be determined that the log table group is 3 groups, and the following group may be obtained with the smallest difference between the largest total predicted load and the smallest total predicted load as the allocation rule: the log table A is a log table group, the log table B is a log table group, the log table C, the log table D and the log table E are a log table group, and a candidate allocation scheme, namely thread 1, is generated according to the log table group and the corresponding playback thread: a log table A; thread 2: a log table B; thread 3: log table C, log table D, log table E.
In some embodiments, the predicted load capacity of the playback thread when processing the log table can be predicted according to the number of logs in each log table, the log table corresponding to the maximum predicted load capacity is distributed to the playback thread with the largest idle processing load, then the idle processing load of the playback thread with the log table distributed is updated, and the above process is repeated, so that the playback thread with more idle processing loads processes more log tables, and load balancing of each playback thread is realized.
It can be understood that the number of the logs contained in each log table can reflect the load of each log table when playback is executed in the playback thread, so that the log tables are distributed according to the number of the logs, and the log tables are distributed according to the load of the target logs in each log table, so that a certain playback thread can be prevented from processing too many logs, the burden of a single playback thread is reduced, and high-concurrency log playback requests can be responded better through load balancing.
In some embodiments, in order to implement load balancing of the playback threads, the log table may be allocated to the corresponding playback thread according to the number of logs in each log table, so that each playback thread in the standby machine can play back the target log in the log table. The log tables with large load capacity can be divided into one of the playback threads to be processed independently, and the log tables with small load capacity can be divided into the same playback thread to be processed, so that the situation that the log tables with large load capacity are distributed into the same playback thread to be processed is avoided, and load balancing is achieved. For example, step 102 may include:
(102.1) predicting a predicted load amount of each log table when playback is performed in the playback thread based on the number of target logs contained in each log table;
And (102.2) carrying out load balancing simulation distribution on the plurality of log tables in the playback threads of the number of threads according to the predicted load capacity occupied by each log table to obtain candidate distribution schemes.
The predicted load may be obtained by predicting the load required by each log table when playback is performed in the playback thread, that is, the amount of resources occupied by each log table when playback is performed in the playback thread. It can be understood that the processing logic of each target log is relatively consistent, the characteristic attributes are similar, and the occupied load amounts are similar, so that the size relation of the predicted load amounts in each log table can be directly determined based on the log amounts of the target logs obtained by sampling.
In some embodiments, a reference load amount may be selected for all target logs, and for each log table, the number of target logs and the log size contained in each log table are taken as factors, and the predicted load amount when each log table performs playback in the playback thread is obtained by multiplying the number of logs by the reference load amount. For example, the reference load amount may be set to 1KB, 2KB, etc., specifically adjusted according to the actual situation.
In some implementations, the predicted load may also be determined based on the number of logs, the log size, and the log processing complexity. For example, the complexity of log processing may be an input/output operation, and the predicted load of the corresponding log table when playback is performed in the playback thread is comprehensively calculated.
In some embodiments, after the predicted load amount of each log table is predicted, the log table may be used as an allocation unit to perform balanced allocation in all playback threads. For example, there are 2 playback threads and 3 log tables, where the predicted load of log table 1 is 100, the predicted load of log table 2 is 10, and the predicted load of log table 3 is 40, then log table 1 may be allocated to one playback thread, and log table 2 and log table 3 may be allocated to another playback thread, so as to obtain a candidate allocation scheme, so as to achieve efficient use of resources and optimization of system performance.
In some embodiments, in order to improve the allocation efficiency, the log tables may be ordered according to the predicted load amount occupied by each log table, so that the log table with the large occupied predicted load amount is allocated first, and then the log table with the small predicted load amount is allocated, so as to ensure that the load amount handled by each playback thread is as close as possible, and improve the efficiency of load balancing, for example, 102.2 may include:
(102.2.1) sorting all log tables according to the predicted load amount occupied by each log table to obtain a sorting result;
(102.2.2) determining the number of threads of the available playback threads;
(102.2.3) when the number of threads is smaller than the number corresponding to the plurality of log tables, according to the sequencing result, carrying out load balancing grouping on the plurality of log tables in the playback threads of the number of threads to obtain log table groups corresponding to the number of threads;
(102.2.4) obtaining candidate allocation schemes according to each playback thread and the log table group corresponding to the playback thread.
The sorting result may be a list or order of sorting all the log tables according to the predicted load amount occupied by each log table, where the list or order is obtained according to the predicted load amount. In particular, the log tables may be sorted in reverse order or sequentially according to the predicted load, and in general, when the log tables are sorted in reverse order according to the predicted load, the first log table may be sequentially allocated to the corresponding playback thread.
Wherein the number of threads may be the number of playback threads that determine availability, i.e. the number of playback threads that can be used for parallel playback operations.
The log table grouping may be a grouping obtained by dividing a plurality of log tables into playback threads with the number of threads according to the sorting result and performing load balancing distribution. After grouping the log tables, the playback thread may be facilitated to play back log data within the log tables according to the corresponding log table grouping. It will be appreciated that after log table grouping, the predicted load between log table groupings should be as balanced as possible to maximize the load placed on each playback thread.
The candidate allocation scheme may be an allocation scheme obtained according to each playback thread and a log table packet corresponding to the playback thread. The candidate allocation scheme describes the specific log table packets that each playback thread needs to process to direct the operation and task allocation of the playback thread.
By way of example, assume that there are four log tables corresponding to predicted loadings: log table a:100; log table B:150; log table C:120; log table D:80;
Firstly, sorting all log tables according to the predicted load amount occupied by each log table to obtain the following sorting result: log table B: 150. log table C: 120. log table a: 100. log table D:80. assuming that the number of available playback threads is 2, then the number of log table packets is also 2.
Secondly, according to the sequencing result, carrying out load balancing grouping on a plurality of log tables in the playback threads of the number of threads to obtain log table groups corresponding to the number of threads: thread 1: log table D (80) and log table B (150), the total predicted load being 230; thread 2: log table a (100) and log table C (120), the total predicted load is 220.
Finally, according to each playback thread and the log table group corresponding to the playback thread, obtaining a candidate allocation scheme: thread 1: log table D and log table B; thread 2: log table a and log table C.
It can be understood that different log tables can be distributed to different playback threads by taking the log tables as a distribution unit, and the log tables are distributed to corresponding playback threads by taking the minimum difference of the total predicted load capacity among the playback threads as a distribution principle, so that the load balance of the system is ensured, and the performance and the stability of the system are improved.
In some embodiments, in order to perform load balancing grouping, the log table may be preferentially allocated according to the amount of the available load in each playback thread, so that the log table with the largest predicted load is preferentially allocated to the playback thread with the largest available load, so as to effectively implement load balancing allocation, for example, "according to the sequencing result, load balancing the plurality of log tables in the playback threads with the number of threads, to obtain the log table group corresponding to the number of threads" in (102.2.3), may include:
(102.2.3.1) determining the corresponding allocation sequence of each log table according to the sequencing result;
(102.2.3.2) selecting a target playback thread with the largest current available load from the log list for distribution according to the real-time available load of each playback thread for the log list according to the available load of each playback thread, and updating the available load of the target playback thread after the distribution is finished;
(102.2.3.3) when each log table in the sequencing result is distributed to the playback threads according to the corresponding distribution sequence, acquiring log table groups corresponding to the number of threads.
The allocation order may be a priority order of each log table in the allocation process determined according to the sorting result. Illustratively, in the reverse order sorting, the first log table, i.e., the log table with the largest predicted load, is sorted with the highest priority.
The available load capacity can be the load capacity of the target log in the new log table which is currently available to each playback thread, and the log in the log table with the largest predicted load value can be distributed to the playback thread with the largest available load capacity, so that load balancing is realized.
The target playback thread may be a playback thread selected as a target for current log table allocation, and after the corresponding log table is allocated to the target playback thread, the available load capacity of the target playback thread may be updated so as to facilitate subsequent allocation according to the available load capacities of the target playback thread and other playback threads.
For example, it is assumed that the target log sampled according to the preset sampling period corresponds to the following log table and the corresponding predicted load amount: log table a: 5. log table B: 8. log table C: 3. log table D:4, according to the sorting result, the corresponding allocation order of each log table can be determined. Sequencing from large to small according to the load capacity, and obtaining the distribution sequence as follows: log table B, log table a, log table D, log table C. If there are playback thread 1, playback thread 2, and playback thread 3, when the initial available load of each thread is 20, the allocation process is as follows:
Firstly, determining the playback thread with the largest available load, wherein the available loads in the 3 playback threads are the same at the moment, so that one playback thread can be randomly selected for allocation, and the available load of the playback thread 1 after allocation is 12 on the assumption that the log table B is allocated to the playback thread 1 according to the allocation sequence.
Thereafter, the playback thread with the largest available load is repeatedly determined, at this time, the available loads of the playback thread 2 and the playback thread 3 are the same and are both greater than the playback thread 1, so that one playback thread can be randomly selected for allocation, for example, the playback thread 2 is selected for allocation, and after the log table a is allocated to the playback thread 2, the available load of the playback thread 2 is updated to be 15.
Further, the playback thread with the largest available load is repeatedly determined, and at this time, the available load of the playback thread 3 is the largest, and therefore, after the log table D is allocated to the playback thread 3, the available load of the playback thread 3 is updated to 16.
Further, the playback thread with the largest available load is repeatedly determined, at this time, the available load of the playback thread 3 is the largest, and therefore, the log table C is allocated to the playback thread 3, and thus, allocation of all target logs in the preset sampling period is completed, and a log table packet corresponding to the number of threads, that is, a packet 1: a log table B; group 2: a log table A; group 3: log table D and log table C.
By grouping the log tables in the above manner, the log tables can be effectively assigned to different playback threads through load balancing. And secondly, according to the real-time available load capacity of each playback thread, selecting the playback thread with the largest load for distribution, so that the processing capacity of the playback thread can be utilized to the maximum, the performance and the efficiency of the system are improved, and the relative balance of the number of log tables processed by each playback thread is ensured.
Step 103, reading preset allocation schemes for a plurality of log tables to be synchronized, and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes.
It will be appreciated that, in general, the system will set a default preset allocation scheme for allocating each log table to a corresponding playback thread for execution. Therefore, it is necessary to compare the difference between the preset allocation scheme and the candidate allocation scheme, if there is no difference between the two schemes, the system continues to execute the preset allocation scheme, if there is a difference between the two schemes, calculate the difference between the two schemes, and determine the target allocation scheme according to the difference, so as to evaluate the influence of the two schemes on the system performance, and perform trade-off and decision.
The preset allocation scheme is a scheme which is executed by default currently. The preset allocation scheme may be to randomly allocate a plurality of current log tables to the playback thread, or may be to calculate hash values for the log tables to which the target log belongs, and then modulo the playback thread based on the hash value of each log table, and determine the playback thread specifically allocated to the log table based on the obtained modulo value. Or, the log tables may be classified, and then the log tables of different types may be allocated to the corresponding playback threads. However, in the above methods, a plurality of log tables may be distributed to the same playback thread, so that load distribution is unbalanced, so that log processing efficiency is low, and performance of the system is reduced.
The degree of difference may be a degree of relative difference between the preset allocation scheme and the candidate allocation scheme. By calculating the degree of difference, the quality or performance difference of the candidate allocation scheme relative to the preset allocation scheme can be determined, so that the target allocation scheme can be selected from the candidate allocation scheme and the preset allocation scheme.
In some embodiments, the execution time of the preset allocation scheme and the execution time of the candidate allocation scheme may be predicted, so as to obtain respective predicted execution times, and the difference degree of the preset allocation scheme and the candidate allocation scheme may be calculated according to the predicted execution times of the preset allocation scheme and the candidate allocation scheme. In some embodiments, standard deviation of each predicted load amount in the preset allocation scheme and the candidate allocation scheme can be calculated, then difference degree is calculated according to the standard deviation between the preset allocation scheme and the candidate allocation scheme, so that the optimal allocation scheme can be selected according to the difference degree, and load balancing, efficient data synchronization and performance optimization can be realized.
In some embodiments, to determine the degree of difference between the preset allocation scheme and the candidate allocation scheme, the degree of difference between the preset allocation scheme and the candidate allocation scheme may be calculated, specifically may be calculated according to standard deviations of predicted loads of the log tables in the preset allocation scheme and the candidate allocation scheme, so as to further calculate the degree of balance of loads of the playback threads in the two schemes, for example, "determining the degree of difference between the preset allocation scheme and the candidate allocation scheme" in step 103 may include:
(103.a1) comparing the candidate allocation scheme with a preset allocation scheme to obtain a first comparison result;
(103.a2) when the first comparison result indicates that the candidate allocation scheme is inconsistent with the preset allocation scheme, predicting a predicted load amount of each log table when playback is performed in the playback thread based on the number of target logs contained in each log table;
(103.a3) calculating a first load standard deviation corresponding to a preset allocation scheme and calculating a second load standard deviation of a candidate allocation scheme based on the predicted load amount of each log table;
(103.a4) determining the difference degree of the preset allocation scheme relative to the candidate allocation scheme according to the first load standard deviation and the second load standard deviation.
The first comparison result may be a result obtained by comparing the candidate allocation scheme with a preset allocation scheme, where the first comparison result characterizes consistency or inconsistency between the candidate allocation scheme and the preset allocation scheme.
The predicted load can be obtained by predicting the number of logs contained in each log table, and can be used for evaluating the computing resources required by each log table when playback is performed. It will be appreciated that when the number of logs of the log table is large, the log table requires a large amount of computing resources in performing playback, i.e., a large predicted load amount.
The first load standard deviation may be a measurement standard of a load balancing degree calculated according to a predicted load amount of a log table allocated to each playback thread in a preset allocation scheme, and the first load standard deviation represents a change degree of load distribution of each log table in the preset allocation scheme.
The second load standard deviation may be a measure of a load balancing degree calculated according to a predicted load amount of the log table allocated to each playback thread in the candidate allocation scheme, where the second load standard deviation represents a degree of change of load distribution of each log table in the candidate allocation scheme.
The difference degree can be obtained by evaluating the difference degree of the preset distribution scheme relative to the candidate distribution scheme according to the difference degree of the first load standard deviation and the second load standard deviation, and the difference degree can be obtained by calculating the ratio of the first load standard deviation to the second load standard deviation.
Illustratively, for the currently three available playback threads, thread 1, thread 2, and thread 3 are represented, respectively. When in the preset allocation scheme, the predicted load value of the thread 1 is the sum of the predicted loads of the log tables in the thread 1, and assuming that the total predicted load value of the thread 1 is 100, the total predicted load value of the thread 2 is 70, and the total predicted load value of the thread 3 is 40, the first load standard deviation is 24.49. Similarly, when in the candidate allocation scheme, assuming that the total predicted load value of thread 1 is 100, the total predicted load value of thread 2 is 60, and the total predicted load value of thread 3 is 50, the second load standard deviation is 21.61.
Then, the difference degree of the preset allocation scheme relative to the candidate allocation scheme can be calculated according to the ratio of the first load standard deviation to the second load standard deviation, namely, the difference degree=24.49/21.61= 1.13,1.13 is larger than 1, so that the standard deviation of the preset allocation scheme is larger than that of the candidate allocation scheme, and each data point in the preset allocation scheme has larger difference relative to the average value than that of the candidate allocation scheme. The conclusion that the allocation scheme of the candidate allocation scheme can realize the load balancing of the system can be determined according to the difference degree.
By the above method, the difference degree of the preset distribution scheme relative to the candidate distribution scheme can be determined by calculating the ratio of the first load standard deviation corresponding to the preset distribution scheme to the second load standard deviation corresponding to the candidate distribution scheme. And comparing the difference with 1, when the difference is larger than 1, indicating that the load balance of the preset allocation scheme is poor, and selecting a candidate allocation scheme, so that the load balance and performance of different allocation schemes can be evaluated and compared on the whole, the resource allocation is optimized, and the design of the allocation scheme is optimized and improved.
In some embodiments, to determine the difference between the preset allocation scheme and the candidate allocation scheme, the difference may also be obtained by calculating the difference between the execution durations between the preset allocation scheme and the candidate allocation scheme, so as to determine a better allocation scheme in an intuitive manner, for example, "determine the difference between the preset allocation scheme and the candidate allocation scheme" in step 103, and may further include:
(103. B1) predicting a first execution duration required for playing back target logs corresponding to the plurality of log tables according to a mapping relation between the plurality of log tables and the plurality of playback threads in a preset allocation scheme;
(103. B2) predicting second execution time lengths of the plurality of playback threads when playback of target logs corresponding to the plurality of log tables is completed according to the mapping relation between the plurality of log tables and the plurality of playback threads in the candidate allocation scheme;
(103. B3) determining a time difference between the first execution duration and the second execution duration, and determining a degree of difference of the preset allocation scheme relative to the candidate allocation scheme according to the time difference.
The mapping relationship may be a corresponding relationship between each playback thread and the log table allocated to the playback thread in a preset allocation scheme or a candidate allocation scheme. For example, if thread 1 corresponds to log table 1 and log table 2, then thread 1 has a mapping relationship with log table 1 and log table 2, and the target log of the log table that needs to be processed by each playback thread can be determined by the mapping relationship.
The first execution duration may be the execution time required by the plurality of playback threads to play back the target logs corresponding to the plurality of log tables according to the mapping relationship in the preset allocation scheme, that is, the first execution duration is the execution time of all the target logs obtained by sampling the preset sampling period in the preset allocation scheme.
The second execution duration may be the execution time of the plurality of playback threads when playback is completed on the target logs of the plurality of log tables according to the mapping relationship in the candidate allocation scheme, that is, the second execution duration is the execution time of all the target logs obtained by sampling the preset sampling period in the candidate allocation scheme.
The time difference may be a difference between the first execution duration and the second execution duration, and may be used to measure a difference between the preset allocation scheme and the candidate allocation scheme in playback time. A smaller time difference indicates that the two schemes are similar in execution time, and a larger time difference indicates that there is a larger difference.
The difference degree may be a time difference between performing a preset allocation scheme and performing a candidate allocation scheme.
Exemplary, the mapping relationship in the preset allocation scheme is: the log table A and the log table C correspond to the playback thread 1; the log table B corresponds to the playback thread 2; the log table D corresponds to the playback thread 3.
The mapping relation in the candidate allocation scheme is as follows: the log table A corresponds to the playback thread 1; the log table B and the log table D correspond to the playback thread 2; log table C corresponds to playback thread 3.
According to the mapping relation of the preset allocation scheme, the first execution duration required by all playback threads to complete the target log of the log table can be predicted to be 200 seconds. According to the mapping relation of the candidate allocation scheme, the second execution duration required by all playback threads to complete the target log of the log table can be predicted to be 170 seconds. And calculating the time difference between the first execution duration and the second execution duration as the difference degree, wherein the difference degree is 30 seconds.
In some embodiments, the time required for the concurrent execution of the multiple playback threads of the standby machine can be predicted directly according to the log number of the log table allocated by each playback thread in the preset allocation scheme and the candidate allocation scheme. In general, when the allocation scheme is unbalanced in load, the execution time required for the playback thread with a larger load is longer, and therefore, the execution time of the allocation scheme when the load is unbalanced is longer than that when the load is balanced, and therefore, the target allocation scheme can be selected as the degree of difference according to the time difference.
In some embodiments, the first execution duration or the second execution duration may also be predicted according to the following manner: firstly, collecting historical data, including specific execution time lengths of the historical logs with different parameters, and building a prediction model according to the different parameters and the specific execution time lengths, wherein the prediction model can perform feature engineering according to the different parameters to extract relevant features; the relevant characteristics may include the size, complexity, etc. of each history log to predict the length of execution required for playback of each log.
In some embodiments, a regression model (e.g., linear regression, decision tree regression, etc.) may be used to build a prediction model, and predict the length of execution required for each history log based on the relevant features, and continuously adjust the parameters of the prediction model through training until the prediction model converges.
Furthermore, because each playback thread runs in parallel in the preset allocation scheme and the candidate allocation scheme, when the playback thread with the largest predicted load is executed, the execution of other playback threads is indicated to be finished, and therefore, the execution time of the playback thread with the largest predicted load can be determined as the execution time of the preset allocation scheme and the candidate allocation scheme without predicting each playback thread, and the allocation efficiency is improved. Specifically, according to the mapping relation between a plurality of log tables and a plurality of playback threads in a preset allocation scheme, determining the playback thread with the largest predicted load as a first reference playback thread, and predicting a first execution duration required for playing back all allocated log tables in the first reference playback thread; according to the mapping relation between the plurality of log tables and the plurality of playback threads in the candidate allocation scheme, determining the playback thread with the largest predicted load as a second reference playback thread, and predicting second execution time length required for playing back all the allocated log tables in the second reference playback thread. And then, determining the time difference between the first execution time length and the second execution time length, and determining the difference degree of the preset allocation scheme relative to the candidate allocation scheme according to the time difference.
Further, after the trained prediction model is obtained, the target logs can be input into the prediction model to obtain the execution duration of each target log, and in the log table corresponding to the target log, the execution durations of all the target logs contained in the log table are accumulated according to the execution durations of the log table to obtain the execution duration of the log table. Further, for each playback thread, the execution duration of each log table may be added to obtain the execution duration of the playback thread. It is understood that the first execution duration and the second execution duration can be predicted in the above manner.
Through the mode, the degree of difference between the preset allocation scheme and the candidate allocation scheme can be determined through the mapping relation and the prediction, so that the method is beneficial to the subsequent selection of a more appropriate target allocation scheme, the system resources are fully utilized, and the overall performance of the system is improved.
And 104, selecting a target allocation scheme from the candidate allocation scheme and the preset allocation scheme according to the difference degree.
It will be appreciated that the target allocation scheme may be selected based on a degree of difference between the candidate allocation scheme and the preset allocation scheme. Specifically, since the preset allocation scheme is the scheme currently being executed by the system, if the candidate allocation scheme is used as the target allocation scheme, the target log in the currently executing log table in each playback thread needs to be executed completely, and then the target log is adjusted to avoid the situation of inconsistent data or errors. Therefore, if the difference is smaller, the preset allocation scheme can still be executed, and if the difference is larger, the candidate allocation scheme is required to be used as the target allocation scheme, so that load balancing is realized, and the resource utilization efficiency is improved.
In some embodiments, in order to determine whether a candidate allocation scheme needs to be a target allocation scheme according to the degree of difference, instead of the preset allocation scheme currently being executed, a degree of difference threshold may be set to compare the degree of difference with the degree of difference threshold, to determine whether an adjustment is needed, for example, step 104 may include:
(104.1) obtaining a difference threshold;
(104.2) comparing the degree of variance to a degree of variance threshold;
(104.3) when the degree of variance is greater than the degree of variance threshold, regarding the candidate allocation scheme as the target allocation scheme.
The difference threshold may be a standard reference value, and may be set empirically. It can be appreciated that, when the currently executing preset allocation scheme is adjusted to be the candidate allocation scheme, a certain time is required to wait for the execution of the target log of the log table allocated at the previous moment in the playback thread to finish, so that the difference threshold may be increased appropriately, so that the preset allocation scheme is still used as the target allocation scheme when the candidate allocation scheme is slightly better than the preset allocation scheme, for example, when the first execution duration of the preset allocation scheme is slightly longer than the second execution duration of the candidate allocation scheme.
In some embodiments, the variance threshold may be set according to the actual situation. For example, when the degree of difference is a time difference, the degree of difference threshold may be determined based on the time difference between the preset allocation scheme and the candidate allocation scheme. For example, the difference threshold is set to 10 seconds, when the first execution duration corresponding to the preset allocation scheme minus the second execution duration corresponding to the candidate allocation scheme, the obtained time difference is 5 seconds, and the difference value is greater than 0, which indicates that the candidate allocation scheme is better than the preset allocation scheme, but because the difference value is not greater than the difference threshold, the preset allocation scheme is still used as the target allocation scheme, that is, the target log can still be played back according to the default preset allocation scheme in the standby machine, and after the next preset sampling period is sampled to the new target log, the difference is recalculated to determine whether the current default allocation scheme needs to be adjusted.
In some embodiments, if the difference is calculated according to the ratio of the first load standard deviation and the second load standard deviation, and the difference threshold is set to 3, when the difference is greater than 1, for example, the difference is 1.13, it indicates that the candidate allocation scheme is better than the preset allocation scheme, but the preset allocation scheme is still used as the target allocation scheme because the difference is not greater than the difference threshold.
In some embodiments, when the degree of variance is greater than the degree of variance threshold, it is indicated that the candidate allocation is significantly better than the preset allocation, and therefore, when the degree of variance is greater than the degree of variance threshold, the candidate allocation may be taken as the target allocation.
It will be appreciated that frequent replacement of allocation schemes due to minor performance improvements may be avoided by setting a variance threshold, which may reduce system fluctuations and unnecessary management overhead. At the same time, this also means that only when the candidate allocation scheme brings a significant improvement, replacement of the preset allocation scheme currently being executed is considered, thereby ensuring stable operation of the system and improving the performance of the system.
And 105, distributing the target logs in each log table to the corresponding playback threads for playback according to the target distribution scheme.
Specifically, the target allocation scheme records the playback threads corresponding to the log tables to which the target logs obtained by sampling in the preset sampling period belong, so that the target logs in each log table are allocated to the corresponding playback threads, and the playback threads can play back the target logs in the log tables, thereby realizing parallel playback processing of the log tables and improving the processing efficiency and performance of the system.
It can be understood that after the target allocation scheme is selected from the preset allocation scheme and the candidate allocation scheme, if the preset allocation scheme is used as the target allocation scheme, the log table does not need to be reassigned; if the candidate allocation scheme is used as the target allocation scheme, the log table to be adjusted can be adjusted to a new playback thread after the target log obtained by sampling the preset sampling period is executed in the previous playback thread, so that continuity and stability of the playback thread are ensured, and data loss or inconsistency caused by redistribution in the middle is avoided. Thus, prior to "step 105", it may include:
(c1) When the candidate allocation scheme is used as a target allocation scheme, determining a first thread number of a playback thread mapped correspondingly to each log table according to the target allocation scheme;
(c2) Determining a historical thread number allocated to each log table at the last moment;
(c3) Comparing the first thread number with the historical thread number to obtain a second comparison result;
(c4) When the second comparison result indicates that the first thread number is inconsistent with the history thread number, continuously executing a target log obtained by sampling the log table at the last moment in a history playback thread corresponding to the history thread number;
(c5) And for each log table, after the history playback thread finishes executing the target log obtained by sampling the log table at the last moment, executing the step of distributing the target log in each log table to the corresponding playback thread for playback according to the target distribution scheme.
Wherein the first thread number may be a number of a playback thread for playing back the target log corresponding thereto according to the target allocation scheme, that is, the first thread number represents a target playback thread to which each log table is allocated in the target allocation scheme, for each log table.
The historical thread number may be the number of the playback thread allocated to each log table at the previous time, and the historical thread number may be the number of the playback thread allocated to the log table in the previous sampling round, or the number of the playback thread allocated to the log table by default in the current sampling round. Specifically, one sampling round corresponds to one preset sampling period.
The second comparison result is a result of comparing the first thread number with the historical thread number, and when the first thread number is inconsistent with the historical thread number, the second comparison result is in an inconsistent state.
The history playback thread is a playback thread corresponding to the history thread number, that is, a playback thread that performs playback on the log table before the target allocation scheme is executed.
For example, when the thread number of the playback thread allocated to the log table a in the target allocation scheme is thread 1, and the history thread number of the history playback thread of the log table a at the previous time is thread 2, the comparison result of the thread 1 and the thread 2 is inconsistent, so if the log table a has 5 target logs which are not played back in the thread 2, after the execution of the 5 target logs in the standby machine is completed, the execution of the target logs in the log table a in the playback thread with the thread number of thread 1 according to the target allocation scheme is further performed, and the playback of the target logs is continued.
By the method, the problem that the log is lost or data is inconsistent in the playback process can be avoided, and continuity and stability of the log playback are ensured. Meanwhile, the log table is redistributed according to the target distribution scheme, so that the load balance of the playback threads can be optimized, and the performance and efficiency of the system are improved.
According to the embodiment of the application, the logs to be synchronized are acquired, and the logs are sampled according to the preset sampling period, so that the log table corresponding to each log and the target log and the number of the logs contained in each log table are determined; based on the log quantity of each log table, carrying out load balancing simulation distribution on the plurality of log tables among a plurality of playback threads to obtain candidate distribution schemes; reading preset allocation schemes aiming at a plurality of log tables to be synchronized, and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes; selecting a target allocation scheme from the candidate allocation scheme and a preset allocation scheme according to the difference degree; and according to the target allocation scheme, allocating the target log in each log table to a corresponding playback thread for playback. Therefore, the load condition occupied by each log table can be determined through the log quantity obtained by sampling the log tables, so that the load balance simulation distribution is carried out among the playback threads according to the log quantity in each log table, the problem of unbalanced load is avoided, and the overall performance of the system is improved. And, the target allocation scheme is selected from the candidate allocation scheme and the preset allocation scheme through the degree of difference, so that the system resources can be fully utilized, and the load balancing can be realized. In summary, the application can realize the load balance of the system and improve the performance effect of the system.
In order to verify the actual effect of the data processing method, the method is compared with a default log allocation method, wherein the default log processing method is to calculate hash values of log tables, take a module value and allocate target logs in the log tables according to the module value. Specifically, by performing performance test on opengauss databases by the data processing method and the default method, each log table is simultaneously allocated and played back under the conditions of 1000 bins and 100 playback threads.
Experimental results show that compared with a default method, the data processing method of the application plays back the target log by using the default distribution method, and the playback speed of the standby machine is improved by 30%. Therefore, the method can greatly improve the playback speed of the standby machine to the target log, and particularly has important significance in a large-scale log playback scene, and can greatly save computing resources and improve the performance of a system.
Referring to fig. 3, an embodiment of the present application further provides a data processing apparatus, which may implement the above data processing method, where the data processing apparatus includes:
The sampling module 31 is configured to obtain a plurality of log tables to be synchronized, and sample the plurality of logs according to a preset sampling period, so as to determine a target log and the number of logs included in each log table;
The allocation module 32 is configured to perform load-balancing simulation allocation on the plurality of log tables among the plurality of playback threads based on the log number of each log table, so as to obtain a candidate allocation scheme;
A determining module 33, configured to read preset allocation schemes for a plurality of log tables to be synchronized, and determine a degree of difference between the preset allocation schemes and the candidate allocation schemes;
a selecting module 34, configured to select a target allocation scheme from the candidate allocation scheme and a preset allocation scheme according to the degree of difference;
and the playback module 35 is configured to allocate the target log in each log table to a corresponding playback thread for playback according to the target allocation scheme.
The specific implementation of the data processing apparatus is substantially the same as the specific embodiment of the data processing method described above, and will not be described herein. On the premise of meeting the requirements of the embodiment of the application, the data processing device can be further provided with other functional modules so as to realize the data processing method in the embodiment.
The embodiment of the application also provides computer equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the data processing method when executing the computer program. The computer equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 4, fig. 4 illustrates a hardware structure of a computer device according to another embodiment, where the computer device includes:
The processor 41 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;
Memory 42 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 42 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented by software or firmware, relevant program codes are stored in the memory 42 and the processor 41 invokes a data processing method for executing the embodiments of the present disclosure;
An input/output interface 43 for implementing information input and output;
The communication interface 44 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (such as USB, network cable, etc.), or may implement communication in a wireless manner (such as mobile network, WIFI, bluetooth, etc.);
A bus 45 for transferring information between various components of the device, such as the processor 41, memory 42, input/output interface 43, and communication interface 44;
Wherein the processor 41, the memory 42, the input/output interface 43 and the communication interface 44 are in communication connection with each other inside the device via a bus 45.
The embodiment of the application also provides a computer readable storage medium, which stores a computer program, and the computer program realizes the data processing method when being executed by a processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" and "a plurality" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided by the present application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the above elements is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.
The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (10)

1. A method of data processing, the method comprising:
Acquiring a plurality of log tables to be synchronized, and sampling the plurality of logs according to a preset sampling period to determine target logs and the number of the logs contained in each log table;
Based on the log quantity of each log table, carrying out load balancing simulation distribution on a plurality of log tables among a plurality of playback threads to obtain candidate distribution schemes;
Reading preset allocation schemes aiming at the plurality of log tables to be synchronized, and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes;
selecting a target allocation scheme from the candidate allocation scheme and the preset allocation scheme according to the difference degree;
When the candidate allocation scheme is used as a target allocation scheme, determining a first thread number of a playback thread mapped correspondingly to each log table according to the target allocation scheme;
Determining a historical thread number allocated to each log table at the last moment;
Comparing the first thread number with the historical thread number to obtain a second comparison result;
When the second comparison result indicates that the first thread number is inconsistent with the history thread number, continuously executing a target log obtained by sampling the log table at the last moment in a history playback thread corresponding to the history thread number;
And for each log table, after the history playback thread finishes executing the target log obtained by sampling the log table at the last moment, distributing the target log in each log table to the corresponding playback thread for playback according to the target distribution scheme.
2. The method according to claim 1, wherein the performing load-balanced simulation allocation of the plurality of log tables among the plurality of playback threads based on the number of logs in each log table to obtain the candidate allocation scheme includes:
Predicting a predicted load amount of each log table when playback is performed in the playback thread based on the number of target logs contained in each log table;
And carrying out load balancing simulation distribution on a plurality of log tables in the playback threads of the thread number according to the predicted load capacity occupied by each log table to obtain candidate distribution schemes.
3. The data processing method according to claim 2, wherein performing load balancing simulation allocation on the plurality of log tables in the playback threads of the number of threads according to the predicted load amount occupied by each log table to obtain a candidate allocation scheme includes:
Sequencing all the log tables according to the predicted load capacity occupied by each log table to obtain sequencing results;
determining the number of threads of the available playback threads;
When the number of threads is smaller than the number corresponding to the plurality of log tables, according to the sequencing result, carrying out load balancing grouping on the plurality of log tables in the playback threads of the number of threads to obtain log table grouping corresponding to the number of threads;
and obtaining candidate allocation schemes according to each playback thread and the log table group corresponding to the playback thread.
4. The method for processing data according to claim 3, wherein said grouping the plurality of log tables in the playback threads of the number of threads according to the sorting result to obtain the log table group corresponding to the number of threads comprises:
determining the corresponding allocation sequence of each log table according to the sequencing result;
selecting a target playback thread with the largest current available load from the log list for distribution according to the real-time available load of each playback thread for the log list according to the available load of each playback thread, and updating the available load of the target playback thread after the distribution is finished;
and when each log table in the sequencing result is distributed to the playback thread according to the corresponding distribution sequence, acquiring log table groups corresponding to the thread number.
5. The data processing method according to claim 1, wherein the determining a degree of difference of the preset allocation scheme with respect to the candidate allocation scheme includes:
comparing the candidate allocation scheme with the preset allocation scheme to obtain a first comparison result;
When the first comparison result indicates that the candidate allocation scheme is inconsistent with the preset allocation scheme, predicting the predicted load capacity of each log table when playback is executed in the playback thread based on the number of target logs contained in each log table;
calculating a first load standard deviation corresponding to the preset allocation scheme and a second load standard deviation of the candidate allocation scheme based on the predicted load capacity of each log table;
And determining the difference degree of the preset distribution scheme relative to the candidate distribution scheme according to the first load standard deviation and the second load standard deviation.
6. The data processing method according to claim 1 or 4, wherein selecting a target allocation scheme from among the candidate allocation scheme and the preset allocation scheme according to the degree of difference comprises:
acquiring a difference threshold;
Comparing the degree of difference to the degree of difference threshold;
And when the difference degree is larger than the difference degree threshold value, taking the candidate allocation scheme as a target allocation scheme.
7. The data processing method according to claim 1, wherein the determining a degree of difference of the preset allocation scheme with respect to the candidate allocation scheme further comprises:
predicting a first execution duration required for playing back target logs corresponding to a plurality of log tables according to the mapping relation between the log tables and a plurality of playback threads in the preset allocation scheme;
Predicting second execution time length of the plurality of playback threads when playback is completed on target logs corresponding to the plurality of log tables according to the mapping relation between the plurality of log tables and the plurality of playback threads in the candidate allocation scheme;
and determining a time difference between the first execution time length and the second execution time length, and determining the degree of difference of the preset allocation scheme relative to the candidate allocation scheme according to the time difference.
8. A data processing apparatus, the apparatus comprising:
the sampling module is used for acquiring a plurality of log tables to be synchronized, and sampling the logs according to a preset sampling period to determine target logs and the number of the logs contained in each log table;
the distribution module is used for carrying out load balancing simulation distribution on a plurality of log tables among a plurality of playback threads based on the log quantity of each log table to obtain candidate distribution schemes;
the determining module is used for reading preset allocation schemes aiming at the plurality of log tables to be synchronized and determining the difference degree of the preset allocation schemes relative to the candidate allocation schemes;
the selecting module is used for selecting a target distribution scheme from the candidate distribution scheme and the preset distribution scheme according to the difference degree;
The playback module is used for determining a first thread number of a playback thread corresponding to the mapping of each log table according to the target allocation scheme when the candidate allocation scheme is used as the target allocation scheme; determining a historical thread number allocated to each log table at the last moment; comparing the first thread number with the historical thread number to obtain a second comparison result; when the second comparison result indicates that the first thread number is inconsistent with the history thread number, continuously executing a target log obtained by sampling the log table at the last moment in a history playback thread corresponding to the history thread number; and for each log table, after the history playback thread finishes executing the target log obtained by sampling the log table at the last moment, distributing the target log in each log table to the corresponding playback thread for playback according to the target distribution scheme.
9. A computer device, characterized in that it comprises a memory storing a computer program and a processor implementing the data processing method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the data processing method of any one of claims 1 to 7.
CN202410398004.3A 2024-04-03 2024-04-03 Data processing method, device, computer equipment and storage medium Active CN117992240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410398004.3A CN117992240B (en) 2024-04-03 2024-04-03 Data processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410398004.3A CN117992240B (en) 2024-04-03 2024-04-03 Data processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117992240A CN117992240A (en) 2024-05-07
CN117992240B true CN117992240B (en) 2024-07-09

Family

ID=90891375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410398004.3A Active CN117992240B (en) 2024-04-03 2024-04-03 Data processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117992240B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115687283A (en) * 2022-11-03 2023-02-03 中国农业银行股份有限公司 Log-based playback method and device, electronic equipment and medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11455230B2 (en) * 2019-11-20 2022-09-27 International Business Machines Corporation Event specific log file generation
CN114116665B (en) * 2021-11-22 2023-07-25 北京海量数据技术股份有限公司 Method for writing transaction log in parallel in database to promote processing efficiency
CN115309550A (en) * 2022-08-04 2022-11-08 天津神舟通用数据技术有限公司 MPP parallel database instance level copy balancing method
CN115658310A (en) * 2022-10-27 2023-01-31 中国联合网络通信集团有限公司 Log playback method, device, equipment and medium
CN115994053A (en) * 2022-11-25 2023-04-21 金篆信科有限责任公司 Parallel playback method and device of database backup machine, electronic equipment and medium
CN117076195A (en) * 2023-08-11 2023-11-17 中国建设银行股份有限公司 Parameter adjusting method and device, storage medium and electronic device
CN117130871B (en) * 2023-10-26 2024-04-05 本原数据(北京)信息技术有限公司 Parallel playback method and device for database logs and nonvolatile storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115687283A (en) * 2022-11-03 2023-02-03 中国农业银行股份有限公司 Log-based playback method and device, electronic equipment and medium

Also Published As

Publication number Publication date
CN117992240A (en) 2024-05-07

Similar Documents

Publication Publication Date Title
CN112162865B (en) Scheduling method and device of server and server
KR101600129B1 (en) Application efficiency engine
CN102143022B (en) Cloud measurement device and method for IP network
US8572621B2 (en) Selection of server for relocation of application program based on largest number of algorithms with identical output using selected server resource criteria
US10810054B1 (en) Capacity balancing for data storage system
CN104166589A (en) Heartbeat package processing method and device
Costa et al. Towards automating the configuration of a distributed storage system
CN113590281A (en) Distributed parallel fuzzy test method and system based on dynamic centralized scheduling
Dai et al. Two-choice randomized dynamic I/O scheduler for object storage systems
CN111694653A (en) Method, device and system for adjusting distribution of calculation operator types in calculation system
CN111858656A (en) Static data query method and device based on distributed architecture
WO2024174920A1 (en) Metadata load balancing method , apparatus and device, and non-volatile readable storage medium
Guo et al. Fast replica recovery and adaptive consistency preservation for edge cloud system
CN114281256A (en) Data synchronization method, device, equipment and medium based on distributed storage system
CN117992240B (en) Data processing method, device, computer equipment and storage medium
CN109032809A (en) Heterogeneous parallel scheduling system based on remote sensing image storage position
Guo et al. Handling data skew at reduce stage in Spark by ReducePartition
CN112433838A (en) Batch scheduling method, device, equipment and computer storage medium
CN113485828A (en) Distributed task scheduling system and method based on quartz
Luo et al. Towards efficiently supporting database as a service with QoS guarantees
CN111506422A (en) Event analysis method and system
CN113360455B (en) Data processing method, device, equipment and medium of super fusion system
Gu et al. Push-Based Network-efficient Hadoop YARN Scheduling Mechanism for In-Memory Computing
Sharafi et al. Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
CN112181755B (en) Method for determining parameter threshold in search service and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant