CN113127427A - Method, system and device for analyzing transaction distribution in database log - Google Patents

Method, system and device for analyzing transaction distribution in database log Download PDF

Info

Publication number
CN113127427A
CN113127427A CN202110430915.6A CN202110430915A CN113127427A CN 113127427 A CN113127427 A CN 113127427A CN 202110430915 A CN202110430915 A CN 202110430915A CN 113127427 A CN113127427 A CN 113127427A
Authority
CN
China
Prior art keywords
transaction
database
binary
log file
binary log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110430915.6A
Other languages
Chinese (zh)
Other versions
CN113127427B (en
Inventor
王绍晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yingxin Computer Technology Co Ltd
Original Assignee
Shandong Yingxin Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yingxin Computer Technology Co Ltd filed Critical Shandong Yingxin Computer Technology Co Ltd
Priority to CN202110430915.6A priority Critical patent/CN113127427B/en
Publication of CN113127427A publication Critical patent/CN113127427A/en
Application granted granted Critical
Publication of CN113127427B publication Critical patent/CN113127427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a system and a device for analyzing transaction distribution in database logs, which are used for acquiring a binary log file corresponding to a database, and autonomously analyzing binary contents of the binary log file to obtain an analysis text of the binary log file; in the process of analyzing the binary content, according to the keywords representing the distribution condition of each transaction, the distribution information of each transaction corresponding to the binary log file is automatically obtained; and according to a preset large transaction distinguishing rule, autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions. Therefore, the method and the device can automatically analyze the binary log file corresponding to the database and automatically analyze the distribution information of the large affairs in the log file, thereby improving the log analysis efficiency.

Description

Method, system and device for analyzing transaction distribution in database log
Technical Field
The invention relates to the field of database performance optimization, in particular to a method, a system and a device for analyzing transaction distribution in database logs.
Background
The MySQL binlog is an unreadable binary log file and is mainly used for recording operations of updating database contents, such as adding, deleting, modifying and the like, inside MySQL (a large database). At present, analysis of transactions in binlog plays an important role in MySQL performance, mainly large transactions in binlog (which means long-time uncommitted transactions in MySQL), because the operation of large transactions occupies system resources for a long time, resulting in a large amount of blocked transactions and seriously affecting the performance of a database; and large transactions are also one of the most common causes of master-slave latency in master-slave database synchronization.
In the prior art, a method for analyzing a transaction in MySQL binlog comprises the following steps: the MySQL self-contained MySQL brinlog tool or a third-party development tool (binlog2sql, myflash and other tools) is utilized to analyze the binlog into a readable text file for a user to check, so that the distribution situation of the transactions in the binlog is obtained through manually analyzing the readable text file, and the distribution situation and related information of the large transactions are determined, so that a basis is provided for optimizing the processing of the large transactions, and the database performance is improved. However, the time taken to manually analyze the text file is long, resulting in inefficient log analysis.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a method, a system and a device for analyzing transaction distribution in database logs, which can automatically analyze binary log files corresponding to a database and automatically analyze the distribution information of large transactions in the log files, thereby improving the log analysis efficiency.
In order to solve the above technical problem, the present invention provides a method for analyzing transaction distribution in a database log, comprising:
acquiring a binary log file corresponding to a database, and performing autonomous analysis on binary contents of the binary log file to obtain an analysis text of the binary log file;
in the process of analyzing the binary content, according to the keywords representing the distribution condition of each transaction, the distribution information of each transaction corresponding to the binary log file is automatically obtained;
and according to a preset large transaction distinguishing rule, autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions.
Preferably, the process of obtaining the binary log file corresponding to the database includes:
monitoring whether an index file of a binary log file corresponding to the database is changed;
if yes, triggering the backup of the local binary log file corresponding to the index file before the change occurs, transmitting the backup binary log file to a remote end so as to obtain the binary log file corresponding to the database at the remote end, and then executing the operation of analyzing the transaction distribution in the obtained binary log file at the remote end.
Preferably, the process of transmitting the backed-up binary log file to a remote location includes:
transmitting the backup binary log file to a remote end by taking a data block as a unit;
and the method for analyzing the transaction distribution in the database log further comprises the following steps:
calculating a first hash value of a target data block which is not transmitted to a remote end according to a hash algorithm; wherein the target data block is any data block;
after the target data block is transmitted to a remote end, calculating a second hash value of the target data block received by the remote end according to a hash algorithm;
after the last data block corresponding to the backup binary log file is transmitted to a remote end, judging whether the first hash value of the target data block is consistent with the second hash value of the target data block;
if all the data blocks are uniform, determining that the backup binary log file is not damaged in the transmission process; otherwise, the backup binary log file is determined to be damaged in the transmission process.
Preferably, the process of autonomously analyzing the binary content of the binary log file to obtain the analysis text of the binary log file includes:
decomposing the binary log file into a plurality of binary files, and converting each binary file into a readable binary file so as to read binary contents of each readable binary file;
translating the binary content of each readable binary file according to the corresponding relation of the binary text to obtain a translation text of each readable binary file;
and merging the translation texts of the readable binary files to obtain the analysis text of the binary log file.
Preferably, the process of autonomously obtaining the distribution information of each transaction corresponding to the binary log file according to the keyword representing the distribution condition of each transaction includes:
correspondingly calculating the size of each transaction according to the key word representing the beginning of each transaction and the key word representing the end of each transaction;
correspondingly calculating the starting time and the ending time of each transaction according to the keywords representing the starting time of each transaction and the keywords representing the ending time of each transaction;
calculating the total number of the transactions according to the GTID number representing the id information of each transaction;
correspondingly calculating the starting position and the ending position of each transaction according to the keywords representing the starting positions of the transactions and the keywords representing the ending positions of the transactions;
and summarizing the information obtained by calculating each transaction to obtain the distribution information of each transaction.
Preferably, the method for resolving transaction distribution in a database log further includes:
extracting SQL contents corresponding to the large transaction from the analysis text according to the starting position and the ending position of the large transaction;
and visually displaying the total number of the transactions, the total number of the large transactions, the distribution information of the large transactions, the SQL contents corresponding to the distribution information and the optimization processing suggestions of the large transactions.
Preferably, the setting process of the preset large transaction distinguishing rule includes:
acquiring the average CPU utilization rate of a server for monitoring the database, QPS information of the database and the average master-slave delay time of the database;
judging whether the average CPU utilization rate exceeds a preset utilization rate threshold, whether the QPS information exceeds a preset number threshold and whether the average master-slave delay time exceeds a preset time threshold;
if the average CPU utilization rate exceeds a preset utilization rate threshold value and/or the QPS information exceeds a preset number threshold value and/or the average master-slave delay time exceeds a preset time threshold value, taking the transaction with the transaction size larger than a preset first threshold value and the execution time exceeding the preset first time threshold value as a large transaction; otherwise, taking the transaction with the transaction size larger than the preset second threshold and the execution time exceeding the preset second time threshold as a large transaction; wherein the preset first threshold is smaller than the preset second threshold; the preset first time threshold is smaller than the preset second time threshold.
Preferably, the process of obtaining the average CPU utilization of the server for monitoring the database, the QPS information of the database, and the average master-slave delay time of the database includes:
calculating the average CPU utilization rate of the servers for monitoring the database according to busy ═ (CPU1+ CPU2+ … CPUn)/count 1; wherein busy is the average utilization rate of the CPU; the CPUn is the CPU utilization rate of the server acquired at the nth time; count1 is the total collection times of the CPU utilization rate;
calculating QPS information of the database according to (com _ delete + com _ insert + com _ update)/diff _ time; wherein QPS is the transaction number submitted by the database per second; com _ delete is the number of times per second that the database deletes data, and is an accumulated value; com insert is the number of data insertions per second of the database, which is a cumulative value; com _ update is the number of data updates per second of the database, and is an accumulated value; diff _ time is the time difference;
calculating the average master-slave delay time of the database according to the delay (second _ slave _ master1+ … + second _ slave _ master/count 2; wherein, second _ before _ master n is the master-slave delay time of the database collected at the nth time; count2 is the total number of acquisitions for the master slave delay time.
In order to solve the above technical problem, the present invention further provides a system for analyzing transaction distribution in a database log, including:
the analysis module is used for acquiring a binary log file corresponding to a database and autonomously analyzing the binary content of the binary log file to obtain an analysis text of the binary log file;
the distribution module is used for autonomously obtaining the distribution information of each transaction corresponding to the binary log file according to the keywords representing the distribution condition of each transaction in the process of analyzing the binary content;
and the distinguishing module is used for autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction according to a preset large transaction distinguishing rule so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions.
In order to solve the above technical problem, the present invention further provides a device for analyzing transaction distribution in a database log, including:
a storage module for storing a computer program;
and the processing module is used for executing the computer program to realize the steps of any method for analyzing the transaction distribution in the database log.
The invention provides a method for analyzing transaction distribution in database logs, which comprises the steps of obtaining a binary log file corresponding to a database, and autonomously analyzing binary contents of the binary log file to obtain an analysis text of the binary log file; in the process of analyzing the binary content, according to the keywords representing the distribution condition of each transaction, the distribution information of each transaction corresponding to the binary log file is automatically obtained; and according to a preset large transaction distinguishing rule, autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions. Therefore, the method and the device can automatically analyze the binary log file corresponding to the database and automatically analyze the distribution information of the large affairs in the log file, thereby improving the log analysis efficiency.
The invention also provides a system and a device for analyzing the transaction distribution in the database log, and the system and the device have the same beneficial effects as the analysis method.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a method for analyzing transaction distributions in a database log according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for analyzing transaction distribution in a database log according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a method, a system and a device for analyzing the transaction distribution in the database log, which can automatically analyze the binary log file corresponding to the database and automatically analyze the distribution information of the large transactions in the log file, thereby improving the log analysis efficiency.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for analyzing transaction distributions in a database log according to an embodiment of the present invention.
The method for analyzing the transaction distribution in the database log comprises the following steps:
step S1: and acquiring a binary log file corresponding to the database, and autonomously analyzing the binary content of the binary log file to obtain an analysis text of the binary log file.
Specifically, taking the MySQL database as an example, the binlog of MySQL needs to be opened in advance, specifically, the log _ bin parameter corresponding to MySQL is set to be on (open state), and the binlog mode is set to be row mode (that is, the log content in the row mode can clearly record the details of modifying each row of data in the database), so as to generate a local binlog file (binary log file).
Based on this, the method and the device for automatically analyzing the binary log file obtain the binary log file corresponding to the database, and then automatically analyze the binary content of the binary log file corresponding to the database to obtain the analysis text of the binary log file corresponding to the database, namely the text readable and understandable by the user.
Step S2: and in the process of analyzing the binary content, automatically obtaining the distribution information of each transaction corresponding to the binary log file according to the keywords representing the distribution condition of each transaction.
Specifically, the content parsed from the binary log file corresponding to the database includes: a key "begin" representing the start of each Transaction, a key "commit" representing the end of each Transaction, a key "timestamp" representing the start and end times of each Transaction, a GTID (Global Transaction Identifier) number representing the id (Identifier) information of each Transaction, and a key "pos" representing the start and end positions of each Transaction.
Based on this, in the process of autonomously analyzing the binary content of the binary log file corresponding to the database, according to the keywords representing the distribution condition of each transaction, the distribution information of each transaction corresponding to the binary log file is autonomously obtained.
Step S3: and according to a preset large transaction distinguishing rule, autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions.
Specifically, a large transaction differentiation rule is set in advance, so as to provide conditions required for each transaction to become a large transaction, and to differentiate the transaction as the large transaction from each transaction. Based on the method and the device, the distribution information of the large transactions is autonomously distinguished from the distribution information of each transaction according to the preset large transaction distinguishing rule, so that the processing of the large transactions in the database is optimized based on the distribution information of the large transactions, and the performance of the database is improved.
The invention provides a method for analyzing transaction distribution in database logs, which comprises the steps of obtaining a binary log file corresponding to a database, and autonomously analyzing binary contents of the binary log file to obtain an analysis text of the binary log file; in the process of analyzing the binary content, according to the keywords representing the distribution condition of each transaction, the distribution information of each transaction corresponding to the binary log file is automatically obtained; and according to a preset large transaction distinguishing rule, autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions. Therefore, the method and the device can automatically analyze the binary log file corresponding to the database and automatically analyze the distribution information of the large affairs in the log file, thereby improving the log analysis efficiency.
On the basis of the above-described embodiment:
as an alternative embodiment, the process of obtaining the binary log file corresponding to the database includes:
monitoring whether an index file of a binary log file corresponding to the database is changed;
if yes, triggering the backup of the local binary log file corresponding to the index file before the change occurs, transmitting the backup binary log file to a remote end so as to obtain the binary log file corresponding to the database at the remote end, and then executing the operation of analyzing the transaction distribution in the obtained binary log file at the remote end.
Specifically, the size of the binary log file corresponding to the database is limited, and when a binary log file is full, a new binary log file is generated. Based on the difference of the index files of different binary log files, whether a new binary log file is generated or not can be judged by monitoring whether the index file of the binary log file corresponding to the database is changed or not. If a new binary log file is generated, the last generated binary log file can be analyzed to analyze the transaction distribution in the binary log file.
Based on this, the method monitors whether the index file of the binary log file corresponding to the database is changed, if the index file is changed, it indicates that a new binary log file is generated, backups the local binary log file corresponding to the index file before the change, and transmits the backed-up binary log file to the remote end, so as to obtain the binary log file corresponding to the database at the remote end, and perform the operation of analyzing the transaction distribution in the obtained binary log file at the remote end. If the index file is not changed, it indicates that a new binary log file is not generated, i.e. the current binary log file is not written, then waiting for the current binary log file to be written.
As an alternative embodiment, the process of transmitting the backed-up binary log file to a remote location includes:
transmitting the backup binary log file to a remote end by taking a data block as a unit;
and the method for analyzing the transaction distribution in the database log further comprises the following steps:
calculating a first hash value of a target data block which is not transmitted to a remote end according to a hash algorithm; wherein, the target data block is any data block;
after the target data block is transmitted to the far end, calculating a second hash value of the target data block received by the far end according to a hash algorithm;
after the last data block corresponding to the backup binary log file is transmitted to a remote end, judging whether a first hash value of a target data block is consistent with a second hash value of the target data block;
if all the data blocks are uniform, determining that the backup binary log file is not damaged in the transmission process; otherwise, the backup binary log file is determined to be damaged in the transmission process.
Specifically, the backup binary log file is transmitted to a remote end by taking a data block as a unit. In order to ensure that the binary log file transmitted to the remote end is not damaged in the transmission process, the method and the device calculate the first hash value of the target data block which is not transmitted to the remote end according to the hash algorithm, and calculate the second hash value of the target data block received by the remote end according to the hash algorithm after the target data block is transmitted to the remote end. After the last data block corresponding to the backup binary log file is transmitted to a remote end, judging whether the first hash values of all the data blocks corresponding to the backup binary log file are consistent with the respective second hash values; if the first hash values of all the data blocks are consistent with the respective second hash values, determining that the backed-up binary log file is not damaged in the transmission process, namely the backed-up binary log file is complete, and allowing the remote execution of the operation of analyzing the transaction distribution in the acquired binary log file; otherwise, it is determined that the backed-up binary log file is damaged in the transmission process, and the operation of analyzing the transaction distribution in the acquired binary log file is not allowed to be executed at the remote end.
In summary, the present application sets three working threads/processes: monitoring thread/process, backup thread/process and inspection thread/process; the monitoring thread/process is used for monitoring whether an index file of a binary log file corresponding to the database is changed or not, specifically, an index file name of the binary log file is added into a monitoring queue, and once the index file name of the binary log file is changed, a signal is sent out to trigger a backup thread/process; calling a backup thread/process, wherein the backup thread/process is used for backing up a local binary log file corresponding to the index file before the change occurs, and then transmitting the backed-up binary log file to a remote end by utilizing an nc tool (a network tool for network functions such as file transmission and the like); and calling the backup thread/process and calling a checking thread/process simultaneously, wherein the checking thread/process is used for checking the integrity of the transmitted binary log file.
As an optional embodiment, the process of autonomously analyzing the binary content of the binary log file to obtain the analysis text of the binary log file includes:
decomposing the binary log file into a plurality of binary files, and converting each binary file into a readable binary file so as to read binary contents of each readable binary file;
translating the binary content of each readable binary file according to the corresponding relation of the binary text to obtain a translation text of each readable binary file;
and merging the translation texts of the readable binary files to obtain the analysis text of the binary log file.
Specifically, the binary log file to be analyzed (called a target binary log file) is decomposed into a plurality of binary files, so that subsequent multi-process processing is facilitated, and efficiency is improved. Considering that the original binary log file is unreadable, the method needs to convert each binary file obtained by decomposition into a readable binary file, then open each readable binary file by using fopen () function, and read the binary content of each readable binary file in the form of file stream by using fread () function so as to analyze the binary content of each readable binary file.
Considering that the binary content of the binary log file corresponding to the database has a certain corresponding relationship (binary text corresponding relationship for short) with the text content readable and understandable by the user, the application can translate the binary content of each readable binary file according to the binary text corresponding relationship to obtain the translated text of each readable binary file, and then merge the translated texts of each readable binary file to obtain the analysis text of the target binary log file.
As an optional embodiment, the process of autonomously obtaining distribution information of each transaction corresponding to the binary log file according to the keyword that represents the distribution condition of each transaction includes:
correspondingly calculating the size of each transaction according to the key word representing the beginning of each transaction and the key word representing the end of each transaction;
correspondingly calculating the starting time and the ending time of each transaction according to the keywords representing the starting time of each transaction and the keywords representing the ending time of each transaction;
calculating the total number of the transactions according to the GTID number representing the id information of each transaction;
correspondingly calculating the starting position and the ending position of each transaction according to the keywords representing the starting positions of the transactions and the keywords representing the ending positions of the transactions;
and summarizing the information obtained by calculating each transaction to obtain the distribution information of each transaction.
Specifically, the content parsed from the binary log file corresponding to the database includes: the key "begin" representing the beginning of each transaction, the key "commit" representing the end of each transaction, the key "timetag" representing the beginning and ending times of each transaction, the GTID number representing the id information of each transaction, and the key "pos" representing the beginning and ending positions of each transaction.
Based on this, begin with begin, commit finishes as a complete transaction in this application, calculate the size of the transaction; calculating the start time and the end time of the transaction by using the timestamp; calculating the total number of the transactions by taking the GTID number as the id of the transaction; and calculating the starting position and the ending position of the transaction by pos, and finally summarizing the information obtained by calculating each transaction, wherein the summarized information is used as the distribution information of each transaction.
In addition, the distribution information of each transaction can be recorded in a log file (txt plain text file), the log content of the log file is converted into a json (JavaScript Object Notation) format supported by a monitoring system by using a monitoring acquisition node, and then the data is pushed to a data analysis node of the monitoring system, so that the data is stored in a local disk by the data analysis node according to a time sequence for subsequent query and analysis (for example, the distribution information of each transaction is obtained by query, and the distribution information of a large transaction is obtained by analysis according to a preset large transaction distinguishing rule).
In addition, the method can establish an alarm rule and an alarm mode: after the distribution information of the large affairs is analyzed and obtained from the distribution information of each affair, the distribution information of the large affairs is sent to the user in a mail or short message mode.
As an alternative embodiment, the method for resolving the transaction distribution in the database log further includes:
extracting SQL contents corresponding to the large transaction from the analysis text according to the starting position and the ending position of the large transaction;
and visually displaying the total number of the transactions, the total number of the large transactions, the distribution information of the large transactions, the SQL contents corresponding to the distribution information and the optimization processing suggestions of the large transactions.
Furthermore, the analysis result of the binary log file corresponding to the database can be visually displayed. Specifically, the visualization presentation is mainly divided into three parts:
one part is summary information: total number of transactions, total number of large transactions, distribution information of large transactions (execution time and start, end position, etc.).
The other part is detailed information: the specific SQL (data processing) content included in the large transaction can be extracted from the parsing text of the binary log file by using the start position and the end position of the large transaction. More specifically, the SQL content corresponding to the large transaction can be displayed according to the execution time length sequence of the SQL. It should be noted that, the displayed SQL content of the same type is subjected to deduplication: there will be a finger tag for SQL that appears in the same transaction, and the same finger will only output one SQL and the number of times it appears in the transaction.
The last part is optimization suggestion and processing suggestion of a large transaction, which is mainly divided into the following cases:
1) when the number of SQL contained in a transaction is too large, a plurality of SQL is split and executed. For example, over 1000 SQL in a transaction, it is proposed to split into every 100 uncommitted executions.
2) During SQL execution, if the database load is high, whether the database load is high due to large-transaction operation or high due to slow query or high concurrency of the database itself is checked, and if the database load is high due to slow query or high concurrency of the database itself, it is recommended to use redis (a cache database) to assist the database processing service or split part of the database service for processing by other databases.
3) SQL is always in a suspended state due to the lock contention problem caused by concurrency. It may be caused by SQL index missing, and it is suggested to give reasonable index, or it is suggested to turn down the lock release time, speed up the release of the program, and initiate again by using the retry mechanism of the program.
As an alternative embodiment, the process of setting the preset large transaction distinguishing rule includes:
acquiring the average CPU utilization rate of a server for monitoring the database, QPS information of the database and the average master-slave delay time of the database;
judging whether the average utilization rate of the CPU exceeds a preset utilization rate threshold, judging whether QPS information exceeds a preset number threshold, and judging whether the average master-slave delay time exceeds a preset time threshold;
if the average utilization rate of the CPU exceeds a preset utilization rate threshold value and/or QPS information exceeds a preset number threshold value and/or the average master-slave delay time exceeds a preset time threshold value, taking the transaction with the transaction size larger than a preset first threshold value and the execution time exceeding the preset first time threshold value as a large transaction; otherwise, taking the transaction with the transaction size larger than the preset second threshold and the execution time exceeding the preset second time threshold as a large transaction; wherein the preset first threshold is smaller than the preset second threshold; the preset first time threshold is smaller than the preset second time threshold.
Specifically, the database has three factors affecting the large transaction threshold: the average CPU (Central processing Unit) utilization of a server for monitoring the database, QPS (transaction number committed per second) information of the database, and the average master-slave delay time of the database. There are three conditions: the average utilization rate of a CPU of a server for monitoring a database exceeds a preset utilization rate threshold (e.g., 50%), QPS information of the database exceeds a preset number threshold (e.g., 5000), and the average master-slave delay time of the database exceeds a preset time threshold (e.g., 10s), and if any one of the conditions is met, the large transaction threshold corresponding to the database is: transactions having a transaction size greater than a first threshold (e.g., 300k) and an execution time exceeding a first time threshold (e.g., 150s) are treated as large transactions (the execution time of a transaction is equal to the end time of the transaction minus its start time). If the three conditions are not met, the large transaction threshold corresponding to the database is as follows: transactions having a transaction size greater than a second threshold (e.g., 600k) and an execution time exceeding a second time threshold (e.g., 300s) are treated as large transactions.
As an alternative embodiment, the process of obtaining the average CPU utilization of the server for monitoring the database, the QPS information of the database, and the average master-slave delay time of the database includes:
calculating the average CPU utilization rate of the servers for monitoring the database according to busy ═ (CPU1+ CPU2+ … CPUn)/count 1; wherein busy is the average utilization rate of the CPU; CPUn is the CPU utilization rate of the server acquired at the nth time; count1 is the total collection times of the CPU utilization rate;
calculating QPS information of the database according to (com _ delete + com _ insert + com _ update)/diff _ time; wherein QPS is the transaction number submitted by the database per second; com _ delete is the number of times per second that the database deletes data, and is an accumulated value; com insert is the number of data insertions per second of the database, which is a cumulative value; com _ update is the number of data updates per second of the database, and is an accumulated value; diff _ time is the time difference;
calculating the average master-slave delay time of the database according to the delay (second _ slave _ master1+ … + second _ slave _ master/count 2; wherein, second _ before _ master n is the master-slave delay time of the database collected at the nth time; count2 is the total number of acquisitions for the master slave delay time.
Specifically, the calculation relationship of the average CPU utilization of the server for monitoring the database is (referred to herein as the average CPU utilization over a certain period of time): busy ═ (CPU1+ CPU2+ … CPUn)/count 1; the CPU1 and the CPU2 respectively correspond to the CPU utilization at two time points, and count1 is the total collection times of the CPU utilization.
The calculation relation of the QPS information of the database is as follows: QPS ═ com _ delete + com _ insert + com _ update)/diff _ time; where com _ delete, com _ insert, and com _ update are all an accumulated value, and diff _ time is a time difference. For example, the number of data deletions of the first 10s of the database is com _ delete1, the number of data insertions of the first 10s of the database is com _ insert1, and the number of data updates of the first 10s of the database is com _ update1, so that the QPS of the first 10s of the database is (com _ delete1+ com _ insert1+ com _ update1)/10 s. The number of data deletions of the first 20s of the database is com _ delete2, the number of data insertions of the first 20s of the database is com _ insert2, and the number of data updates of the first 20s of the database is com _ update2, so that the QPS of the first 20s of the database is (com _ delete2+ com _ insert2+ com _ update2)/20 s. Then the QPS between the first 10s-20s of the database is [ (com _ delete 2-com _ delete1) + (com _ insert 2-com _ insert1) + (com _ update 2-com _ update1) ]/10 s.
The calculation relationship of the average master-slave delay time of the database is (the average master-slave delay time of a certain period of time is referred to herein): delay (second _ before _ master1+ … + second _ before _ master n)/count 2; the second _ before _ master1 and the second _ before _ master 2 respectively correspond to master-slave delay times of two time points, and the count2 is the total acquisition times of the master-slave delay times.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a system for analyzing transaction distribution in a database log according to an embodiment of the present invention.
The system for analyzing the transaction distribution in the database log comprises the following steps:
the analysis module 1 is used for acquiring a binary log file corresponding to the database, and performing autonomous analysis on binary contents of the binary log file to obtain an analysis text of the binary log file;
the distribution module 2 is used for autonomously obtaining the distribution information of each transaction corresponding to the binary log file according to the keywords representing the distribution condition of each transaction in the process of analyzing the binary content;
and the distinguishing module 3 is used for autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction according to a preset large transaction distinguishing rule so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions.
For introduction of the analysis system provided in the present application, reference is made to the embodiments of the analysis method, which are not repeated herein.
The present application further provides a device for analyzing transaction distribution in database logs, including:
a storage module for storing a computer program;
and the processing module is used for executing the computer program to realize the steps of any method for analyzing the transaction distribution in the database log.
For the introduction of the analysis apparatus provided in the present application, reference is made to the embodiments of the analysis method, which are not repeated herein.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for resolving transaction distributions in database logs, comprising:
acquiring a binary log file corresponding to a database, and performing autonomous analysis on binary contents of the binary log file to obtain an analysis text of the binary log file;
in the process of analyzing the binary content, according to the keywords representing the distribution condition of each transaction, the distribution information of each transaction corresponding to the binary log file is automatically obtained;
and according to a preset large transaction distinguishing rule, autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions.
2. The method of resolving transaction distributions in database logs as recited in claim 1, wherein the process of obtaining a binary log file corresponding to a database comprises:
monitoring whether an index file of a binary log file corresponding to the database is changed;
if yes, triggering the backup of the local binary log file corresponding to the index file before the change occurs, transmitting the backup binary log file to a remote end so as to obtain the binary log file corresponding to the database at the remote end, and then executing the operation of analyzing the transaction distribution in the obtained binary log file at the remote end.
3. The method of resolving transaction distributions in database logs as recited in claim 2, wherein the process of transmitting the backed up binary log file to a remote location comprises:
transmitting the backup binary log file to a remote end by taking a data block as a unit;
and the method for analyzing the transaction distribution in the database log further comprises the following steps:
calculating a first hash value of a target data block which is not transmitted to a remote end according to a hash algorithm; wherein the target data block is any data block;
after the target data block is transmitted to a remote end, calculating a second hash value of the target data block received by the remote end according to a hash algorithm;
after the last data block corresponding to the backup binary log file is transmitted to a remote end, judging whether the first hash value of the target data block is consistent with the second hash value of the target data block;
if all the data blocks are uniform, determining that the backup binary log file is not damaged in the transmission process; otherwise, the backup binary log file is determined to be damaged in the transmission process.
4. The method of resolving transaction distributions in database logs as claimed in claim 1, wherein the process of autonomously resolving the binary contents of the binary log file to obtain the resolved text of the binary log file comprises:
decomposing the binary log file into a plurality of binary files, and converting each binary file into a readable binary file so as to read binary contents of each readable binary file;
translating the binary content of each readable binary file according to the corresponding relation of the binary text to obtain a translation text of each readable binary file;
and merging the translation texts of the readable binary files to obtain the analysis text of the binary log file.
5. The method according to claim 1, wherein the process of autonomously obtaining the distribution information of each transaction corresponding to the binary log file according to the keyword representing the distribution condition of each transaction comprises:
correspondingly calculating the size of each transaction according to the key word representing the beginning of each transaction and the key word representing the end of each transaction;
correspondingly calculating the starting time and the ending time of each transaction according to the keywords representing the starting time of each transaction and the keywords representing the ending time of each transaction;
calculating the total number of the transactions according to the GTID number representing the id information of each transaction;
correspondingly calculating the starting position and the ending position of each transaction according to the keywords representing the starting positions of the transactions and the keywords representing the ending positions of the transactions;
and summarizing the information obtained by calculating each transaction to obtain the distribution information of each transaction.
6. The method of resolving a transaction distribution in a database log of claim 5, wherein the method of resolving a transaction distribution in a database log further comprises:
extracting SQL contents corresponding to the large transaction from the analysis text according to the starting position and the ending position of the large transaction;
and visually displaying the total number of the transactions, the total number of the large transactions, the distribution information of the large transactions, the SQL contents corresponding to the distribution information and the optimization processing suggestions of the large transactions.
7. The method for resolving transaction distributions in database logs according to any one of claims 1 to 6, wherein the process of setting the big transaction differentiation rule in advance comprises:
acquiring the average CPU utilization rate of a server for monitoring the database, QPS information of the database and the average master-slave delay time of the database;
judging whether the average CPU utilization rate exceeds a preset utilization rate threshold, whether the QPS information exceeds a preset number threshold and whether the average master-slave delay time exceeds a preset time threshold;
if the average CPU utilization rate exceeds a preset utilization rate threshold value and/or the QPS information exceeds a preset number threshold value and/or the average master-slave delay time exceeds a preset time threshold value, taking the transaction with the transaction size larger than a preset first threshold value and the execution time exceeding the preset first time threshold value as a large transaction; otherwise, taking the transaction with the transaction size larger than the preset second threshold and the execution time exceeding the preset second time threshold as a large transaction; wherein the preset first threshold is smaller than the preset second threshold; the preset first time threshold is smaller than the preset second time threshold.
8. The method of resolving transaction distributions in database logs according to claim 7, wherein the process of obtaining the average CPU utilization of the server for monitoring the database, the QPS information of the database, and the average master slave latency of the database comprises:
calculating the average CPU utilization rate of the servers for monitoring the database according to busy ═ (CPU1+ CPU2+ … CPUn)/count 1; wherein busy is the average utilization rate of the CPU; the CPUn is the CPU utilization rate of the server acquired at the nth time; count1 is the total collection times of the CPU utilization rate;
calculating QPS information of the database according to (com _ delete + com _ insert + com _ update)/diff _ time; wherein QPS is the transaction number submitted by the database per second; com _ delete is the number of times per second that the database deletes data, and is an accumulated value; com insert is the number of data insertions per second of the database, which is a cumulative value; com _ update is the number of data updates per second of the database, and is an accumulated value; diff _ time is the time difference;
calculating the average master-slave delay time of the database according to the delay (second _ slave _ master1+ … + second _ slave _ master/count 2; wherein, second _ before _ master n is the master-slave delay time of the database collected at the nth time; count2 is the total number of acquisitions for the master slave delay time.
9. A system for resolving transaction distributions in database logs, comprising:
the analysis module is used for acquiring a binary log file corresponding to a database and autonomously analyzing the binary content of the binary log file to obtain an analysis text of the binary log file;
the distribution module is used for autonomously obtaining the distribution information of each transaction corresponding to the binary log file according to the keywords representing the distribution condition of each transaction in the process of analyzing the binary content;
and the distinguishing module is used for autonomously distinguishing the distribution information of the large transactions from the distribution information of each transaction according to a preset large transaction distinguishing rule so as to optimize the processing of the large transactions in the database based on the distribution information of the large transactions.
10. An apparatus for resolving transaction distributions in a database log, comprising:
a storage module for storing a computer program;
processing means for executing the computer program for implementing the steps of the method for resolving a transaction distribution in a database log according to any of claims 1-8.
CN202110430915.6A 2021-04-21 2021-04-21 Method, system and device for analyzing transaction distribution in database log Active CN113127427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110430915.6A CN113127427B (en) 2021-04-21 2021-04-21 Method, system and device for analyzing transaction distribution in database log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110430915.6A CN113127427B (en) 2021-04-21 2021-04-21 Method, system and device for analyzing transaction distribution in database log

Publications (2)

Publication Number Publication Date
CN113127427A true CN113127427A (en) 2021-07-16
CN113127427B CN113127427B (en) 2022-08-02

Family

ID=76778599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110430915.6A Active CN113127427B (en) 2021-04-21 2021-04-21 Method, system and device for analyzing transaction distribution in database log

Country Status (1)

Country Link
CN (1) CN113127427B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016078423A1 (en) * 2014-11-17 2016-05-26 中兴通讯股份有限公司 Transaction processing method and apparatus for distributed database system
CN106776136A (en) * 2016-12-12 2017-05-31 网易(杭州)网络有限公司 Data base processing method and device
WO2017094263A1 (en) * 2015-12-04 2017-06-08 日本電気株式会社 Log analysis system, method, and program
CN109101627A (en) * 2018-08-14 2018-12-28 交通银行股份有限公司 heterogeneous database synchronization method and device
CN110278102A (en) * 2018-03-15 2019-09-24 勤智数码科技股份有限公司 A kind of IT automation operational system and method
CN110990182A (en) * 2019-12-03 2020-04-10 腾讯科技(深圳)有限公司 Transaction processing method, device, equipment and storage medium
CN112084161A (en) * 2020-09-16 2020-12-15 腾讯科技(深圳)有限公司 Database-based data processing method and device and readable storage medium
CN112612859A (en) * 2020-12-31 2021-04-06 上海英方软件股份有限公司 DDL analysis method and device based on log analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016078423A1 (en) * 2014-11-17 2016-05-26 中兴通讯股份有限公司 Transaction processing method and apparatus for distributed database system
WO2017094263A1 (en) * 2015-12-04 2017-06-08 日本電気株式会社 Log analysis system, method, and program
CN106776136A (en) * 2016-12-12 2017-05-31 网易(杭州)网络有限公司 Data base processing method and device
CN110278102A (en) * 2018-03-15 2019-09-24 勤智数码科技股份有限公司 A kind of IT automation operational system and method
CN109101627A (en) * 2018-08-14 2018-12-28 交通银行股份有限公司 heterogeneous database synchronization method and device
CN110990182A (en) * 2019-12-03 2020-04-10 腾讯科技(深圳)有限公司 Transaction processing method, device, equipment and storage medium
CN112084161A (en) * 2020-09-16 2020-12-15 腾讯科技(深圳)有限公司 Database-based data processing method and device and readable storage medium
CN112612859A (en) * 2020-12-31 2021-04-06 上海英方软件股份有限公司 DDL analysis method and device based on log analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
C大调A小调,CSDN: "mysql 大事务 binlog_如何快速找到MYSQL binlog中的大事物以及生成量分布(infobin工具)", 《HTTPS://BLOG.CSDN.NET/WEIXIN_29023079/ARTICLE/DETAILS/113133931》 *
姚远ACE,CSDN: "快速找到 binlog 中是否有大事务", 《HTTPS://BLOG.CSDN.NET/WEIXIN_43424368/ARTICLE/DETAILS/107279748》 *

Also Published As

Publication number Publication date
CN113127427B (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN109684352B (en) Data analysis system, data analysis method, storage medium, and electronic device
US10261862B2 (en) Data replication in a database management system
US11468062B2 (en) Order-independent multi-record hash generation and data filtering
US10262013B2 (en) Efficient full delete operations
US8234248B2 (en) Tracking changes to a business object
CN108959400B (en) Bank system historical data cleaning method and device
US7797286B2 (en) System and method for externally providing database optimizer statistics
US7673291B2 (en) Automatic database diagnostic monitor architecture
CN109213756B (en) Data storage method, data retrieval method, data storage device, data retrieval device, server and storage medium
CN110569214B (en) Index construction method and device for log file and electronic equipment
CN111400408A (en) Data synchronization method, device, equipment and storage medium
US20150286671A1 (en) Transaction system
CN109086382B (en) Data synchronization method, device, equipment and storage medium
CA2378582A1 (en) Real-time database object statistics collection
CN114625809A (en) Data synchronization method and device based on Binlog log, storage medium and electronic equipment
WO2015168988A1 (en) Data index creation method and device, and computer storage medium
CN110245037B (en) Hive user operation behavior restoration method based on logs
CN109446262B (en) Data aggregation method and device
US9043371B1 (en) Storing information in a trusted environment for use in processing data triggers in an untrusted environment
CN113127427B (en) Method, system and device for analyzing transaction distribution in database log
CN113094442A (en) Full data synchronization method, device, equipment and medium
CN113094154A (en) Big data processing method and system based on Aliyun
CN111382332B (en) Earthquake disaster information processing method and system
US11113311B2 (en) Technology agnostic system and method for achieving eventually-consistent data replication
CN114385188A (en) Code workload statistical method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant